[ 
https://issues.apache.org/jira/browse/IMPALA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7994:
------------------------------
    Priority: Blocker  (was: Critical)

> Queries hitting memory limit issues in release builds
> -----------------------------------------------------
>
>                 Key: IMPALA-7994
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7994
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: bharath v
>            Assignee: Bikramjeet Vig
>            Priority: Blocker
>              Labels: broken-build
>
> This usually causes multiple test failures, especially the ones running 
> around the time memory is oversubscribed. The failures in one of builds I 
> noticed are.
> {noformat}
>  query_test.test_queries.TestQueriesTextTables.test_random[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none]      19 sec  1
>  
> query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: parquet/none]       19 sec  1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: avro/snap/block] 2.2 sec 1
>  
> query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol:
>  beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': 
> True} | table_format: seq/gzip/block]       1.8 sec 1
>  query_test.test_queries.TestQueriesTextTables.test_values[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none]       60 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: rc/bzip/block]   7 ms    1
>  
> query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol:
>  beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': 
> True} | table_format: text/none]    60 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: rc/bzip/block]    7 ms    1
>  
> query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: parquet/none]      76 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/lzo/block]      7 ms    1
>  verifiers.test_verify_metrics.TestValidateMetrics.test_metrics_are_zero
> {noformat}
> Following is the mem-tracker dump from one of the failed queries.
> {noformat}
> Stacktrace
> query_test/test_queries.py:182: in test_random
>     self.run_test_case('QueryTest/random', vector)
> common/impala_test_suite.py:467: in run_test_case
>     result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:688: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:170: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:182: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:359: in __execute_query
>     self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:380: in wait_for_finished
>     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    Query aborted:Memory limit exceeded: Error occurred on backend 
> impala-ec2-centos74-m5-4xlarge-ondemand-1509.vpc.cloudera.com:22000 by 
> fragment f84d32bad98b93af:74dc2ee400000000
> E   Memory left in process limit: -190.02 MB
> E   Query(f84d32bad98b93af:74dc2ee400000000): Reservation=364.00 MB 
> ReservationLimit=9.60 GB OtherMemory=44.76 KB Total=364.04 MB Peak=2.88 GB
> E     Fragment f84d32bad98b93af:74dc2ee400000000: Reservation=364.00 MB 
> OtherMemory=44.76 KB Total=364.04 MB Peak=2.88 GB
> E       AGGREGATION_NODE (id=4): Total=16.00 KB Peak=16.00 KB
> E         NonGroupingAggregator 0: Total=8.00 KB Peak=8.00 KB
> E           Exprs: Total=4.00 KB Peak=4.00 KB
> E       AGGREGATION_NODE (id=3): Reservation=364.00 MB OtherMemory=17.12 KB 
> Total=364.02 MB Peak=2.88 GB
> E         GroupingAggregator 0: Reservation=364.00 MB OtherMemory=17.12 KB 
> Total=364.02 MB Peak=2.88 GB
> E           Exprs: Total=17.12 KB Peak=17.12 KB
> E       NESTED_LOOP_JOIN_NODE (id=2): Total=0 Peak=408.00 KB
> E         Nested Loop Join Builder: Total=0 Peak=392.00 KB
> E       HDFS_SCAN_NODE (id=0): Reservation=0 OtherMemory=0 Total=0 
> Peak=536.00 KB
> E       HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 
> Peak=208.00 KB
> E       PLAN_ROOT_SINK: Total=0 Peak=0
> E       CodeGen: Total=3.63 KB Peak=594.50 KBProcess: memory limit exceeded. 
> Limit=12.00 GB Total=12.19 GB Peak=136.84 GB
> E     Buffer Pool: Free Buffers: Total=7.56 GB
> E     Buffer Pool: Clean Pages: Total=0
> E     Buffer Pool: Unused Reservation: Total=-136.01 MB
> E     Control Service Queue: Limit=50.00 MB Total=0 Peak=772.75 KB
> E     Data Stream Service Queue: Limit=614.40 MB Total=0 Peak=796.13 MB
> E     Data Stream Manager Early RPCs: Total=0 Peak=100.00 KB
> E     TCMalloc Overhead: Total=243.58 MB
> E     RequestPool=default-pool: Total=500.11 MB Peak=136.23 GB
> E       Query(f84d32bad98b93af:74dc2ee400000000): Reservation=364.00 MB 
> ReservationLimit=9.60 GB OtherMemory=44.76 KB Total=364.04 MB Peak=2.88 GB
> E       Query(7b409553a87bb311:1a7031fa00000000): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=0 Total=0 Peak=1.04 GB
> E       Query(84d0244f7bac9f9:46fd9f6b00000000): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=0 Total=0 Peak=74.49 MB
> E       Query(244afa49159c5336:24e0097100000000): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=40.00 KB Total=40.00 KB Peak=40.00 KB
> E       Query(ac40b390f31d5add:4eff02bf00000000): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=25.00 KB Total=25.00 KB Peak=25.00 KB
> E       Query(3644d3cb42492c6c:c6efd0f800000000): Reservation=136.01 MB 
> ReservationLimit=9.60 GB OtherMemory=8.00 KB Total=136.02 MB Peak=136.02 MB
> E     RequestPool=fe-eval-exprs: Total=0 Peak=256.01 MB
> E     Untracked Memory: Total=4.03 GB
> {noformat}
> We can see that the untracked memory is on the higher side and one of the 
> queries listed above {{7b409553a87bb311:1a7031fa00000000}} corresponds to 
> {{create table test_insert_large_string_dde7e595.insert_largestring stored as 
> parquet as select repeat('AZ', 128 * 1024 * 1024) as s}}. I can repro the 
> untracked memory peak when I run this query locally on my dev box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to