Andrew Sherman created IMPALA-10588:
---------------------------------------

             Summary: PlannerTest/resource-requirements.test fails with bad mem 
estimates (from number of files?)
                 Key: IMPALA-10588
                 URL: https://issues.apache.org/jira/browse/IMPALA-10588
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 4.0
            Reporter: Andrew Sherman


We see an unexpected plan in the plan for "select * from tpch_orc_def.lineitem" 
with Hive v3.

The first line to diff is 

{code}
Per-Host Resource Estimates: Memory=188MB 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
{code}

but in the scan we see
{code}
HDFS partitions=1/1 files=1 size=142.84MB
{code}
instead of the expected 
{code}
HDFS partitions=1/1 files=5 size=142.72MB
{code}
Could this be a regression from the recent change IMPALA-10503 which changed 
data loading?

{code}
Section PLAN of query:
select * from tpch_orc_def.lineitem

Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=12.00MB Threads=2
Per-Host Resource Estimates: Memory=188MB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Analyzed query: SELECT * FROM tpch_orc_def.lineitem

F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=188.00MB mem-reservation=12.00MB 
thread-reservation=2
PLAN-ROOT SINK
|  output exprs: tpch_orc_def.lineitem.l_orderkey, 
tpch_orc_def.lineitem.l_partkey, tpch_orc_def.lineitem.l_suppkey, 
tpch_orc_def.lineitem.l_linenumber, tpch_orc_def.lineitem.l_quantity, 
tpch_orc_def.lineitem.l_extendedprice, tpch_orc_def.lineitem.l_discount, 
tpch_orc_def.lineitem.l_tax, tpch_orc_def.lineitem.l_returnflag, 
tpch_orc_def.lineitem.l_linestatus, tpch_orc_def.lineitem.l_shipdate, 
tpch_orc_def.lineitem.l_commitdate, tpch_orc_def.lineitem.l_receiptdate, 
tpch_orc_def.lineitem.l_shipinstruct, tpch_orc_def.lineitem.l_shipmode, 
tpch_orc_def.lineitem.l_comment
|  mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
00:SCAN HDFS [tpch_orc_def.lineitem]
   HDFS partitions=1/1 files=1 size=142.84MB
   stored statistics:
     table: rows=6.00M size=142.84MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=6.00M
   mem-estimate=88.00MB mem-reservation=8.00MB thread-reservation=1
   tuple-ids=0 row-size=231B cardinality=6.00M
   in pipelines: 00(GETNEXT)

Expected:
Max Per-Host Resource Reservation: Memory=12.00MB Threads=2
Per-Host Resource Estimates: Memory=140MB
Analyzed query: SELECT * FROM tpch_orc_def.lineitem

F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=140.00MB mem-reservation=12.00MB 
thread-reservation=2
PLAN-ROOT SINK
|  output exprs: tpch_orc_def.lineitem.l_orderkey, 
tpch_orc_def.lineitem.l_partkey, tpch_orc_def.lineitem.l_suppkey, 
tpch_orc_def.lineitem.l_linenumber, tpch_orc_def.lineitem.l_quantity, 
tpch_orc_def.lineitem.l_extendedprice, tpch_orc_def.lineitem.l_discount, 
tpch_orc_def.lineitem.l_tax, tpch_orc_def.lineitem.l_returnflag, 
tpch_orc_def.lineitem.l_linestatus, tpch_orc_def.lineitem.l_shipdate, 
tpch_orc_def.lineitem.l_commitdate, tpch_orc_def.lineitem.l_receiptdate, 
tpch_orc_def.lineitem.l_shipinstruct, tpch_orc_def.lineitem.l_shipmode, 
tpch_orc_def.lineitem.l_comment
|  mem-estimate=100.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
00:SCAN HDFS [tpch_orc_def.lineitem]
   HDFS partitions=1/1 files=5 size=142.72MB
   stored statistics:
     table: rows=6.00M size=142.72MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.73M
   mem-estimate=40.00MB mem-reservation=8.00MB thread-reservation=1
   tuple-ids=0 row-size=231B cardinality=6.00M
   in pipelines: 00(GETNEXT)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to