Boaz Ben-Zvi created DRILL-5728: ----------------------------------- Summary: Hash Aggregate: Useless bigint value vector in the values batch Key: DRILL-5728 URL: https://issues.apache.org/jira/browse/DRILL-5728 Project: Apache Drill Issue Type: Improvement Components: Execution - Codegen Affects Versions: 1.11.0 Reporter: Boaz Ben-Zvi Priority: Minor
When aggregating a non-nullable column (like *sum(l_partkey)* below), the code generation creates an extra value vector (in addition to the actual "sum" vector) which is used as a "nonNullCount". This is useless (as the underlying column is non-nullable), and wastes considerable memory ( 8 * 64K = 512K per each value in a batch !!) Example query: <code> select sum(l_partkey) as slpk from cp.`tpch/lineitem.parquet` group by l_orderkry; </code> And as can be seen in the generated code below, the bigint value vector *vv5* is only used to hold a *1* flag to note "not null": <code> public void updateAggrValuesInternal(int incomingRowIdx, int htRowIdx) throws SchemaChangeException { { IntHolder out11 = new IntHolder(); { out11 .value = vv8 .getAccessor().get((incomingRowIdx)); } IntHolder in = out11; work0 .value = vv1 .getAccessor().get((htRowIdx)); BigIntHolder value = work0; work4 .value = vv5 .getAccessor().get((htRowIdx)); BigIntHolder nonNullCount = work4; SumFunctions$IntSum_add: { nonNullCount.value = 1; value.value += in.value; } work0 = value; vv1 .getMutator().set((htRowIdx), work0 .value); work4 = nonNullCount; vv5 .getMutator().set((htRowIdx), work4 .value); } } </code> -- This message was sent by Atlassian JIRA (v6.4.14#64029)