Weston Pace created ARROW-16549:
-----------------------------------

             Summary: [C++] Simplify AggregateNodeOptions aggregates/targets
                 Key: ARROW-16549
                 URL: https://issues.apache.org/jira/browse/ARROW-16549
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Currently AggregateNodeOptions is:

{noformat}
class ARROW_EXPORT AggregateNodeOptions : public ExecNodeOptions {
 public:
  // aggregations which will be applied to the targetted fields
  std::vector<internal::Aggregate> aggregates;
  // fields to which aggregations will be applied
  std::vector<FieldRef> targets;
  // output field names for aggregations
  std::vector<std::string> names;
  // keys by which aggregations will be grouped
  std::vector<FieldRef> keys;
};
{noformat}

It is not very obvious how {{aggregates}} and {{targets}} are related.  My 
initial read of the comments led me to think that each aggregate would be 
applied to each target and you would end up with {{len(aggregates) * 
len(targets)}} output fields.  In reality the {{aggregate}} at index {{i}} only 
applies to the {{target}} at index {{i}}.  It would be simpler to add a 
{{FieldRef target}} to {{internal::Aggregate}} (and {{Aggregate}} should not be 
{{internal}}).

Alternatively, the entire {{internal::Aggregate}} could be replaced by a "call" 
{{arrow::compute::Expression}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to