Antoine Pitrou created ARROW-18037: -------------------------------------- Summary: [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns Key: ARROW-18037 URL: https://issues.apache.org/jira/browse/ARROW-18037 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou
As found while working on ARROW-18004: the dataset scanner and the Acero engine rely on {{ExecBatch::ToRecordBatch}} returning successfully when the given schema has fewer fields than the ExecBatch has columns. This apparently allows to implicitly drop the dataset-added columns ({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final result. However, it seems wrong and brittle to do this implicitly at the {{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it should probably be done explicitly inside Acero/dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010)