[ https://issues.apache.org/jira/browse/ARROW-18344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634877#comment-17634877 ]
Antoine Pitrou edited comment on ARROW-18344 at 11/16/22 3:11 PM: ------------------------------------------------------------------ We don't actually sort data in Arrow, we produce indices that would sort the (untouched, unsorted) data. Here we should follow the same approach, which means it can't be part of Concatenate. We probably want something like a "merge_indices" compute function, similar to "sort_indices". The building blocks required for implementation are already there, since that's how "sort_indices" is implemented for chunked inputs. One limitation is that this requires physical chunking to be aligned with logical sortedness? Unless we optionally allow the user to pass a vector of the boundaries between (logical) sorted chunks. was (Author: pitrou): We don't actually sort data in Arrow, we produce indices that would sort the (untouched, unsorted) data. Here we should follow the same approach, which means it can't be part of Concatenate. We probably want something like a "merge_indices" compute function, similar to "sort_indices". The building blocks required for implementation are already there, since that's how "sort_indices" is implemented for chunked inputs. One limitation is that this requires physical chunking to be aligned with logical sortedness? > [C++] Use input pre-sortedness to create sorted table with ConcatenateTables > ---------------------------------------------------------------------------- > > Key: ARROW-18344 > URL: https://issues.apache.org/jira/browse/ARROW-18344 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Rok Mihevc > Priority: Major > Labels: kernel > > In case of concatenating large sorted tables (e.g. sorted timeseries data) > the resulting table is no longer sorted. However the input sortedness can be > used to significantly speed up post concatenation sorting. A potential API > could be to add ConcatenateTablesOptions.inputs_sorted and implement the > logic in ConcatenateTables. -- This message was sent by Atlassian Jira (v8.20.10#820010)