Hi Liya,

Thanks for the pointers. Your suggestion looks very similar to the
FixedWidthOutOfPlaceSorter.sortOutOfPlace() and
VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace() implementations, where
they get the sorted indices first, and then copy data by the sorted indices
to an output buffer. But when you mentioned that I could "design an
efficient algorithm based on your specific data properties", are
you suggesting there could be a sorting implementation more efficient than
a generalized struct-based implementation of
FixedWidthOutOfPlaceSorter.sortOutOfPlace() and
VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace()?

Or are you referring to building a generalized struct-based
FixedWidthInPlaceVectorSorter (assuming my struct only has fixed-width
child vectors)?

Best,
Daniel

On 2021/08/26 03:38:59, Fan Liya <[email protected]> wrote:
> Hi Daniel,>
>
> The following method can be used, although it may not be the most
efficient>
> one:>
>
> 1. You need to provide an implementation of>
> VectorValueComparator<StructVector>  based on your custom requirements.>
> 2. Sort your struct vector by>
> the org.apache.arrow.algorithm.sort.IndexSorter, which will produce a>
> vector with the positions of vector elements in sorted order.>
> 3. Generate the sorted vector by the element positions generated in step
2.>
> Note that we can use another struct vector and call the>
> StructVector#copyFrom API.>
>
> The above process is an out-of-place sort. An in-place sort is feasible>
> only if all fields are fixed-width, and you can design an efficient>
> algorithm based on your specific data properties.>
>
> In our current implementation, we do not have a default>
> VectorValueComparator for struct vectors, and we have no plan to provide>
> one.>
> However, we may provide a general out-of-place sorter for vectors of>
> arbitrary types.>
>
> Best,>
> Liya Fan>
>
>
>
>
>
>
> On Thu, Aug 26, 2021 at 5:55 AM Daniel Hsu . <[email protected]>>
> wrote:>
>
> > Hi,>
> >>
> > My name is Daniel Hsu and I'm an engineer at ByteDance. We're
exploring>
> > Apache Arrow, and have come across a use case that we're not sure
about.>
> >>
> > We represent our columnar dataset as a StructVector that has one child>
> > vector per column. We'd like to sort a StructVector by a composite key
from>
> > multiple of the child vectors, but it doesn't seem like this use case
is>
> > supported because:>
> >>
> > 1. FixedWidthInPlaceVectorSorter and FixedWidthOutOfPlaceVectorSorter
only>
> > work on fixed width vectors, and a StructVector is not fixed width
vector.>
> > 2.  VariableWidthOutOfPlaceVectorSorter only works>
> > on BaseVariableWidthVector, and StructVector is not a>
> > BaseVariableWidthVector.>
> >>
> > And while index sorting does work on StructVectors, it isn't able to
solve>
> > our use case.>
> >>
> > Is there a recommendation on how to sort a StructVector, or more
generally>
> > how to sort multiple vectors by composite keys? I've attached a simple
Java>
> > file that contains some sample code to demonstrate what I'm referring
to.>
> >>
> > Note: I sent this email to [email protected] yesterday,
but>
> > on second thought I'm not sure if [email protected] is>
> > meant for questions.>
> >>
> > Best,>
> > Daniel>
> >>
>

Reply via email to