I think both ways can be explored, depending on your specific scenario.
In particular, if all child vectors are fixed-sized, you can design an
in-place sorter.

Best,
Liya Fan


On Thu, Aug 26, 2021 at 5:10 PM Daniel Hsu . <[email protected]>
wrote:

> Hi Liya,
>
> Thanks for the pointers. Your suggestion looks very similar to the
> FixedWidthOutOfPlaceSorter.sortOutOfPlace() and
> VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace() implementations, where
> they get the sorted indices first, and then copy data by the sorted indices
> to an output buffer. But when you mentioned that I could "design an
> efficient algorithm based on your specific data properties", are
> you suggesting there could be a sorting implementation more efficient than
> a generalized struct-based implementation of
> FixedWidthOutOfPlaceSorter.sortOutOfPlace() and
> VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace()?
>
> Or are you referring to building a generalized struct-based
> FixedWidthInPlaceVectorSorter (assuming my struct only has fixed-width
> child vectors)?
>
> Best,
> Daniel
>
> On 2021/08/26 03:38:59, Fan Liya <[email protected]> wrote:
> > Hi Daniel,>
> >
> > The following method can be used, although it may not be the most
> efficient>
> > one:>
> >
> > 1. You need to provide an implementation of>
> > VectorValueComparator<StructVector>  based on your custom requirements.>
> > 2. Sort your struct vector by>
> > the org.apache.arrow.algorithm.sort.IndexSorter, which will produce a>
> > vector with the positions of vector elements in sorted order.>
> > 3. Generate the sorted vector by the element positions generated in step
> 2.>
> > Note that we can use another struct vector and call the>
> > StructVector#copyFrom API.>
> >
> > The above process is an out-of-place sort. An in-place sort is feasible>
> > only if all fields are fixed-width, and you can design an efficient>
> > algorithm based on your specific data properties.>
> >
> > In our current implementation, we do not have a default>
> > VectorValueComparator for struct vectors, and we have no plan to
> provide>
> > one.>
> > However, we may provide a general out-of-place sorter for vectors of>
> > arbitrary types.>
> >
> > Best,>
> > Liya Fan>
> >
> >
> >
> >
> >
> >
> > On Thu, Aug 26, 2021 at 5:55 AM Daniel Hsu . <[email protected]>>
> > wrote:>
> >
> > > Hi,>
> > >>
> > > My name is Daniel Hsu and I'm an engineer at ByteDance. We're
> exploring>
> > > Apache Arrow, and have come across a use case that we're not sure
> about.>
> > >>
> > > We represent our columnar dataset as a StructVector that has one
> child>
> > > vector per column. We'd like to sort a StructVector by a composite key
> from>
> > > multiple of the child vectors, but it doesn't seem like this use case
> is>
> > > supported because:>
> > >>
> > > 1. FixedWidthInPlaceVectorSorter and FixedWidthOutOfPlaceVectorSorter
> only>
> > > work on fixed width vectors, and a StructVector is not fixed width
> vector.>
> > > 2.  VariableWidthOutOfPlaceVectorSorter only works>
> > > on BaseVariableWidthVector, and StructVector is not a>
> > > BaseVariableWidthVector.>
> > >>
> > > And while index sorting does work on StructVectors, it isn't able to
> solve>
> > > our use case.>
> > >>
> > > Is there a recommendation on how to sort a StructVector, or more
> generally>
> > > how to sort multiple vectors by composite keys? I've attached a simple
> Java>
> > > file that contains some sample code to demonstrate what I'm referring
> to.>
> > >>
> > > Note: I sent this email to [email protected] yesterday,
> but>
> > > on second thought I'm not sure if [email protected] is>
> > > meant for questions.>
> > >>
> > > Best,>
> > > Daniel>
> > >>
> >
>

Reply via email to