I think both ways can be explored, depending on your specific scenario. In particular, if all child vectors are fixed-sized, you can design an in-place sorter.
Best, Liya Fan On Thu, Aug 26, 2021 at 5:10 PM Daniel Hsu . <[email protected]> wrote: > Hi Liya, > > Thanks for the pointers. Your suggestion looks very similar to the > FixedWidthOutOfPlaceSorter.sortOutOfPlace() and > VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace() implementations, where > they get the sorted indices first, and then copy data by the sorted indices > to an output buffer. But when you mentioned that I could "design an > efficient algorithm based on your specific data properties", are > you suggesting there could be a sorting implementation more efficient than > a generalized struct-based implementation of > FixedWidthOutOfPlaceSorter.sortOutOfPlace() and > VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace()? > > Or are you referring to building a generalized struct-based > FixedWidthInPlaceVectorSorter (assuming my struct only has fixed-width > child vectors)? > > Best, > Daniel > > On 2021/08/26 03:38:59, Fan Liya <[email protected]> wrote: > > Hi Daniel,> > > > > The following method can be used, although it may not be the most > efficient> > > one:> > > > > 1. You need to provide an implementation of> > > VectorValueComparator<StructVector> based on your custom requirements.> > > 2. Sort your struct vector by> > > the org.apache.arrow.algorithm.sort.IndexSorter, which will produce a> > > vector with the positions of vector elements in sorted order.> > > 3. Generate the sorted vector by the element positions generated in step > 2.> > > Note that we can use another struct vector and call the> > > StructVector#copyFrom API.> > > > > The above process is an out-of-place sort. An in-place sort is feasible> > > only if all fields are fixed-width, and you can design an efficient> > > algorithm based on your specific data properties.> > > > > In our current implementation, we do not have a default> > > VectorValueComparator for struct vectors, and we have no plan to > provide> > > one.> > > However, we may provide a general out-of-place sorter for vectors of> > > arbitrary types.> > > > > Best,> > > Liya Fan> > > > > > > > > > > > > > > On Thu, Aug 26, 2021 at 5:55 AM Daniel Hsu . <[email protected]>> > > wrote:> > > > > > Hi,> > > >> > > > My name is Daniel Hsu and I'm an engineer at ByteDance. We're > exploring> > > > Apache Arrow, and have come across a use case that we're not sure > about.> > > >> > > > We represent our columnar dataset as a StructVector that has one > child> > > > vector per column. We'd like to sort a StructVector by a composite key > from> > > > multiple of the child vectors, but it doesn't seem like this use case > is> > > > supported because:> > > >> > > > 1. FixedWidthInPlaceVectorSorter and FixedWidthOutOfPlaceVectorSorter > only> > > > work on fixed width vectors, and a StructVector is not fixed width > vector.> > > > 2. VariableWidthOutOfPlaceVectorSorter only works> > > > on BaseVariableWidthVector, and StructVector is not a> > > > BaseVariableWidthVector.> > > >> > > > And while index sorting does work on StructVectors, it isn't able to > solve> > > > our use case.> > > >> > > > Is there a recommendation on how to sort a StructVector, or more > generally> > > > how to sort multiple vectors by composite keys? I've attached a simple > Java> > > > file that contains some sample code to demonstrate what I'm referring > to.> > > >> > > > Note: I sent this email to [email protected] yesterday, > but> > > > on second thought I'm not sure if [email protected] is> > > > meant for questions.> > > >> > > > Best,> > > > Daniel> > > >> > > >
