Hi Liya, Thanks for the pointers. Your suggestion looks very similar to the FixedWidthOutOfPlaceSorter.sortOutOfPlace() and VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace() implementations, where they get the sorted indices first, and then copy data by the sorted indices to an output buffer. But when you mentioned that I could "design an efficient algorithm based on your specific data properties", are you suggesting there could be a sorting implementation more efficient than a generalized struct-based implementation of FixedWidthOutOfPlaceSorter.sortOutOfPlace() and VariableWidthOutOfPlaceVectorSorter.sortOutOfPlace()?
Or are you referring to building a generalized struct-based FixedWidthInPlaceVectorSorter (assuming my struct only has fixed-width child vectors)? Best, Daniel On 2021/08/26 03:38:59, Fan Liya <[email protected]> wrote: > Hi Daniel,> > > The following method can be used, although it may not be the most efficient> > one:> > > 1. You need to provide an implementation of> > VectorValueComparator<StructVector> based on your custom requirements.> > 2. Sort your struct vector by> > the org.apache.arrow.algorithm.sort.IndexSorter, which will produce a> > vector with the positions of vector elements in sorted order.> > 3. Generate the sorted vector by the element positions generated in step 2.> > Note that we can use another struct vector and call the> > StructVector#copyFrom API.> > > The above process is an out-of-place sort. An in-place sort is feasible> > only if all fields are fixed-width, and you can design an efficient> > algorithm based on your specific data properties.> > > In our current implementation, we do not have a default> > VectorValueComparator for struct vectors, and we have no plan to provide> > one.> > However, we may provide a general out-of-place sorter for vectors of> > arbitrary types.> > > Best,> > Liya Fan> > > > > > > > On Thu, Aug 26, 2021 at 5:55 AM Daniel Hsu . <[email protected]>> > wrote:> > > > Hi,> > >> > > My name is Daniel Hsu and I'm an engineer at ByteDance. We're exploring> > > Apache Arrow, and have come across a use case that we're not sure about.> > >> > > We represent our columnar dataset as a StructVector that has one child> > > vector per column. We'd like to sort a StructVector by a composite key from> > > multiple of the child vectors, but it doesn't seem like this use case is> > > supported because:> > >> > > 1. FixedWidthInPlaceVectorSorter and FixedWidthOutOfPlaceVectorSorter only> > > work on fixed width vectors, and a StructVector is not fixed width vector.> > > 2. VariableWidthOutOfPlaceVectorSorter only works> > > on BaseVariableWidthVector, and StructVector is not a> > > BaseVariableWidthVector.> > >> > > And while index sorting does work on StructVectors, it isn't able to solve> > > our use case.> > >> > > Is there a recommendation on how to sort a StructVector, or more generally> > > how to sort multiple vectors by composite keys? I've attached a simple Java> > > file that contains some sample code to demonstrate what I'm referring to.> > >> > > Note: I sent this email to [email protected] yesterday, but> > > on second thought I'm not sure if [email protected] is> > > meant for questions.> > >> > > Best,> > > Daniel> > >> >
