Hi Niranda, Thanks a lot for the quick response! Yes, you're absolutely right, that should work! I somehow missed that the filter compute function takes _any_ input type.
So that only leaves (2)... Using (1) to filter the result of list_parent_indices gets pretty close: filter([0, 0, 0, 1, 1, 2, 2, 2, 3, 4, 4], [false, false, true, false, false, false, true, false, false, false, false]) = [0, 2] Is there a shortcut in Arrow to transform a list of indices to a corresponding boolean mask ([0, 2] --> [true, false, true, false, false])? Cheers, Leo On Fri, Aug 13, 2021 at 10:45 PM Niranda Perera <[email protected]> wrote: > Hi Leo, > > Can't you call compute.filter with the resultant bool array from step (2)? > 🤔 > > On Fri, Aug 13, 2021, 06:17 Leonhard Gruenschloss < > [email protected]> wrote: > >> Hi, >> >> I'd like to filter a ListArray, based on whether a particular value is >> present in each list. Is there a better approach than the one described >> below? Particularly, are there any existing compute functions that I could >> use instead? >> >> Here's a concrete example, with rows consisting of variable-length lists >> of strings: >> ["a", "b", "x"] >> ["c", "d"] >> ["e", "x", "a"] >> ["c"] >> ["d, "e"] >> >> If the element to search for is "x", only the first and third row would >> be retained after filtering: >> ["a", "b", "x"] >> ["e", "x", "a"] >> >> To implement this, the following should work, but is there a better way? >> >> (1) Run the "equal" compute function on the values of the list: >> [false, false, true, false, false, false, true, false, false, false, >> false] >> >> (2) Linearly scan the result of (1) in lockstep with the list's offsets, >> to keep track of which rows matched: >> [true, false, true, false, false] >> >> (3) Expand the result of (2) by the list lengths: >> [true, true, true, false, false, true, true, true, false, false, false] >> >> (4) Use the "filter" compute function (using the result from (3)) to copy >> only the matching values. >> ["a", "b", "x", "e", "x", "a"] >> >> (5) Using the result of (2), sum up lengths to compute new offsets: >> [0, 3, 6] >> >> (2), (3), and (5) are of course not difficult to implement, but is there >> maybe a trick to use existing compute functions instead? Particularly for >> non-C++ implementations that could make a big performance difference. >> >> Cheers, >> Leo >> >>>
