+1. We have the same issue. A direct kernel would be very useful. On Wed, Nov 2, 2022 at 1:46 PM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote:
> While there are indeed some workarounds possible by composing the > existing kernels (as David shows), we should ideally have a direct > kernel for this kind of operation, but that kernel currently doesn't > exist. > > I recently ran into a similar issue, and I opened > https://issues.apache.org/jira/browse/ARROW-18097 about a > "list_contains" scalar kernel, which would already for checking > against a single value. Maybe we then also want a "list_is_in" kernel > for checking with multiple values (although one could already combine > multiple "list_contains" calls). > > Joris > > On Wed, 2 Nov 2022 at 20:01, Suresh V <suresh0...@gmail.com> wrote: > > > > HI David .. Thank you very much for the response. I apologize for not > posing the question correctly. > > > > The method you have does give the right answer, but it results in > multiple new objects and multiple data passes. > > > > I was looking for a kernel which avoids that as I am dealing with really > large arrays. Please let me know if I am not being clear. > > > > Thanks again for your help. > > > > On Wed, Nov 2, 2022, 2:40 PM Lee, David <david....@blackrock.com> wrote: > >> > >> Slight correction for 3 or 4 instead of just 3.. > >> > >> > >> > >> result = pc.is_in(list(range(len(arr))), pc.filter(indices, > pc.is_in(flat_arr, pa.array([3,4])))) > >> > >> > >> > >> From: Lee, David > >> Sent: Wednesday, November 2, 2022 11:26 AM > >> To: user@arrow.apache.org > >> Subject: RE: Filter a list array based on the contents of the list. > >> > >> > >> > >> This works.. > >> > >> > >> > >> import pyarrow as pa > >> > >> import pyarrow.compute as pc > >> > >> > >> > >> arr = pa.array([[1,2],[3],[3,4,5]]) > >> > >> > >> > >> indices = pc.list_parent_indices(arr) > >> > >> flat_arr = pc.list_flatten(arr) > >> > >> > >> > >> > >> > >> result = pc.is_in(list(range(len(arr))), pc.filter(indices, > pc.equal(flat_arr, 3))) > >> > >> > >> > >> >>> result > >> > >> <pyarrow.lib.BooleanArray object at 0x00000243EA2D4D00> > >> > >> [ > >> > >> false, > >> > >> true, > >> > >> true > >> > >> ] > >> > >> > >> > >> > >> > >> From: Suresh V <suresh0...@gmail.com> > >> Sent: Wednesday, November 2, 2022 10:23 AM > >> To: user@arrow.apache.org > >> Subject: Filter a list array based on the contents of the list. > >> > >> > >> > >> External Email: Use caution with links and attachments > >> > >> Hi .. > >> > >> > >> > >> Is there a compute function I can use to filter an array with list > entries based on the contents of the list? > >> > >> > >> > >> For eg. > >> > >> arr = pa.array([1,2],[3],[3,4,5]). I want to run a computer function > which return true if the entries have 3 or 4. > >> > >> > >> > >> Expected output is: > >> > >> pa.array(False, True, True). > >> > >> > >> > >> The closest I could find was map lookup which expects the entries to be > map. > >> > >> > >> > >> Thanks > >> > >> > >> > >> This message may contain information that is confidential or > privileged. If you are not the intended recipient, please advise the sender > immediately and delete this message. See > http://www.blackrock.com/corporate/compliance/email-disclaimers for > further information. Please refer to > http://www.blackrock.com/corporate/compliance/privacy-policy for more > information about BlackRock’s Privacy Policy. > >> > >> > >> For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/about-us/contacts-locations. > >> > >> © 2022 BlackRock, Inc. All rights reserved. >