While there are indeed some workarounds possible by composing the existing kernels (as David shows), we should ideally have a direct kernel for this kind of operation, but that kernel currently doesn't exist.
I recently ran into a similar issue, and I opened https://issues.apache.org/jira/browse/ARROW-18097 about a "list_contains" scalar kernel, which would already for checking against a single value. Maybe we then also want a "list_is_in" kernel for checking with multiple values (although one could already combine multiple "list_contains" calls). Joris On Wed, 2 Nov 2022 at 20:01, Suresh V <[email protected]> wrote: > > HI David .. Thank you very much for the response. I apologize for not posing > the question correctly. > > The method you have does give the right answer, but it results in multiple > new objects and multiple data passes. > > I was looking for a kernel which avoids that as I am dealing with really > large arrays. Please let me know if I am not being clear. > > Thanks again for your help. > > On Wed, Nov 2, 2022, 2:40 PM Lee, David <[email protected]> wrote: >> >> Slight correction for 3 or 4 instead of just 3.. >> >> >> >> result = pc.is_in(list(range(len(arr))), pc.filter(indices, >> pc.is_in(flat_arr, pa.array([3,4])))) >> >> >> >> From: Lee, David >> Sent: Wednesday, November 2, 2022 11:26 AM >> To: [email protected] >> Subject: RE: Filter a list array based on the contents of the list. >> >> >> >> This works.. >> >> >> >> import pyarrow as pa >> >> import pyarrow.compute as pc >> >> >> >> arr = pa.array([[1,2],[3],[3,4,5]]) >> >> >> >> indices = pc.list_parent_indices(arr) >> >> flat_arr = pc.list_flatten(arr) >> >> >> >> >> >> result = pc.is_in(list(range(len(arr))), pc.filter(indices, >> pc.equal(flat_arr, 3))) >> >> >> >> >>> result >> >> <pyarrow.lib.BooleanArray object at 0x00000243EA2D4D00> >> >> [ >> >> false, >> >> true, >> >> true >> >> ] >> >> >> >> >> >> From: Suresh V <[email protected]> >> Sent: Wednesday, November 2, 2022 10:23 AM >> To: [email protected] >> Subject: Filter a list array based on the contents of the list. >> >> >> >> External Email: Use caution with links and attachments >> >> Hi .. >> >> >> >> Is there a compute function I can use to filter an array with list entries >> based on the contents of the list? >> >> >> >> For eg. >> >> arr = pa.array([1,2],[3],[3,4,5]). I want to run a computer function which >> return true if the entries have 3 or 4. >> >> >> >> Expected output is: >> >> pa.array(False, True, True). >> >> >> >> The closest I could find was map lookup which expects the entries to be map. >> >> >> >> Thanks >> >> >> >> This message may contain information that is confidential or privileged. If >> you are not the intended recipient, please advise the sender immediately and >> delete this message. See >> http://www.blackrock.com/corporate/compliance/email-disclaimers for further >> information. Please refer to >> http://www.blackrock.com/corporate/compliance/privacy-policy for more >> information about BlackRock’s Privacy Policy. >> >> >> For a list of BlackRock's office addresses worldwide, see >> http://www.blackrock.com/corporate/about-us/contacts-locations. >> >> © 2022 BlackRock, Inc. All rights reserved.
