While there are indeed some workarounds possible by composing the
existing kernels (as David shows), we should ideally have a direct
kernel for this kind of operation, but that kernel currently doesn't
exist.

I recently ran into a similar issue, and I opened
https://issues.apache.org/jira/browse/ARROW-18097 about a
"list_contains" scalar kernel, which would already for checking
against a single value. Maybe we then also want a "list_is_in" kernel
for checking with multiple values (although one could already combine
multiple "list_contains" calls).

Joris

On Wed, 2 Nov 2022 at 20:01, Suresh V <[email protected]> wrote:
>
> HI David .. Thank you very much for the response. I apologize for not posing 
> the question correctly.
>
> The method you have does give the right answer, but it results in multiple 
> new objects and multiple data passes.
>
> I was looking for a kernel which avoids that as I am dealing with really 
> large arrays. Please let me know if I am not being clear.
>
> Thanks again for your help.
>
> On Wed, Nov 2, 2022, 2:40 PM Lee, David <[email protected]> wrote:
>>
>> Slight correction for 3 or 4 instead of just 3..
>>
>>
>>
>> result = pc.is_in(list(range(len(arr))), pc.filter(indices, 
>> pc.is_in(flat_arr, pa.array([3,4]))))
>>
>>
>>
>> From: Lee, David
>> Sent: Wednesday, November 2, 2022 11:26 AM
>> To: [email protected]
>> Subject: RE: Filter a list array based on the contents of the list.
>>
>>
>>
>> This works..
>>
>>
>>
>> import pyarrow as pa
>>
>> import pyarrow.compute as pc
>>
>>
>>
>> arr = pa.array([[1,2],[3],[3,4,5]])
>>
>>
>>
>> indices = pc.list_parent_indices(arr)
>>
>> flat_arr = pc.list_flatten(arr)
>>
>>
>>
>>
>>
>> result = pc.is_in(list(range(len(arr))), pc.filter(indices, 
>> pc.equal(flat_arr, 3)))
>>
>>
>>
>> >>> result
>>
>> <pyarrow.lib.BooleanArray object at 0x00000243EA2D4D00>
>>
>> [
>>
>>   false,
>>
>>   true,
>>
>>   true
>>
>> ]
>>
>>
>>
>>
>>
>> From: Suresh V <[email protected]>
>> Sent: Wednesday, November 2, 2022 10:23 AM
>> To: [email protected]
>> Subject: Filter a list array based on the contents of the list.
>>
>>
>>
>> External Email: Use caution with links and attachments
>>
>> Hi ..
>>
>>
>>
>> Is there a compute function I can use to filter an array with list entries 
>> based on the contents of the list?
>>
>>
>>
>> For eg.
>>
>> arr = pa.array([1,2],[3],[3,4,5]). I want to run a computer function which 
>> return true if the entries have 3 or 4.
>>
>>
>>
>> Expected output is:
>>
>> pa.array(False, True, True).
>>
>>
>>
>> The closest I could find was map lookup which expects the entries to be map.
>>
>>
>>
>> Thanks
>>
>>
>>
>> This message may contain information that is confidential or privileged. If 
>> you are not the intended recipient, please advise the sender immediately and 
>> delete this message. See 
>> http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
>> information.  Please refer to 
>> http://www.blackrock.com/corporate/compliance/privacy-policy for more 
>> information about BlackRock’s Privacy Policy.
>>
>>
>> For a list of BlackRock's office addresses worldwide, see 
>> http://www.blackrock.com/corporate/about-us/contacts-locations.
>>
>> © 2022 BlackRock, Inc. All rights reserved.

Reply via email to