+1. We have the same issue. A direct kernel would be very useful.

On Wed, Nov 2, 2022 at 1:46 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> While there are indeed some workarounds possible by composing the
> existing kernels (as David shows), we should ideally have a direct
> kernel for this kind of operation, but that kernel currently doesn't
> exist.
>
> I recently ran into a similar issue, and I opened
> https://issues.apache.org/jira/browse/ARROW-18097 about a
> "list_contains" scalar kernel, which would already for checking
> against a single value. Maybe we then also want a "list_is_in" kernel
> for checking with multiple values (although one could already combine
> multiple "list_contains" calls).
>
> Joris
>
> On Wed, 2 Nov 2022 at 20:01, Suresh V <suresh0...@gmail.com> wrote:
> >
> > HI David .. Thank you very much for the response. I apologize for not
> posing the question correctly.
> >
> > The method you have does give the right answer, but it results in
> multiple new objects and multiple data passes.
> >
> > I was looking for a kernel which avoids that as I am dealing with really
> large arrays. Please let me know if I am not being clear.
> >
> > Thanks again for your help.
> >
> > On Wed, Nov 2, 2022, 2:40 PM Lee, David <david....@blackrock.com> wrote:
> >>
> >> Slight correction for 3 or 4 instead of just 3..
> >>
> >>
> >>
> >> result = pc.is_in(list(range(len(arr))), pc.filter(indices,
> pc.is_in(flat_arr, pa.array([3,4]))))
> >>
> >>
> >>
> >> From: Lee, David
> >> Sent: Wednesday, November 2, 2022 11:26 AM
> >> To: user@arrow.apache.org
> >> Subject: RE: Filter a list array based on the contents of the list.
> >>
> >>
> >>
> >> This works..
> >>
> >>
> >>
> >> import pyarrow as pa
> >>
> >> import pyarrow.compute as pc
> >>
> >>
> >>
> >> arr = pa.array([[1,2],[3],[3,4,5]])
> >>
> >>
> >>
> >> indices = pc.list_parent_indices(arr)
> >>
> >> flat_arr = pc.list_flatten(arr)
> >>
> >>
> >>
> >>
> >>
> >> result = pc.is_in(list(range(len(arr))), pc.filter(indices,
> pc.equal(flat_arr, 3)))
> >>
> >>
> >>
> >> >>> result
> >>
> >> <pyarrow.lib.BooleanArray object at 0x00000243EA2D4D00>
> >>
> >> [
> >>
> >>   false,
> >>
> >>   true,
> >>
> >>   true
> >>
> >> ]
> >>
> >>
> >>
> >>
> >>
> >> From: Suresh V <suresh0...@gmail.com>
> >> Sent: Wednesday, November 2, 2022 10:23 AM
> >> To: user@arrow.apache.org
> >> Subject: Filter a list array based on the contents of the list.
> >>
> >>
> >>
> >> External Email: Use caution with links and attachments
> >>
> >> Hi ..
> >>
> >>
> >>
> >> Is there a compute function I can use to filter an array with list
> entries based on the contents of the list?
> >>
> >>
> >>
> >> For eg.
> >>
> >> arr = pa.array([1,2],[3],[3,4,5]). I want to run a computer function
> which return true if the entries have 3 or 4.
> >>
> >>
> >>
> >> Expected output is:
> >>
> >> pa.array(False, True, True).
> >>
> >>
> >>
> >> The closest I could find was map lookup which expects the entries to be
> map.
> >>
> >>
> >>
> >> Thanks
> >>
> >>
> >>
> >> This message may contain information that is confidential or
> privileged. If you are not the intended recipient, please advise the sender
> immediately and delete this message. See
> http://www.blackrock.com/corporate/compliance/email-disclaimers for
> further information.  Please refer to
> http://www.blackrock.com/corporate/compliance/privacy-policy for more
> information about BlackRock’s Privacy Policy.
> >>
> >>
> >> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/about-us/contacts-locations.
> >>
> >> © 2022 BlackRock, Inc. All rights reserved.
>

Reply via email to