Yes please, I think it makes sense and should be fairly straightforward On Mon, Nov 29, 2021 at 5:38 PM Niranda Perera <[email protected]> wrote:
> Should I open a JIRA on this? > > On Mon, Nov 29, 2021, 10:52 Alessandro Molina < > [email protected]> wrote: > >> Oh, ops, sorry my fault, I understood the question reversed :D >> >> I think that if we had a compute function that returns indices of a >> matching value that could also be applied to masks to retrieve the indices >> of any "true" value thus also solving your question if combined with is_in >> (or any other predicate at that point). That might be a reasonable addition >> to compute functions. >> >> >> On Sun, Nov 28, 2021 at 7:00 AM Niranda Perera <[email protected]> >> wrote: >> >>> Hi guys, sorry for the late reply. >>> >>> Yes, Joris is right. I want the converse (I think 😊 ) of index in. I >>> was discussing this with Eduardo in zulip [1]. >>> >>> I was hoping that I could do this. >>> ``` >>> values = pa.array([1, 2, 2, 3, 4, 1]) >>> to_find= pa.array([1, 2, 1]) >>> indices = pc.index_in(to_find, value_set=values) # expected = [0, 5, 1, >>> 2, 0, 5] received = [0, 1, 0] >>> ``` >>> So, index_in does not handle duplicated indices of values (I am guessing >>> it creates a hashmap of values, and not a multimap). >>> >>> One suggestion was to use `aggregations.index`. And I think that might >>> work recursively, as follows. But I haven't tested this. >>> ``` >>> indices = [] >>> for f in to_find: >>> idx = -1 >>> while true: >>> idx = pc.index(values, f, start=idx + 1, end=len(values)) >>> if idx == -1: >>> break >>> else: >>> indices.append(idx) >>> ``` >>> >>> But I was thinking if it would make sense to give a method to find all >>> indices of a value (inner while loop)? >>> >>> Best >>> >>> [1] >>> https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Find.20a.20value.20indices.20in.20an.20array/near/262351923 >>> >>> >>> On Thu, Nov 25, 2021 at 3:14 PM Joris Van den Bossche < >>> [email protected]> wrote: >>> >>>> I think "index_in" does the index in the other way around? It gives, >>>> for each value of the array, the index in the set. While if I >>>> understand the question correctly, Niranda is looking for the index >>>> into the array for elements that are present in the set. >>>> >>>> Something like that could be achieved by using "is_in", and then >>>> getting the indices of the True values: >>>> >>>> >>> pc.is_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3])) >>>> <pyarrow.lib.BooleanArray object at 0x7fcc96896a00> >>>> [ >>>> true, >>>> false, >>>> true >>>> ] >>>> >>>> To get the location of the True values, in numpy this is called >>>> "nonzero", and we have an open JIRA for adding this as a kernel >>>> (https://issues.apache.org/jira/browse/ARROW-13035) >>>> >>>> On Thu, 25 Nov 2021 at 11:17, Alessandro Molina >>>> <[email protected]> wrote: >>>> > >>>> > I think index_in is what you are looking for >>>> > >>>> > >>> pc.index_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3])) >>>> > <pyarrow.lib.Int32Array object at 0x11e2a6580> >>>> > [ >>>> > 0, >>>> > null, >>>> > 1 >>>> > ] >>>> > >>>> > On Sat, Nov 20, 2021 at 4:49 AM Niranda Perera < >>>> [email protected]> wrote: >>>> >> >>>> >> Hi all, is there a compute API for searching a value index (and a >>>> set of values) in an Array? >>>> >> ex: >>>> >> ```python >>>> >> a = [1, 2, 2, 3, 4, 1] >>>> >> values= pa.array([1, 2, 1]) >>>> >> >>>> >> index = find_index(a, 1) # = [0, 5] >>>> >> indices = find_indices(a, values) # = [0, 1, 2, 5] >>>> >> ``` >>>> >> I am currently using `compute.is_in` and traversing the true indices >>>> of the result Bitmap. Is there a better way? >>>> >> >>>> >> Best >>>> >> -- >>>> >> Niranda Perera >>>> >> https://niranda.dev/ >>>> >> @n1r44 >>>> >> >>>> >>> >>> >>> -- >>> Niranda Perera >>> https://niranda.dev/ >>> @n1r44 <https://twitter.com/N1R44> >>> >>>
