Hi guys, sorry for the late reply.
Yes, Joris is right. I want the converse (I think 😊 ) of index in. I was
discussing this with Eduardo in zulip [1].
I was hoping that I could do this.
```
values = pa.array([1, 2, 2, 3, 4, 1])
to_find= pa.array([1, 2, 1])
indices = pc.index_in(to_find, value_set=values) # expected = [0, 5, 1, 2,
0, 5] received = [0, 1, 0]
```
So, index_in does not handle duplicated indices of values (I am guessing it
creates a hashmap of values, and not a multimap).
One suggestion was to use `aggregations.index`. And I think that might work
recursively, as follows. But I haven't tested this.
```
indices = []
for f in to_find:
idx = -1
while true:
idx = pc.index(values, f, start=idx + 1, end=len(values))
if idx == -1:
break
else:
indices.append(idx)
```
But I was thinking if it would make sense to give a method to find all
indices of a value (inner while loop)?
Best
[1]
https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Find.20a.20value.20indices.20in.20an.20array/near/262351923
On Thu, Nov 25, 2021 at 3:14 PM Joris Van den Bossche <
[email protected]> wrote:
> I think "index_in" does the index in the other way around? It gives,
> for each value of the array, the index in the set. While if I
> understand the question correctly, Niranda is looking for the index
> into the array for elements that are present in the set.
>
> Something like that could be achieved by using "is_in", and then
> getting the indices of the True values:
>
> >>> pc.is_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3]))
> <pyarrow.lib.BooleanArray object at 0x7fcc96896a00>
> [
> true,
> false,
> true
> ]
>
> To get the location of the True values, in numpy this is called
> "nonzero", and we have an open JIRA for adding this as a kernel
> (https://issues.apache.org/jira/browse/ARROW-13035)
>
> On Thu, 25 Nov 2021 at 11:17, Alessandro Molina
> <[email protected]> wrote:
> >
> > I think index_in is what you are looking for
> >
> > >>> pc.index_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3]))
> > <pyarrow.lib.Int32Array object at 0x11e2a6580>
> > [
> > 0,
> > null,
> > 1
> > ]
> >
> > On Sat, Nov 20, 2021 at 4:49 AM Niranda Perera <[email protected]>
> wrote:
> >>
> >> Hi all, is there a compute API for searching a value index (and a set
> of values) in an Array?
> >> ex:
> >> ```python
> >> a = [1, 2, 2, 3, 4, 1]
> >> values= pa.array([1, 2, 1])
> >>
> >> index = find_index(a, 1) # = [0, 5]
> >> indices = find_indices(a, values) # = [0, 1, 2, 5]
> >> ```
> >> I am currently using `compute.is_in` and traversing the true indices of
> the result Bitmap. Is there a better way?
> >>
> >> Best
> >> --
> >> Niranda Perera
> >> https://niranda.dev/
> >> @n1r44
> >>
>
--
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>