[Numpy-discussion] Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Dear all, There is a PR that adds a lookup table approach to `unique`, shown below. You can get up to ~16x speedup for large integer arrays, at the cost of potentially greater memory usage. https://github.com/numpy/numpy/pull/21843 This is controlled by a new `kind` parameter, which is describ

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Ralf Gommers
On Tue, Jun 28, 2022 at 6:33 PM Miles Cranmer wrote: > Dear all, > > There is a PR that adds a lookup table approach to `unique`, shown below. > You can get up to ~16x speedup for large integer arrays, at the cost of > potentially greater memory usage. > I've seen multiple requests for not sorti

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread David Menéndez Hurtado
On Tue, 28 Jun 2022, 6:50 pm Ralf Gommers, wrote: > > >> ``` >> kind : {None, 'sort', 'table'}, optional >> > > Regarding the name, `'table'` is an implementation detail. The end user > should not have to care what the data structure is that is used. I suggest > to use something like "unsorte

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Thanks for the comments Ralf! > You cannot switch the default behavior, that will break backwards > compatibility. The default `kind=None` have no effect on input/output behavior of the function. The only changes a user will see are in terms of speed and memory usage. `unique` will select this

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Ah, I did not clarify this: `kind="table"` will *also* return a sorted array. It simply does not use a sorting algorithm to get to it. This is because the table is generated using `np.arange` (i.e., already sorted) which is then masked. ___ NumPy-Discu

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Ralf Gommers
On Tue, Jun 28, 2022 at 7:21 PM Miles Cranmer wrote: > Thanks for the comments Ralf! > > > You cannot switch the default behavior, that will break backwards > compatibility. > > The default `kind=None` have no effect on input/output behavior of the > function. The only changes a user will see are

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

2022-06-28 Thread Miles Cranmer
Regarding 2., did you have a particular approach in mind? This new lookup table method is already O(n) scaling (similar to a counting sort), so I cannot fathom a method that, as you suggest, would get significantly better performance for integer arrays. The sorting here is "free" in some sense s