[Rd] Partial matching performance in data frame rownames using [

2023-12-11 Thread Hilmar Berger via R-devel
Dear all, I have seen that others have discussed the partial matching behaviour of data.frame[idx,] in the past, in particular with respect to unexpected results sets. I am aware of the fact that one can work around this using either match() or switching to tibble/data.table or similar altogethe

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-12 Thread Ivan Krylov
В Mon, 11 Dec 2023 21:11:48 +0100 Hilmar Berger via R-devel пишет: > What was unexpected is that in this case was that [.data.frame was > hanging for a long time (I waited about 10 minutes and then restarted > R). Also, this cannot be interrupted in interactive mode. That's unfortunate. If an op

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-13 Thread Hilmar Berger via R-devel
Dear Ivan, thanks a lot, that is helpful. Still, I feel that default partial matching cripples the functionality of data.frame for larger tables. Thanks again and best regards Hilmar On 12.12.23 13:55, Ivan Krylov wrote: В Mon, 11 Dec 2023 21:11:48 +0100 Hilmar Berger via R-devel пишет: W

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-16 Thread Ivan Krylov
On Wed, 13 Dec 2023 09:04:18 +0100 Hilmar Berger via R-devel wrote: > Still, I feel that default partial matching cripples the functionality > of data.frame for larger tables. Changing the default now would require a long deprecation cycle to give everyone who uses `[.data.frame` and relies on p

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-19 Thread Toby Hocking
Hi Hilmar and Ivan, I have used your code examples to write a blog post about this topic, which has figures that show the asymptotic time complexity of the various approaches, https://tdhock.github.io/blog/2023/df-partial-match/ The asymptotic complexity of partial matching appears to be quadratic

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-19 Thread Toby Hocking
Hi Hilmar and Ivan, I have used your code examples to write a blog post about this topic, which has figures that show the asymptotic time complexity of the various approaches, https://tdhock.github.io/blog/2023/df-partial-match/ The asymptotic complexity of partial matching appears to be quadratic

Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-21 Thread Hilmar Berger via R-devel
Dear Toby and Ivan, thanks a lot for the proposed patch and this detailed analysis. The timing analysis nicely shows what I suspected - that partial matching in large tables (>>10^5 rows) can get prohibitively slow. For 10^6 rows with 50% non-hits in exact matching I roughly would expect 10,000 s