Re: topN using a heap

2016-09-25 Thread Andrei Alexandrescu via Digitalmars-d
On 9/25/16 4:19 AM, Jon Degenhardt wrote: On Saturday, 24 September 2016 at 18:22:51 UTC, Andrei Alexandrescu wrote: Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d uses the new syntax in various places. P

Re: topN using a heap

2016-09-25 Thread Jon Degenhardt via Digitalmars-d
On Saturday, 24 September 2016 at 18:22:51 UTC, Andrei Alexandrescu wrote: Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d uses the new syntax in various places. Probably the best baseline is the equivale

Re: topN using a heap

2016-09-24 Thread Andrei Alexandrescu via Digitalmars-d
On 09/23/2016 04:45 PM, Jon Degenhardt wrote: That's a very nice result. Both the absolute numbers and that all three sets achieve similar performance, as they different distribution characteristics. Got curious so I tried to patch my ldc installation (0.17.1 (DMD v2.068.2, LLVM 3.8.0)), but s

Re: topN using a heap

2016-09-23 Thread Jon Degenhardt via Digitalmars-d
On Friday, 23 September 2016 at 16:09:18 UTC, Andrei Alexandrescu wrote: BTW, as I commented in https://issues.dlang.org/show_bug.cgi?id=16517, using the new topN implementation instead of sort to compute the median over google's 1-grams is over 11x faster using dmd. That's a very nice resu

Re: topN using a heap

2016-09-23 Thread Andrei Alexandrescu via Digitalmars-d
On 09/23/2016 11:40 AM, Stefan Koch wrote: On Friday, 23 September 2016 at 15:30:20 UTC, Andrei Alexandrescu wrote: Work is blocked by https://issues.dlang.org/show_bug.cgi?id=16528, which is quite a head-scratcher. Any ideas for a workaround? Thanks! -- Andrei annotate the templates. Nagonn

Re: topN using a heap

2016-09-23 Thread Andrei Alexandrescu via Digitalmars-d
On 09/23/2016 11:40 AM, Stefan Koch wrote: On Friday, 23 September 2016 at 15:30:20 UTC, Andrei Alexandrescu wrote: Work is blocked by https://issues.dlang.org/show_bug.cgi?id=16528, which is quite a head-scratcher. Any ideas for a workaround? Thanks! -- Andrei annotate the templates. I don'

Re: topN using a heap

2016-09-23 Thread Stefan Koch via Digitalmars-d
On Friday, 23 September 2016 at 15:30:20 UTC, Andrei Alexandrescu wrote: Work is blocked by https://issues.dlang.org/show_bug.cgi?id=16528, which is quite a head-scratcher. Any ideas for a workaround? Thanks! -- Andrei annotate the templates.

Re: topN using a heap

2016-09-23 Thread Andrei Alexandrescu via Digitalmars-d
Work is blocked by https://issues.dlang.org/show_bug.cgi?id=16528, which is quite a head-scratcher. Any ideas for a workaround? Thanks! -- Andrei

Re: topN using a heap

2016-09-21 Thread Andrei Alexandrescu via Digitalmars-d
On 09/21/2016 01:37 PM, Jon Degenhardt wrote: On Wednesday, 21 September 2016 at 14:58:27 UTC, Andrei Alexandrescu wrote: On 9/21/16 4:16 AM, Jon Degenhardt wrote: Timing comparison of sort and topN, times in milliseconds: sort topN Field 2: 289 1756 Field 3: 285148

Re: topN using a heap

2016-09-21 Thread Jon Degenhardt via Digitalmars-d
On Wednesday, 21 September 2016 at 14:58:27 UTC, Andrei Alexandrescu wrote: On 9/21/16 4:16 AM, Jon Degenhardt wrote: Timing comparison of sort and topN, times in milliseconds: sort topN Field 2: 289 1756 Field 3: 285148793 Field 4: 273668906 The above times a

Re: topN using a heap

2016-09-21 Thread Andrei Alexandrescu via Digitalmars-d
On 9/21/16 4:16 AM, Jon Degenhardt wrote: Timing comparison of sort and topN, times in milliseconds: sort topN Field 2: 289 1756 Field 3: 285148793 Field 4: 273668906 The above times are for LDC 1.1.0-beta2 (DMD 2.071.1). Similar behavior is seen for DMD 2.071.

Re: topN using a heap

2016-09-21 Thread Jon Degenhardt via Digitalmars-d
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: So let me summarize what has happened: 1. topN was reportedly slow. It was using a random pivot. I made it use getPivot (deterministic) instead of a random pivot in https://github.com/D-Programming-Language/phobos/pull/39

Re: topN using a heap

2016-01-20 Thread Andrei Alexandrescu via Digitalmars-d
On 01/19/2016 08:08 PM, Ivan Kazmenko wrote: On Tuesday, 19 January 2016 at 13:52:08 UTC, Andrei Alexandrescu wrote: On 01/18/2016 09:21 PM, Ivan Kazmenko wrote: Do you think sort and topN would be attackable if they used a per-process-seeded RNG as per Xinok's idea? -- Andrei Yes, but with a

Re: topN using a heap

2016-01-19 Thread Ivan Kazmenko via Digitalmars-d
On Tuesday, 19 January 2016 at 13:52:08 UTC, Andrei Alexandrescu wrote: On 01/18/2016 09:21 PM, Ivan Kazmenko wrote: Do you think sort and topN would be attackable if they used a per-process-seeded RNG as per Xinok's idea? -- Andrei Yes, but with a little interactivity (not generating the inpu

Re: topN using a heap

2016-01-19 Thread Andrei Alexandrescu via Digitalmars-d
On 01/19/2016 01:27 PM, Timon Gehr wrote: On 01/19/2016 04:33 PM, Andrei Alexandrescu wrote: On 01/19/2016 08:10 AM, Andrei Alexandrescu wrote: So anyhow we need the "intro" part. I'll get to that soon. Destroy! I made it part of https://github.com/D-Programming-Language/phobos/pull/3934. ht

Re: topN using a heap

2016-01-19 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 04:33 PM, Andrei Alexandrescu wrote: On 01/19/2016 08:10 AM, Andrei Alexandrescu wrote: So anyhow we need the "intro" part. I'll get to that soon. Destroy! I made it part of https://github.com/D-Programming-Language/phobos/pull/3934. https://github.com/andralex/phobos/commit/4ba

Re: topN using a heap

2016-01-19 Thread Andrei Alexandrescu via Digitalmars-d
On 01/19/2016 08:10 AM, Andrei Alexandrescu wrote: So anyhow we need the "intro" part. I'll get to that soon. Destroy! I made it part of https://github.com/D-Programming-Language/phobos/pull/3934. https://github.com/andralex/phobos/commit/4ba95cd1bd5124b53324c441e62c51d759481b04 One interes

Re: topN using a heap

2016-01-19 Thread Andrei Alexandrescu via Digitalmars-d
On 01/18/2016 09:21 PM, Ivan Kazmenko wrote: On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: 4. sort was and is attackable before all of these changes No, sort utilizes Introsort (Quicksort but switch to Heapsort if recurse too deep): see https://github.com/D-Programmin

Re: topN using a heap

2016-01-19 Thread Andrei Alexandrescu via Digitalmars-d
On 01/18/2016 09:42 PM, Xinok wrote: To be clear, sort is NOT attackable. Introsort is used for unstable sorting which begins with quick sort but falls back to heap sort after too many recursions to maintain O(n log n) running time. Timsort is used for stable sorting which is a variant of merge s

Re: topN using a heap

2016-01-19 Thread Andrei Alexandrescu via Digitalmars-d
On 01/18/2016 09:44 PM, Ivan Kazmenko wrote: On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: 4. sort was and is attackable before all of these changes (b) Improve sort() first, then apply a similar strategy to improving topN. Do not use RNGs at all. Since point 4 is in

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: 4. sort was and is attackable before all of these changes (b) Improve sort() first, then apply a similar strategy to improving topN. Do not use RNGs at all. Since point 4 is in fact already fixed a good while ago, my sug

Re: topN using a heap

2016-01-18 Thread Xinok via Digitalmars-d
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: Of course not. I think this back-and-forth takes away from the gist of things. So let me summarize what has happened: 1. topN was reportedly slow. It was using a random pivot. I made it use getPivot (deterministic) instead

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: 4. sort was and is attackable before all of these changes No, sort utilizes Introsort (Quicksort but switch to Heapsort if recurse too deep): see https://github.com/D-Programming-Language/phobos/blob/2.067/std/algorithm/s

Re: topN using a heap

2016-01-18 Thread deadalnix via Digitalmars-d
On Tuesday, 19 January 2016 at 00:17:30 UTC, Andrei Alexandrescu wrote: How would this translate to a matter of selecting the pivot during sort? -- Andrei A large chunk of a given datacenter going quadratic at the same time.

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 01:04:03 UTC, Andrei Alexandrescu wrote: On 1/18/16 7:46 PM, Ilya wrote: On Tuesday, 19 January 2016 at 00:38:14 UTC, Timon Gehr wrote: On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Progr

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 01:04:03 UTC, Andrei Alexandrescu wrote: On 1/18/16 7:46 PM, Ilya wrote: On Tuesday, 19 January 2016 at 00:38:14 UTC, Timon Gehr wrote: On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Progr

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 01:04:03 UTC, Andrei Alexandrescu wrote: On 1/18/16 7:46 PM, Ilya wrote: On Tuesday, 19 January 2016 at 00:38:14 UTC, Timon Gehr wrote: On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Progr

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 7:46 PM, Ilya wrote: On Tuesday, 19 January 2016 at 00:38:14 UTC, Timon Gehr wrote: On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Programming-Language/phobos/blob/master/std/random.d#L1151 --Ilya The defau

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 00:38:14 UTC, Timon Gehr wrote: On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Programming-Language/phobos/blob/master/std/random.d#L1151 --Ilya The default RNG is seeded with unpredictabl

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 01:12 AM, Ilya wrote: There is already implementation with predictable seed. Proof: https://github.com/D-Programming-Language/phobos/blob/master/std/random.d#L1151 --Ilya The default RNG is seeded with unpredictableSeed. What is the point you are trying to make?

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:51 PM, Ilya wrote: On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 6:55 PM, deadalnix wrote: On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: unpredictableSeed uses the system clock as a source of randomness, so we're good there. -- Andrei I got problem with that even when crytographically secure randomness wasn't needed more

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Tuesday, 19 January 2016 at 00:02:08 UTC, Timon Gehr wrote: On 01/19/2016 12:55 AM, Ilya wrote: On Monday, 18 January 2016 at 23:53:53 UTC, Timon Gehr wrote: On 01/19/2016 12:50 AM, Ilya wrote: ... 1. Yes, probability of hitting the worst case repeatedly is is practically zero. But RNGs

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 6:51 PM, Ilya wrote: On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It o

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Monday, 18 January 2016 at 23:39:02 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:18 PM, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for worst case. --Ilya Well it does improve things. The probability of hitting the worst case repeatedly is practically zero,

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 12:55 AM, Ilya wrote: On Monday, 18 January 2016 at 23:53:53 UTC, Timon Gehr wrote: On 01/19/2016 12:50 AM, Ilya wrote: ... 1. Yes, probability of hitting the worst case repeatedly is is practically zero. But RNGs do not change this probability. 2. It is possible to build attack

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 23:55:38 UTC, Timon Gehr wrote: On 01/19/2016 12:51 AM, Ilya wrote: On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: [...] No, it is definitel

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 12:51 AM, Ilya wrote: On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case.

Re: topN using a heap

2016-01-18 Thread deadalnix via Digitalmars-d
On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: unpredictableSeed uses the system clock as a source of randomness, so we're good there. -- Andrei I got problem with that even when crytographically secure randomness wasn't needed more than once. A specific case included

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 23:53:53 UTC, Timon Gehr wrote: On 01/19/2016 12:50 AM, Ilya wrote: ... 1. Yes, probability of hitting the worst case repeatedly is is practically zero. But RNGs do not change this probability. 2. It is possible to build attack for our RNGs, because they are Pseu

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 12:39 AM, Andrei Alexandrescu wrote: On 1/18/16 6:18 PM, Ilya wrote: On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: Here goes the test which shows quadratic behavior for the new version: http://dpa

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/19/2016 12:50 AM, Ilya wrote: ... 1. Yes, probability of hitting the worst case repeatedly is is practically zero. But RNGs do not change this probability. 2. It is possible to build attack for our RNGs, because they are Pseudo-RNGs. --Ilya You also need to predict the seed. How do you d

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for w

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 23:39:02 UTC, Andrei Alexandrescu wrote: On 1/18/16 6:18 PM, Ilya wrote: On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: Here goes the test which shows quadratic behavior for the new

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 6:44 PM, Ilya wrote: On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for worst case. --Ilya Still, use of RNG makes it impossible to construct th

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 6:27 PM, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for worst case. --Ilya Still, use of RNG makes it impossible to construct the worst case beforehand, once and for all. In that sense

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for worst case. --Ilya Still, use of RNG makes it impossible to construct the worst case beforehand, once an

Re: topN using a heap

2016-01-18 Thread Andrei Alexandrescu via Digitalmars-d
On 1/18/16 6:18 PM, Ilya wrote: On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: Here goes the test which shows quadratic behavior for the new version: http://dpaste.dzfl.pl/e4b3bc26c3cf (dpaste kills the slow code

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote: A RNGs don't improve worst case. It only changes an permutation for worst case. --Ilya Still, use of RNG makes it impossible to construct the worst case beforehand, once and for all. In that sense, this is a regression.

Re: topN using a heap

2016-01-18 Thread Ilya via Digitalmars-d
On Monday, 18 January 2016 at 20:45:56 UTC, Ivan Kazmenko wrote: On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: Here goes the test which shows quadratic behavior for the new version: http://dpaste.dzfl.pl/e4b3bc26c3cf (dpaste kills the slow code before it completes the task)

Re: topN using a heap

2016-01-18 Thread Timon Gehr via Digitalmars-d
On 01/18/2016 01:00 PM, Ivan Kazmenko wrote: The old version (2.070.0-b2) could not be tricked with it, does it use random? Yes, it selected the pivot uniformly at random using the global RNG. (This is also why the documentation says topN is O(n) in expectation.)

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: Here goes the test which shows quadratic behavior for the new version: http://dpaste.dzfl.pl/e4b3bc26c3cf (dpaste kills the slow code before it completes the task) Correction: this is the result of removing a uniform call in pull

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Monday, 18 January 2016 at 12:00:10 UTC, Ivan Kazmenko wrote: On Sunday, 17 January 2016 at 22:20:30 UTC, Andrei Alexandrescu wrote: All - let me know how things can be further improved. Thx! Here goes the test which shows quadratic behavior for the new version: http://dpaste.dzfl.pl/e4b3b

Re: topN using a heap

2016-01-18 Thread Ivan Kazmenko via Digitalmars-d
On Sunday, 17 January 2016 at 22:20:30 UTC, Andrei Alexandrescu wrote: On 01/17/2016 03:32 PM, Ivan Kazmenko wrote: Here is a more verbose version. OK, very nice. Thanks! I've modified topN to work as follows. In a loop: * If nth <= r.length / log2(r.length)^^2 (or is similarly close to r.

Re: topN using a heap

2016-01-17 Thread deadalnix via Digitalmars-d
On Monday, 18 January 2016 at 01:38:16 UTC, Andrei Alexandrescu wrote: On 1/17/16 8:07 PM, deadalnix wrote: A common way to do it is to go quicksort, but only recurse on one side of the set. That should give log(n)^2 complexity on average. Yah, that's quickselect (which this work started from

Re: topN using a heap

2016-01-17 Thread Andrei Alexandrescu via Digitalmars-d
On 1/17/16 8:07 PM, deadalnix wrote: A common way to do it is to go quicksort, but only recurse on one side of the set. That should give log(n)^2 complexity on average. Yah, that's quickselect (which this work started from). It's linear, and you can't get top n in sublinear time because you ne

Re: topN using a heap

2016-01-17 Thread deadalnix via Digitalmars-d
On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: https://github.com/D-Programming-Language/phobos/pull/3934 So, say you're looking for the smallest 10 elements out of 100_000. The quickselect algorithm (which topN currently uses) will successively partition the set in (

Re: topN using a heap

2016-01-17 Thread Andrei Alexandrescu via Digitalmars-d
On 01/17/2016 03:32 PM, Ivan Kazmenko wrote: Here is a more verbose version. OK, very nice. Thanks! I've modified topN to work as follows. In a loop: * If nth <= r.length / log2(r.length)^^2 (or is similarly close to r.length), use topNHeap with one heap and stop * If nth <= r.length / log2(r

Re: topN using a heap

2016-01-17 Thread Ivan Kazmenko via Digitalmars-d
On Sunday, 17 January 2016 at 16:06:31 UTC, Andrei Alexandrescu wrote: On 01/17/2016 06:41 AM, Ivan Kazmenko wrote: The average case is O(n + (k log n log k)) for small enough k. So, any k below roughly n / log^2 (n) makes the second summand less than the first. I don't understand how you der

Re: topN using a heap

2016-01-17 Thread Andrei Alexandrescu via Digitalmars-d
On 01/17/2016 06:41 AM, Ivan Kazmenko wrote: The average case is O(n + (k log n log k)) for small enough k. So, any k below roughly n / log^2 (n) makes the second summand less than the first. I don't understand how you derived the average case complexity, and I don't understand how you derived

Re: topN using a heap

2016-01-17 Thread Andrei Alexandrescu via Digitalmars-d
On 01/17/2016 06:55 AM, Ivan Kazmenko wrote: So, what can be done is to introduce TopNStrategy.auto which is the default and uses the (current or better) heuristic to switch between strategies, but also leave a way to explicitly select one of the two strategies, just like one can do now with sort

Re: topN using a heap

2016-01-17 Thread Ivan Kazmenko via Digitalmars-d
On Sunday, 17 January 2016 at 02:37:48 UTC, Timon Gehr wrote: On 01/17/2016 03:09 AM, Andrei Alexandrescu wrote: On 1/16/16 8:00 PM, Timon Gehr wrote: The implementation falls back to topNHeap whenever k is within the first or last ~n/8 elements and therefore is Ω(n log n) on average. Correc

Re: topN using a heap

2016-01-17 Thread Ivan Kazmenko via Digitalmars-d
On Sunday, 17 January 2016 at 03:26:54 UTC, Andrei Alexandrescu wrote: On 1/16/16 9:37 PM, Timon Gehr wrote: Ivan's analysis suggests that even something significantly larger, like n/log(n)² might work as an upper bound for k. I'm not clear on how you got to that boundary. There are a few im

Re: topN using a heap

2016-01-16 Thread Ilya Yaroshenko via Digitalmars-d
On Sunday, 17 January 2016 at 03:26:54 UTC, Andrei Alexandrescu wrote: On 1/16/16 9:37 PM, Timon Gehr wrote: Ivan's analysis suggests that even something significantly larger, like n/log(n)² might work as an upper bound for k. I'm not clear on how you got to that boundary. There are a few im

Re: topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
On 1/16/16 9:37 PM, Timon Gehr wrote: Ivan's analysis suggests that even something significantly larger, like n/log(n)² might work as an upper bound for k. I'm not clear on how you got to that boundary. There are a few implementation details to be minded as well (quickly and carefully approxi

Re: topN using a heap

2016-01-16 Thread Timon Gehr via Digitalmars-d
On 01/17/2016 03:09 AM, Andrei Alexandrescu wrote: On 1/16/16 8:00 PM, Timon Gehr wrote: On 01/16/2016 10:28 PM, Ivan Kazmenko wrote: On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: ... To summarize: For k< The implementation falls back to topNHeap whenever k is wit

Re: topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
On 1/16/16 9:12 PM, Andrei Alexandrescu wrote: On 1/16/16 4:58 PM, Ziad Hatahet via Digitalmars-d wrote: On Sat, Jan 16, 2016 at 7:25 AM, Andrei Alexandrescu via Digitalmars-d mailto:digitalmars-d@puremagic.com>> wrote: 1. Organize the first 11 elements into a max heap Why not the first

Re: topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
On 1/16/16 5:27 PM, Ivan Kazmenko wrote: On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: 3. At the end, swap the largest in the heap with the 10th and you're done! And why this? Do we additionally require the k-th element to arrive exactly on k-th place? Yes. -- And

Re: topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
On 1/16/16 4:58 PM, Ziad Hatahet via Digitalmars-d wrote: On Sat, Jan 16, 2016 at 7:25 AM, Andrei Alexandrescu via Digitalmars-d mailto:digitalmars-d@puremagic.com>> wrote: 1. Organize the first 11 elements into a max heap Why not the first 10? So you get to put the appropriate element

Re: topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
On 1/16/16 8:00 PM, Timon Gehr wrote: On 01/16/2016 10:28 PM, Ivan Kazmenko wrote: On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: ... To summarize: For k< The implementation falls back to topNHeap whenever k is within the first or last ~n/8 elements and therefore is

Re: topN using a heap

2016-01-16 Thread Timon Gehr via Digitalmars-d
On 01/16/2016 10:28 PM, Ivan Kazmenko wrote: On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: ... To summarize: For k< The implementation falls back to topNHeap whenever k is within the first or last ~n/8 elements and therefore is Ω(n log n) on average.

Re: topN using a heap

2016-01-16 Thread Ivan Kazmenko via Digitalmars-d
On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: 3. At the end, swap the largest in the heap with the 10th and you're done! And why this? Do we additionally require the k-th element to arrive exactly on k-th place?

Re: topN using a heap

2016-01-16 Thread Ziad Hatahet via Digitalmars-d
On Sat, Jan 16, 2016 at 7:25 AM, Andrei Alexandrescu via Digitalmars-d < digitalmars-d@puremagic.com> wrote: > > 1. Organize the first 11 elements into a max heap > > Why not the first 10?

Re: topN using a heap

2016-01-16 Thread Ivan Kazmenko via Digitalmars-d
On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu wrote: That's quite a bit of work, so 3934 uses an alternate strategy for finding the smallest 10: 1. Organize the first 11 elements into a max heap 2. Scan all other elements progressively. Whenever an element is found that is

topN using a heap

2016-01-16 Thread Andrei Alexandrescu via Digitalmars-d
https://github.com/D-Programming-Language/phobos/pull/3934 So, say you're looking for the smallest 10 elements out of 100_000. The quickselect algorithm (which topN currently uses) will successively partition the set in (assuming the pivot choice works well) 50_000, 25_000, etc chunks all the