Reserving/Preallocating associative array?

2013-12-24 Thread Gordon
Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions; auto f = File("input.txt"); fo

Re: Reserving/Preallocating associative array?

2013-12-24 Thread bearophile
Gordon: void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i]

Re: Reserving/Preallocating associative array?

2013-12-24 Thread bearophile
Gordon: The file looks like: 1 40 4 2 42 11 ... And has 11M lines. What's the interval of the numbers of the first column? Bye, bearophile

Re: Reserving/Preallocating associative array?

2013-12-24 Thread bearophile
Some ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to disable the GC before the associative array creation and re-enable it when it's buil

Re: Reserving/Preallocating associative array?

2013-12-24 Thread Gordon
Hello Bearophine, On Tuesday, 24 December 2013 at 23:13:59 UTC, bearophile wrote: Some ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to d

Re: Reserving/Preallocating associative array?

2013-12-24 Thread Andrei Alexandrescu
On 12/24/13 2:28 PM, Gordon wrote: Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions;

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Philippe Sigaud
Out of curiosity, do you know which one of his 3 suggestions brought you the highest speed boost? What happens if you do not disable the GC, for example?

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Gordon
On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud wrote: Out of curiosity, do you know which one of his 3 suggestions brought you the highest speed boost? What happens if you do not disable the GC, for example? Good question, I did test each one incrementally: 1. Switching from

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Gordon
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote: void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Thanks for this very elegant solution! For completenes

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Andrei Alexandrescu
On 12/25/13 8:29 AM, Gordon wrote: On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote: void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Thanks for this ver

Re: Reserving/Preallocating associative array?

2013-12-25 Thread John Colvin
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote: On 12/24/13 2:28 PM, Gordon wrote: Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code lo

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Andrei Alexandrescu
On 12/25/13 10:25 AM, John Colvin wrote: watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one. Ew, good point. Thanks. Gordon, you may find this has better performance if you add ()

Re: Reserving/Preallocating associative array?

2013-12-25 Thread H. S. Teoh
On Wed, Dec 25, 2013 at 10:51:23AM -0800, Andrei Alexandrescu wrote: > On 12/25/13 10:25 AM, John Colvin wrote: > >watch out for the parenthsesis on sort. As bearophile likes to point > >out frequently, without parenthesis you are calling the builtin sort, > >not the std.algorithm one. > > Ew, goo

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Gordon
On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote: On 12/25/13 10:25 AM, John Colvin wrote: Gordon, you may find this has better performance if you add () to sort. Also, can you share your data somewhere if it's not confidential? It will be related to this researc

Re: Reserving/Preallocating associative array?

2013-12-25 Thread Philippe Sigaud
On Wed, Dec 25, 2013 at 5:47 PM, Andrei Alexandrescu wrote: > On 12/25/13 8:29 AM, Gordon wrote: >> For completeness (since we're dealing with timing): >> >> 1. Running the above code with Garbage-collection enabled, takes 1m45s. >> >> 2. Running it with GC disabled takes 50s . >> >> 3. Running wi

Re: Reserving/Preallocating associative array?

2013-12-25 Thread bearophile
H. S. Teoh: Is the built-in sort deprecated yet? If it is, when can we finally say bye-bye to it? https://d.puremagic.com/issues/show_bug.cgi?id=10318 Bye, bearophile

Re: Reserving/Preallocating associative array?

2013-12-26 Thread bearophile
GC.disable; size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; }

Re: Reserving/Preallocating associative array?

2013-12-26 Thread thedeemon
On Thursday, 26 December 2013 at 14:09:29 UTC, bearophile wrote: A question for people that know something about the D garbage collector: can it be added some logic to the GC to detect the situations where you have to allocate many times very frequently, and disable or rarefy the collection fr

Re: Reserving/Preallocating associative array?

2013-12-26 Thread Benjamin Thaut
Am 25.12.2013 21:26, schrieb Gordon: On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote: On 12/25/13 10:25 AM, John Colvin wrote: Gordon, you may find this has better performance if you add () to sort. Also, can you share your data somewhere if it's not confidential?

Re: Reserving/Preallocating associative array?

2013-12-26 Thread Gordon
On Thursday, 26 December 2013 at 18:07:38 UTC, Benjamin Thaut wrote: Did you use D to convert that SQL dump to tabular data? If so, could you post the small snippet? If not did you convert the "relationship" table? No, I use "D" for some internal experimentation. To get the data as tabular I

Re: Reserving/Preallocating associative array?

2013-12-26 Thread Marco Leise
Am Wed, 25 Dec 2013 16:12:09 + schrieb "Gordon" : > On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud > wrote: > > Out of curiosity, do you know which one of his 3 suggestions > > brought you > > the highest speed boost? What happens if you do not disable the > > GC, for > > e

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Am 25.12.2013 17:12, schrieb Gordon: Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union")

Re: Reserving/Preallocating associative array?

2013-12-27 Thread bearophile
Benjamin Thaut: If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks) What compiler are you using? ldc2? And is it a good idea to put MurMur in D? 50% find free entry in hashmap 21% parse uint Perhaps th

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Am 27.12.2013 12:02, schrieb bearophile: Benjamin Thaut: If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks) What compiler are you using? ldc2? dmd 2.064.2 And is it a good idea to put MurMur in D? I do

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Gordon
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote: Am 25.12.2013 17:12, schrieb Gordon: So I was interrested how far you can take this. The version bearophile posted runs in 14 seconds (48420527 ticks) on my machine. <...> At this point the profiler looked something like t

Re: Reserving/Preallocating associative array?

2013-12-27 Thread bearophile
Gordon: Who is "JM" who wrote "byLineFast" ? I'm using his code and would like to properly acknowledge him/her. It should be Juan Manuel Cabo: http://permalink.gmane.org/gmane.comp.lang.d.general/117750 Bye, bearophile

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Am 27.12.2013 17:49, schrieb Gordon: Benjamin, Can you point me to your Hashmap implementation? I could perhaps use it to improve the timings even more. https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d This implementation depends on my own allocator design, but it should b

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap implementation to support binary serialization, by just writing the enti

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Peter Alexander
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote: At this point the profiler looked something like this: 50% find free entry in hashmap When I'm building large immutable hash tables I do this: Read in all the unordered key-value pairs into an array of Tuple!(Key, Value) De

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Daniel Kozak
On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote: Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[s

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Am 27.12.2013 19:25, schrieb Daniel Kozak: using OrderedAA improve speed 3x https://github.com/Kozzi11/Trash/tree/master/util A possible downside of this implementation is though, that due to the fact that you are using a double linked list per index, there will be more chache misses during

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Gordon
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote: Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap imple

Re: Reserving/Preallocating associative array?

2013-12-27 Thread Benjamin Thaut
Am 27.12.2013 20:55, schrieb Gordon: On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote: Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. Yo

Re: Reserving/Preallocating associative array?

2013-12-28 Thread Benjamin Thaut
Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results: My implementation. Power of two size (25% free): building hashmap: 8 seconds 28880605 ticks looking entries up: 0 seconds 31685 ticks My implementation. Prime number size (25% free):

Re: Reserving/Preallocating associative array?

2013-12-30 Thread Daniel Kozak
On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut wrote: Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results: My implementation. Power of two size (25% free): building hashmap: 8 seconds 28880605 ticks looking entries up: 0

Re: Reserving/Preallocating associative array?

2013-12-30 Thread Benjamin Thaut
Am 30.12.2013 15:06, schrieb Daniel Kozak: This is cool! Can you post somewhere your code and data which you use for this test? I really like to test it on my new AA implementation :) Here is the post that describes how to get the data: http://forum.dlang.org/post/yglwnmawrvbhpszds...@forum.