Hello,
I want to load a large text file containing two numeric fields
into an associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code looks like this:
===
void main()
{
size_t[size_t] unions;
auto f = File("input.txt");
fo
Gordon:
void main()
{
size_t[size_t] unions;
auto f = File("input.txt");
foreach ( line ; f.byLine() ) {
auto fields = line.split();
size_t i = to!size_t(fields[0]);
size_t j = to!size_t(fields[1]);
unions[i]
Gordon:
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
What's the interval of the numbers of the first column?
Bye,
bearophile
Some ides to speed up your code:
- Try to use parse instead of split + to!size_t
- Try to use byLineFast, in the attach here (the code is not
mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
- Try to disable the GC before the associative array creation
and re-enable it when it's buil
Hello Bearophine,
On Tuesday, 24 December 2013 at 23:13:59 UTC, bearophile wrote:
Some ides to speed up your code:
- Try to use parse instead of split + to!size_t
- Try to use byLineFast, in the attach here (the code is not
mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
- Try to d
On 12/24/13 2:28 PM, Gordon wrote:
Hello,
I want to load a large text file containing two numeric fields into an
associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code looks like this:
===
void main()
{
size_t[size_t] unions;
Out of curiosity, do you know which one of his 3 suggestions brought you
the highest speed boost? What happens if you do not disable the GC, for
example?
On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud
wrote:
Out of curiosity, do you know which one of his 3 suggestions
brought you
the highest speed boost? What happens if you do not disable the
GC, for
example?
Good question, I did test each one incrementally:
1. Switching from
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu
wrote:
void main()
{
size_t[size_t] unions;
foreach (e; "input.txt"
.slurp!(size_t, size_t)("%s %s").sort.uniq ) {
unions[e[0]] = e[1];
}
}
Thanks for this very elegant solution!
For completenes
On 12/25/13 8:29 AM, Gordon wrote:
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:
void main()
{
size_t[size_t] unions;
foreach (e; "input.txt"
.slurp!(size_t, size_t)("%s %s").sort.uniq ) {
unions[e[0]] = e[1];
}
}
Thanks for this ver
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu
wrote:
On 12/24/13 2:28 PM, Gordon wrote:
Hello,
I want to load a large text file containing two numeric fields
into an
associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code lo
On 12/25/13 10:25 AM, John Colvin wrote:
watch out for the parenthsesis on sort. As bearophile likes to point out
frequently, without parenthesis you are calling the builtin sort, not
the std.algorithm one.
Ew, good point. Thanks.
Gordon, you may find this has better performance if you add ()
On Wed, Dec 25, 2013 at 10:51:23AM -0800, Andrei Alexandrescu wrote:
> On 12/25/13 10:25 AM, John Colvin wrote:
> >watch out for the parenthsesis on sort. As bearophile likes to point
> >out frequently, without parenthesis you are calling the builtin sort,
> >not the std.algorithm one.
>
> Ew, goo
On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei
Alexandrescu wrote:
On 12/25/13 10:25 AM, John Colvin wrote:
Gordon, you may find this has better performance if you add ()
to sort.
Also, can you share your data somewhere if it's not
confidential?
It will be related to this researc
On Wed, Dec 25, 2013 at 5:47 PM, Andrei Alexandrescu
wrote:
> On 12/25/13 8:29 AM, Gordon wrote:
>> For completeness (since we're dealing with timing):
>>
>> 1. Running the above code with Garbage-collection enabled, takes 1m45s.
>>
>> 2. Running it with GC disabled takes 50s .
>>
>> 3. Running wi
H. S. Teoh:
Is the built-in sort deprecated yet? If it is, when can we
finally say bye-bye to it?
https://d.puremagic.com/issues/show_bug.cgi?id=10318
Bye,
bearophile
GC.disable;
size_t[size_t] unions;
foreach (line; "input.txt".File.byLineFast) {
line.munch(" \t"); // skip ws
immutable i = line.parse!size_t;
line.munch(" \t"); // skip ws
immutable j = line.parse!size_t;
unions[i] = j;
}
GC.enable;
}
On Thursday, 26 December 2013 at 14:09:29 UTC, bearophile wrote:
A question for people that know something about the D garbage
collector: can it be added some logic to the GC to detect the
situations where you have to allocate many times very
frequently, and disable or rarefy the collection fr
Am 25.12.2013 21:26, schrieb Gordon:
On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote:
On 12/25/13 10:25 AM, John Colvin wrote:
Gordon, you may find this has better performance if you add () to sort.
Also, can you share your data somewhere if it's not confidential?
On Thursday, 26 December 2013 at 18:07:38 UTC, Benjamin Thaut
wrote:
Did you use D to convert that SQL dump to tabular data? If so,
could you post the small snippet? If not did you convert the
"relationship" table?
No, I use "D" for some internal experimentation.
To get the data as tabular I
Am Wed, 25 Dec 2013 16:12:09 +
schrieb "Gordon" :
> On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud
> wrote:
> > Out of curiosity, do you know which one of his 3 suggestions
> > brought you
> > the highest speed boost? What happens if you do not disable the
> > GC, for
> > e
Am 25.12.2013 17:12, schrieb Gordon:
Good question, I did test each one incrementally:
1. Switching from "split+to!size_t" to "parse" (and commenting out the
"union") reduced running time from ~26s to ~20s (on average).
2. Switching from "byLine" to "byLineFast" (and commenting out the
"union")
Benjamin Thaut:
If you replace the default hash function D uses with the MurMur
hash function it brings it down even more to 8 seconds
(29425713 ticks)
What compiler are you using? ldc2?
And is it a good idea to put MurMur in D?
50% find free entry in hashmap
21% parse uint
Perhaps th
Am 27.12.2013 12:02, schrieb bearophile:
Benjamin Thaut:
If you replace the default hash function D uses with the MurMur hash
function it brings it down even more to 8 seconds (29425713 ticks)
What compiler are you using? ldc2?
dmd 2.064.2
And is it a good idea to put MurMur in D?
I do
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
Am 25.12.2013 17:12, schrieb Gordon:
So I was interrested how far you can take this. The version
bearophile posted runs in 14 seconds (48420527 ticks) on my
machine.
<...>
At this point the profiler looked something like t
Gordon:
Who is "JM" who wrote "byLineFast" ? I'm using his code and
would like to properly acknowledge him/her.
It should be Juan Manuel Cabo:
http://permalink.gmane.org/gmane.comp.lang.d.general/117750
Bye,
bearophile
Am 27.12.2013 17:49, schrieb Gordon:
Benjamin,
Can you point me to your Hashmap implementation? I could perhaps use it
to improve the timings even more.
https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d
This implementation depends on my own allocator design, but it should b
Also, gordon whats is keeping you from converting the textual
representation to a binary one? I mean, if you need to read that file in
often, you can easly preprocess it into a binary blob. You could even
modify my hashmap implementation to support binary serialization, by
just writing the enti
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
At this point the profiler looked something like this:
50% find free entry in hashmap
When I'm building large immutable hash tables I do this:
Read in all the unordered key-value pairs into an array of
Tuple!(Key, Value)
De
On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
Hello,
I want to load a large text file containing two numeric fields
into an associative array.
The file looks like:
1 40
4 2
42 11
...
And has 11M lines.
My code looks like this:
===
void main()
{
size_t[s
Am 27.12.2013 19:25, schrieb Daniel Kozak:
using OrderedAA improve speed 3x
https://github.com/Kozzi11/Trash/tree/master/util
A possible downside of this implementation is though, that due to the
fact that you are using a double linked list per index, there will be
more chache misses during
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
Also, gordon whats is keeping you from converting the textual
representation to a binary one? I mean, if you need to read
that file in often, you can easly preprocess it into a binary
blob. You could even modify my hashmap imple
Am 27.12.2013 20:55, schrieb Gordon:
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
Also, gordon whats is keeping you from converting the textual
representation to a binary one? I mean, if you need to read that file
in often, you can easly preprocess it into a binary blob. Yo
Because I was always curious to do some hashmap profiling with real
data, I did some more. Here are the results:
My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks
My implementation. Prime number size (25% free):
On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut
wrote:
Because I was always curious to do some hashmap profiling with
real data, I did some more. Here are the results:
My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0
Am 30.12.2013 15:06, schrieb Daniel Kozak:
This is cool!
Can you post somewhere your code and data which you use for this test? I
really like to test it on my new AA implementation :)
Here is the post that describes how to get the data:
http://forum.dlang.org/post/yglwnmawrvbhpszds...@forum.
36 matches
Mail list logo