Arkanosis added a comment.

FWIW, I've just tried to convert the ttl dump of the 1st of November 2017 on a machine with 378 GiB of RAM and 0 GiB of swap and… well… it failed with std::bad_alloc after more than 21 hours of runtime. Granted, there was another process eating ~100 GiB of memory, but I thought it would be okay — so I'm proved wrong.

As I was optimistic, I ran the conversion directly from the ttl.gz file, maybe preventing some memory mapping optimization, and also added the -i flag to generate the index at the same time. I'll re-run the conversion without these in the hope of finally getting the hdt file.

So, here are the statistics I got:

$ /usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz  wikidata-20171101-all.hdt
Catch exception load: std::bad_alloc
ERROR: std::bad_alloc
Command exited with non-zero status 1
        Command being timed: "rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz wikidata-20171101-all.hdt"
        User time (seconds): 64999.77
        System time (seconds): 10906.79
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 21:13:25
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 200475524
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 703
        Minor (reclaiming a frame) page faults: 8821385485
        Voluntary context switches: 36774
        Involuntary context switches: 4514261
        Swaps: 0
        File system inputs: 81915000
        File system outputs: 2767696
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1
/usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz   64999,77s user 10906,80s system 99% cpu 21:13:25,50 total

NB: the exceptionally long runtime is the result of the conversion being single-threaded while the machine has a lot of threads but a relatively low per-thread performance (2.3 Ghz). The process wasn't under memory pressure until it crashed (no swap anyway) and wasn't waiting much for I/O — so it was all CPU-bound.


TASK DETAIL
https://phabricator.wikimedia.org/T179681

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Arkanosis
Cc: Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to