Arkanosis added a comment. |
FWIW, I've just tried to convert the ttl dump of the 1st of November 2017 on a machine with 378 GiB of RAM and 0 GiB of swap and… well… it failed with std::bad_alloc after more than 21 hours of runtime. Granted, there was another process eating ~100 GiB of memory, but I thought it would be okay — so I'm proved wrong.
As I was optimistic, I ran the conversion directly from the ttl.gz file, maybe preventing some memory mapping optimization, and also added the -i flag to generate the index at the same time. I'll re-run the conversion without these in the hope of finally getting the hdt file.
So, here are the statistics I got:
$ /usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz wikidata-20171101-all.hdt Catch exception load: std::bad_alloc ERROR: std::bad_alloc Command exited with non-zero status 1 Command being timed: "rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz wikidata-20171101-all.hdt" User time (seconds): 64999.77 System time (seconds): 10906.79 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 21:13:25 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 200475524 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 703 Minor (reclaiming a frame) page faults: 8821385485 Voluntary context switches: 36774 Involuntary context switches: 4514261 Swaps: 0 File system inputs: 81915000 File system outputs: 2767696 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1 /usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz 64999,77s user 10906,80s system 99% cpu 21:13:25,50 total
NB: the exceptionally long runtime is the result of the conversion being single-threaded while the machine has a lot of threads but a relatively low per-thread performance (2.3 Ghz). The process wasn't under memory pressure until it crashed (no swap anyway) and wasn't waiting much for I/O — so it was all CPU-bound.
Cc: Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs