Lucas_Werkmeister_WMDE updated the task description. (Show Details) |
CHANGES TO TASK DESCRIPTION
As for the processing time, on my system 9% of the dump were processed in 23 minutes, so the full conversion would probably take some hours, but not days. The CPU time as reported by Bash’s `time` builtin was actually less than the wall-clock time, so it doesn’t look like the tool is multi-threaded. But of course it’s possible that there is some additional phase of processing after the tool is done reading the file, and I have no idea how long that could take.
...
There is an `rdf2hdt` tool ([link](https://github.com/rdfhdt/hdt-cpp/tree/develop/libhdt); LGPLv2.1+) that can convert TTL dumps to HDT files. Unfortunately, it doesn’t run in a streaming fashion (it doesn’t even open the output file until it’s done converting) and seems to require almost as much memory as the uncompressed TTL dump to run. I tried to run it on the latest Wikidata dump, but the program was OOM-killed after having consumed 2.32 GiB of the gzipped input dump (according to `pv`), which corresponds to 15.63 GiB of uncompressed input data; the last `VmSize` before it was killed was 13.04 GiB. As the full uncompressed TTL dump is 187 GiB (201 GB), it looks like we would need a machine with at least ~200 GB of memory to do the conversion. ~~(Perhaps we could get away with using lots of swap space instead of actual RAM – I have no idea what kind of memory access patterns the tool has.)~~As for the processing time, on my system 9% of the dump were processed in 23 minutes, so the full conversion would probably take some hours, but not days. The CPU time as reported by Bash’s `time` builtin was actually less than the wall-clock time, so it doesn’t look like the tool is multi-threaded. But of course it’s possible that there is some additional phase of processing after the tool is done reading the file, and I have no idea how long that could take.
...
TASK DETAIL
EMAIL PREFERENCES
To: Lucas_Werkmeister_WMDE
Cc: Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
Cc: Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs