Lucas_Werkmeister_WMDE added a comment. |
I ran the conversion directly from the ttl.gz file
Interesting, I couldn’t get that to work and had to pipe gunzip output into the program.
I also tried converting the latest dump, and since I don’t have access to any system with that much RAM, I thought I could perhaps trade some execution time for swap space. Bad idea :) the process got through 20% of the input file and then slowed to a crawl, at data rates of single-digit kilobytes per second. It would’ve taken half a year to finish at that rate.
But FWIW, here’s the command I used, with a healthy dose of systemd sandboxing since it’s a completely unknown program I’m running:
time pv latest-all.ttl.gz | gunzip | sudo systemd-run --wait --pipe --unit rdf2hdt \ -p CapabilityBoundingSet=CAP_DAC_OVERRIDE \ -p ProtectSystem=strict p PrivateNetwork=yes -p ProtectHome=yes -p PrivateDevices=yes \ -p ProtectKernelTunables=yes -p ProtectControlGroups=yes \ -p NoNewPrivileges=yes -p RestrictNamespaces=yes \ -p MemoryAccounting=yes -p CPUAccounting=yes -p BlockIOAccounting=yes -p IOAccounting=yes -p TasksAccounting=yes \ /usr/local/bin/rdf2hdt -i -f ttl -B 'http://wikiba.se/ontology-beta#Dump' /dev/stdin /dev/stdout \ >| wikidata-2017-11-01.hdt
I had to make install the program because the libtoolized dev build doesn’t really support being run like that. (See systemd/systemd#7254 for the CapabilityBoundingSet part – knowing what I know now, -p $USER would’ve been the better choice.)
In T179681#3736044, @Addshore wrote:@Smalyshev we discussed dumping the JNL files used by blaze graph directly at points during wikidata con.
I'm aware that isnt a HDT dump, but im wondering if this would help in any way.Can we reliably get a consistent snapshot of those files when BlazeGraph is constantly writing updates to them?
Cc: Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs