Smalyshev added a comment.

it doesn’t even open the output file until it’s done converting

That might be a problem when we have 4bn triples... I think "load the whole thing is memory" is a doomed approach - even if we find a way to get past memory limits for current dump, what would happen when it doubles in size?

The idea that you need to keep everything in memory to compress/optimize is of course not true - you can still do pretty fine with disk-based storage, that's what Blazegraph does for example and probably nearly every other graph DB. Yes if would be a bit slower and requires some careful programming, but it's not something that should be impossible. Unfortunately, sounds like people behind HDT are not interested in doing this work. Without it, the idea of converting Wikidata data set is a no go, unfortunately - I do not see how Wikidata data set can be served with "load up everything in memory" paradigm. If we find somebody that wants/can do the work that allows HDT to process large datasets, then I think it is a good idea to have it in dumps, but not before that.



To: Smalyshev
Cc: Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list

Reply via email to