I am very happy to read the wikidata team is working on
the much anticipated problems with WDQS [5] toward making
WikiData scalable.

On my side the biggest technical problem, that I know about,
is the on-disk footprint.

Last year, I managed to have 1 to 1 ratio between the .nt
textual format and the stored data while keeping around
the added advantage of time traveling queries.

Since then I figured further optimizations that should
bring the on-disk footprint to something that is similar
to current blazegraph production setup.

That is around 1/3 or 1/2 of the.nt dump size, that is
something around 2TB or 3TB SSD disk to store the current
wikidata while keeping around the added advantage of scaling
both queries and storage horizontally, possibly infinitely!


Best regards,

[0] https://www.youtube.com/watch?v=oV4qelj9fxM via 
https://etherpad.wikimedia.org/p/WikidataCon2021-ScalingWDQS
[1] https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021
[2] https://phabricator.wikimedia.org/T291207
[3] https://phabricator.wikimedia.org/T206560
[4] https://phabricator.wikimedia.org/T291340
[5] https://meta.wikimedia.org/wiki/Grants:Project/Future-proof_WDQS

Amirouche Amazigh BOUBEKKI ~ https://hyper.dev
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to