Re: [Wikidata] Are we ready for our future

Stas Malyshev Sat, 04 May 2019 00:57:36 -0700

Hi!

> For the technical guys, consider our growth and plan for at least one
> year. When the impression exists that the current architecture will not
> scale beyond two years, start a project to future proof Wikidata.


We may also want to consider if Wikidata is actually the best store for
all kinds of data. Let's consider example:

https://www.wikidata.org/w/index.php?title=Q57009452

This is an entity that is almost 2M in size, almost 3000 statements and
each edit to it produces another 2M data structure. And its dump, albeit
slightly smaller, still 780K and will need to be updated on each edit.

Our database is obviously not optimized for such entities, and they
won't perform very well. We have 21 million scientific articles in the
DB, and if even 2% of them would be like this, it's almost a terabyte of
data (multiplied by number of revisions) and billions of statements.

While I am not against storing this as such, I do wonder if it's
sustainable to keep such kind of data together with other Wikidata data
in a single database. After all, each query that you run - even if not
related to that 21 million in any way - will have to still run in within
the same enormous database and be hosted on the same hardware. This is
especially important for services like Wikidata Query Service where all
data (at least currently) occupies a shared space and can not be easily
separated.

Any thoughts on this?

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Are we ready for our future

Reply via email to