yeah, the wikibase storage doesn't sound right here, but these are two
different issues, one with wikibase (sql) and one with the Wikidata Query
Service (blazegraph).

that 2M footprint is the sql db blob? each additional 2M edit is the
version history correct?

So the issue your are referring to here is in the design of the SQL based
"Wikibase Repository"? How does the 2M  footprint and its versions compare
to a large wikipedia blob?

WQS data doesn't have versions, it doesn't have to be in one space and can
easily be separated. The whole point of LOD is to decentralize your data.
But I understand that Wikidata/WQS is currently designend as a centralized
closed shop service for several reasons granted.




On Sat, May 4, 2019 at 8:57 AM Stas Malyshev <smalys...@wikimedia.org>
wrote:

> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to