Hoi,
Yes we could do that. What follows is that functionality of Wikidata is
killed. Completely dead.

Also, this thread is about us being ready for what we could be, not for
what we are. At that we are not as good as we could be. In your suggestions
for inclusion we could have the "concept cloud" for Wikipedia articles
defined by its wikilinks and define them in Wikidata. We don't. We could
have a usable user interface like Reasonator. We don't.

The reason for this thread is what does it take for us to have a perfoming
system because we don't. Our growth is less than what it could be. Our
functionality is less that what it could be. At the same tme we are
restricted it seems by annual budgets that do not take into account what
functionality we provide and could provide. The notion that we should
restrict our content for performance sake.. A rich organisation that is the
Wikimedia Foundation. REALLY!!
Thanks,
       GerardM

On Sat, 4 May 2019 at 20:27, Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Stas,
>
> Many thanks for writing this down! It is very useful to have a clear
> statement like this from the dev team.
>
> Given the sustainability concerns that you mention, I think the way
> forward for the community could be to hold a RFC to determine a stricter
> admissibility criterion for scholarly articles.
>
> It could be one of (or a boolean combination of) these:
> - having a site link;
> - being used as a reference for a statement on Wikidata;
> - being cited in a sister project;
> - being cited in a sister project using a template that fetches the
> metadata from Wikidata such as {{cite Q}};
> - being authored by someone with Wikipedia page about them;
> - … any other criterion that comes to mind.
>
> This way, the size of the corpus could be kept in control, and the
> criterion could be loosened later if the scalability concerns are
> addressed.
>
> Cheers,
> Antonin
>
> On 5/4/19 8:37 AM, Stas Malyshev wrote:
> > Hi!
> >
> >> For the technical guys, consider our growth and plan for at least one
> >> year. When the impression exists that the current architecture will not
> >> scale beyond two years, start a project to future proof Wikidata.
> >
> > We may also want to consider if Wikidata is actually the best store for
> > all kinds of data. Let's consider example:
> >
> > https://www.wikidata.org/w/index.php?title=Q57009452
> >
> > This is an entity that is almost 2M in size, almost 3000 statements and
> > each edit to it produces another 2M data structure. And its dump, albeit
> > slightly smaller, still 780K and will need to be updated on each edit.
> >
> > Our database is obviously not optimized for such entities, and they
> > won't perform very well. We have 21 million scientific articles in the
> > DB, and if even 2% of them would be like this, it's almost a terabyte of
> > data (multiplied by number of revisions) and billions of statements.
> >
> > While I am not against storing this as such, I do wonder if it's
> > sustainable to keep such kind of data together with other Wikidata data
> > in a single database. After all, each query that you run - even if not
> > related to that 21 million in any way - will have to still run in within
> > the same enormous database and be hosted on the same hardware. This is
> > especially important for services like Wikidata Query Service where all
> > data (at least currently) occupies a shared space and can not be easily
> > separated.
> >
> > Any thoughts on this?
> >
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to