So, I'm not particularly involved with the scholarly-papers work, but
with my day-job bibliographic analysis hat on...

Papers like this are a *remarkable* anomaly - hyperauthorship like
this is confined to some quite specific areas of physics, and is still
relatively uncommon even in those. I don't think we have to worry
about it approaching anything like 2% of papers any time soon :-)

For 2018 publications, the global mean number of authors/paper is
slightly under five (all disciplines). Over all time, allowing for
there being more new papers than old ones, I'd guess it's something
like three.

Andrew.



On Sat, 4 May 2019 at 08:58, Stas Malyshev <smalys...@wikimedia.org> wrote:
>
> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



--
- Andrew Gray
  and...@generalist.org.uk

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to