So, I'm not particularly involved with the scholarly-papers work, but with my day-job bibliographic analysis hat on...
Papers like this are a *remarkable* anomaly - hyperauthorship like this is confined to some quite specific areas of physics, and is still relatively uncommon even in those. I don't think we have to worry about it approaching anything like 2% of papers any time soon :-) For 2018 publications, the global mean number of authors/paper is slightly under five (all disciplines). Over all time, allowing for there being more new papers than old ones, I'd guess it's something like three. Andrew. On Sat, 4 May 2019 at 08:58, Stas Malyshev <smalys...@wikimedia.org> wrote: > > Hi! > > > For the technical guys, consider our growth and plan for at least one > > year. When the impression exists that the current architecture will not > > scale beyond two years, start a project to future proof Wikidata. > > We may also want to consider if Wikidata is actually the best store for > all kinds of data. Let's consider example: > > https://www.wikidata.org/w/index.php?title=Q57009452 > > This is an entity that is almost 2M in size, almost 3000 statements and > each edit to it produces another 2M data structure. And its dump, albeit > slightly smaller, still 780K and will need to be updated on each edit. > > Our database is obviously not optimized for such entities, and they > won't perform very well. We have 21 million scientific articles in the > DB, and if even 2% of them would be like this, it's almost a terabyte of > data (multiplied by number of revisions) and billions of statements. > > While I am not against storing this as such, I do wonder if it's > sustainable to keep such kind of data together with other Wikidata data > in a single database. After all, each query that you run - even if not > related to that 21 million in any way - will have to still run in within > the same enormous database and be hosted on the same hardware. This is > especially important for services like Wikidata Query Service where all > data (at least currently) occupies a shared space and can not be easily > separated. > > Any thoughts on this? > > -- > Stas Malyshev > smalys...@wikimedia.org > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata -- - Andrew Gray and...@generalist.org.uk _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata