That is a tough question. We are pretty sure that we technically scale
quite well, and there is no reason that the community should restrict
itself out of technical reasons. If the number of item suddenly increases
by one or two orders of magnitudes, we would probably meet a few hiccups on
the way, but the architecture should be able to deal with that.

What I am much more worried about is, is the scaling of the community
though. One of my statements from my Wikidata talks is "we do not want to
become the biggest data heap out there, but rather aim for an organic
community, that is strong and resilient enough to maintain the data that is
being collected." See also Wikidata requirement #6 <
http://meta.wikimedia.org/wiki/Wikidata/Notes/Requirements> (a page worth
re-reading).

Sometimes it might sense for Wikidata to bridge and connect to external
data sources that have their own way of maintenance and curation. Should
the dataset really be merged into Wikidata? Is the data wikilike? Is it
used in the Wikimedia projects? Or could it be also provided as a linked
open dataset, which is referenced from Wikidata?

Just to give an example: sure, one could theoretically start to collect
temperature data of a city in hourly measurements*, but it could maybe make
more sense to point to an external site that collects this data in a more
efficient format, provide the mapping identifiers, and allow for a bot to
go there and discover the data. Wikidata in turn could provide an
aggregation of the data, which indeed would be used on e.g. Wikipedia and
Wikivoyage, but leave the full dataset on the external site.

(Which, by the way, would also be a viable solutions for datasets which
have incompatible licenses).

I hope this makes sense, Cheers,
Denny

* Actually, this kind of data would probably kill us faster than creating
many items, as it would make a single item be ginormous. We scale not that
well in that direction.



2013/3/14 Benjamin Good <ben.mcgee.g...@gmail.com>

> I've been struggling to understand what should go into wikidata and what
> should not.  I see that this is because it hasn't been decided yet ;)
> http://www.wikidata.org/wiki/Wikidata_talk:Notability
>
> In helping the community to make this decision I think it would be really
> helpful for the developers to weigh in on the technical capacity of the
> envisioned/realized wikidata infrastructure.  If we know how big the system
> could realistically be and continue to work well technically, it might help
> discussions about how much and what kind of content we should put into it.
>  If the plan is to cope with only a few tens of millions of subjects that
> is quite different than if the plan allows for the potential creation of
> billions of items.  (Suggesting less inclusive versus more inclusive
> policies).
>
> ?
>
> -Ben
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>


-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to