> > - Wikidata has 40k organisations: https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A > %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel { > bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
Hi, I think Wikidata contains many more organizations than that. If we choose the "instance of Business enterprise", we get 135570 results. And I imagine there are many other categories that bring together commercial companies. https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ4830453.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D On the substance, the project to add all companies of a country would make Wikidata a kind of totally free clone of Open Corporates <https://opencorporates.com/>. I would of course be delighted to see that, but is it not a challenge to maintain such a database? Companies are like humans, it appears and disappears every day. 2017-10-16 13:41 GMT+02:00 Sebastian Hellmann < hellm...@informatik.uni-leipzig.de>: > Hi all, > > the technical challenges are not so difficult. > > - 2.2 million are the exact number of German organisations, i.e. > associations and companies. They are also unique. > > - Wikidata has 40k organisations: > > https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A > %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel { > bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A} > > so there would be a maximum of 40k duplicates These are easy to find and > deduplicate > > - The crawl can be done easily, a colleague has done so before. > > > The issues here are: > > - Do you want to upload the data in Wikidata? It would be a real big > extension. Can I go ahead > > - If the data were available externally as structured data under open > license, I would probably not suggest loading it into wikidata, as the data > can be retrieved from the official source directly, however, here this data > will not be published in a decent format. > > I thought that the way data is copied from coyrighted sources, i.e. only > facts is ok for wikidata. This done in a lot of places, I guess. Same for > Wikipedia, i.e. News articles and copyrighted books are referenced. So > Wikimedia or the Wikimedia community are experts on this. > > All the best, > > Sebastian > > On 16.10.2017 10:18, Neubert, Joachim wrote: > > Hi Sebastian, > > > > This is huge! It will cover almost all currently existing German > companies. Many of these will have similar names, so preparing for > disambiguation is a concern. > > > > A good way for such an approach would be proposing a property for an > external identifier, loading the data into Mix-n-match, creating links for > companies already in Wikidata, and adding the rest (or perhaps only parts > of them - I’m not sure if having all of them in Wikidata makes sense, but > that’s another discussion), preferably with location and/or sector of trade > in the description field. > > > > I’ve tried to figure out what could be used as key for a external > identifier property. However, it looks like the registry does not offer any > (persistent) URL to its entries. So for looking up a company, apparently > there are two options: > > > > - conducting an extended search for the exact string “A&A > Dienstleistungsgesellschaft mbH“ > > - copying the register number “32853” plus selecting the court > (Leipzig) from the according dropdown list and search that > > > > Both ways are not very intuitive, even if we can provide a link to the > search form. This would make a weak connection to the source of > information. Much more important, it makes disambiguation in Mix-n-match > difficult. This applies for the preparation of your initial load (you would > not want to create duplicates). But much more so for everybody else who > wants to match his or her data later on. Being forced to search for entries > manually in a cumbersome way for disambiguation of a new, possibly large > and rich dataset is, in my eyes, not something we want to impose on future > contributors. And often, the free information they find in the registry > (formal name, register number, legal form, address) will not easily match > with the information they have (common name, location, perhaps founding > date, and most important sector of trade), so disambiguation may still be > difficult. > > > > Have you checked which parts of the accessible information as below can be > crawled and added legally to external databases such as Wikidata? > > > > Cheers, Joachim > > > > -- > > Joachim Neubert > > > > ZBW – German National Library of Economics > > Leibniz Information Centre for Economics > > Neuer Jungfernstieg 21 > 20354 Hamburg > > Phone +49-42834-462 > > > > > > > > *Von:* Wikidata [mailto:wikidata-boun...@lists.wikimedia.org > <wikidata-boun...@lists.wikimedia.org>] *Im Auftrag von *Sebastian > Hellmann > *Gesendet:* Sonntag, 15. Oktober 2017 09:45 > *An:* wikidata@lists.wikimedia.org > *Betreff:* [Wikidata] Kickstartet: Adding 2.2 million German > organisations to Wikidata > > > > Hi all, > > the German business registry contains roughly 2.2 million organisations. > Some information is paid, but other is public, i.e. the info you are > searching for at and clicking on UT (see example below): > > https://www.handelsregister.de/rp_web/mask.do?Typ=e > > > > I would like to add this to Wikidata, either by crawling or by raising > money to use crowdsourcing concepts like crowdflour or amazon turk. > > > > It should meet notability criteria 2: https://www.wikidata.org/wiki/ > Wikidata:Notability > > 2. It refers to an instance of a *clearly identifiable conceptual or > material entity*. The entity must be notable, in the sense that it *can > be described using serious and publicly available references*. If there > is no item about you yet, you are probably not notable. > > > The reference is the official German business registry, which is serious > and public. Orgs are also per definition clearly identifiable legal > entities. > > How can I get clearance to proceed on this? > > All the best, > Sebastian > > > > > Entity data > > > > Saxony District court *Leipzig HRB 32853 * – A&A > Dienstleistungsgesellschaft mbH > > Legal status: > > Gesellschaft mit beschränkter Haftung > > > Capital: > > 25.000,00 EUR > > > Date of entry: > > 29/08/2016 > (When entering date of entry, wrong data input can occur due to system > failures!) > > > Date of removal: > > - > > > Balance sheet available: > > - > > > Address (subject to correction): > > A&A Dienstleistungsgesellschaft mbH > Prager Straße 38-40 > > 04317 Leipzig > > > > > -- > All the best, > Sebastian Hellmann > > Director of Knowledge Integration and Linked Data Technologies (KILT) > Competence Center > at the Institute for Applied Informatics (InfAI) at Leipzig University > Executive Director of the DBpedia Association > Projects: http://dbpedia.org, http://nlp2rdf.org, > http://linguistics.okfn.org, https://www.w3.org/community/ld4lt > <http://www.w3.org/community/ld4lt> > Homepage: http://aksw.org/SebastianHellmann > Research Group: http://aksw.org > > > _______________________________________________ > Wikidata mailing > listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata > > > -- > All the best, > Sebastian Hellmann > > Director of Knowledge Integration and Linked Data Technologies (KILT) > Competence Center > at the Institute for Applied Informatics (InfAI) at Leipzig University > Executive Director of the DBpedia Association > Projects: http://dbpedia.org, http://nlp2rdf.org, > http://linguistics.okfn.org, https://www.w3.org/community/ld4lt > <http://www.w3.org/community/ld4lt> > Homepage: http://aksw.org/SebastianHellmann > Research Group: http://aksw.org > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata