>
> - Wikidata has 40k organisations:

https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A
> %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}


Hi,

I think Wikidata contains many more organizations than that. If we choose
the "instance of Business enterprise", we get 135570 results. And I imagine
there are many other categories that bring together commercial companies.


https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ4830453.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

On the substance, the project to add all companies of a country would make
Wikidata a kind of totally free clone of Open Corporates
<https://opencorporates.com/>. I would of course be delighted to see that,
but is it not a challenge to maintain such a database? Companies are like
humans, it appears and disappears every day.



2017-10-16 13:41 GMT+02:00 Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de>:

> Hi all,
>
> the technical challenges are not so difficult.
>
> - 2.2 million are the exact number of German organisations, i.e.
> associations and companies. They are also unique.
>
> - Wikidata has 40k organisations:
>
> https://query.wikidata.org/#SELECT %3Fitem %3FitemLabel %0AWHERE %0A{%0A
> %3Fitem wdt%3AP31 wd%3AQ43229.%0A SERVICE wikibase%3Alabel {
> bd%3AserviceParam wikibase%3Alanguage "[AUTO_LANGUAGE]%2Cen". }%0A}
>
> so there would be a maximum of 40k duplicates These are easy to find and
> deduplicate
>
> - The crawl can be done easily, a colleague has done so before.
>
>
> The issues here are:
>
> - Do you want to upload the data in Wikidata? It would be a real big
> extension. Can I go ahead
>
> - If the data were available externally as structured data under open
> license, I would probably not suggest loading it into wikidata, as the data
> can be retrieved from the official source directly, however, here this data
> will not be published in a decent format.
>
> I thought that the way data is copied from coyrighted sources, i.e. only
> facts is ok for wikidata. This done in a lot of places, I guess. Same for
> Wikipedia, i.e. News articles and copyrighted books are referenced. So
> Wikimedia or the Wikimedia community are experts on this.
>
> All the best,
>
> Sebastian
>
> On 16.10.2017 10:18, Neubert, Joachim wrote:
>
> Hi Sebastian,
>
>
>
> This is huge! It will cover almost all currently existing German
> companies. Many of these will have similar names, so preparing for
> disambiguation is a concern.
>
>
>
> A good way for such an approach would be proposing a property for an
> external identifier, loading the data into Mix-n-match, creating links for
> companies already in Wikidata, and adding the rest (or perhaps only parts
> of them - I’m not sure if having all of them in Wikidata makes sense, but
> that’s another discussion), preferably with location and/or sector of trade
> in the description field.
>
>
>
> I’ve tried to figure out what could be used as key for a external
> identifier property. However, it looks like the registry does not offer any
> (persistent) URL to its entries. So for looking up a company, apparently
> there are two options:
>
>
>
> -          conducting an extended search for the exact string “A&A
> Dienstleistungsgesellschaft mbH“
>
> -          copying the register number “32853” plus selecting the court
> (Leipzig) from the according dropdown list and search that
>
>
>
> Both ways are not very intuitive, even if we can provide a link to the
> search form. This would make a weak connection to the source of
> information. Much more important, it makes disambiguation in Mix-n-match
> difficult. This applies for the preparation of your initial load (you would
> not want to create duplicates). But much more so for everybody else who
> wants to match his or her data later on. Being forced to search for entries
> manually in a cumbersome way for disambiguation of a new, possibly large
> and rich dataset is, in my eyes, not something we want to impose on future
> contributors. And often, the free information they find in the registry
> (formal name, register number, legal form, address) will not easily match
> with the information they have (common name, location, perhaps founding
> date, and most important sector of trade), so disambiguation may still be
> difficult.
>
>
>
> Have you checked which parts of the accessible information as below can be
> crawled and added legally to external databases such as Wikidata?
>
>
>
> Cheers, Joachim
>
>
>
> --
>
> Joachim Neubert
>
>
>
> ZBW – German National Library of Economics
>
> Leibniz Information Centre for Economics
>
> Neuer Jungfernstieg 21
> 20354 Hamburg
>
> Phone +49-42834-462
>
>
>
>
>
>
>
> *Von:* Wikidata [mailto:wikidata-boun...@lists.wikimedia.org
> <wikidata-boun...@lists.wikimedia.org>] *Im Auftrag von *Sebastian
> Hellmann
> *Gesendet:* Sonntag, 15. Oktober 2017 09:45
> *An:* wikidata@lists.wikimedia.org
> *Betreff:* [Wikidata] Kickstartet: Adding 2.2 million German
> organisations to Wikidata
>
>
>
> Hi all,
>
> the German business registry contains roughly 2.2 million organisations.
> Some information is paid, but other is public, i.e. the info you are
> searching for at and clicking on UT (see example below):
>
> https://www.handelsregister.de/rp_web/mask.do?Typ=e
>
>
>
> I would like to add this to Wikidata, either by crawling or by raising
> money to use crowdsourcing concepts like crowdflour or amazon turk.
>
>
>
> It should meet notability criteria 2: https://www.wikidata.org/wiki/
> Wikidata:Notability
>
> 2. It refers to an instance of a *clearly identifiable conceptual or
> material entity*. The entity must be notable, in the sense that it *can
> be described using serious and publicly available references*. If there
> is no item about you yet, you are probably not notable.
>
>
> The reference is the official German business registry, which is serious
> and public. Orgs are also per definition clearly identifiable legal
> entities.
>
> How can I get clearance to proceed on this?
>
> All the best,
> Sebastian
>
>
>
>
> Entity data
>
>
>
> Saxony District court *Leipzig HRB 32853 * – A&A
> Dienstleistungsgesellschaft mbH
>
> Legal status:
>
> Gesellschaft mit beschränkter Haftung
>
>
> Capital:
>
> 25.000,00 EUR
>
>
> Date of entry:
>
> 29/08/2016
> (When entering date of entry, wrong data input can occur due to system
> failures!)
>
>
> Date of removal:
>
> -
>
>
> Balance sheet available:
>
> -
>
>
> Address (subject to correction):
>
> A&A Dienstleistungsgesellschaft mbH
> Prager Straße 38-40
>
> 04317 Leipzig
>
>
>
>
> --
> All the best,
> Sebastian Hellmann
>
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
>
>
> _______________________________________________
> Wikidata mailing 
> listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> --
> All the best,
> Sebastian Hellmann
>
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to