Re: [Wikidata] Which external identifiers are worth covering?

2017-09-11 Thread Neubert, Joachim
for the Wikidata project. > Betreff: Re: [Wikidata] Which external identifiers are worth covering? > > Hi Marco, > > I guess this depends what you mean by "exhaustive". Exhaustive in that every > Wikidata item has ID X, or exhaustive in that we have every instance

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-09 Thread Lucy Chambers
Hi Jane and Gerard, Thanks for the suggestions! Labels are definitely a very important consideration - will have a think about this. On your questions, Jane, the people we've been working with have increasingly been adding quite a lot of "occupation:politician" statements, and even more specific

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Maarten Dammers
Hi Marco, On 07-09-17 20:51, Marco Fossati wrote: Hi everyone, As a data quality addict, I've been investigating the coverage of external identifiers linked to Wikidata items about people. Given the numbers on SQID [1] and some SPARQL queries [2, 3], it seems that even the second most used

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Gerard Meijssen
Hoi, I understand the tendency to have English labels for items seen as important. However, there are bots who add labels to many, many languages as the labels tend to be the same. In my opinion we should encourage the inclusion of information. Yes, we may get duplicates but having the data early

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Jane Darnell
Interesting! I noticed that suddenly a lot more politicians were showing up in my queries - have you been adding the occupation=politician property? I believe politicians are severely underrepresented on Wikipedia projects (except for the top people in the news) so if you have good metadata, then

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Lucy Chambers
Hi folks, Very happy to see this discussion happening. I work on the EveryPolitician project [0] and for several years, we have been mapping official IDs to Wikidata IDs. We have probably half the national legislators in the world mapped this way, and many of the ones we’re missing are because

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Andy Mabbett
On 7 September 2017 at 19:51, Marco Fossati wrote: > external identifiers linked to Wikidata items about people. I'll take this as an invitation to remind everyone about ORCID iDs ;-) See: https://www.wikidata.org/wiki/Wikidata:ORCID and:

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Osma Suominen
Somewhat related to this discussion is the coli-conc project, which collects statistics about KOS-type (thesaurus, authority file etc.) identifier links in Wikidata: http://coli-conc.gbv.de/concordances/wikidata/ You can also find statistics about indirect mappings, from one KOS via Wikidata

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Antonin Delpeuch (lists)
In general, I think it would be great to store inside Wikidata the graph of relations between identifiers. Something like: VIAF linksTo ISNI VIAF linksTo GND … GRID linksTo ISNI arXiv linksTo DOI Last time I looked, there was no simple way to do that. So for WikiProject Universities we have used

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Magnus Manske
Is anyone working on an "auto-resolve" bot? If you have VIAF (but nothing else), you can resolve other identifiers via the VIAF site; similarly, if you have only GND, you could try to reverse-lookup VIAF. I think a list of items that have zero external identifiers, ordered by "importance"

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-08 Thread Jane Darnell
As a basic rule for "which external identifiers are worth covering", I would begin with any national identifiers we have for people (politicians, artists, writers, theologians, scientists, etc), then national identifiers for organizations (government-related, GNP-related businesses, nonprofits,

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-07 Thread Andrew Gray
Hi Marco, I guess this depends what you mean by "exhaustive". Exhaustive in that every Wikidata item has ID X, or exhaustive in that we have every instance of ID X in Wikidata? The first is probably not going to happen, as the vast majority of external identifiers have a defined scope for what

Re: [Wikidata] Which external identifiers are worth covering?

2017-09-07 Thread john cummings
I guess this question for me is how do we do this in practice? How do we make sure Wikidata stays up to date/synced with external databases we think are important? On 7 September 2017 at 20:51, Marco Fossati wrote: > Hi everyone, > > As a data quality addict, I've been

[Wikidata] Which external identifiers are worth covering?

2017-09-07 Thread Marco Fossati
Hi everyone, As a data quality addict, I've been investigating the coverage of external identifiers linked to Wikidata items about people. Given the numbers on SQID [1] and some SPARQL queries [2, 3], it seems that even the second most used ID (VIAF) only covers *25%* of people items circa.