As a basic rule for "which external identifiers are worth covering", I
would begin with any  national identifiers we have for people (politicians,
artists, writers, theologians, scientists, etc), then national identifiers
for organizations (government-related, GNP-related businesses, nonprofits,
educational institutions, etc), then national identifiers for places
(census-defined population centers, battle-scenes, etc)

In my opnion, the question should not be "which identifier has the most
coverage" but "which items have the most identifiers"


On Thu, Sep 7, 2017 at 9:26 PM, Andrew Gray <and...@generalist.org.uk>
wrote:

> Hi Marco,
>
> I guess this depends what you mean by "exhaustive". Exhaustive in that
> every Wikidata item has ID X, or exhaustive in that we have every
> instance of ID X in Wikidata?
>
> The first is probably not going to happen, as the vast majority of
> external identifiers have a defined scope for what they identify. Some
> are pretty broad - VIAF is essentially "everyone who exists in a
> library catalogue as an author or subject" - but still have a limit.
> We're never really going to reach a situation where there is a single
> identifier type that covers everyone, unless we're linking across to
> another Wikidata-type comprehensive knowledgebase, and even then we'd
> need to ensure we're in a position where they already cover everything
> in Wikidata.
>
> The second can (and has) been done - the largest one I know of offhand
> for people is the Oxford DNB (60k items) but for non-people we have
> complete coverage of eg Swedish district codes, P1841 (160k items).
> It's a bit of a slog to get these completed and then maintained, since
> the last 5-10% tend to be more challenging complicated cases, but one
> or two determined people can make it happen. And of course it's not
> appropriate for many identifiers, as they may issue IDs for things
> that we don't intend to have in Wikidata, so we will never completely
> cover them.
>
> I should quickly plug the "expected completeness" property which is
> really useful for identifiers - P2429 - as this can quickly show
> whether something is a) completely on Wikidata; b) not complete yet
> but eventually might be; or c) probably never will be. Not very widely
> rolled out yet, though...
>
> Andrew.
>
>
> On 7 September 2017 at 19:51, Marco Fossati <foss...@spaziodati.eu> wrote:
> > Hi everyone,
> >
> > As a data quality addict, I've been investigating the coverage of
> external
> > identifiers linked to Wikidata items about people.
> >
> > Given the numbers on SQID [1] and some SPARQL queries [2, 3], it seems
> that
> > even the second most used ID (VIAF) only covers *25%* of people items
> circa.
> > Then, there is a long tail of IDs that are barely used at all.
> >
> > So here is my question:
> > *which external identifiers deserve an effort to achieve exhaustive
> > coverage?*
> >
> > Looking forward to your valuable feedback.
> > Cheers,
> >
> > Marco
> >
> > [1] https://tools.wmflabs.org/sqid/#/browse?type=properties "Select
> > datatype" set to "ExternalId", "Used for class" set to "human Q5"
> > [2] total people: http://tinyurl.com/ybvcm5uw
> > [3] people with a VIAF link: http://tinyurl.com/ya6dnpr7
> >
> > _______________________________________________
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> - Andrew Gray
>   and...@generalist.org.uk
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to