2015-01-17 4:27 GMT-05:00 Lydia Pintscher <lydia.pintsc...@wikimedia.de>:

>
> The log is at
> https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2015-01-16
> for anyone who couldn't make it.


Denny discusses importing all missing VIAF keys from Freebase using
"multichill" (unclear what that is from the context) on the assumption that
the error rate is low.  It would be worth checking assumptions like that
with folks who are familiar with the Freebase data before acting on them.

Here are some things that I think are true about the VIAF keys in Freebase:

- they were assigned by a user, not by Google/Metawab (not necessarily a
bad thing since some of the biggest problems in Freebase were created by
G/M and some users have contributed very high quality data)

- they keys were, I believe, assigned based heavily on existing Library of
Congress identifiers that had previously been assigned by Metaweb.  Those
key assignments are not as high quality as other parts of Freebase.  One
easy thing to check for is people with two LC (and thus two VIAF) keys
assigned.  In cases where there are more than key and the extra keys don't
represent pseudonyms, this is a clear error.

- Freebase doesn't create separate entities for pseudonyms, unlike the
library cataloging world.  Depending on what decision Wikidata makes in
this regard, it's something to watch out for when reusing Freebase author
data (including VIAF keys)

- much Freebase author data was imported from OpenLibrary which has its own
set of quality issues.  A bunch of this data was later deleted, leaving
that portion of the graph somewhat thready and moth-eaten.  It's unclear
whether that was a net gain or loss in overall bibliographic data quality
for Freebase.

- I suspect that most VIAF keys which are in Freebase and not Wikidata
represent entities which are not in Wikidata which means they aren't useful
anyway since he wants to focus on creating new links, not new entities (a
direction that I'm not sure I agree with, but that's a whole separate
discussion).

One of the key inputs to judging the quality of assertions is their
provenance. Fortunately, this is recorded for all assertions in Freebase
and it's possible to trace a given fact back to the user, toolchain, or
process that added it to the database. Unfortunately, this information is
only available through the Freebase API, not the bulk data dump.
Hopefully, this will change before Google completely abandons Freebase.

If any Wikidata folk want to discuss VIAF keys in Freebase (or its author
data in general), feel free to get in touch.

Tom
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to