Hi Denny,
Thanks for publishing your Colab notebook!
I went through it and would like to share my first thoughts here. We can
then move further discussion somewhere else.
1. in general, how can we compare datasets with totally different time
stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump
is old;
2. given that all datasets contain Wikipedia links, perhaps we could use
them as a bridge for the comparison, instead of Wikidata mappings. I'm
assuming that Freebase and DBpedia entities with Wikidata mappings are
subsets of the whole datasets (but this should be verified);
3. we could use record linkage techniques to connect Wikidata entities
with Freebase and DBpedia ones, then assess the agreement in terms of
statements per entity. There has been some experimental work (different
use case and goal) in the soweego project:
https://soweego.readthedocs.io/en/latest/validator.html
On 10/1/19 1:13 AM, Denny Vrandečić wrote:
Marco, I totally agree with what you said - the project has stalled, and
there is plenty of opportunity to harvest more data from Freebase and
bring it to Wikidata, and this should be reignited.
Yeah, that would be great.
There is known work to do, but it's hard to sustain such a big project
without allocated resources:
https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R
BTW, there is also version 2 of the Wikidata primary sources tool that
needs love, although I'm now skeptical that it will be an effective way
to achieve the Freebase harvesting.
We should probably rethink the whole thing, and restart small with very
simple use cases, pretty much like the Harvest templates tool you mentioned:
https://tools.wmflabs.org/pltools/harvesttemplates/
Cheers,
Marco
P.S.: I *might* have found the freshest relevant DBpedia datasets:
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects
I said *might* because it was really painful to find a download button
and to guess among multiple versions of the same dataset:
https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang=en.ttl.bz2
@Sebastian may know if it's the good one :-)
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata