Hi Denny,

Thanks for publishing your Colab notebook!
I went through it and would like to share my first thoughts here. We can then move further discussion somewhere else.

1. in general, how can we compare datasets with totally different time stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump is old; 2. given that all datasets contain Wikipedia links, perhaps we could use them as a bridge for the comparison, instead of Wikidata mappings. I'm assuming that Freebase and DBpedia entities with Wikidata mappings are subsets of the whole datasets (but this should be verified); 3. we could use record linkage techniques to connect Wikidata entities with Freebase and DBpedia ones, then assess the agreement in terms of statements per entity. There has been some experimental work (different use case and goal) in the soweego project:
https://soweego.readthedocs.io/en/latest/validator.html


On 10/1/19 1:13 AM, Denny Vrandečić wrote:
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited.
Yeah, that would be great.
There is known work to do, but it's hard to sustain such a big project without allocated resources:
https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R

BTW, there is also version 2 of the Wikidata primary sources tool that needs love, although I'm now skeptical that it will be an effective way to achieve the Freebase harvesting. We should probably rethink the whole thing, and restart small with very simple use cases, pretty much like the Harvest templates tool you mentioned:
https://tools.wmflabs.org/pltools/harvesttemplates/

Cheers,

Marco

P.S.: I *might* have found the freshest relevant DBpedia datasets:
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects
I said *might* because it was really painful to find a download button and to guess among multiple versions of the same dataset:
https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/mappingbased-objects_lang=en.ttl.bz2
@Sebastian may know if it's the good one :-)

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to