Out of interest, is there still a live Freebase SPARQL endpoint ?

And is it kept up to date with which items have been matched to Wikidata ?

Both of these would be useful, I think.

  --  James.


On 01/10/2015 20:25, Thomas Tanon wrote:
It's me.

https://www.wikidata.org/wiki/User:Tpt
https://twitter.com/Tpt93

Cheers,

Thomas

Le 1 oct. 2015 à 21:10, Stéphane Corlosquet <scorlosq...@gmail.com> a écrit :

Hi Denny,

This is great work! who is Tpt?

Steph.

On Thu, Oct 1, 2015 at 2:09 PM, Denny Vrandečić <vrande...@google.com> wrote:
Hi all,

as you know, Tpt has been working as an intern this summer at Google. He 
finished his work a few weeks ago and I am happy to announce today the 
publication of all scripts and the resulting data he has been working on. 
Additionally, we publish a few novel visualizations of the data in Wikidata and 
Freebase. We are still working on the actual report summarizing the effort and 
providing numbers on its effectiveness and progress. This will take another few 
weeks.

First, thanks to Tpt for his amazing work! I have not expected to see such rich 
results. He has exceeded my expectations by far, and produced much more 
transferable data than I expected. Additionally, he also was working on the 
primary sources tool directly and helped Marco Fossati to upload a second, 
sports-related dataset (you can select that by clicking on the gears icon next 
to the Freebase item link in the sidebar on Wikidata, when you switch on the 
Primary Sources tool).

The scripts that were created and used can be found here:

https://github.com/google/freebase-wikidata-converter

All scripts are released under the Apache license v2.

The following data files are also released. All data is released under the CC0 
license (in order to make this explicit, a comment has been added to the start 
of each file, stating the copyright and the license. If any script dealing with 
the files hiccups due to that line, simply remove the first line).

https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-mapped-missing.tsv.gz
The actual missing statements, including URLs for sources, are in this file. 
This was filtered against statements already existing in Wikidata, and the 
statements are mapped to Wikidata IDs. This contains about 14.3M statements 
(214MB gzipped, 831MB unzipped). These are created using the mappings below in 
addition to the mappings already in Wikidata. The quality of these statements 
is rather mixed.

Additional datasets that we know meet a higher quality bar have been previously 
released and uploaded directly to Wikidata by Tpt, following community 
consultation.

https://tools.wmflabs.org/wikidata-primary-sources/data/additional-mapping.pairs.gz
Contains additional mappings between Freebase MIDs and Wikidata QIDs, which are 
not available in Wikidata. These are mappings based on statistical methods and 
single interwiki links. Unlike the first set of mappings we had created and 
published previously (which required multiple interwiki links at least), these 
mappings are expected to have a lower quality - sufficient for a manual 
process, but probably not sufficient for an automatic upload. This contains 
about 3.4M mappings (30 MB gzipped, 64MB unzipped).

https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-new-labels.tsv.gz
This file includes labels and aliases for Wikidata items which seem to be 
currently missing. The quality of these labels is undetermined. The file 
contains about 860k labels in about 160 languages, with 33 languages having 
more than 10k labels each (14MB gzipped, 32MB unzipped).

https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-reviewed-missing.tsv.gz
This is an interesting file as it includes a quality signal for the statements 
in Freebase. What you will find here are ordered pairs of Freebase mids and 
properties, each indicating that the given pair were going through a review 
process and likely have a higher quality on average. This is only for those 
pairs that are missing from Wikidata. The file includes about 1.4M pairs, and 
this can be used for importing part of the data directly (6MB gzipped, 52MB 
unzipped).

Now anyone can take the statements, analyse them, slice and dice them, upload 
them, use them for your own tools and games, etc. They remain available through 
the primary sources tool as well, which has already led to several thousand new 
statements in the last few weeks.

Additionally, Tpt and I created in the last few days of his internship a few 
visualizations of the current data in Wikidata and in Freebase.

First, the following is a visualization of the whole of Wikidata:

https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-color.png

The visualization needs a bit of explanation, I guess. The y-axis (up/down) 
represents time, the x-axis (left/right) represents space / geolocation. The 
further down, the closer you are to the present, the further up the more you go 
in the past. Time is given in a rational scale - the 20th century gets much 
more space than the 1st century. The x-axis represents longitude, with the 
prime meridian in the center of the image.

Every item is being put at its longitude (averaged, if several) and at its 
earliest point of time mentioned on the item. For items without either, 
neighbouring items propagate their value to them (averaging, if necessary). 
This is done repeatedly until the items are saturated.

In order to understand that a bit better, the following image offers a 
supporting grid: each line from left to right represents a century (up to the 
first century), and each line from top to bottom represent a meridian (with 
London in the middle of the graph).

https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-grid-color.png

The same visualizations has also been created for Freebase:

https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-color.png
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-grid-color.png

In order to compare the two graphs, we also overlaid them over each other. I 
will leave the interpretation to you, but you can easily see the strengths of 
weaknesses of both knowledge bases.

https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-red-freebase-green.png
https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-red-wikidata-green.png

The programs for creating the visualizations are all available in the Github 
repository mentioned above (plenty of RAM is recommended to run it).

Enjoy the visualizations, the data and the script! Tpt and I are available to 
answer questions. I hope this will help with understanding and analysing some 
of the results of the work that we did this summer.

Cheers,
Denny

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Steph.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to