It's me.

https://www.wikidata.org/wiki/User:Tpt
https://twitter.com/Tpt93

Cheers,

Thomas

> Le 1 oct. 2015 à 21:10, Stéphane Corlosquet <scorlosq...@gmail.com> a écrit :
> 
> Hi Denny,
> 
> This is great work! who is Tpt?
> 
> Steph.
> 
> On Thu, Oct 1, 2015 at 2:09 PM, Denny Vrandečić <vrande...@google.com> wrote:
> Hi all,
> 
> as you know, Tpt has been working as an intern this summer at Google. He 
> finished his work a few weeks ago and I am happy to announce today the 
> publication of all scripts and the resulting data he has been working on. 
> Additionally, we publish a few novel visualizations of the data in Wikidata 
> and Freebase. We are still working on the actual report summarizing the 
> effort and providing numbers on its effectiveness and progress. This will 
> take another few weeks.
> 
> First, thanks to Tpt for his amazing work! I have not expected to see such 
> rich results. He has exceeded my expectations by far, and produced much more 
> transferable data than I expected. Additionally, he also was working on the 
> primary sources tool directly and helped Marco Fossati to upload a second, 
> sports-related dataset (you can select that by clicking on the gears icon 
> next to the Freebase item link in the sidebar on Wikidata, when you switch on 
> the Primary Sources tool).
> 
> The scripts that were created and used can be found here:
> 
> https://github.com/google/freebase-wikidata-converter
> 
> All scripts are released under the Apache license v2.
> 
> The following data files are also released. All data is released under the 
> CC0 license (in order to make this explicit, a comment has been added to the 
> start of each file, stating the copyright and the license. If any script 
> dealing with the files hiccups due to that line, simply remove the first 
> line).
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-mapped-missing.tsv.gz
> The actual missing statements, including URLs for sources, are in this file. 
> This was filtered against statements already existing in Wikidata, and the 
> statements are mapped to Wikidata IDs. This contains about 14.3M statements 
> (214MB gzipped, 831MB unzipped). These are created using the mappings below 
> in addition to the mappings already in Wikidata. The quality of these 
> statements is rather mixed.
> 
> Additional datasets that we know meet a higher quality bar have been 
> previously released and uploaded directly to Wikidata by Tpt, following 
> community consultation.
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/additional-mapping.pairs.gz
> Contains additional mappings between Freebase MIDs and Wikidata QIDs, which 
> are not available in Wikidata. These are mappings based on statistical 
> methods and single interwiki links. Unlike the first set of mappings we had 
> created and published previously (which required multiple interwiki links at 
> least), these mappings are expected to have a lower quality - sufficient for 
> a manual process, but probably not sufficient for an automatic upload. This 
> contains about 3.4M mappings (30 MB gzipped, 64MB unzipped).
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-new-labels.tsv.gz
> This file includes labels and aliases for Wikidata items which seem to be 
> currently missing. The quality of these labels is undetermined. The file 
> contains about 860k labels in about 160 languages, with 33 languages having 
> more than 10k labels each (14MB gzipped, 32MB unzipped).
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-reviewed-missing.tsv.gz
> This is an interesting file as it includes a quality signal for the 
> statements in Freebase. What you will find here are ordered pairs of Freebase 
> mids and properties, each indicating that the given pair were going through a 
> review process and likely have a higher quality on average. This is only for 
> those pairs that are missing from Wikidata. The file includes about 1.4M 
> pairs, and this can be used for importing part of the data directly (6MB 
> gzipped, 52MB unzipped).
> 
> Now anyone can take the statements, analyse them, slice and dice them, upload 
> them, use them for your own tools and games, etc. They remain available 
> through the primary sources tool as well, which has already led to several 
> thousand new statements in the last few weeks.
> 
> Additionally, Tpt and I created in the last few days of his internship a few 
> visualizations of the current data in Wikidata and in Freebase.
> 
> First, the following is a visualization of the whole of Wikidata:
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-color.png
> 
> The visualization needs a bit of explanation, I guess. The y-axis (up/down) 
> represents time, the x-axis (left/right) represents space / geolocation. The 
> further down, the closer you are to the present, the further up the more you 
> go in the past. Time is given in a rational scale - the 20th century gets 
> much more space than the 1st century. The x-axis represents longitude, with 
> the prime meridian in the center of the image.
> 
> Every item is being put at its longitude (averaged, if several) and at its 
> earliest point of time mentioned on the item. For items without either, 
> neighbouring items propagate their value to them (averaging, if necessary). 
> This is done repeatedly until the items are saturated.
> 
> In order to understand that a bit better, the following image offers a 
> supporting grid: each line from left to right represents a century (up to the 
> first century), and each line from top to bottom represent a meridian (with 
> London in the middle of the graph).
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-grid-color.png
> 
> The same visualizations has also been created for Freebase:
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-color.png
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-grid-color.png
> 
> In order to compare the two graphs, we also overlaid them over each other. I 
> will leave the interpretation to you, but you can easily see the strengths of 
> weaknesses of both knowledge bases.
> 
> https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-red-freebase-green.png
> https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-red-wikidata-green.png
> 
> The programs for creating the visualizations are all available in the Github 
> repository mentioned above (plenty of RAM is recommended to run it).
> 
> Enjoy the visualizations, the data and the script! Tpt and I are available to 
> answer questions. I hope this will help with understanding and analysing some 
> of the results of the work that we did this summer.
> 
> Cheers,
> Denny
> 
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> 
> 
> 
> 
> --
> Steph.
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to