It's me. https://www.wikidata.org/wiki/User:Tpt https://twitter.com/Tpt93
Cheers, Thomas > Le 1 oct. 2015 à 21:10, Stéphane Corlosquet <scorlosq...@gmail.com> a écrit : > > Hi Denny, > > This is great work! who is Tpt? > > Steph. > > On Thu, Oct 1, 2015 at 2:09 PM, Denny Vrandečić <vrande...@google.com> wrote: > Hi all, > > as you know, Tpt has been working as an intern this summer at Google. He > finished his work a few weeks ago and I am happy to announce today the > publication of all scripts and the resulting data he has been working on. > Additionally, we publish a few novel visualizations of the data in Wikidata > and Freebase. We are still working on the actual report summarizing the > effort and providing numbers on its effectiveness and progress. This will > take another few weeks. > > First, thanks to Tpt for his amazing work! I have not expected to see such > rich results. He has exceeded my expectations by far, and produced much more > transferable data than I expected. Additionally, he also was working on the > primary sources tool directly and helped Marco Fossati to upload a second, > sports-related dataset (you can select that by clicking on the gears icon > next to the Freebase item link in the sidebar on Wikidata, when you switch on > the Primary Sources tool). > > The scripts that were created and used can be found here: > > https://github.com/google/freebase-wikidata-converter > > All scripts are released under the Apache license v2. > > The following data files are also released. All data is released under the > CC0 license (in order to make this explicit, a comment has been added to the > start of each file, stating the copyright and the license. If any script > dealing with the files hiccups due to that line, simply remove the first > line). > > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-mapped-missing.tsv.gz > The actual missing statements, including URLs for sources, are in this file. > This was filtered against statements already existing in Wikidata, and the > statements are mapped to Wikidata IDs. This contains about 14.3M statements > (214MB gzipped, 831MB unzipped). These are created using the mappings below > in addition to the mappings already in Wikidata. The quality of these > statements is rather mixed. > > Additional datasets that we know meet a higher quality bar have been > previously released and uploaded directly to Wikidata by Tpt, following > community consultation. > > https://tools.wmflabs.org/wikidata-primary-sources/data/additional-mapping.pairs.gz > Contains additional mappings between Freebase MIDs and Wikidata QIDs, which > are not available in Wikidata. These are mappings based on statistical > methods and single interwiki links. Unlike the first set of mappings we had > created and published previously (which required multiple interwiki links at > least), these mappings are expected to have a lower quality - sufficient for > a manual process, but probably not sufficient for an automatic upload. This > contains about 3.4M mappings (30 MB gzipped, 64MB unzipped). > > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-new-labels.tsv.gz > This file includes labels and aliases for Wikidata items which seem to be > currently missing. The quality of these labels is undetermined. The file > contains about 860k labels in about 160 languages, with 33 languages having > more than 10k labels each (14MB gzipped, 32MB unzipped). > > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-reviewed-missing.tsv.gz > This is an interesting file as it includes a quality signal for the > statements in Freebase. What you will find here are ordered pairs of Freebase > mids and properties, each indicating that the given pair were going through a > review process and likely have a higher quality on average. This is only for > those pairs that are missing from Wikidata. The file includes about 1.4M > pairs, and this can be used for importing part of the data directly (6MB > gzipped, 52MB unzipped). > > Now anyone can take the statements, analyse them, slice and dice them, upload > them, use them for your own tools and games, etc. They remain available > through the primary sources tool as well, which has already led to several > thousand new statements in the last few weeks. > > Additionally, Tpt and I created in the last few days of his internship a few > visualizations of the current data in Wikidata and in Freebase. > > First, the following is a visualization of the whole of Wikidata: > > https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-color.png > > The visualization needs a bit of explanation, I guess. The y-axis (up/down) > represents time, the x-axis (left/right) represents space / geolocation. The > further down, the closer you are to the present, the further up the more you > go in the past. Time is given in a rational scale - the 20th century gets > much more space than the 1st century. The x-axis represents longitude, with > the prime meridian in the center of the image. > > Every item is being put at its longitude (averaged, if several) and at its > earliest point of time mentioned on the item. For items without either, > neighbouring items propagate their value to them (averaging, if necessary). > This is done repeatedly until the items are saturated. > > In order to understand that a bit better, the following image offers a > supporting grid: each line from left to right represents a century (up to the > first century), and each line from top to bottom represent a meridian (with > London in the middle of the graph). > > https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-grid-color.png > > The same visualizations has also been created for Freebase: > > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-color.png > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-grid-color.png > > In order to compare the two graphs, we also overlaid them over each other. I > will leave the interpretation to you, but you can easily see the strengths of > weaknesses of both knowledge bases. > > https://tools.wmflabs.org/wikidata-primary-sources/data/wikidata-red-freebase-green.png > https://tools.wmflabs.org/wikidata-primary-sources/data/freebase-red-wikidata-green.png > > The programs for creating the visualizations are all available in the Github > repository mentioned above (plenty of RAM is recommended to run it). > > Enjoy the visualizations, the data and the script! Tpt and I are available to > answer questions. I hope this will help with understanding and analysing some > of the results of the work that we did this summer. > > Cheers, > Denny > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > > -- > Steph. > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata