[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
Lydia_Pintscher added a comment. In https://phabricator.wikimedia.org/T95316#1375098, @mkroetzsch wrote: I will do a complete review of the update RDF mapping in the course of the next week. I will report back then if there is anything missing in the diff. Thank you! Also, what is the expected outcome of this bug? A table like the one posted by Lucie? Or something with more detail? Some rows in the current table are probably only understood by people who already know both dumps ;-) Is this meant to be only for our internal information? This is for us internally to make sure we're all on the same page and are good with what we have. When we're further along we can check what kind of public-facing documentation we need. TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, Lydia_Pintscher Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
Lucie added a comment. | | http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150223/ | https://dumps.wikimedia.org/wikidatawiki/entities/20150420/ | | file ending/type | nt (subset of RDF/ttl) | ttl | | triple| the whole link http://wikidata.org/entity/Q1 | turtle (prefixes) | | dumps | multiple dumps | one, https://phabricator.wikimedia.org/T93488 | | labels (aliasesdescriptions)| one language per Label + Tripel http://www.w3.org/2000/01/rdf-schema#label | per language three triple: rdfs:label, skos:prefLabel, schema:name | | statment GUID | always uppercase, starting with S (Q1Sf5d5115d-489a-7654-9a0a-5eea5be80d07) | sometimes upper, sometimes lowercase, starting with - (q1-0479EB23-FC5B-4EEC-9529-CEE21D6C6FA9)| | statement value | e:Q1Sguid e:P1036v 113 // truthy would be with suffix c | as triple, 'truthy': e:Q1 wdt:P1036 113; also as full statement | | properties| with P123s for stament and P123v for value | prefix s (statement) for staments and wtd (assert) for values (in full statements)- otherwise prefix v | | sitelinks | no badges, enwikilink a wikidata.org/ontology#Article | badges, enwikilink a schema:Article | | Metadata (like license and date) | no | yes | | defining WD links as types of rdf Classes | yes | no (planed as seperate OWL file; https://phabricator.wikimedia.org/T97522) | | calendars | gegorian | julian and gregorian | TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie Cc: mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
Lydia_Pintscher added a subscriber: Lydia_Pintscher. Lydia_Pintscher added a comment. Are there any differences we're missing? Are we ok with these differences? TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, Lydia_Pintscher Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
mkroetzsch added a comment. In https://phabricator.wikimedia.org/T95316#1373937, @Lydia_Pintscher wrote: Are there any differences we're missing? Are we ok with these differences? I will do a complete review of the update RDF mapping in the course of the next week. I will report back then if there is anything missing in the diff. Also, what is the expected outcome of this bug? A table like the one posted by Lucie? Or something with more detail? Some rows in the current table are probably only understood by people who already know both dumps ;-) Is this meant to be only for our internal information? Another relevant note here might be that the plan is to fully align WDTK mappings with the updated RDF dumps, so that many of the above will go away (the split into several files would remain though). We just did not do this while we were still discussing the updated RDF mapping. @Lucie: - The second row difference is just a consequence of what was already stated in the first row (NTriples vs Turtle). Maybe this can be merged/deleted. - It seems that the entry in row labels (aliasesdescriptions) only refers to labels. The properties skos:prefLabel and schema:name are not used for descriptions or aliases in either dumps, AFAIK. - It would make sense to distinguish differences in distribution/surface syntax (which format, how many files, which compression algorithm, ...) from real differences in the RDF model (=differences that matter for SPARQL users). TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, mkroetzsch Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
daniel added a subscriber: daniel. daniel added a comment. First results are collected in a spreadsheet here: https://docs.google.com/a/wikimedia.de/spreadsheets/d/1cI7EYMiyUIqqsvMxPH5Zryt8dVIxJb0bYOOtBY-cSno TASK DETAIL https://phabricator.wikimedia.org/T95316 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, daniel Cc: daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps
Smalyshev added a subscriber: Smalyshev. Smalyshev added a comment. The docs for the new RDF dump format are here: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format TASK DETAIL https://phabricator.wikimedia.org/T95316 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucie, Smalyshev Cc: Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs