[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-06-18 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

In https://phabricator.wikimedia.org/T95316#1375098, @mkroetzsch wrote:

 I will do a complete review of the update RDF mapping in the course of the 
 next week. I will report back then if there is anything missing in the diff.


Thank you!

 Also, what is the expected outcome of this bug? A table like the one posted 
 by Lucie? Or something with more detail? Some rows in the current table are 
 probably only understood by people who already know both dumps ;-) Is this 
 meant to be only for our internal information?


This is for us internally to make sure we're all on the same page and are good 
with what we have. When we're further along we can check what kind of 
public-facing documentation we need.


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie, Lydia_Pintscher
Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, 
Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-06-17 Thread Lucie
Lucie added a comment.

|   | 
http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150223/ | 
https://dumps.wikimedia.org/wikidatawiki/entities/20150420/ 
   |
| file ending/type  | nt (subset of RDF/ttl)
  | ttl 
   |
| triple| the whole link 
http://wikidata.org/entity/Q1  | turtle 
(prefixes)  
|
| dumps | multiple dumps
  | one, 
https://phabricator.wikimedia.org/T93488
  |
| labels (aliasesdescriptions)| one language per Label + Tripel 
http://www.w3.org/2000/01/rdf-schema#label  | per language three triple: 
rdfs:label, skos:prefLabel, schema:name |
| statment GUID | always uppercase, starting with S 
(Q1Sf5d5115d-489a-7654-9a0a-5eea5be80d07) | sometimes upper, sometimes 
lowercase, starting with - (q1-0479EB23-FC5B-4EEC-9529-CEE21D6C6FA9)|
| statement value   | e:Q1Sguid e:P1036v 113 // truthy 
would be with suffix c | as triple, 'truthy': e:Q1 
wdt:P1036 113; also as full statement 
 |
| properties| with P123s for stament and P123v 
for value  | prefix s (statement) for staments 
and wtd (assert) for values (in full statements)- otherwise prefix v |
| sitelinks | no badges, enwikilink a 
wikidata.org/ontology#Article   | badges, enwikilink a 
schema:Article  
  |
| Metadata (like license and date)  | no
  | yes 
   |
| defining WD links as types of rdf Classes | yes   
  | no (planed as seperate OWL file; 
https://phabricator.wikimedia.org/T97522) |
| calendars | gegorian  
  | julian and gregorian
   |


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie
Cc: mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-06-17 Thread Lydia_Pintscher
Lydia_Pintscher added a subscriber: Lydia_Pintscher.
Lydia_Pintscher added a comment.

Are there any differences we're missing? Are we ok with these differences?


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie, Lydia_Pintscher
Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, 
Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-06-17 Thread mkroetzsch
mkroetzsch added a comment.

In https://phabricator.wikimedia.org/T95316#1373937, @Lydia_Pintscher wrote:

 Are there any differences we're missing? Are we ok with these differences?


I will do a complete review of the update RDF mapping in the course of the next 
week. I will report back then if there is anything missing in the diff.

Also, what is the expected outcome of this bug? A table like the one posted by 
Lucie? Or something with more detail? Some rows in the current table are 
probably only understood by people who already know both dumps ;-) Is this 
meant to be only for our internal information?

Another relevant note here might be that the plan is to fully align WDTK 
mappings with the updated RDF dumps, so that many of the above will go away 
(the split into several files would remain though). We just did not do this 
while we were still discussing the updated RDF mapping.

@Lucie:

- The second row difference is just a consequence of what was already stated in 
the first row (NTriples vs Turtle). Maybe this can be merged/deleted.
- It seems that the entry in row labels (aliasesdescriptions) only refers 
to labels. The properties skos:prefLabel and schema:name are not used for 
descriptions or aliases in either dumps, AFAIK.
- It would make sense to distinguish differences in distribution/surface syntax 
(which format, how many files, which compression algorithm, ...) from real 
differences in the RDF model (=differences that matter for SPARQL users).


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie, mkroetzsch
Cc: Lydia_Pintscher, mkroetzsch, daniel, Smalyshev, Aklapper, Lucie, 
Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-05-08 Thread daniel
daniel added a subscriber: daniel.
daniel added a comment.

First results are collected in a spreadsheet here: 
https://docs.google.com/a/wikimedia.de/spreadsheets/d/1cI7EYMiyUIqqsvMxPH5Zryt8dVIxJb0bYOOtBY-cSno


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie, daniel
Cc: daniel, Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95316: Comparison of the existing Wikidata RDF dumps

2015-04-07 Thread Smalyshev
Smalyshev added a subscriber: Smalyshev.
Smalyshev added a comment.

The docs for the new RDF dump format are here: 
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format


TASK DETAIL
  https://phabricator.wikimedia.org/T95316

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucie, Smalyshev
Cc: Smalyshev, Aklapper, Lucie, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs