[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata
Christopher added a comment. Do you foresee any changes to the context/vocabulary/ontology in the future (e.g. implementing processing features of JSON-LD 1.1)? How will context changes be versioned / published? Could not also the ontology <http://wikiba.se/ontology-1.0.owl#> be dereferenceable as a json-ld context? Then you could use @vocab to provide a default for the wikibase properties and types. (e.g. "@vocab": "http://wikiba.se/ontology-1.0.jsonld;) TASK DETAIL https://phabricator.wikimedia.org/T207168 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: cscott, Christopher Cc: Addshore, WMDE-leszek, Pablo-WMDE, dbarratt, abian, _jensen, Christopher, Salgo60, daniel, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, Smalyshev, Lea_Lacroix_WMDE, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, rosalieper, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata
Christopher added a comment. thanks, I look forward to this being deployed. json-ld will be very useful for wikidata, particularly framing. You might want to consider providing the context as a remote link to reduce the payloads (and "noise" in the data). Here is that test entity, framed on the playground. Notice how it merges the statements and references. (sorry for the long link ...) https://json-ld.org/playground-dev/#startTab=tab-framed=https%3A%2F%2Ftest.wikidata.org%2Fwiki%2FSpecial%3AEntityData%2FQ64.jsonld=%7B%22%40context%22%3A%7B%22wdata%22%3A%22https%3A%2F%2Ftest.wikidata.org%2Fwiki%2FSpecial%3AEntityData%2F%22%2C%22schema%22%3A%22http%3A%2F%2Fschema.org%2F%22%2C%22about%22%3A%7B%22%40id%22%3A%22schema%3Aabout%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22wd%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fentity%2F%22%2C%22cc%22%3A%22http%3A%2F%2Fcreativecommons.org%2Fns%23%22%2C%22license%22%3A%7B%22%40id%22%3A%22cc%3Alicense%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22softwareVersion%22%3A%7B%22%40id%22%3A%22schema%3AsoftwareVersion%22%7D%2C%22version%22%3A%7B%22%40id%22%3A%22schema%3Aversion%22%7D%2C%22xsd%22%3A%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%22%2C%22dateModified%22%3A%7B%22%40id%22%3A%22schema%3AdateModified%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22wikibase%22%3A%22http%3A%2F%2Fwikiba.se%2Fontology-beta%23%22%2C%22statements%22%3A%7B%22%40id%22%3A%22wikibase%3Astatements%22%7D%2C%22identifiers%22%3A%7B%22%40id%22%3A%22wikibase%3Aidentifiers%22%7D%2C%22sitelinks%22%3A%7B%22%40id%22%3A%22wikibase%3Asitelinks%22%7D%2C%22rdfs%22%3A%22http%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%22%2C%22label%22%3A%7B%22%40id%22%3A%22rdfs%3Alabel%22%7D%2C%22skos%22%3A%22http%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%22%2C%22prefLabel%22%3A%7B%22%40id%22%3A%22skos%3AprefLabel%22%7D%2C%22name%22%3A%7B%22%40id%22%3A%22schema%3Aname%22%7D%2C%22wdt%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fdirect%2F%22%2C%22P63%22%3A%7B%22%40id%22%3A%22wdt%3AP63%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22P17%22%3A%7B%22%40id%22%3A%22wdt%3AP17%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22p%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2F%22%2C%22wds%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fentity%2Fstatement%2F%22%2C%22p%3AP63%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22rank%22%3A%7B%22%40id%22%3A%22wikibase%3Arank%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22ps%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fstatement%2F%22%2C%22ps%3AP63%22%3A%7B%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22psv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fstatement%2Fvalue%2F%22%2C%22wdv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fvalue%2F%22%2C%22psv%3AP63%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22quantityAmount%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityAmount%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityUpperBound%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityUpperBound%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityLowerBound%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityLowerBound%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityUnit%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityUnit%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22pq%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fqualifier%2F%22%2C%22P66%22%3A%7B%22%40id%22%3A%22pq%3AP66%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22pqv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fqualifier%2Fvalue%2F%22%2C%22pqv%3AP66%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22timeValue%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeValue%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22timePrecision%22%3A%7B%22%40id%22%3A%22wikibase%3AtimePrecision%22%7D%2C%22timeTimezone%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeTimezone%22%7D%2C%22timeCalendarModel%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeCalendarModel%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22prov%22%3A%22http%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%22%2C%22wasDerivedFrom%22%3A%7B%22%40id%22%3A%22prov%3AwasDerivedFrom%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22wdref%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Freference%2F%22%2C%22pr%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Freference%2F%22%2C%22P20%22%3A%7B%22%40id%22%3A%22pr%3AP20%22%7D%2C%22p%3AP17%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22ps%3AP17%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22propertyType%22%3A%7B%22%40id%22%3A%22wikibase%3ApropertyType%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22directClaim%22%3A%7B%22%40id%22%3A%22wikibase%3AdirectClaim%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22claim%22%3A%7B%22%40id%22%3A%22wikibase%3Aclaim%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22statementProperty%22%3A%7B%22%40id%22%3A%22wikibase%3AstatementProperty%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22statementValue%22%3A%7B%22%40id%22%3A%22wikibase%3AstatementValue%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22qualifier%22%3A%7B%22%40id%22%3A%22wikibase%3Aqualifier%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22qualifierValue%22%3A%7B%22%40id%22%3A%22wikibase%3Aqualifi
[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata
Christopher added a comment. according to mailing list (Wikidata Digest, Vol 83, Issue 18), this now enabled on beta. Yet when one requests the link: https://wikidata.beta.wmflabs.org/wiki/Special:EntityData/Q64.jsonld, it does not work?TASK DETAILhttps://phabricator.wikimedia.org/T207168EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscott, ChristopherCc: Christopher, Salgo60, daniel, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, Smalyshev, Lea_Lacroix_WMDE, Nickleh, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata] How to find the Dbpedia data for a Wikidata
Hi Scott, One way to do that would be to get the language code label list from WDQS with this: http://tinyurl.com/y9p7q9l2 SELECT ?label WHERE {?s wdt:P424 ?code; rdfs:label ?label filter (lang(?label) = "en").} and then stream the list to LDF client [1] requests https://tinyurl.com/ycc3dyce ldf-client https://query.wikidata.org/bigdata/ldf http://fragments.dbpedia.org/2015-10/en "SELECT * WHERE {?s rdfs:label "+ language + "@en . ?s owl:sameAs ?link }" The results would be in JSON from the client. It should give a relatively complete list of WIkidata language code entity corresponding resources in DBpedia Also, an simple way to get a dbpedia resource with TPF is with the entity label which is one of the properties that is the same for both datasets. So, SELECT * WHERE { ?s rdfs:label "German"@en . } will return the matching dbpedia and wikidata resources for that label. This could also perhaps be done with a federated query in WDQS.(untested). Christopher Johnson [1] https://github.com/LinkedDataFragments/Client.js > Message: 3 > Date: Sun, 29 Apr 2018 17:48:10 -0700 > From: Scott MacLeod <worlduniversityandsch...@gmail.com> > To: Discussion list for the Wikidata project > <wikidata@lists.wikimedia.org> > Subject: Re: [Wikidata] How to find the Dbpedia data for a Wikidata > item? > Message-ID: > <CADy6Cs8pVNqEQu909Y64DXFOq7SBJs4M3stvzPD7GQ3AMZNBkw@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Paris Writers' News/PWN, Markus, and Wikidatans, > > Based on your example (http://tinyurl.com/yahwql2n), Markus, I'm seeking > to > learn how to do a similar query for all languages. > > In Wikidata I found a Q item # for "language" - Q34770 ( > https://www.wikidata.org/wiki/Q34770) - and plugged this into your query, > replaced the word "countries" with "languages," etc. but didn't get a > result, where your query yields 209 countries, Markus. > > In a parallel way, how would one compute them from the names of the > articles in Wikipedia? > > Thanks, > Scott > > > > > > > > On Fri, Apr 27, 2018 at 2:31 PM, Markus Kroetzsch < > markus.kroetz...@tu-dresden.de> wrote: > > > Hi, > > > > (English) DBpedia URIs are basically just (English) Wikipedia URIs with > > the first part exchanged. So one can compute them from the names of the > > articles. Example: a query for DBpedia URIs for all countries: > > > > http://tinyurl.com/yahwql2n > > > > """ > > SELECT ?dbpediaId > > WHERE > > { > > ?item wdt:P31 wd:Q6256 . # for the example: get IDs for all countries > > ?sitelink schema:about ?item ; > > schema:isPartOf <https://en.wikipedia.org/> . > > > > BIND(URI(CONCAT("http://dbpedia.org/resource/",SUBSTR( > STR(?sitelink),31))) > > as ?dbpediaId) > > } > > """ > > > > Of course, depending on your use case, you can do the same offline > > (without requiring SPARQL to rewrite the id strings for you). > > > > In theory, one could use federation to pull in data from the DBpedia > > endpoint, but in practice I could not find an interesting query that > > completes within the timeout (but I did not try for very long to debug > > this). > > > > Best regards, > > > > Markus > > > > > > > > > > On 23/04/18 06:41, PWN wrote: > > > >> If one knows the Q code (or URI) for an entity on Wikidata, how can one > >> find the Dbpedia Id and the information linked to it? > >> Thank you. > >> > >> Sent from my iPad > >> ___ > >> Wikidata mailing list > >> Wikidata@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/wikidata > >> > >> > > > > ___ > > Wikidata mailing list > > Wikidata@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > > > > -- > > -- > - Scott MacLeod - Founder & President > - World University and School > - http://worlduniversityandschool.org > > - 415 480 4577 > - http://scottmacleod.com > > > - CC World University and School - like CC Wikipedia with best STEM-centric > CC OpenCourseWare - incorporated as a nonprofit university and school in > California, and is a U.S. 501 (c) (3) tax-exempt educational organization. > > > IMPORTANT NOTICE: This transmission and any attachments are intended only > for the use of the
Re: [Wikidata] How to split a label by whitespace in WDQS ?
Hi Thad, "Assignment" can be done with CONSTRUCT, so something like this would work to split a name into two parts: PREFIX ex: <http://example.org#> CONSTRUCT { ?human ex:hasfirstName ?first. ?human ex:hasSecondName ?second } WHERE { ?human wdt:P31 wd:Q5; rdfs:label ?label . BIND (STRBEFORE(?label, " ") AS ?first) . BIND (STRAFTER(?label, " ") AS ?second) . FILTER (lang(?label)= "en") . } Christopher Johnson Scientific Associate Universitätsbibliothek Leipzig On 19 September 2017 at 14:00, <wikidata-requ...@lists.wikimedia.org> wrote: > Send Wikidata mailing list submissions to > wikidata@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata > or, via email, send a message with subject or body 'help' to > wikidata-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wikidata-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata digest..." > > > Today's Topics: > >1. Weekly Summary #278 (Léa Lacroix) >2. How to split a label by whitespace in WDQS ? (Thad Guidry) >3. Re: How to split a label by whitespace in WDQS ? (Marco Neumann) >4. Re: How to split a label by whitespace in WDQS ? > (Nicolas VIGNERON) >5. Re: How to split a label by whitespace in WDQS ? > (Lucas Werkmeister) >6. Re: How to split a label by whitespace in WDQS ? (Thad Guidry) >7. Categories in RDF/WDQS (Stas Malyshev) > > > -- > > Message: 1 > Date: Mon, 18 Sep 2017 17:36:38 +0200 > From: Léa Lacroix <lea.lacr...@wikimedia.de> > To: "Discussion list for the Wikidata project." > <wikidata@lists.wikimedia.org> > Subject: [Wikidata] Weekly Summary #278 > Message-ID: > 1798em4...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > *Here's your quick overview of what has been happening around Wikidata over > the last week.*Events > <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events>/ > Press/Blogs > <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage> > >- Upcoming: Wikidata Wahldaten Workshop 2017 ><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_ > Wahldaten_Workshop_2017> >– 30 September 2017 in Vienna, Austria >- Upcoming: Wikimedia Research Showcase ><https://meta.wikimedia.org/wiki/Wikimedia_Research/ > Showcase#September_2017> >- Past: Wikidata workshop in Zurich ><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_Zurich> (the >slides of the speakers are linked on the page) >- Past: GLAMhack Wikidata workshop in Lausanne (see the slides of the > Query >Service introduction ><https://docs.google.com/presentation/d/1hwUBbtP0TppAKrEpjtSjdOXePZ_ > 7OIRNDWsAHzVk0NA/edit#slide=id.g1f4d0124c0_0_0> >) >- Past: Wikidata workshop in Kolkata ><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_ > workshop_Kolkata_2017>, >India >- Bridging real and fictional worlds ><https://medium.com/wiki-playtime/bridging-real-and- > fictional-worlds-1af32ee65a26> >in Wikidata, by Martin Poulter >- Weekend at the Museum (of Brittany) ><https://www.lehir.net/weekend-at-the-museum-of-brittany/>, by Envel Le >Hir <https://www.wikidata.org/wiki/User:Envlh> >- Wiki Loves Monuments und Wikidata ><http://archivalia.hypotheses.org/67371>, by SW >- The French Connection at the Wikimania 2017 Hackathon ><https://www.lehir.net/the-french-connection-at-the- > wikimania-2017-hackathon/>, >by Envel Le Hir <https://www.wikidata.org/wiki/User:Envlh> > > Other Noteworthy Stuff > >- Wikidata ontology explorer ><https://lucaswerkmeister.github.io/wikidata-ontology-explorer/>: >creates a tree of a class or property, shows common properties and >statements >- Join the mysterious group of Wikidata:Flashmob ><https://www.wikidata.org/wiki/Wikidata:Flashmob> who improve labels, > or >summon them on an item >- A breaking change to the *wbcheckconstraints* API output format was >announced ><https://www.wikidata.org/wiki/Wikidata:Project_chat#BREAKING_CHANGE:_ > wbcheckconstraints_API_output_format> >- Q4000 <https://www.wikidata.org/wiki/Q4000> was created >- Improvements coming soon to Recent Changes ><h
Re: [Wikidata] Wikidata Digest, Vol 70, Issue 11
Hi Amir, The idea that I think that you are trying to render reminds of a query I wrote some time ago that uses the number of sitelinks (which basically equates to the number of different language wikipedia articles for a given Wikidata concept) to make an ranked list (providing a rudimentary metric for linguistic "coverage") for a type. For example, this one ranks instances of Q571 (a book). prefix schema: <http://schema.org/> prefix wd: <http://www.wikidata.org/entity/> prefix wdt: <http://www.wikidata.org/prop/direct/> SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount) WHERE { ?s wdt:P31 wd:Q571 . ?sitelink schema:about ?s . ?s wdt:P50 ?author OPTIONAL { ?s rdfs:label ?desc filter (lang(?desc) = "en"). } OPTIONAL { ?author rdfs:label ?authorlabel filter (lang(?authorlabel) = "en"). } } GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount) Hope it helps, Christopher On 11 September 2017 at 14:00, <wikidata-requ...@lists.wikimedia.org> wrote: > Send Wikidata mailing list submissions to > wikidata@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata > or, via email, send a message with subject or body 'help' to > wikidata-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wikidata-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata digest..." > > > Today's Topics: > >1. missing/existing Wikipedia articles by number of speakers > (Amir E. Aharoni) >2. Re: [Wikimediaindia-l] New portal on Wikidata: > Wikidata:WikiProject India (Abhijeet Safai) >3. Re: missing/existing Wikipedia articles by number of speakers > (Reem Al-Kashif) >4. Re: missing/existing Wikipedia articles by number of speakers > (Gerard Meijssen) > > > -- > > Message: 1 > Date: Sun, 10 Sep 2017 15:43:40 +0300 > From: "Amir E. Aharoni" <amir.ahar...@mail.huji.ac.il> > To: "Discussion list for the Wikidata project." > <wikidata@lists.wikimedia.org> > Subject: [Wikidata] missing/existing Wikipedia articles by number of > speakers > Message-ID: > <CACtNa8sEhyz66jm9XOzM5SAYzb3Nd2tDiqQcnxn5SxmRx91r_g@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > Is there an existing tool that shows whether a Wikipedia article exists or > doesn't exist in a list of languages sorted by the number of speakers? > > For example, I'd give this tool an article name, and it would show me a > list similar to the one at the English Wikipedia article [[List of > languages by total number of speakers]], and indicating whether the article > exists or not in each language. > > If there is no such tool, I guess I could write something in SPARQL, but > I'd have to learn SPARQL first, so I'm trying to ask here :) > > Thanks! > > -- > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי > http://aharoni.wordpress.com > “We're living in pieces, > I want to live in peace.” – T. Moore > -- next part -- > An HTML attachment was scrubbed... > URL: <https://lists.wikimedia.org/pipermail/wikidata/ > attachments/20170910/2c37931c/attachment-0001.html> > > -- > > Message: 2 > Date: Sat, 9 Sep 2017 10:32:26 +0530 > From: Abhijeet Safai <abhijeet.sa...@gmail.com> > To: Wikimedia India Community list > <wikimediaindi...@lists.wikimedia.org> > Cc: "Discussion list for the Wikidata project." > <wikidata@lists.wikimedia.org> > Subject: Re: [Wikidata] [Wikimediaindia-l] New portal on Wikidata: > Wikidata:WikiProject India > Message-ID: > <CAAwPGk1+U41F0rkog7pNUc1X=X6HSq6dkQuWw-HkQ7ym_370-w@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > "This WikiProject is to coordinate the efforts to create, enhance and > populate the coverage of topics related to India including her history, > geography, culture, society, people, infrastructure, education, > demographics and anything related between India and other fields such as > science, technology, arts, entertainment etc." > > Excellent! I am extremely happy to see it. I do not know how much I will be > able to help, but I will try to help as per my time and abilities. > > -- > Dr. Abhijeet Safai > > On Sat, Sep 9, 2017 at 3:46 AM, ViswaPrabha (വിശ്വപ്രഭ) <
[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher added a comment. I can add here that in fcrepo4, that with PR #1187 they have changed to not use RFC5785 for representing Skolemized bnodes. Instead, a new fragment URI convention has been implemented, so internally minted UUIDs are appended to the resource subject as a fragment (aka Hash URI identifier) rather than creating a new resource node. This convention actually makes more sense than RFC5785 for statements and references I suspect. Graph serializations then would "naturally" entail these identifier bnodes in a single resource/entity context, and this then facilitates round-tripping and other downstream from RDF operations, like JSON-LD framing.TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher added a comment. The fact remains that the claim without its entity relationship, represented in the GUID by the Q prefix, would be lost into a vacuum of nothing. And really, the concatenation of an entity ID with its statement UUID (with the expectation that a parser can understand the $ as a delimiter) is a rather questionable convention. I guess I am not clear on why the MW API should constrain RDF serialization. They are separate implementations. Is there a convenient "round trip" import from RDF mechanism available in the API? If not, who cares about what the MW API expects. The basic problem is with the "claim" design. It seems to me that Statement GUIDs are actually unnecessary overhead because the subject of a claim is always the item/entity. There is really no need to mint a GUID subject for the claim. If you needed to have a separate statement node, it may have been better to do something like this: <> wikibase:hasClaim _:b1 _:b1 wdt:someprop "somevalue" A bnode is always an object of a <> resource first.TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, EBjune, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher added a comment. Statement IDs should definitely be represented as bnodes (internally) and skolem IRIs externally because they are uniquely defined within an entity node representation. They have no meaning outside of the entity. The typing semantics of Wikibase values are very obscure and entirely too complex for most normal reuse implementations of the data. If values are intended to be "shared between items" by an external consumer, then they should be represented as another entity type, and optimally their URIs should be dereferenceable. However, we know that this is not the case, so my personal "impression" of these things is already wrong. Similarly confusing is the muddled reference implementation. My use case simply needs the references to be presented in the context of the statement that gives the reference a meaning. In my estimation, a reference is just a statement about a statement in the context of a one item, so I do not see how or why a reference can be "shared between items". Note that if the reference statement itself was semantically equal to another in the same item, it should therefore simply be a bnode!TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher edited the task description. (Show Details) EDIT DETAILS...{F5323364}75} [[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher edited the task description. (Show Details) EDIT DETAILS...{F5323350}64} [[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher edited the task description. (Show Details) EDIT DETAILS...to produce the intended output attached. {F5323223} 350} [[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]] [[ https://tools.ietf.org/html/rfc5785 | (RFC5785]]TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher edited the task description. (Show Details) EDIT DETAILS...to produce the intended output attached. {F5323223} [[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher created this task.Christopher added projects: Wikidata-Query-Service, Wikibase-DataModel-Serialization.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONNote: this relates more to my localized use of Wikibase RDF serialization than to the Wikidata Query Service directly, though it may also be relevant to the WDQS. It is my opinion that the RDF representation of statement and reference URIs should conform to a W3C standard (RFC5785) so that other libraries (like JSON-LD), for example, can recognize them as Skolem IRIs (or uniquely minted identifiers). One possible scenario is that a JSON-LD consumer may want to frame an entity, and this would require it to make the statements and references into bnodes so that their values can be formatted as sets or lists. Having these types of URIs in the .well-known namespace simplifies the parsing task. This seems relatively trivial to do. I have already made the experimental changes in my instance that touches these files: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/RdfVocabulary.php and https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/FullStatementRdfBuilder.php to produce the intended output attached. F5323223: well-knownUUID-statements and references.n-triples [[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]] (RFC5785TASK DETAILhttps://phabricator.wikimedia.org/T155891WORKBOARDhttps://phabricator.wikimedia.org/project/board/891/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T155890: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785
Christopher created this task.Christopher added projects: Wikidata-Query-Service, Wikibase-DataModel-Serialization.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONNote: this relates more to my localized use of Wikibase RDF serialization than to the Wikidata Query Service directly, though it may also be relevant to the WDQS. It is my opinion that the RDF representation of statement and reference URIs should conform to a W3C standard (RFC5785) so that other libraries (like JSON-LD), for example, can recognize them as Skolem IRIs (or uniquely minted identifiers). One possible scenario is that a JSON-LD consumer may want to frame an entity, and this would require it to make the statements and references into bnodes so that their values can be formatted as sets or lists. Having these types of URIs in the .well-known namespace simplifies the parsing task. This seems relatively trivial to do. I have already made the experimental changes in my instance that touches these files: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/RdfVocabulary.php and https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/FullStatementRdfBuilder.php to produce the intended output attached. F5323223: well-knownUUID-statements and references.n-triples [[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]] (RFC5785TASK DETAILhttps://phabricator.wikimedia.org/T155890WORKBOARDhttps://phabricator.wikimedia.org/project/board/891/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Christopher, Aklapper, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T131960: "_" character encoded as %20 in Wikidata URI RDF serialization
Christopher created this task. Christopher moved this task to Need investigation on the Wikidata-Query-Service workboard. Herald added a subscriber: Aklapper. Herald added projects: Wikidata, Discovery. TASK DESCRIPTION Wikipedia and Commons URIs do not match their RDF representation in Wikidata if there is an underscore. For example. even though the rewrite rules of Wikipedia translate spaces to the underscore form of the URI, the canonical URI for a Wikipedia article has the underscore. This underscore form of the URI is what should be represented in the RDF. dbpedia uses the foaf:isPrimaryTopicOf property for Wikipedia sitelink and their URI form contains the underscores. This essentially breaks federation between dbpedia resources and Wikidata entities using the sitelink as the primary key (if the Wikidata sitelink has a %20). Examples: A query (http://tinyurl.com/gntg9wx) for a sitelink for the entity Q3032 with the article URI https://de.wikipedia.org/wiki/Darwin_Harbour returns: https://de.wikipedia.org/wiki/Darwin%20Harbour TASK DETAIL https://phabricator.wikimedia.org/T131960 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T131235: wikibase:GlobecoordinateValue decimal representation not in lexical form in WDQS.
Christopher added a comment. The PRETTY_PRINT setting of the TurtleWriter is set to "true" by default. This causes the writer to only write the literal "label" without the datatype. This affects boolean, decimal, integer and double literals. To fix make the following change (starting at line 623) in Munge.java: final RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, lastWriter); final WriterConfig config = writer.getWriterConfig(); config.set(BasicWriterSettings.PRETTY_PRINT, false); handler = new PrefixRecordingRdfHandler(writer, prefixes); Other default config settings are: config.set(BasicWriterSettings.RDF_LANGSTRING_TO_LANG_LITERAL, true); config.set(BasicWriterSettings.XSD_STRING_TO_PLAIN_LITERAL, true); TASK DETAIL https://phabricator.wikimedia.org/T131235 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T131235: wikibase:GlobecoordinateValue decimal representation not in lexical form in WDQS.
Christopher created this task. Herald added a subscriber: Aklapper. Herald added projects: Wikidata, Discovery. TASK DESCRIPTION It seems that using shorthand rather than a lexical form for decimal coordinates breaks (xsd schema) validation of the munged/split wikibase turtle dumps. Example: wdv:d0a7604c8ae9777857887ac4f1807286 a wikibase:GlobecoordinateValue ; wikibase:geoLatitude 30.12684 ; wikibase:geoLongitude 120.25657 ; a wikibase:GeoAutoPrecision ; wikibase:geoPrecision 0.00028 ; wikibase:geoGlobe wd:Q2 . This is a problem for loading this data into Virtuoso, and possibly other triple stores. The geodata decimals are serialized in lexical form if requested directly from wikibase, however. TASK DETAIL https://phabricator.wikimedia.org/T131235 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle
Christopher added a comment. I have worked around the counting problem. The experimental TPF Server is here: http://orbeon-bb.wmflabs.org/ This wikidata datasource uses SPARQL interface at http://query.wikidata.org/sparql I think that this issue can be closed. TASK DETAIL https://phabricator.wikimedia.org/T130799 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle
Christopher added a comment. it seems that with a CONSTRUCT query, sending an Accept: text/turtle works. http://wdm-rdf.wmflabs.org/short/NyJpTCnpl this is actually all that is required to get a linked data fragment from the SPARQL interface. The problem with TPF access to the WDQS SPARQL interface is that the very simple query (required for the metadataCallback) SELECT (COUNT(*) AS ?count) WHERE {?s ?p ?o} cannot return turtle. Also, there seems to be a minor difference with the COUNT implementation in OpenVirtuoso, that allows a count to be unbound like: SELECT COUNT(*) WHERE {?s ?p ?o} TASK DETAIL https://phabricator.wikimedia.org/T130799 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle
Christopher added a comment. the node.js version of the TPF server is actually why I created this issue. My concept of the fragment server was that it could decentralize a big dataset by distributing data fragments to it with selectors, <http://www.hydra-cg.com/spec/latest/linked-data-fragments/#selectors> so I am not exactly sure how implementing a Blazegraph TPFServer is in the overall WDQS scope. It could help perhaps for decentralizing your data internally, or for caching named graphs, though it would probably not by itself facilitate using WDQS as a SPARQL datasource for external TPF implementations. TASK DETAIL https://phabricator.wikimedia.org/T130799 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Reopened] T130799: provide sparql results as text/turtle
Christopher reopened this task as "Open". TASK DETAIL https://phabricator.wikimedia.org/T130799 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T130799: provide sparql results as text/turtle
Christopher closed this task as "Invalid". TASK DETAIL https://phabricator.wikimedia.org/T130799 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T130799: provide sparql results as text/turtle
Christopher created this task. Christopher moved this task to Blazegraph on the Wikidata-Query-Service workboard. Herald added a subscriber: Aklapper. Herald added projects: Wikidata, Discovery. TASK DESCRIPTION openvirtuoso (dbpedia) can do this. there is not a maven artifact similar to sesame-queryresultio-sparqlxml for turtle so this seems not possible without developing a new package for sesame. there is, however, org.openrdf.rio.turtle.TurtleWriter. this would be useful to have for generating contextual subsets (fragments) of "loadable data" for consumers. TASK DETAIL https://phabricator.wikimedia.org/T130799 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. Coincidentally, it seems that there are people who know a lot more about this than I do that have debated this issue at length in a long and very informative thread: CRS specification (was: Re: ISA Core Location Vocabulary) <https://lists.w3.org/Archives/Public/public-locadd/2014Jan/.html> It is clearly more involved than just using "proper" software libraries and "handling requirements". The conflicting point that I see from your side is that introducing unneeded complexity is bad. And, also, that this was the best practical alternative available for handling "garbage data". Sure, I agree with that, but oversimplification of a problem is worse. I personally feel that the "grunt approach" of using regex in SPARQL to filter URIs from literals in a result set is not clean, and also quite costly. The alternative that introduces subproperties for geometry values is definitely more complex and as indicated in the thread: You need OWL 2 to formally define a complex class and then say that geometry consists of exactly two parts, one that contains the coordinate sequence and another on that contain the CRS. You cannot make such statements using RDFS or OWL. To assert that everything that needs to be said about a geometry value can be put into a standard RDFS string literal is obviously not true. The irregular form of geo:wktLiteral is a kind of "convenience method" that seems to work for most use cases, but definitely not for all, and I really doubt that it is functionally sustainable for complex geodata. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: daniel, Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. @Smalyshev so, by stating that geometry and CRS are different, you then concur with the main arguments referenced above that they should not be conflated in a simple literal. @Daniel I agree with the idea of specifying the CRS as an additional component of the GlobeCoordinate data value separately from the geometry. I do not agree that a Wikidata entity can be inferred to be a CRS without it providing or pointing to a serialization that can be validated against a known CRS encoding (z.B. gml:GeodeticCRS). Stating that a CRS is an instance of a "geodetic reference system" is only a concept pointer, and does not provide the syntax of a CRS schema which is necessary for a software to understand the meaning of a geometry. In summary, these are the reasons why a CRS should not be represented as a URI in a simple WKT literal string (that contains point geometry). 1. geometry and its CRS are just two separate things 2. it becomes much harder to use the CRS as a filter in a SPARQL query 3. it is not possible to assign multiple CRS specifications to a geometry 4. the domain of a CRS specification should not be limited to a single geometry 5. The CRS is a URI, so it should be published as one 6. It is not possible to assign a CRS to a collection of geometries (e.g. a dataset) 7. Software libraries that handle WKT geometry do not expect a CRS as start of the string The current use of simple geo:wktLiteral for WGS84 points is fine, but if a Wikidata goal is to introduce more complex GIS spatial data (which I think would be very worthwhile), then the implementation should adhere to justifiable and reasonable standards for the data representation. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: daniel, Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. Please see geoSPARQL CRS design is debatable <https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems#GeoSPARQL> from the W3C Coordinate Reference System website. Also, #7 here: the conflation of CRS with with the WKT in a literal has many undesirable effects <https://lists.w3.org/Archives/Public/public-locadd/2013Dec/0052.html> From the 12-063r5 document: The WKT representation of coordinate reference systems as defined in ISO 19125-1:2004 and OGC specification 01-009 is inconsistent with the terminology and technical provisions of ISO 19111:2007 and OGC Abstract Specification topic 2 (08-015r2), “Geographic information – Spatial referencing by coordinates”. Is this clear? They are admitting that the previous design form of WKT is **inconsistent** with other specifications. While it says nothing directly about the geoSPARQL specification, from page 4 of that 2012 design spec, WKT is defined through direct reference to the //deprecated standard//: "as it is specified in Well Known Text (as defined by Simple Features or ISO 19125)" What is outlined in this new specification is a de facto "WKT string form". And this form should accommodate all of the semantics of the geo:wktLiteral string format, including geometries (that are not explicitly mentioned in the new specification). Admittedly, this is quite a difficult thing to sort out, and there is definitely politics and big money at work in the standards process. If you want something more concrete to build on, maybe I ask them for a concrete geoSPARQL "best practice guideline". Finally, I do not feel that it is accurate to use a Wikidata entity (e.g. Mars) as a CRS by the design definition of geo:wktLiteral and then not properly specify it. It is probably better to just omit it and use a different data type for non-earth coordinates than to provide an entity concept URI as a CRS identifier. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. @Smalyshev have you tried to read the updated WKT CRS specification http://docs.opengeospatial.org/is/12-063r5/12-063r5.html yet? From what I can interpret, they have now deprecated the 2012 "non-ISO compliant" concatenation of a URI form of CRS and geometry. Instead, the CRS string semantics are specified in section "WKT string form" which takes the format in KEYWORD1[attribute1,KEYWORD2[attribute2,attribute3]]. Note the use of an UPPERCASE keyword. So, a WDQS non-earth coordinate may be specified like: "IMAGECRS["crs name"], POINT[18.4 226]"^^ogc:wktLiteral. Also, this variation of the wktLiteral datatype could be in a different namespace from the geo:wktLiteral and could then be easily filtered by the map parser. I suggest possibly considering this to fix https://phabricator.wikimedia.org/T130428 and other globe variant issues. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-tech] Wikidata-tech Digest, Vol 35, Issue 6
If the page redirect titles exist in Wikipedia, they are valid in Wikidata as data, regardless of what they represent in *your view* of "quality". If cleanup needs to be done, it should be done in the context of the source first. Evaluating the value of a specific "alias" to a Wikidata item is a judgment that should be based entirely on a *referenceable* data source. Wikidata aliases (as well as descriptions and preferred labels) are completely arbitrary and unreferenced, and in my judgment worthless, without a primary source or clearly defined semantic relationship. The judgmental curation of Wikidata is in fact, not that useful. Wikidata should simply seek to represent data *as it exists* (errors or not) in the primary source. Furthermore, apparently you do not get why skos:hiddenLabel exists. Why you feel that it is not worthwhile is not relevant to its primary function, which is to facilitate searching. (see https://www.w3.org/2012/09/odrl/semantic/draft/doco/skos_hiddenLabel.html) And, it is not difficult to argue that the searching in Wikidata could use improvement. On 16 March 2016 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org> wrote: > Send Wikidata-tech mailing list submissions to > wikidata-tech@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > or, via email, send a message with subject or body 'help' to > wikidata-tech-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wikidata-tech-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata-tech digest..." > > > Today's Topics: > >1. Re: Wikipedia Page Redirect Titles in Wikidata (Lydia Pintscher) > > > -- > > Message: 1 > Date: Tue, 15 Mar 2016 16:49:40 + > From: Lydia Pintscher <lydia.pintsc...@wikimedia.de> > To: wikidata-tech@lists.wikimedia.org > Subject: Re: [Wikidata-tech] Wikipedia Page Redirect Titles in > Wikidata > Message-ID: > < > cabfqugjj3hadoaa+oi6wkot9zr6hbcnq9w40ztxtxz9-+vh...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Sat, Mar 12, 2016 at 2:14 PM Christopher Johnson < > christopher.john...@wikimedia.de> wrote: > > > Hi, > > > > I am developing a scientific terms thesaurus and have discovered that > > existing Wikipedia "page redirect titles" provide a useful way to resolve > > an odd or archaic form to a "canonical" term label as it is represented > by > > the Wikipedia page title (aka Wikidata "sitelink"). For example, > > > > > https://en.wikipedia.org//w/api.php?action=query=xml=redirects=universe > > > > In Wikidata, these "page redirect titles" are not represented in the data > > model except very inconsistently and sparsely as skos:altLabel or > > ("alias"). My use case is that I would like to be able to query Wikidata > > for these page redirect titles in order to resolve odd multi-linguistic > > names to an single concept. > > > > My question is that if I were to create a bot that imported all "page > > redirect titles" for a given sitelink and created them with the > > skos:altLabel property en masse, is this a valid semantic relationship? > > Or, should it rather be represented as ?sitelink owl:sameAs redirect > > URI>? Or both? > > > > Furthermore,, in some cases (z.B. mis-spellings), skos:hiddenLabel may be > > more appropriate, but this has no definition in the data model. There > > potentially would be a lot of clutter in the UI without a hiddelLabel > alias > > property. Also, there are no types for page redirects in Wikipedia, > afaik. > > > > Additional value for the searching in the WIkidata UI could probably be > > obtained from indexing these alternate page titles as well. > > > > There are several points to address: > 1) Should redirects from Wikipedia be imported as aliases on Wikidata? No. > This has been done before and created a massive amount of cleanup work > because the redirects contained a lot of not meaningful misspellings and > more. Please do not import them to Wikidata without approval through the > bot approval process and clear quality control. > 2) Should we allow more fine-grained distinction between real aliases and > misspellings in the UI and datamodel? No. I don't believe this is worth the > complexity and resulting discussions/edit wars and more. > > > Cheers > Lyd
[Wikidata-tech] Wikipedia Page Redirect Titles in Wikidata
Hi, I am developing a scientific terms thesaurus and have discovered that existing Wikipedia "page redirect titles" provide a useful way to resolve an odd or archaic form to a "canonical" term label as it is represented by the Wikipedia page title (aka Wikidata "sitelink"). For example, https://en.wikipedia.org//w/api.php?action=query=xml=redirects=universe In Wikidata, these "page redirect titles" are not represented in the data model except very inconsistently and sparsely as skos:altLabel or ("alias"). My use case is that I would like to be able to query Wikidata for these page redirect titles in order to resolve odd multi-linguistic names to an single concept. My question is that if I were to create a bot that imported all "page redirect titles" for a given sitelink and created them with the skos:altLabel property en masse, is this a valid semantic relationship? Or, should it rather be represented as ?sitelink owl:sameAs ? Or both? Furthermore,, in some cases (z.B. mis-spellings), skos:hiddenLabel may be more appropriate, but this has no definition in the data model. There potentially would be a lot of clutter in the UI without a hiddelLabel alias property. Also, there are no types for page redirects in Wikipedia, afaik. Additional value for the searching in the WIkidata UI could probably be obtained from indexing these alternate page titles as well. Regards, Christopher Johnson ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. Eh, http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral is an RDFS Datatype so the semantics are defined by the RDF schema, right? But, I found this http://docs.opengeospatial.org/is/12-063r5/12-063r5.html that demonstrates that the WKS CRS extends far beyond RDF. I suspect that the implementation of wktLiteral is bound to RDFS, regardless of the "rich semantics" of WKT. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. Thanks for the clarification. However, the Req 10 of the geoSPARQL specification seems to be at odds with the definition of a "literal value". (According to https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal). The way that I read this specification is that a literal is either a URI or a string, but not both. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher added a comment. Intentional or not., It is wrong. Why is it necessary? The problem is that it breaks parsing of geosparql literals. For example, if I ask for instance of volcanoes, I have to make exceptions for weird non-Earth coordinates. TASK DETAIL https://phabricator.wikimedia.org/T129072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Christopher created this task. Christopher moved this task to All WDQS-related tasks on the Wikidata-Query-Service workboard. Herald added a subscriber: Aklapper. Herald added a project: Discovery. TASK DESCRIPTION See http://tinyurl.com/grkd7qw for an example query that returns the coordinates for Olympus Mons, a Martian volcano. Raw Result { "head" : { "vars" : [ "o" ] }, "results" : { "bindings" : [ { "o" : { "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral;, "type" : "literal", "value" : "<http://www.wikidata.org/entity/Q111> Point(18.4 226)" } } ] } } TASK DETAIL https://phabricator.wikimedia.org/T129072 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries
Christopher added a comment. I may be wrong, but the headers that are returned from a request to the nginx server wdqs1002 say that varnish 1.1 is already being used there. And, for whatever reason,** it misses**, because repeating the same query gives the same response time. For example, this one returns in 25180>26966 ms. http://query.wikidata.org/sparql?query=PREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0APREFIX+q%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fqualifier%2F%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0ASELECT+%3FcountryLabel+(COUNT(DISTINCT+%3Fchild)+AS+%3Fnumber)%0AWHERE+%7B%0A++%3Fchild+wdt%3AP106%2Fwdt%3AP279*+wd%3AQ855091+.++%0A++%3Fchild+wdt%3AP27+%3Fcountry+.%0A++SERVICE+wikibase%3Alabel+%7B%0Abd%3AserviceParam+wikibase%3Alanguage+%22en%22+.%0A%3Fcountry+rdfs%3Alabel+%3FcountryLabel%0A++%7D+%0A++%0A%7D+GROUP+BY+%3FcountryLabel+ORDER+BY+DESC(%3Fnumber) Even though Varnish cache **should work** to proxy nginx for optimizing delivery of static query results, it lacks several important features of an object broker. Namely, client control of object expiration (TTL) and retrieval of "named query results" from persistent storage. A WDQS service use case may in fact be to compare results from several days ago with current results. Thus, assuming the latest results state is what the client wants my actually not be true. Possibly, the optimal solution would use the varnish-api-engine (http://info.varnish-software.com/blog/introducing-varnish-api-engine) in conjunction with a WDQS REST API (provided with a modified RESTBase?). Is the varnish-api-engine being used anywhere in WMF? Also, delegating query requests to an API could allow POSTs. Simply with Varnish cache, the POST problem would remain unresolved. TASK DETAIL https://phabricator.wikimedia.org/T126730 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries
Christopher added a comment. I perceive the use of Varnish as not directly related to how an object broker could manage this use case (expensive querying of the wdqs nano sparql api), though it is probably related to any UI elements (i.e. the query editor or results renderer) that may generally be connected to the query service. If a REST solution (like RESTBase) is used, a client request could either GET the results from cache with an ID or trigger a query event webhook that forwards (and stores) the response from the nanosparql server directly with callback. The basic API design could be something like GET /query/:owner/:qid or /query/hooks/:owner/:qid, where the first case would just return the results from a db cache and the second would trigger a callback that returns (and stores) a payload from the nanosparql server. A typical use case for this is a static query that returns dynamic results updated on a regular frequency (e.g. daily) from a single client. The payload event handler for the sparql server callback could also be controlled based on client quota and retention policies. TASK DETAIL https://phabricator.wikimedia.org/T126730 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs
Christopher added a comment. @smalyshev I completely agree with the concept of an intermediate service between the nanosparqlserver and the client. I think that this service should "broker" requests (based on an options configuration object), and eval whether a query is re-executed against the BG db or the results could be returned from the "cache", i.e. an "offline" "response only" db. I have been looking at Huginn https://github.com/cantino/huginn recently. This is an application that delegates tasks to agents. This (or similar app) may be suitable for MW extension usage just by using agents or webhooks instead of inline queries. TASK DETAIL https://phabricator.wikimedia.org/T126730 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T126730: Caching for results of wikidatasparql queries for Graphs
Christopher added a subscriber: Christopher. Christopher added a comment. question: why is this task limited in scope to the Graph extension? TASK DETAIL https://phabricator.wikimedia.org/T126730 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Yurik, hoo, Aklapper, aude, Izno, Wikidata-bugs, Mbch331, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations
Christopher added a comment. @smalyshev no, I think that this specific issue has been practically resolved. TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, debt, Gehel, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T122848: Kill wdm.wmflabs.org
Christopher added a comment. done. TASK DETAIL https://phabricator.wikimedia.org/T122848 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Abraham, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T122848: Kill wdm.wmflabs.org
Christopher closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T122848 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Abraham, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T115996: [Task] Use package manager
Christopher added a comment. I have actively started working on this. You can see the work here: https://github.com/christopher-johnson/wdqs-gui Since using node requires a lot of refactoring and code style changes, I am interested in the developing the GUI as a separate dev branch or package. And when or if it meets with general approval, then it can be merged into production. I am using Gulp for the live build tasks and everything is installed with npm. It also now runs completely independently of Blazegraph as a stand-alone app. TASK DETAIL https://phabricator.wikimedia.org/T115996 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Smalyshev, JanZerebecki, StudiesWorld, Aklapper, Jonas, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T115996: [Task] Use package manager
Christopher added a subscriber: Christopher. Christopher added a comment. Question: Why is not the GUI a completely independent project / repo / build /deployment from WDQS? One reason to not have to do a full maven build for every GUI patch can be seen here: https://integration.wikimedia.org/ci/job/wikidata-query-rdf/777/console. The CI failed because of a network problem. Using npm is a really good idea, but perhaps the first step is to just split the front end out from the main Blazegraph package. TASK DETAIL https://phabricator.wikimedia.org/T115996 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Smalyshev, JanZerebecki, StudiesWorld, Aklapper, Jonas, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Obviously, a main aspect of the data presented in the todo stats is "referenced statements". (even though the chart labels there are wrong). Whether or not this query maps directly to todo is actually not the key issue. Clearly, measuring data quality requires that the arity of statement to reference relationships are quantified. Right? This assumption is based on Wikipedia's policy of maintaining a NPOV. And, unfortunately, all unreferenced statements contain a "bias" that makes the data theoretically worthless, even though they may in fact be "correct". On 8 Dec 2015 1:52 pm, "Addshore" <no-re...@phabricator.wikimedia.org> wrote: > Addshore added a comment. > > Okay, I'm struggling to see which part of the todo stats this is covering > > > TASK DETAIL > https://phabricator.wikimedia.org/T117234 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: Christopher, Addshore > Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher, > Aklapper, aude, Mbch331 > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Since P143 is primarily a "reference type" property, it should be used when the reference node is the subject (with a few exceptions apparently). The query only evaluates the arity of the reference nodes as objects. So, the results for P143 are expected. On 8 Dec 2015 1:09 pm, "Addshore" <no-re...@phabricator.wikimedia.org> wrote: > Addshore added a comment. > > I am still confused, Running this for > https://phabricator.wikimedia.org/P143 gives the following: > > nrefs count > 0 920 > 1 8 > > > TASK DETAIL > https://phabricator.wikimedia.org/T117234 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: Christopher, Addshore > Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, > Wikidata-bugs, aude, Mbch331 > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. @Addshore Some progress was made on this in https://phabricator.wikimedia.org/T120166. The only "practical" way to get the statement and reference metrics is to facet the data by property. It is just not possible to run counting queries against the whole database and get any reasonable response time. This means that any large domain or range metric counts should iterate over all 1800+ properties with separate SPARQL calls and then aggregate the numbers. We can do this for the statement -> reference arity with: PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX prov: <http://www.w3.org/ns/prov#> prefix p: <http://www.wikidata.org/prop/> SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE { { SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs) WHERE { ?item p:$property ?wds . OPTIONAL {?wds prov:wasDerivedFrom ?ref } . } GROUP BY ?wds } } GROUP BY ?nrefs ORDER BY ?nrefs Would you do this in PHP? If you want to handle this, just let me know, otherwise we could reuse the bulk sparql scripts that I have already done in R. In addition to tracking aggregates, it would also be useful to show all property counts in a table like I did for here http://wdm.wmflabs.org/?t=wikidata_property_usage_count. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. I think that you may have missed the point. I added the $property variable in the above query to indicate that this has to be run for **every** property. p:https://phabricator.wikimedia.org/P227 is a random example. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. So basically a clever adaptation as to what I suggested in https://phabricator.wikimedia.org/T119775 to get statements referenced to the Wikipedias. It works, but seems a very hacky approach around the core problem of not having a way to ask how many references a statement has. So, just so I am clear on this, a statement to reference triple is always unique in the dataset? I was under the assumption that a singular reference statement could potentially be duplicated with different hashes, which is why distinct would need to be enforced on the subject.In theory, there should also be metadata on the reference that identifies it as "the latest" version, and previous revisions should not simply be replaced. This is another issue, I guess. Imho, there are clear problems with the reference implementation that should be addressed and not just worked around which is why I created https://phabricator.wikimedia.org/T120166 to start. Is the objective here just to produce some numbers or to improve the quality of the data? TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T120166: Semantically define arity of statement -> reference relations
Christopher added a comment. Quick edit: I ran this query successfully in 13min, 11sec, 476m returning 312,068 results returning the arity of GND (https://phabricator.wikimedia.org/P227) property statements. So it is possible, but really, really slow. prefix wikibase: <http://wikiba.se/ontology#> prefix wdt: <http://www.wikidata.org/prop/direct/> prefix prov: <http://www.w3.org/ns/prov#> prefix wd: <http://www.wikidata.org/entity/> prefix p: <http://www.wikidata.org/prop/> SELECT ?wds (count(distinct(?o)) AS ?ocount) WHERE { ?s p:P227 ?wds . ?wds a wikibase:Statement OPTIONAL { ?wds prov:wasDerivedFrom ?o } } GROUP BY ?wds TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations
Christopher added a comment. @Jheald Thank you for your suggestions. What is fairly clear in my research is that counting type queries on large (or undefined ranges) with an unbound domain are just not possible (without huge resource consumption) when the namespace contains millions and millions of triples. For example, the PREFIX prov: <http://www.w3.org/ns/prov#> SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE { ?stmt prov:wasDerivedFrom ?ref } will not work, even with no query timeout. I have tried it on http://wdm-rdf.wmflabs.org and it uses all of the 8GB heap spaces and crashes Blazegraph. Of course, there are ways to use SPARQL to post-process/filter manageable result sets (in memory) as you suggest, but this seems not possible for the 800M+ triples in wdq. By introducing an "arity class property" (like "hasNullReference"), the evaluation on **all** data, can be achieved with minimal processing overhead because the query range is a boolean value and not a variable like "all references" . TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations
Christopher added a comment. @Jheald Perfect. This works, even with adding optional it runs in 10 seconds. Yea, definitely outputting the statements is unnecessary and adds a lot of time. Total results: 5, duration: 10445 ms nrefs count 0 39775 1 339700 2 10050 3 382 4 14 Conclusion: Faceting the namespace by property (and avoiding unnecessary output processing) is a practical way to get this data. Thanks again. TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T120166: Semantically define arity of statement -> reference relations
Christopher created this task. Christopher added a subscriber: Christopher. Christopher added projects: Wikidata, Wikidata-Query-Service, Wikibase-DataModel. Herald added subscribers: StudiesWorld, Aklapper. Herald added a project: Discovery. TASK DESCRIPTION This is data model and RDF serialization problem. The primary use case is for measuring and evaluating "unreferenced statements", a nullary relationship that dominates the data set. (See T117234) Since there are no attributes/properties in the data model/ontology to represent the arity of statement to reference relationships, querying for this property is not currently possible with SPARQL. See http://www.w3.org/TR/swbp-n-aryRelations/ for recommendations on implementation. TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a blocking task: T120166: Semantically define arity of statement -> reference relations. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T120166: Semantically define arity of statement -> reference relations
Christopher added a blocked task: T117234: Reproduce wikidata-todo/stats data using analytics infrastructure . TASK DETAIL https://phabricator.wikimedia.org/T120166 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. The only way to get a count of statements with references in the current model/format is like this: PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX prov: <http://www.w3.org/ns/prov#> SELECT (count(distinct(?s)) AS ?scount) WHERE { ?s prov:wasDerivedFrom ?wdref . } This query is super slow! In fact, it has crashed Blazegraph because on an unlimited query timeout, it uses all of the 8GB allocated heap space. Since a single statement can have multiple references, just counting prov:wasDerivedFrom using estimated cardinality only returns a count of all references. I asked the experts in the mailing list how we can address this reference query problem, and no one has responded with anything useful yet. This is an issue that could be handled in the Wikibase RDF serialization with any number of different solutions. In addition to the idea of introducing a null reference object, another possibility would be to create a new attribute like wikibase:hasReference with a boolean datatype constraint. I will create a new ticket for this issue I guess. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Project] [Updated] Wikidata-Query-Service
Christopher added a member: Christopher. PROJECT DETAIL https://phabricator.wikimedia.org/project/profile/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher, Gage, ksmith, Jdouglas, DanielFriesen, hoo, Addshore, Tpt, JeroenDeDauw, Joe, Eloquence, aude, Tobi_WMDE_SW, Wikidata-bugs, daniel, MaxSem, jkroll, JanZerebecki, Smalyshev, Manybubbles, GWicke ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5
The statement to reference relation problem also relates to the topic of Metadata Reification which from what I can gather, not really addressed in the current WDQS RDF approach. In Blazegraph, this could be supported by Quads or RDR (Reification Done Right). See http://arxiv.org/pdf/1406.3399.pdf , https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right One possible approach using triples for the use case could be to assign a blank node to a reference placeholder and introduce the valid range class for prov:wasDerivedFrom (prov:entity) with the canonical reference UUID like this: wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom _:refhash . _:refhash a prov:entity, wikibase:Reference, wdref:referenceUUID ; pr:P7 "Some data" ; pr:P8 "1976-01-12T00:00:00Z"^^xsd:dateTime ; prv:P8 wdv:b74072c03a5ced412a336ff213d69ef1 . Introducing a owl:minCardinality on prov:wasDerivedFrom would mean that if there were no refhash for a statement than a null object (similar to wdno) would identify "unreferenced statements" like this: wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom wikibase:nullRef . There are a lot ways to deal with this issue, I guess. But, it seems to me that having a simple programmatic method to validate statement integrity (as supported or unsupported claims) is very important to substantiating the utility of Wikidata for the academic community. On 28 November 2015 at 11:20, Christopher Johnson < christopher.john...@wikimedia.de> wrote: > Thank you for the explanation. The content negotion for an Item IRI is > clear. Any request for http://www.wikidata.org/entity/Q... requires an > Accept application/rdf+xml header in order to get the RDF. The default > response is JSON and Accept text/html returns a 200 response delivering the > UI page. > > For statement resolution in the Item RDF, is not this a fragment? So in > the Item context, the IRI for a statement resource would be > http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the > statement IRI http://www.wikidata.org/entity/statement/Statement_UUID > could just return the statement as a separate entity. > > On the topic of references, a use case is to measure data quality by > counting the number of "unreferenced statements". At > https://phabricator.wikimedia.org/T117234#1834728, I propose the > possibility of using blank reference nodes to identify these "bad" > statements. Having an object to count greatly expedites the query process > because of the estimated cardinality feature of Blazegraph. The only > alternative to this is to count distinct statements with the > prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may > not be possible without a huge amount of memory). > > I do not know what would be involved in implementing blank reference nodes > and what performance consequences may also occur. It seems to me that the > pairing of statements and references is a core feature of the data model, > and it is odd that there can exist statements that have no associated > reference node in the RDF. > > Cheers, > Christopher > > On 27 November 2015 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org> > wrote: > >> Send Wikidata-tech mailing list submissions to >> wikidata-tech@lists.wikimedia.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech >> or, via email, send a message with subject or body 'help' to >> wikidata-tech-requ...@lists.wikimedia.org >> >> You can reach the person managing the list at >> wikidata-tech-ow...@lists.wikimedia.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Wikidata-tech digest..." >> >> >> Today's Topics: >> >>1. RDF Item, Statement and Reference IRI Resolution? >> (Christopher Johnson) >>2. Re: RDF Item, Statement and Reference IRI Resolution? >> (Markus Krötzsch) >> >> >> -- >> >> Message: 1 >> Date: Fri, 27 Nov 2015 07:21:10 +0100 >> From: Christopher Johnson <christopher.john...@wikimedia.de> >> To: wikidata-tech@lists.wikimedia.org, wikimedia-de-tech >> <wikimedia-de-t...@wikimedia.de> >> Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI >> Resolution? >> Message-ID: >> <CACzuuKvGK1dM1+dn4ypocjhO= >> psuk4lltwngzp1yfvp6wmv...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >>
[Wikidata-bugs] [Maniphest] [Commented On] T119775: Create WDQS service for snak / reference hashes
Christopher added a comment. You can get reference hashes for objects using the http://www.wikidata.org/prop/reference/ predicate. For example, PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX prov: <http://www.w3.org/ns/prov#> SELECT (count(distinct(?s)) AS ?scount) WHERE { ?wds wdt:P31 wd:Q10876391 . ?wdref <http://www.wikidata.org/prop/reference/P143> ?wds . ?s prov:wasDerivedFrom ?wdref . } This returns a count of 16,266,065 references to all the Wikipedias (from http://wdm-rdf.wmflabs.org) TASK DETAIL https://phabricator.wikimedia.org/T119775 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Smalyshev, Aklapper, Addshore, StudiesWorld, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5
Thank you for the explanation. The content negotion for an Item IRI is clear. Any request for http://www.wikidata.org/entity/Q... requires an Accept application/rdf+xml header in order to get the RDF. The default response is JSON and Accept text/html returns a 200 response delivering the UI page. For statement resolution in the Item RDF, is not this a fragment? So in the Item context, the IRI for a statement resource would be http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the statement IRI http://www.wikidata.org/entity/statement/Statement_UUID could just return the statement as a separate entity. On the topic of references, a use case is to measure data quality by counting the number of "unreferenced statements". At https://phabricator.wikimedia.org/T117234#1834728, I propose the possibility of using blank reference nodes to identify these "bad" statements. Having an object to count greatly expedites the query process because of the estimated cardinality feature of Blazegraph. The only alternative to this is to count distinct statements with the prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may not be possible without a huge amount of memory). I do not know what would be involved in implementing blank reference nodes and what performance consequences may also occur. It seems to me that the pairing of statements and references is a core feature of the data model, and it is odd that there can exist statements that have no associated reference node in the RDF. Cheers, Christopher On 27 November 2015 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org> wrote: > Send Wikidata-tech mailing list submissions to > wikidata-tech@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > or, via email, send a message with subject or body 'help' to > wikidata-tech-requ...@lists.wikimedia.org > > You can reach the person managing the list at > wikidata-tech-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata-tech digest..." > > > Today's Topics: > >1. RDF Item, Statement and Reference IRI Resolution? > (Christopher Johnson) >2. Re: RDF Item, Statement and Reference IRI Resolution? > (Markus Krötzsch) > > > ------ > > Message: 1 > Date: Fri, 27 Nov 2015 07:21:10 +0100 > From: Christopher Johnson <christopher.john...@wikimedia.de> > To: wikidata-tech@lists.wikimedia.org, wikimedia-de-tech > <wikimedia-de-t...@wikimedia.de> > Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI > Resolution? > Message-ID: > <CACzuuKvGK1dM1+dn4ypocjhO= > psuk4lltwngzp1yfvp6wmv...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > After looking at the RDF format closely, I am asking if the item, statement > and reference IRIs could/should be directly resolvable to XML/JSON > formatted resources. > > It seems that currently http://www.wikidata.org/entity/ redirects to > the UI at https://www.wikidata.org/wiki/ which is not what a machine > reader > would expect. > Without a simple method to resolve the IRIs (perhaps a RESTful API?), these > RDF data objects are opaque for parsers. > > Of course, with wbgetclaims, it is possible to get the statement like this: > > https://www.wikidata.org/w/api.php?action=wbgetclaims=xml=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 > > but the API expected GUID format does not match the RDF UUID representation > (there is a $ or "%24" after the item instead of a -) and it returns both > the statement and the references. > > Since the reference is its own node in the RDF, it can be queried > independently. For example, to ask "return all of the statements where > reference R is bound." But then, the return value is a list of statement > IDs and a subquery or separate query is then required to return the > associated statement node. > > I am also wondering why item, statement and reference "UUIDs" are not in > canonical format in the RDF. This is a question of compliance with IETF > guidelines, which may or may not be relevant. > > Item: Q20913766 > Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 > Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9 > > See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format > See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml > and http://tools.ietf.org/html/rfc4122 for information on urn:uuid > guidelines. &g
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T119775: Create WDQS service for snak / reference hashes
Christopher added a subscriber: Christopher. TASK DETAIL https://phabricator.wikimedia.org/T119775 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Smalyshev, Aklapper, Addshore, StudiesWorld, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-tech] RDF Item, Statement and Reference IRI Resolution?
Hi, After looking at the RDF format closely, I am asking if the item, statement and reference IRIs could/should be directly resolvable to XML/JSON formatted resources. It seems that currently http://www.wikidata.org/entity/ redirects to the UI at https://www.wikidata.org/wiki/ which is not what a machine reader would expect. Without a simple method to resolve the IRIs (perhaps a RESTful API?), these RDF data objects are opaque for parsers. Of course, with wbgetclaims, it is possible to get the statement like this: https://www.wikidata.org/w/api.php?action=wbgetclaims=xml=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 but the API expected GUID format does not match the RDF UUID representation (there is a $ or "%24" after the item instead of a -) and it returns both the statement and the references. Since the reference is its own node in the RDF, it can be queried independently. For example, to ask "return all of the statements where reference R is bound." But then, the return value is a list of statement IDs and a subquery or separate query is then required to return the associated statement node. I am also wondering why item, statement and reference "UUIDs" are not in canonical format in the RDF. This is a question of compliance with IETF guidelines, which may or may not be relevant. Item: Q20913766 Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9 See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml and http://tools.ietf.org/html/rfc4122 for information on urn:uuid guidelines. Thanks for your feedback, Christopher ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. I am blocked on this by several problems with the data model/ontology. The question of the relationship of the data model and the RDF node definitions is a bit complicated, perhaps more so than it should be. A reference is a special type of statement defined by its relationship to other statements. An "unreferenced statement" is undefined in the ontology and in the RDF format. All statements **should** in practice have a reference node. But this is not an enforceable constraint in the data model apparently. I think that when a statement is born, it should also create a reference "placeholder" or blank node in the RDF. With this information in the RDF, counting these "bad" statements would be much easier. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. Truthy statement counts per Item can be done like this: PREFIX wd: <http://www.wikidata.org/entity/> SELECT (count(distinct(?o)) AS ?ocount) WHERE { wd:Q7239 ?p ?o FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/prop/direct;)) } Labels per Item like this: PREFIX wd: <http://www.wikidata.org/entity/> SELECT (count(distinct(?o)) AS ?ocount) WHERE { wd:Q7239 ?p ?o FILTER (REGEX(STR(?p), "http://www.w3.org/2000/01/rdf-schema#label;)) } Descriptions per Item: PREFIX wd: <http://www.wikidata.org/entity/> SELECT (count(distinct(?o)) AS ?ocount) WHERE { wd:Q7239 ?p ?o FILTER (REGEX(STR(?p), "http://schema.org/description;)) } Sitelinks per item: PREFIX wd: <http://www.wikidata.org/entity/> SELECT (count(distinct(?s)) AS ?ocount) WHERE { ?s ?p wd:Q7239 FILTER (REGEX(STR(?p), "http://schema.org/about;)) } TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. OK. So the title "Referenced Statements by Statement Type" is just wrong then. Rather, it shows **All Statements ** by Type" | Date | itemlink | string | globecoordinate | time | quantity | somevalue | novalue | Total | | 2015-10-19 | 46,177,560 | 20,631,391 | 2,363,191 | 3,588,295 | 470,476 | 9,630 | 4,436 | 73,244,979 | TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. True, a statement is either referenced or "unreferenced". Getting the number of referenced statements (currently 41,735,203) is easy and fast with: curl -G https://query.wikidata.org/bigdata/namespace/wdq/sparql --data-urlencode ESTCARD --data-urlencode 'p=<http://www.w3.org/ns/prov#wasDerivedFrom>' So we use the total of wikibase:Statement objects to represent the total number of statements and subtract referenced statements to get "unreferenced statements". What is still murky to me, and I think possibly wrong with the todo/stats data, is the "Referenced statements by statement type". Something does not add up there because the total should not be greater than the sum of "Statements referenced to Wikipedia by statement type" and "Statements referenced to other sources by statement type" ? For getting counts of objects per item, this means running 19M separate queries or is there another way? Creating a script to do this would be very similar to the property distribution method that I have already done I guess. Basically ask "list all of the items" and then "lapply(items, count labels, statements, links, descriptions)" TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. OK. I may have found an answer to the question of wildcard "Prefix Matching" that is necessary in order to query for number of statements in an item. PREFIX bds: <http://www.bigdata.com/rdf/search#> prefix wikibase: <http://wikiba.se/ontology#> SELECT (count(distinct(?s)) AS ?scount) WHERE { wd:Q20903715 ?p wikibase:Item ?s bds:search "wd:statement*" . } This requires the FullTextSearch https://wiki.blazegraph.com/wiki/index.php/FullTextSearch to be enabled (it is not on query.wikidata.org). I will test on labs. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. Yes. It seems I need to disable the 10 minute query timeout set here first: https://github.com/wikimedia/wikidata-query-rdf/blob/b3e646284f0b74131bce99a1b7d5fc6bfe675ec1/war/src/config/web.xml#L55 A fat query like this: PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX prov: <http://www.w3.org/ns/prov#> SELECT (count(distinct(?wds)) AS ?scount) WHERE { ?wds ?p wikibase:Statement . OPTIONAL { ?wds1 <http://www.w3.org/ns/prov#wasDerivedFrom> ?o . FILTER (?wds1 = ?wds) . } FILTER (!bound(?wds1)) . } to find out how many statements do not have references is currently not possible. There may be a better way to ask for this, but the way that the data is coded does not really facilitate type joins. An important point is that wikidata-todo/stats, and possibly the standing perception of the data, assumes an iterable hierarchy. But RDF does not make hierarchy. So an Item does not "contain" statements, and statements do not "contain" references. The relationship between statements and references is difficult to query by type, because a binding triple looks like this: wd:statement/Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 prov:wasDerivedFrom wdref:39f3ce979f9d84a0ebf09abe1702bf22326695e9 Note that simply counting the frequency of http://www.w3.org/ns/prov#wasDerivedFrom and comparing that to the frequency of wikibase:Statement would provide a kind of global ratio that is a fast and easy alternative to counting individual statements without references. I am rebuilding wdm-rdf now with the new Munger and no query timeout. Also, I will load the dump from 17 November, so that the updater has some chance to sync. It had fallen back to 14 days old, and I doubt that it would ever have caught up. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117735: Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVs
Christopher added a subscriber: Christopher. Christopher added a comment. To expand on the use cases for a metrics storage backend here is appropriate. I think that Wikidata content metrics favor long term retention (i.e. forever) because their purpose is to evaluate dynamics over both short and long period intervals. Since content is always changing, recreation of a past state from live data is not possible. The value of these historical measurement "snapshots" is therefore quite high. These old data are never archived either and must be able to be retrieved without loading a dump or using some offline process. In contrast, ops metrics are much more focused on the present and/or recent state. Thus, two different use cases exist here. If the proposal to use Graphite can substantiate a long term ( not decaying ) storage method, then it should work for both. If not, then something else (like OpenTSDB/ HBase) should be implemented. TASK DETAIL https://phabricator.wikimedia.org/T117735 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore, Christopher Cc: Christopher, Aklapper, StudiesWorld, Addshore, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster
Christopher added a comment. I am not sure why this is considered to be "a simple use case" since as mentioned in https://phabricator.wikimedia.org/T117735 there are at least two different requirements. Content metrics require long term (non-decaying) storage, operational metrics do not. Whisper (Graphite's database) is not robust and has a fixed size. Even the documentation says it is not "disk space efficient". Of course, if we assume that the need is only to record a small number of data points with a low resolution, none of this matters. The added complexity of introducing backups and HDFS,, etc. to the Graphite proposition does not seem "simple". Also, the puppet module would still need to be reconfigured/modified as @Addshore tried to do, for long term retention, but this does not solve the archiving problem. There has to be a built in way to preserve and "snapshot" the database, or else it could be a real pain to restore. And, in the interim period from snapshot to restoration all measurements would be lost, unless it were on a cluster. As far as I know, Cassandra can also run on a single instance, it does not need a cluster. TASK DETAIL https://phabricator.wikimedia.org/T117732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster
Christopher added a comment. If not HBase, what about Cassandra? This is already puppetized. At least you will be using a storage solution that is designed for HDFS. TASK DETAIL https://phabricator.wikimedia.org/T117732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster
Christopher added a comment. If you are going to use HDFS, why not just use HBase instead of Graphite? TASK DETAIL https://phabricator.wikimedia.org/T117732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. No. the blocking task code enables an option to not filter item, statement, value and reference rdf:types in the munger. I decided not to wait for this, so that I could get started, but having it in master is very helpful going forward. In order to have these types on live wdqs, would require a complete rebuild of their data, which takes a long time. The wdm-rdf instance is a clone that includes these types, and should eventually synch up to production (hopefully in another 5 or 6 day ... 24 hours of edits takes approx. 12 hours to process). It is possible to do estimated cardinality queries on live wdqs for the property usage counts and anything else other than these primary types, however. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Christopher added a comment. Update: All data loaded into Blazegraph (it took over 24 hours). Sync now running and up to 27 October. Using Fast Range Counts returns counts of content objects instantly. Examples: curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql --data-urlencode ESTCARD --data-urlencode 'o=http://wikiba.se/ontology#Item' Number of Items: 18,733,307 curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql --data-urlencode ESTCARD --data-urlencode 'o=http://wikiba.se/ontology#Statement' Number of Statements: 74,709,111 curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql --data-urlencode ESTCARD --data-urlencode 'p=http://www.w3.org/ns/prov#wasDerivedFrom' Number of Predicate wasDerivedFrom: 38,985,221 Trending these kinds of objects should show interesting usage frequency patterns. TASK DETAIL https://phabricator.wikimedia.org/T117234 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Block] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher reopened blocking task T117194: Evaluate Spark on YARN as "Open". TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher added a comment. Note: A new task will be created for measuring SPARQL performance for counting tasks in different environments. This has some relationship to Hadoop and Spark potentially, but the first step is profile Blazegraph with complex counting queries and use this as a benchmark for improvement. TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore, Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher added a comment. Can we agree that Graphite is the way forward for the backend and close this task? TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher closed blocking task T117194: Evaluate Spark on YARN as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore, Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher closed blocking task T117194: Evaluate Spark on YARN as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher closed blocking task T117195: Develop Wikidata (JSON or RDF) Dump Processing API for use with Spark as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore, Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard
Christopher added a comment. I have observed that the property data does not have a persistent frequency. (i.e some days there are no values reported). It may be better to generate null values for properties regularly if they do not report usage. There are two options with the aggregate table: 1. show all properties without latest value. 2. only show latest reported properties. I favor option 2. To have a complete list of properties with option 2, though requires a consistent reported dataset including nulls. This is the patchset for the change: https://gerrit.wikimedia.org/r/#/c/250185/4 TASK DETAIL https://phabricator.wikimedia.org/T116009 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard
Christopher added a comment. See the change here: http://wdm.wmflabs.org/?t=wikidata_daily_getclaims_property_use TASK DETAIL https://phabricator.wikimedia.org/T116009 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist
Christopher added a comment. This is why there is the config.R file. The only path variable that needs to be changed is there. See base_uri <- "/srv/dashboards/shiny-server/wdm/". In windows this would be C:\whatever\whatever I guess. TASK DETAIL https://phabricator.wikimedia.org/T116150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist
Christopher closed this task as "Resolved". Christopher set Security to None. TASK DETAIL https://phabricator.wikimedia.org/T116150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist
Christopher added a subscriber: Christopher. Christopher added a comment. I cannot reproduce this now. I assume that this is fixed. The file is local and in the repo now. https://github.com/wikimedia/wikidata-analytics-dashboard/blob/master/data/wikidata_eng_social_media.tsv TASK DETAIL https://phabricator.wikimedia.org/T116150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard
Christopher added a subscriber: Christopher. Christopher added a comment. What is the benefit of having all properties on one graph? To me, the simplest approach is to pass a parameter with a single property id from ordered table link to a chart. To analyse the trend of a single property over time seems valuable, and possible, but because of the wide range, I do not think that graphing all property values on one chart is. TASK DETAIL https://phabricator.wikimedia.org/T116009 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Closed] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist
Christopher closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T116150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: gerritbot, Christopher, Addshore, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T117206: Move KPI section up to dashboard
Christopher added a subscriber: Christopher. Christopher added a comment. Does this mean that you would prefer the KPI tab on the dashboard sidebar to be first in the list? TASK DETAIL https://phabricator.wikimedia.org/T117206 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Abraham, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher added a project: WMDE-Analytics-Engineering. Christopher set Security to None. TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T116009: Add graph to getclaimsusage on dashboard
Christopher moved this task to Doing on the WMDE-Analytics-Engineering workboard. TASK DETAIL https://phabricator.wikimedia.org/T116009 WORKBOARD https://phabricator.wikimedia.org/project/board/1585/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Project Column] T113180: Create semantic definitions for Wikidata Metrics
Christopher moved this task to Doing on the WMDE-Analytics-Engineering workboard. TASK DETAIL https://phabricator.wikimedia.org/T113180 WORKBOARD https://phabricator.wikimedia.org/project/board/1585/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: gerritbot, Christopher, Aklapper, JanZerebecki, Deskana, Ricordisamoa, EBernhardson, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T115242: Add Munger option to not filter uninteresting object type triples
Christopher added a blocked task: T117234: Reproduce wikidata-todo data using analytics infrastructure . TASK DETAIL https://phabricator.wikimedia.org/T115242 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Christopher Cc: JanZerebecki, Aklapper, Christopher, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T117203: [WD] External usage KPI
Christopher added subscribers: Addshore, Christopher. Christopher added a comment. Do you mean this https://searchdata.wmflabs.org/external/ ? This should be able to be retrieved on short interval from Graphite? @Addshore? The KPI is defined with a "rolling 30 day window". Is this a requirement? A 30-day aggregate might be a super huge number... TASK DETAIL https://phabricator.wikimedia.org/T117203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Christopher, Addshore, Lydia_Pintscher, Abraham, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
Christopher added blocking tasks: T117194: Evaluate Spark on YARN, T117195: Develop Wikidata (JSON or RDF) Dump Processing API for use with Spark. TASK DETAIL https://phabricator.wikimedia.org/T116547 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, Ricordisamoa, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
It is possible that a Hadoop architecture could provide the performance and scalability needed for robust statistical analysis of the Wikidata RDF datasets. It is also possible that Jena may have better integration tools with Hadoop that Blazegraph. See https://jena.apache.org/documentation/hadoop/ I do not see a direct relationship however between T115242 and performance other than that the reasoning behind filtering these "boring" objects is based on the perceived negative performance impact of allowing them to be queried from a publicly accessible endpoint. The intent of T115242 is to provide these objects in a dataset to a "nonpublic" query interface for metrics evaluation only. The question that should be asked is whether Blazegraph and the WDQS platform are robust enough for intense stat analysis and if not, why and what can be done to improve them? On 26 Oct 2015 10:00, "JanZerebecki" <no-re...@phabricator.wikimedia.org> wrote: > JanZerebecki added a comment. > > @Christopher can as he created https://phabricator.wikimedia.org/T115242. > > > TASK DETAIL > https://phabricator.wikimedia.org/T116547 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: JanZerebecki > Cc: Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, > Ricordisamoa, Wikidata-bugs, aude > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T115120: Wikidata Metrics
Christopher added a project: WMDE-Analytics-Engineering. Christopher set Security to None. TASK DETAIL https://phabricator.wikimedia.org/T115120 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, TempleM, Aklapper, RP88, revi, Luke081515, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Gryllida, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T113180: Create semantic definitions for Wikidata Metrics
Christopher added a project: WMDE-Analytics-Engineering. Christopher set Security to None. TASK DETAIL https://phabricator.wikimedia.org/T113180 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: gerritbot, Christopher, Aklapper, JanZerebecki, Deskana, Ricordisamoa, EBernhardson, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T108404: [Story] create a Wikidata analytics dashboard
Christopher added a project: WMDE-Analytics-Engineering. TASK DETAIL https://phabricator.wikimedia.org/T108404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: gerritbot, Addshore, Lydia_Pintscher, EBernhardson, Ricordisamoa, Deskana, JanZerebecki, Aklapper, Wikidata-bugs, aude ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T115120: Wikidata Metrics
Christopher added a comment. @Andrew Is there something else that needs to be said/done in order to make this happen? Currently, the development dashboard is running on the scrumbugz project (http://wdm.wmflabs.org/wdm/), so this will just be reallocated. Additional note: If the RDF dumps are available on /public/dumps, access to this would be beneficial. TASK DETAIL https://phabricator.wikimedia.org/T115120 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, TempleM, Aklapper, RP88, revi, Luke081515, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Gryllida, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T115242: Add Munger option to not filter uninteresting object type triples
Christopher created this task. Christopher assigned this task to Smalyshev. Christopher added a subscriber: Christopher. Christopher added projects: Wikidata-Query-Service, Wikidata. Christopher moved this task to All WDQS-related tasks on the Wikidata-Query-Service workboard. Herald added a subscriber: Aklapper. Herald added a project: Discovery. TASK DESCRIPTION Triples with object types wikibase:Item, wikibase:Statement, wikibase:Reference, and wikibase:Value are filtered by default by the Munger. For certain use cases, like object counting and comparison, it is desirable to retain these. Adding an option to not filter uninteresting, similar to not removeSiteLinks, should be an option. See T115120 TASK DETAIL https://phabricator.wikimedia.org/T115242 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, Christopher Cc: Aklapper, Christopher, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T115120: Wikidata Metrics
Christopher added a subscriber: Smalyshev. Christopher added a comment. After researching this, I have discovered that the Munger that processes the RDF dump removes several ontology types (wikibase:Item, wikibase:Statement, wikibase:Reference, and wikibase:Value) that are needed for object counting and comparison. See here https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/rdf/Munger.java, lines 405, 466, 514, 556. @Smalyshev Is it possible to add an option to keep them? And approximately how much additional space/memory would these use? TASK DETAIL https://phabricator.wikimedia.org/T115120 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, TempleM, Aklapper, RP88, Revi, Luke081515, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Gryllida, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs