[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata

2019-07-02 Thread Christopher
Christopher added a comment.


  Do you foresee any changes to the context/vocabulary/ontology in the future 
(e.g. implementing processing features of JSON-LD 1.1)?   How will context 
changes be versioned / published?
  
  Could not also the ontology <http://wikiba.se/ontology-1.0.owl#> be 
dereferenceable as a json-ld context?  Then you could use @vocab to provide a 
default for the wikibase properties and types.  (e.g. "@vocab": 
"http://wikiba.se/ontology-1.0.jsonld;)

TASK DETAIL
  https://phabricator.wikimedia.org/T207168

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: cscott, Christopher
Cc: Addshore, WMDE-leszek, Pablo-WMDE, dbarratt, abian, _jensen, Christopher, 
Salgo60, daniel, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, 
intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, 
Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, 
Smalyshev, Lea_Lacroix_WMDE, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, rosalieper, Jonas, Wikidata-bugs, 
aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata

2018-10-23 Thread Christopher
Christopher added a comment.
thanks, I look forward to this being deployed.  json-ld will be very useful for wikidata, particularly framing.  You might want to consider providing the context as a remote link to reduce the payloads (and "noise" in the data).   Here is that test entity, framed on the playground.  Notice how it merges the statements and references.  (sorry for the long link ...)
https://json-ld.org/playground-dev/#startTab=tab-framed=https%3A%2F%2Ftest.wikidata.org%2Fwiki%2FSpecial%3AEntityData%2FQ64.jsonld=%7B%22%40context%22%3A%7B%22wdata%22%3A%22https%3A%2F%2Ftest.wikidata.org%2Fwiki%2FSpecial%3AEntityData%2F%22%2C%22schema%22%3A%22http%3A%2F%2Fschema.org%2F%22%2C%22about%22%3A%7B%22%40id%22%3A%22schema%3Aabout%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22wd%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fentity%2F%22%2C%22cc%22%3A%22http%3A%2F%2Fcreativecommons.org%2Fns%23%22%2C%22license%22%3A%7B%22%40id%22%3A%22cc%3Alicense%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22softwareVersion%22%3A%7B%22%40id%22%3A%22schema%3AsoftwareVersion%22%7D%2C%22version%22%3A%7B%22%40id%22%3A%22schema%3Aversion%22%7D%2C%22xsd%22%3A%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%22%2C%22dateModified%22%3A%7B%22%40id%22%3A%22schema%3AdateModified%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22wikibase%22%3A%22http%3A%2F%2Fwikiba.se%2Fontology-beta%23%22%2C%22statements%22%3A%7B%22%40id%22%3A%22wikibase%3Astatements%22%7D%2C%22identifiers%22%3A%7B%22%40id%22%3A%22wikibase%3Aidentifiers%22%7D%2C%22sitelinks%22%3A%7B%22%40id%22%3A%22wikibase%3Asitelinks%22%7D%2C%22rdfs%22%3A%22http%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%22%2C%22label%22%3A%7B%22%40id%22%3A%22rdfs%3Alabel%22%7D%2C%22skos%22%3A%22http%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%22%2C%22prefLabel%22%3A%7B%22%40id%22%3A%22skos%3AprefLabel%22%7D%2C%22name%22%3A%7B%22%40id%22%3A%22schema%3Aname%22%7D%2C%22wdt%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fdirect%2F%22%2C%22P63%22%3A%7B%22%40id%22%3A%22wdt%3AP63%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22P17%22%3A%7B%22%40id%22%3A%22wdt%3AP17%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22p%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2F%22%2C%22wds%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fentity%2Fstatement%2F%22%2C%22p%3AP63%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22rank%22%3A%7B%22%40id%22%3A%22wikibase%3Arank%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22ps%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fstatement%2F%22%2C%22ps%3AP63%22%3A%7B%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22psv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fstatement%2Fvalue%2F%22%2C%22wdv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fvalue%2F%22%2C%22psv%3AP63%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22quantityAmount%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityAmount%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityUpperBound%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityUpperBound%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityLowerBound%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityLowerBound%22%2C%22%40type%22%3A%22xsd%3Adecimal%22%7D%2C%22quantityUnit%22%3A%7B%22%40id%22%3A%22wikibase%3AquantityUnit%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22pq%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fqualifier%2F%22%2C%22P66%22%3A%7B%22%40id%22%3A%22pq%3AP66%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22pqv%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Fqualifier%2Fvalue%2F%22%2C%22pqv%3AP66%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22timeValue%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeValue%22%2C%22%40type%22%3A%22xsd%3AdateTime%22%7D%2C%22timePrecision%22%3A%7B%22%40id%22%3A%22wikibase%3AtimePrecision%22%7D%2C%22timeTimezone%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeTimezone%22%7D%2C%22timeCalendarModel%22%3A%7B%22%40id%22%3A%22wikibase%3AtimeCalendarModel%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22prov%22%3A%22http%3A%2F%2Fwww.w3.org%2Fns%2Fprov%23%22%2C%22wasDerivedFrom%22%3A%7B%22%40id%22%3A%22prov%3AwasDerivedFrom%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22wdref%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Freference%2F%22%2C%22pr%22%3A%22http%3A%2F%2Ftest.wikidata.org%2Fprop%2Freference%2F%22%2C%22P20%22%3A%7B%22%40id%22%3A%22pr%3AP20%22%7D%2C%22p%3AP17%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22ps%3AP17%22%3A%7B%22%40type%22%3A%22%40id%22%7D%2C%22propertyType%22%3A%7B%22%40id%22%3A%22wikibase%3ApropertyType%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22directClaim%22%3A%7B%22%40id%22%3A%22wikibase%3AdirectClaim%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22claim%22%3A%7B%22%40id%22%3A%22wikibase%3Aclaim%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22statementProperty%22%3A%7B%22%40id%22%3A%22wikibase%3AstatementProperty%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22statementValue%22%3A%7B%22%40id%22%3A%22wikibase%3AstatementValue%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22qualifier%22%3A%7B%22%40id%22%3A%22wikibase%3Aqualifier%22%2C%22%40type%22%3A%22%40id%22%7D%2C%22qualifierValue%22%3A%7B%22%40id%22%3A%22wikibase%3Aqualifi

[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata

2018-10-23 Thread Christopher
Christopher added a comment.
according to mailing list (Wikidata Digest, Vol 83, Issue 18), this now enabled on beta.  Yet when one requests the link: https://wikidata.beta.wmflabs.org/wiki/Special:EntityData/Q64.jsonld, it does not work?TASK DETAILhttps://phabricator.wikimedia.org/T207168EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscott, ChristopherCc: Christopher, Salgo60, daniel, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, Smalyshev, Lea_Lacroix_WMDE, Nickleh, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata] How to find the Dbpedia data for a Wikidata

2018-04-30 Thread Christopher Johnson
​ Hi Scott, ​

One way to do that would be to get the language code label list from WDQS
with this:
http://tinyurl.com/y9p7q9l2
SELECT ?label WHERE {?s wdt:P424 ?code; rdfs:label ?label filter
(lang(?label) = "en").}

and then stream the list to LDF client [1] requests
https://tinyurl.com/ycc3dyce
ldf-client https://query.wikidata.org/bigdata/ldf
http://fragments.dbpedia.org/2015-10/en "SELECT * WHERE {?s rdfs:label "+
language + "@en  . ?s owl:sameAs ?link }"

The results would be in JSON from the client.  It should give a relatively
complete list of WIkidata language code entity corresponding resources in
DBpedia

Also, an simple way to get a dbpedia resource with TPF is with the entity
label which is one of the properties that is the same for both datasets.
So,
SELECT * WHERE {
?s rdfs:label "German"@en  .
}
will return the matching dbpedia and wikidata resources for that label.
This could also perhaps be done with a federated query in WDQS.(untested).

Christopher Johnson

[1] https://github.com/LinkedDataFragments/Client.js


> Message: 3
> Date: Sun, 29 Apr 2018 17:48:10 -0700
> From: Scott MacLeod <worlduniversityandsch...@gmail.com>
> To: Discussion list for the Wikidata project
> <wikidata@lists.wikimedia.org>
> Subject: Re: [Wikidata] How to find the Dbpedia data for a Wikidata
> item?
> Message-ID:
> <CADy6Cs8pVNqEQu909Y64DXFOq7SBJs4M3stvzPD7GQ3AMZNBkw@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Paris Writers' News/PWN, Markus, and Wikidatans,
>
> Based on your example (http://tinyurl.com/yahwql2n), Markus, I'm seeking
> to
> learn how to do a similar query for all languages.
>
> In Wikidata I found a Q item # for "language" - Q34770 (
> https://www.wikidata.org/wiki/Q34770) - and plugged this into your query,
> replaced the word "countries" with "languages," etc. but didn't get a
> result, where your query yields 209 countries, Markus.
>
> In a parallel way, how would one compute them from the names of the
> articles in Wikipedia?
>
> Thanks,
> Scott
>
>
>
>
>
>
>
> On Fri, Apr 27, 2018 at 2:31 PM, Markus Kroetzsch <
> markus.kroetz...@tu-dresden.de> wrote:
>
> > Hi,
> >
> > (English) DBpedia URIs are basically just (English) Wikipedia URIs with
> > the first part exchanged. So one can compute them from the names of the
> > articles. Example: a query for DBpedia URIs for all countries:
> >
> > http://tinyurl.com/yahwql2n
> >
> > """
> > SELECT ?dbpediaId
> > WHERE
> > {
> >   ?item wdt:P31 wd:Q6256 . # for the example: get IDs for all countries
> >   ?sitelink schema:about ?item ;
> > schema:isPartOf <https://en.wikipedia.org/> .
> >
> > BIND(URI(CONCAT("http://dbpedia.org/resource/",SUBSTR(
> STR(?sitelink),31)))
> > as ?dbpediaId)
> > }
> > """
> >
> > Of course, depending on your use case, you can do the same offline
> > (without requiring SPARQL to rewrite the id strings for you).
> >
> > In theory, one could use federation to pull in data from the DBpedia
> > endpoint, but in practice I could not find an interesting query that
> > completes within the timeout (but I did not try for very long to debug
> > this).
> >
> > Best regards,
> >
> > Markus
> >
> >
> >
> >
> > On 23/04/18 06:41, PWN wrote:
> >
> >> If one knows the Q code (or URI) for an entity on Wikidata, how can one
> >> find the Dbpedia Id and the information linked to it?
> >> Thank you.
> >>
> >> Sent from my iPad
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >>
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
>
>
> --
>
> --
> - Scott MacLeod - Founder & President
> - World University and School
> - http://worlduniversityandschool.org
>
> - 415 480 4577
> - http://scottmacleod.com
>
>
> - CC World University and School - like CC Wikipedia with best STEM-centric
> CC OpenCourseWare - incorporated as a nonprofit university and school in
> California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
>
>
> IMPORTANT NOTICE: This transmission and any attachments are intended only
> for the use of the

Re: [Wikidata] How to split a label by whitespace in WDQS ?

2017-09-19 Thread Christopher Johnson
Hi Thad,

"Assignment" can be done with CONSTRUCT, so something like this would work
to split a name into two parts:
PREFIX ex: <http://example.org#>
CONSTRUCT {
 ?human ex:hasfirstName ?first.
  ?human ex:hasSecondName ?second
 } WHERE
{
  ?human wdt:P31 wd:Q5; rdfs:label ?label .
  BIND (STRBEFORE(?label, " ") AS ?first) .
  BIND (STRAFTER(?label, " ") AS ?second) .
  FILTER (lang(?label)= "en") .
}

Christopher Johnson
Scientific Associate
Universitätsbibliothek Leipzig

On 19 September 2017 at 14:00, <wikidata-requ...@lists.wikimedia.org> wrote:

> Send Wikidata mailing list submissions to
> wikidata@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> or, via email, send a message with subject or body 'help' to
> wikidata-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata digest..."
>
>
> Today's Topics:
>
>1. Weekly Summary #278 (Léa Lacroix)
>2. How to split a label by whitespace in WDQS ? (Thad Guidry)
>3. Re: How to split a label by whitespace in WDQS ? (Marco Neumann)
>4. Re: How to split a label by whitespace in WDQS ?
>   (Nicolas VIGNERON)
>5. Re: How to split a label by whitespace in WDQS ?
>   (Lucas Werkmeister)
>6. Re: How to split a label by whitespace in WDQS ? (Thad Guidry)
>7. Categories in RDF/WDQS (Stas Malyshev)
>
>
> --
>
> Message: 1
> Date: Mon, 18 Sep 2017 17:36:38 +0200
> From: Léa Lacroix <lea.lacr...@wikimedia.de>
> To: "Discussion list for the Wikidata project."
> <wikidata@lists.wikimedia.org>
> Subject: [Wikidata] Weekly Summary #278
> Message-ID:
>  1798em4...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> *Here's your quick overview of what has been happening around Wikidata over
> the last week.*Events
> <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events>/
> Press/Blogs
> <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage>
>
>- Upcoming: Wikidata Wahldaten Workshop 2017
><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_
> Wahldaten_Workshop_2017>
>– 30 September 2017 in Vienna, Austria
>- Upcoming: Wikimedia Research Showcase
><https://meta.wikimedia.org/wiki/Wikimedia_Research/
> Showcase#September_2017>
>- Past: Wikidata workshop in Zurich
><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_Zurich> (the
>slides of the speakers are linked on the page)
>- Past: GLAMhack Wikidata workshop in Lausanne (see the slides of the
> Query
>Service introduction
><https://docs.google.com/presentation/d/1hwUBbtP0TppAKrEpjtSjdOXePZ_
> 7OIRNDWsAHzVk0NA/edit#slide=id.g1f4d0124c0_0_0>
>)
>- Past: Wikidata workshop in Kolkata
><https://www.wikidata.org/wiki/Wikidata:Events/Wikidata_
> workshop_Kolkata_2017>,
>India
>- Bridging real and fictional worlds
><https://medium.com/wiki-playtime/bridging-real-and-
> fictional-worlds-1af32ee65a26>
>in Wikidata, by Martin Poulter
>- Weekend at the Museum (of Brittany)
><https://www.lehir.net/weekend-at-the-museum-of-brittany/>, by Envel Le
>Hir <https://www.wikidata.org/wiki/User:Envlh>
>- Wiki Loves Monuments und Wikidata
><http://archivalia.hypotheses.org/67371>, by SW
>- The French Connection at the Wikimania 2017 Hackathon
><https://www.lehir.net/the-french-connection-at-the-
> wikimania-2017-hackathon/>,
>by Envel Le Hir <https://www.wikidata.org/wiki/User:Envlh>
>
> Other Noteworthy Stuff
>
>- Wikidata ontology explorer
><https://lucaswerkmeister.github.io/wikidata-ontology-explorer/>:
>creates a tree of a class or property, shows common properties and
>statements
>- Join the mysterious group of Wikidata:Flashmob
><https://www.wikidata.org/wiki/Wikidata:Flashmob> who improve labels,
> or
>summon them on an item
>- A breaking change to the *wbcheckconstraints* API output format was
>announced
><https://www.wikidata.org/wiki/Wikidata:Project_chat#BREAKING_CHANGE:_
> wbcheckconstraints_API_output_format>
>- Q4000 <https://www.wikidata.org/wiki/Q4000> was created
>- Improvements coming soon to Recent Changes
><h

Re: [Wikidata] Wikidata Digest, Vol 70, Issue 11

2017-09-11 Thread Christopher Johnson
Hi Amir,

The idea that I think that you are trying to render reminds of a query I
wrote some time ago that uses the number of sitelinks (which basically
equates to the number of different language wikipedia articles for a given
Wikidata concept) to make an ranked list (providing a rudimentary metric
for linguistic "coverage") for a type.

For example, this one ranks instances of Q571 (a book).
prefix schema: <http://schema.org/>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount)
WHERE {
  ?s wdt:P31 wd:Q571 .
  ?sitelink schema:about ?s .
  ?s wdt:P50 ?author
  OPTIONAL {
 ?s rdfs:label ?desc filter (lang(?desc) = "en").
   }
  OPTIONAL {
 ?author rdfs:label ?authorlabel filter (lang(?authorlabel) = "en").
   }
 } GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount)

Hope it helps,
Christopher

On 11 September 2017 at 14:00, <wikidata-requ...@lists.wikimedia.org> wrote:

> Send Wikidata mailing list submissions to
> wikidata@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata
> or, via email, send a message with subject or body 'help' to
> wikidata-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata digest..."
>
>
> Today's Topics:
>
>1. missing/existing Wikipedia articles by number of speakers
>   (Amir E. Aharoni)
>2. Re: [Wikimediaindia-l] New portal on Wikidata:
>   Wikidata:WikiProject India (Abhijeet Safai)
>3. Re: missing/existing Wikipedia articles by number of  speakers
>   (Reem Al-Kashif)
>4. Re: missing/existing Wikipedia articles by number of  speakers
>   (Gerard Meijssen)
>
>
> --
>
> Message: 1
> Date: Sun, 10 Sep 2017 15:43:40 +0300
> From: "Amir E. Aharoni" <amir.ahar...@mail.huji.ac.il>
> To: "Discussion list for the Wikidata project."
> <wikidata@lists.wikimedia.org>
> Subject: [Wikidata] missing/existing Wikipedia articles by number of
> speakers
> Message-ID:
> <CACtNa8sEhyz66jm9XOzM5SAYzb3Nd2tDiqQcnxn5SxmRx91r_g@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> Is there an existing tool that shows whether a Wikipedia article exists or
> doesn't exist in a list of languages sorted by the number of speakers?
>
> For example, I'd give this tool an article name, and it would show me a
> list similar to the one at the English Wikipedia article [[List of
> languages by total number of speakers]], and indicating whether the article
> exists or not in each language.
>
> If there is no such tool, I guess I could write something in SPARQL, but
> I'd have to learn SPARQL first, so I'm trying to ask here :)
>
> Thanks!
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
> -- next part --
> An HTML attachment was scrubbed...
> URL: <https://lists.wikimedia.org/pipermail/wikidata/
> attachments/20170910/2c37931c/attachment-0001.html>
>
> --
>
> Message: 2
> Date: Sat, 9 Sep 2017 10:32:26 +0530
> From: Abhijeet Safai <abhijeet.sa...@gmail.com>
> To: Wikimedia India Community list
> <wikimediaindi...@lists.wikimedia.org>
> Cc: "Discussion list for the Wikidata project."
> <wikidata@lists.wikimedia.org>
> Subject: Re: [Wikidata] [Wikimediaindia-l] New portal on Wikidata:
> Wikidata:WikiProject India
> Message-ID:
> <CAAwPGk1+U41F0rkog7pNUc1X=X6HSq6dkQuWw-HkQ7ym_370-w@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> "This WikiProject is to coordinate the efforts to create, enhance and
> populate the coverage of topics related to India including her history,
> geography, culture, society, people, infrastructure, education,
> demographics and anything related between India and other fields such as
> science, technology, arts, entertainment etc."
>
> Excellent! I am extremely happy to see it. I do not know how much I will be
> able to help, but I will try to help as per my time and abilities.
>
> --
> Dr. Abhijeet Safai
>
> On Sat, Sep 9, 2017 at 3:46 AM, ViswaPrabha (വിശ്വപ്രഭ) <

[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-05-20 Thread Christopher
Christopher added a comment.
I can add here that in fcrepo4, that with PR #1187 they have changed to not use RFC5785 for representing Skolemized bnodes.  Instead, a new fragment URI convention has been implemented, so internally minted UUIDs are appended to the resource subject as a fragment (aka Hash URI identifier) rather than creating a new resource node.  This convention actually makes more sense than RFC5785 for statements and references I suspect.  Graph serializations then would "naturally" entail these identifier bnodes in a single resource/entity context, and this then facilitates round-tripping and other downstream from RDF operations, like JSON-LD framing.TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-02-16 Thread Christopher
Christopher added a comment.
The fact remains that the claim without its entity relationship, represented in the GUID by the Q prefix, would be lost into a vacuum of nothing.  And really, the concatenation of an entity ID with its statement UUID  (with the expectation that a parser can understand the $ as a delimiter) is a rather questionable convention.  I guess I am not clear on why the MW API should constrain RDF serialization.  They are separate implementations.  Is there a convenient "round trip" import from RDF mechanism available in the API?  If not, who cares about what the MW API expects.

The basic problem is with the "claim" design.  It seems to me that Statement GUIDs are actually unnecessary overhead because the subject of a claim is always the item/entity.  There is really no need to mint a GUID subject for the claim.  If you needed to have a separate statement node, it may have been better to do something like this:

<> wikibase:hasClaim _:b1
_:b1 wdt:someprop "somevalue"

A bnode is always an object of a <> resource first.TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, EBjune, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-24 Thread Christopher
Christopher added a comment.
Statement IDs should definitely be represented as bnodes (internally) and skolem IRIs externally because they are uniquely defined within an entity node representation.  They have no meaning outside of the entity.

The typing semantics of Wikibase values are very obscure and entirely too complex for most normal reuse implementations of the data.  If values are intended to be "shared between items" by an external consumer, then they should be represented as another entity type, and optimally their URIs should be dereferenceable.  However, we know that this is not the case, so my personal "impression" of these things is already wrong.

Similarly confusing is the muddled reference implementation.  My use case simply needs the references to be presented in the context of the statement that gives the reference a meaning.  In my estimation, a reference is just a statement about a statement in the context of a one item, so I do not see how or why a reference can be "shared between items".  Note that if the reference statement itself was semantically equal to another in the same item, it should therefore simply be a bnode!TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Smalyshev, Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher edited the task description. (Show Details)
EDIT DETAILS...{F5323364}75}

[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher edited the task description. (Show Details)
EDIT DETAILS...{F5323350}64}

[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher edited the task description. (Show Details)
EDIT DETAILS...to produce the intended output attached.

 {F5323223} 350}

[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]
[[ https://tools.ietf.org/html/rfc5785 | (RFC5785]]TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher edited the task description. (Show Details)
EDIT DETAILS...to produce the intended output attached.

 {F5323223} 

[[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]...TASK DETAILhttps://phabricator.wikimedia.org/T155891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T155891: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher created this task.Christopher added projects: Wikidata-Query-Service, Wikibase-DataModel-Serialization.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery.
TASK DESCRIPTIONNote: this relates more to my localized use of Wikibase RDF serialization than to the Wikidata Query Service directly, though it may also be relevant to the WDQS.

It is my opinion that the RDF representation of statement and reference URIs should conform to a W3C standard (RFC5785) so that other libraries (like JSON-LD), for example, can recognize them as Skolem IRIs (or uniquely minted identifiers).

One possible scenario is that a JSON-LD consumer may want to frame an entity, and this would require it to make the statements and references into bnodes so that their values can be formatted as sets or lists.  Having these types of URIs in the .well-known namespace simplifies the parsing task.

This seems relatively trivial to do.  I have already made the experimental changes in my instance that touches these files: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/RdfVocabulary.php
and 
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/FullStatementRdfBuilder.php

to produce the intended output attached. F5323223: well-knownUUID-statements and references.n-triples

[[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]
(RFC5785TASK DETAILhttps://phabricator.wikimedia.org/T155891WORKBOARDhttps://phabricator.wikimedia.org/project/board/891/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Aklapper, Christopher, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T155890: Represent Statement and Reference URIs as Skolem IRIs consistent with RFC5785

2017-01-21 Thread Christopher
Christopher created this task.Christopher added projects: Wikidata-Query-Service, Wikibase-DataModel-Serialization.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery.
TASK DESCRIPTIONNote: this relates more to my localized use of Wikibase RDF serialization than to the Wikidata Query Service directly, though it may also be relevant to the WDQS.

It is my opinion that the RDF representation of statement and reference URIs should conform to a W3C standard (RFC5785) so that other libraries (like JSON-LD), for example, can recognize them as Skolem IRIs (or uniquely minted identifiers).

One possible scenario is that a JSON-LD consumer may want to frame an entity, and this would require it to make the statements and references into bnodes so that their values can be formatted as sets or lists.  Having these types of URIs in the .well-known namespace simplifies the parsing task.

This seems relatively trivial to do.  I have already made the experimental changes in my instance that touches these files: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/RdfVocabulary.php
and 
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes/Rdf/FullStatementRdfBuilder.php

to produce the intended output attached. F5323223: well-knownUUID-statements and references.n-triples

[[[ https://www.w3.org/2011/rdf-wg/wiki/Skolemisation | Skolemization ]]
(RFC5785TASK DETAILhttps://phabricator.wikimedia.org/T155890WORKBOARDhttps://phabricator.wikimedia.org/project/board/891/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristopherCc: Christopher, Aklapper, EBjune, mschwarzer, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T131960: "_" character encoded as %20 in Wikidata URI RDF serialization

2016-04-06 Thread Christopher
Christopher created this task.
Christopher moved this task to Need investigation on the Wikidata-Query-Service 
workboard.
Herald added a subscriber: Aklapper.
Herald added projects: Wikidata, Discovery.

TASK DESCRIPTION
  Wikipedia and Commons URIs do not match their RDF representation in Wikidata 
if there is an underscore.
  
  For example. even though the rewrite rules of Wikipedia translate spaces to 
the underscore form of the URI, the canonical URI for a Wikipedia article has 
the underscore.  This underscore form of the URI is what should be represented 
in the RDF.
  
  dbpedia uses the foaf:isPrimaryTopicOf property for Wikipedia sitelink and 
their URI form contains the underscores.  This essentially breaks federation 
between dbpedia resources and Wikidata entities using the sitelink as the 
primary key (if the Wikidata sitelink has a %20).
  
  Examples:
  A query (http://tinyurl.com/gntg9wx) for a sitelink for the entity Q3032 with 
the article URI https://de.wikipedia.org/wiki/Darwin_Harbour
  
  returns:
  https://de.wikipedia.org/wiki/Darwin%20Harbour

TASK DETAIL
  https://phabricator.wikimedia.org/T131960

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T131235: wikibase:GlobecoordinateValue decimal representation not in lexical form in WDQS.

2016-04-03 Thread Christopher
Christopher added a comment.


  The PRETTY_PRINT setting of the TurtleWriter is set to "true" by default.  
This causes the writer to only write the literal "label" without the datatype.  
This affects boolean, decimal, integer and double literals.
  
  To fix make the following change (starting at line 623) in Munge.java:
  
final RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, lastWriter);
final WriterConfig config = writer.getWriterConfig();
config.set(BasicWriterSettings.PRETTY_PRINT, false);
handler = new PrefixRecordingRdfHandler(writer, prefixes);
  
  Other default config settings are:
  
config.set(BasicWriterSettings.RDF_LANGSTRING_TO_LANG_LITERAL, true);
config.set(BasicWriterSettings.XSD_STRING_TO_PLAIN_LITERAL, true);

TASK DETAIL
  https://phabricator.wikimedia.org/T131235

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T131235: wikibase:GlobecoordinateValue decimal representation not in lexical form in WDQS.

2016-03-30 Thread Christopher
Christopher created this task.
Herald added a subscriber: Aklapper.
Herald added projects: Wikidata, Discovery.

TASK DESCRIPTION
  It seems that using shorthand rather than a lexical form for decimal 
coordinates breaks (xsd schema) validation of the munged/split wikibase turtle 
dumps.  Example:
  
wdv:d0a7604c8ae9777857887ac4f1807286 a wikibase:GlobecoordinateValue ;
wikibase:geoLatitude 30.12684 ;
wikibase:geoLongitude 120.25657 ;
a wikibase:GeoAutoPrecision ;
wikibase:geoPrecision 0.00028 ;
wikibase:geoGlobe wd:Q2 .
  
  This is a problem for loading this data into Virtuoso, and possibly other 
triple stores.  The geodata decimals are serialized in lexical form if 
requested directly from wikibase, however.

TASK DETAIL
  https://phabricator.wikimedia.org/T131235

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle

2016-03-28 Thread Christopher
Christopher added a comment.


  I have worked around the counting problem.  The experimental TPF Server is 
here:
  http://orbeon-bb.wmflabs.org/
  
  This wikidata datasource uses SPARQL interface at 
http://query.wikidata.org/sparql
  
  I think that this issue can be closed.

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle

2016-03-24 Thread Christopher
Christopher added a comment.


  it seems that with a CONSTRUCT query, sending an Accept: text/turtle works.
  
  http://wdm-rdf.wmflabs.org/short/NyJpTCnpl
  
  this is actually all that is required to get a linked data fragment from the 
SPARQL interface.
  
  The problem with TPF access to the WDQS SPARQL interface is that the very 
simple query (required for the metadataCallback)
  
SELECT (COUNT(*) AS ?count) WHERE {?s ?p ?o}
  
  cannot return turtle.  Also, there seems to be a minor difference with the 
COUNT implementation in OpenVirtuoso, that allows a count to be unbound like:
  
SELECT COUNT(*) WHERE {?s ?p ?o}

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T130799: provide sparql results as text/turtle

2016-03-23 Thread Christopher
Christopher added a comment.


  the node.js version of the TPF server is actually why I created this issue.
  
  My concept of the fragment server was that it could decentralize a big 
dataset by distributing data fragments to it with selectors, 
<http://www.hydra-cg.com/spec/latest/linked-data-fragments/#selectors> so I am 
not exactly sure how implementing a Blazegraph TPFServer is in the overall WDQS 
scope.  It could help perhaps for decentralizing your data internally, or for 
caching named graphs, though it would probably not by itself facilitate using 
WDQS as a SPARQL datasource for external TPF implementations.

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Reopened] T130799: provide sparql results as text/turtle

2016-03-23 Thread Christopher
Christopher reopened this task as "Open".

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T130799: provide sparql results as text/turtle

2016-03-23 Thread Christopher
Christopher closed this task as "Invalid".

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T130799: provide sparql results as text/turtle

2016-03-23 Thread Christopher
Christopher created this task.
Christopher moved this task to Blazegraph on the Wikidata-Query-Service 
workboard.
Herald added a subscriber: Aklapper.
Herald added projects: Wikidata, Discovery.

TASK DESCRIPTION
  openvirtuoso (dbpedia) can do this.
  
  there is not a maven artifact similar to sesame-queryresultio-sparqlxml for 
turtle so this seems not possible without developing a new package for sesame.  
 there is, however, org.openrdf.rio.turtle.TurtleWriter.
  
  this would be useful to have for generating contextual subsets (fragments) of 
"loadable data" for consumers.

TASK DETAIL
  https://phabricator.wikimedia.org/T130799

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Aklapper, Christopher, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-21 Thread Christopher
Christopher added a comment.


  Coincidentally, it seems that there are people who know a lot more about this 
than I do that have debated this issue at length in a long and very informative 
thread:
  CRS specification (was: Re: ISA Core Location Vocabulary) 
<https://lists.w3.org/Archives/Public/public-locadd/2014Jan/.html>
  
  It is clearly more involved than just using "proper" software libraries and 
"handling requirements".  The conflicting point that I see from your side is 
that introducing unneeded complexity is bad.  And, also, that this was the best 
practical alternative available for handling "garbage data".  Sure, I agree 
with that, but oversimplification of a problem is worse.  I personally feel 
that the "grunt approach" of using regex in SPARQL to filter URIs from literals 
in a result set is not clean, and also quite costly.
  
  The alternative that introduces subproperties for geometry values is 
definitely more complex and as indicated in the thread:
  
You need OWL 2 to formally define a complex class and then say that 
geometry consists of exactly two parts, one that contains the coordinate
sequence and another on that contain the CRS. You cannot make such 
statements using RDFS or OWL.
  
  To assert that everything that needs to be said about a geometry value can be 
put into a standard RDFS string literal is obviously not true.  The irregular 
form of geo:wktLiteral is a kind of  "convenience method" that seems to work 
for most use cases, but definitely not for all, and I really doubt that it is 
functionally sustainable for complex geodata.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: daniel, Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-21 Thread Christopher
Christopher added a comment.


  @Smalyshev so, by stating that geometry and CRS are different, you then 
concur with the main arguments referenced above that they should not be 
conflated in a simple literal.  @Daniel I agree with the idea of specifying the 
CRS as an additional component of the GlobeCoordinate data value separately 
from the geometry.
  
  I do not agree that a Wikidata entity can be inferred to be a CRS without it 
providing or pointing to a serialization that can be validated against a known 
CRS encoding (z.B. gml:GeodeticCRS).  Stating that a CRS is an instance of a 
"geodetic reference system" is only a concept pointer, and does not provide the 
syntax of a CRS schema which is necessary for a software to understand the 
meaning of a geometry.
  
  In summary, these are the reasons why a CRS should not be represented as a 
URI in a simple WKT literal string (that contains point geometry).
  
  1. geometry and its CRS are just two separate things
  2. it becomes much harder to use the CRS as a filter in a SPARQL query
  3. it is not possible to assign multiple CRS specifications to a geometry
  4. the domain of a CRS specification should not be limited to a single 
geometry
  5. The CRS is a URI, so it should be published as one
  6. It is not possible to assign a CRS to a collection of geometries (e.g. a 
dataset)
  7. Software libraries that handle WKT geometry do not expect a CRS as start 
of the string
  
  The current use of simple geo:wktLiteral for WGS84 points is fine, but if a 
Wikidata goal is to introduce more complex GIS spatial data (which I think 
would be very worthwhile), then the implementation should adhere to justifiable 
and reasonable standards for the data representation.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: daniel, Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-21 Thread Christopher
Christopher added a comment.


  Please see geoSPARQL CRS design is debatable 
<https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems#GeoSPARQL> 
from the W3C Coordinate Reference System website.
  
  Also, #7 here: the conflation of CRS with with the WKT in a  literal has many 
undesirable effects 
<https://lists.w3.org/Archives/Public/public-locadd/2013Dec/0052.html>
  
  From the 12-063r5 document:
  
The WKT representation of coordinate reference systems as defined in ISO 
19125-1:2004 and OGC specification 01-009 is inconsistent with the terminology 
and 
technical provisions of ISO 19111:2007 and OGC Abstract Specification topic 
2 (08-015r2), “Geographic information – Spatial referencing by coordinates”.
  
  Is this clear?  They are admitting that the previous design form of WKT  is 
**inconsistent** with other specifications.   While it says nothing directly 
about the geoSPARQL specification, from page 4 of that 2012 design spec, WKT is 
defined through direct reference to the //deprecated standard//:
  
"as it is specified in Well Known Text (as defined by Simple Features or 
ISO 19125)" 
  
  What is outlined in this new specification is a de facto "WKT string form".  
And this form should accommodate all of the semantics of the geo:wktLiteral 
string format, including geometries (that are not explicitly mentioned in the 
new specification).   Admittedly, this is quite a difficult thing to sort out, 
and there is definitely politics and big money at work in the standards 
process. If you want something more concrete to build on, maybe I ask them for 
a concrete geoSPARQL "best practice guideline".
  
  Finally, I do not feel that it is accurate to use a Wikidata entity (e.g. 
Mars) as a CRS by the design definition of geo:wktLiteral and then not properly 
specify it.  It is probably better to just omit it and use a different data 
type for non-earth coordinates than to provide an entity concept URI as a CRS 
identifier.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-20 Thread Christopher
Christopher added a comment.


  @Smalyshev have you tried to read the updated WKT CRS specification 
http://docs.opengeospatial.org/is/12-063r5/12-063r5.html yet?  From what I can 
interpret, they have now deprecated the 2012 "non-ISO compliant" concatenation 
of a URI form of CRS and geometry.
  
  Instead, the CRS string semantics are specified in section "WKT string form" 
which takes the format in KEYWORD1[attribute1,KEYWORD2[attribute2,attribute3]]. 
 Note the use of an UPPERCASE keyword.
  
  So, a WDQS non-earth coordinate may be specified like:  "IMAGECRS["crs 
name"], POINT[18.4 226]"^^ogc:wktLiteral.  Also, this variation of the 
wktLiteral datatype could be in a different namespace from the geo:wktLiteral 
and could then be easily filtered by the map parser.
  
  I suggest possibly considering this to fix 
https://phabricator.wikimedia.org/T130428 and other globe variant issues.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-tech] Wikidata-tech Digest, Vol 35, Issue 6

2016-03-19 Thread Christopher Johnson
If the page redirect titles exist in Wikipedia, they are valid in Wikidata
as data, regardless of what they represent in *your view* of "quality".  If
cleanup needs to be done, it should be done in the context of the source
first.  Evaluating the value of a specific "alias" to a Wikidata item is a
judgment that should be based entirely on a *referenceable* data source.

Wikidata aliases (as well as descriptions and preferred labels) are
completely arbitrary and unreferenced, and in my judgment worthless,
without a primary source or clearly defined semantic relationship. The
judgmental curation of Wikidata is in fact, not that useful.  Wikidata
should simply seek to represent data *as it exists* (errors or not) in the
primary source.

Furthermore, apparently you do not get why skos:hiddenLabel exists.  Why
you feel that it is not worthwhile is not relevant to its primary function,
which is to facilitate searching. (see
https://www.w3.org/2012/09/odrl/semantic/draft/doco/skos_hiddenLabel.html)

 And, it is not difficult to argue that the searching in Wikidata could use
improvement.

On 16 March 2016 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org>
wrote:

> Send Wikidata-tech mailing list submissions to
> wikidata-tech@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
> or, via email, send a message with subject or body 'help' to
> wikidata-tech-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-tech-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata-tech digest..."
>
>
> Today's Topics:
>
>1. Re: Wikipedia Page Redirect Titles in Wikidata (Lydia Pintscher)
>
>
> --
>
> Message: 1
> Date: Tue, 15 Mar 2016 16:49:40 +
> From: Lydia Pintscher <lydia.pintsc...@wikimedia.de>
> To: wikidata-tech@lists.wikimedia.org
> Subject: Re: [Wikidata-tech] Wikipedia Page Redirect Titles in
> Wikidata
> Message-ID:
> <
> cabfqugjj3hadoaa+oi6wkot9zr6hbcnq9w40ztxtxz9-+vh...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sat, Mar 12, 2016 at 2:14 PM Christopher Johnson <
> christopher.john...@wikimedia.de> wrote:
>
> > Hi,
> >
> > I am developing a scientific terms thesaurus and have discovered that
> > existing Wikipedia "page redirect titles" provide a useful way to resolve
> > an odd or archaic form to a "canonical" term label as it is represented
> by
> > the Wikipedia page title (aka Wikidata "sitelink").  For example,
> >
> >
> https://en.wikipedia.org//w/api.php?action=query=xml=redirects=universe
> >
> > In Wikidata, these "page redirect titles" are not represented in the data
> > model except very inconsistently and sparsely as skos:altLabel or
> > ("alias").  My use case is that I would like to be able to query Wikidata
> > for these page redirect titles in order to resolve odd multi-linguistic
> > names to an single concept.
> >
> > My question is that if I were to create a bot that imported all "page
> > redirect titles" for a given sitelink and created them with the
> > skos:altLabel property en masse, is this a valid semantic relationship?
> > Or, should it rather be represented as ?sitelink owl:sameAs  redirect
> > URI>?  Or both?
> >
> > Furthermore,, in some cases (z.B. mis-spellings), skos:hiddenLabel may be
> > more appropriate, but this has no definition in the data model.  There
> > potentially would be a lot of clutter in the UI without a hiddelLabel
> alias
> > property.  Also, there are no types for page redirects in Wikipedia,
> afaik.
> >
> > Additional value for the searching in the WIkidata UI could probably be
> > obtained from indexing these alternate page titles as well.
> >
>
> There are several points to address:
> 1) Should redirects from Wikipedia be imported as aliases on Wikidata? No.
> This has been done before and created a massive amount of cleanup work
> because the redirects contained a lot of not meaningful misspellings and
> more. Please do not import them to Wikidata without approval through the
> bot approval process and clear quality control.
> 2) Should we allow more fine-grained distinction between real aliases and
> misspellings in the UI and datamodel? No. I don't believe this is worth the
> complexity and resulting discussions/edit wars and more.
>
>
> Cheers
> Lyd

[Wikidata-tech] Wikipedia Page Redirect Titles in Wikidata

2016-03-12 Thread Christopher Johnson
Hi,

I am developing a scientific terms thesaurus and have discovered that
existing Wikipedia "page redirect titles" provide a useful way to resolve
an odd or archaic form to a "canonical" term label as it is represented by
the Wikipedia page title (aka Wikidata "sitelink").  For example,
https://en.wikipedia.org//w/api.php?action=query=xml=redirects=universe

In Wikidata, these "page redirect titles" are not represented in the data
model except very inconsistently and sparsely as skos:altLabel or
("alias").  My use case is that I would like to be able to query Wikidata
for these page redirect titles in order to resolve odd multi-linguistic
names to an single concept.

My question is that if I were to create a bot that imported all "page
redirect titles" for a given sitelink and created them with the
skos:altLabel property en masse, is this a valid semantic relationship?
Or, should it rather be represented as ?sitelink owl:sameAs ?  Or both?

Furthermore,, in some cases (z.B. mis-spellings), skos:hiddenLabel may be
more appropriate, but this has no definition in the data model.  There
potentially would be a lot of clutter in the UI without a hiddelLabel alias
property.  Also, there are no types for page redirects in Wikipedia, afaik.

Additional value for the searching in the WIkidata UI could probably be
obtained from indexing these alternate page titles as well.

Regards,
Christopher Johnson
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-07 Thread Christopher
Christopher added a comment.


  Eh, 
http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral is 
an RDFS Datatype so the semantics are defined by the RDF schema, right?  But, I 
found this http://docs.opengeospatial.org/is/12-063r5/12-063r5.html that 
demonstrates that the WKS CRS extends far beyond RDF.   I suspect that the 
implementation of wktLiteral is bound to RDFS, regardless of the "rich 
semantics" of WKT.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-07 Thread Christopher
Christopher added a comment.


  Thanks for the clarification.  However, the Req 10 of the geoSPARQL 
specification seems to be at odds with the definition of a "literal value".  
(According to https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal).  
The way that I read this specification is that a literal is either a URI or a 
string, but not both.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-07 Thread Christopher
Christopher added a comment.


  Intentional or not., It is wrong.  Why is it necessary?  The problem is that 
it breaks parsing of geosparql literals.  For example, if I ask for instance of 
volcanoes, I have to make exceptions for weird non-Earth coordinates.

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T129072: wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates

2016-03-07 Thread Christopher
Christopher created this task.
Christopher moved this task to All WDQS-related tasks on the 
Wikidata-Query-Service workboard.
Herald added a subscriber: Aklapper.
Herald added a project: Discovery.

TASK DESCRIPTION
  See http://tinyurl.com/grkd7qw for an example query that returns the 
coordinates for Olympus Mons, a Martian volcano.
  
  Raw Result
  
{
  "head" : {
"vars" : [ "o" ]
  },
  "results" : {
"bindings" : [ {
  "o" : {
"datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral;,
"type" : "literal",
"value" : "<http://www.wikidata.org/entity/Q111> Point(18.4 226)"
  }
} ]
  }
}

TASK DETAIL
  https://phabricator.wikimedia.org/T129072

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries

2016-02-17 Thread Christopher
Christopher added a comment.

I may be wrong, but the headers that are returned from a request to the nginx 
server wdqs1002 say that varnish 1.1 is already being used there.  And, for 
whatever reason,** it misses**, because repeating the same query gives the same 
response time.  For example, this one returns in 25180>26966 ms.

  
http://query.wikidata.org/sparql?query=PREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0APREFIX+v%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0APREFIX+q%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fqualifier%2F%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0ASELECT+%3FcountryLabel+(COUNT(DISTINCT+%3Fchild)+AS+%3Fnumber)%0AWHERE+%7B%0A++%3Fchild+wdt%3AP106%2Fwdt%3AP279*+wd%3AQ855091+.++%0A++%3Fchild+wdt%3AP27+%3Fcountry+.%0A++SERVICE+wikibase%3Alabel+%7B%0Abd%3AserviceParam+wikibase%3Alanguage+%22en%22+.%0A%3Fcountry+rdfs%3Alabel+%3FcountryLabel%0A++%7D+%0A++%0A%7D+GROUP+BY+%3FcountryLabel+ORDER+BY+DESC(%3Fnumber)

Even though Varnish cache **should work** to proxy nginx for optimizing 
delivery of static query results, it lacks several important features of an 
object broker.  Namely, client control of object expiration (TTL) and retrieval 
of "named query results" from persistent storage.   A WDQS service use case may 
in fact be to compare results from several days ago with current results.   
Thus, assuming the latest results state is what the client wants my actually 
not be true.

Possibly, the optimal solution would use the varnish-api-engine 
(http://info.varnish-software.com/blog/introducing-varnish-api-engine) in 
conjunction with a WDQS REST API (provided with a modified RESTBase?).   Is the 
varnish-api-engine being used anywhere in WMF?  Also, delegating query requests 
to an API could allow POSTs.  Simply with Varnish cache, the POST problem would 
remain unresolved.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, 
Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, 
jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries

2016-02-16 Thread Christopher
Christopher added a comment.

I perceive the use of Varnish as not directly related to how an object broker 
could manage this use case (expensive querying of the wdqs nano sparql api), 
though it is probably related to any UI elements (i.e. the query editor or 
results renderer) that may generally be connected to the query service.

If a REST solution (like RESTBase) is used, a client request could either GET 
the results from cache with an ID or trigger a query event webhook that 
forwards (and stores) the response from the nanosparql server directly with 
callback.  The basic API design could be something like GET /query/:owner/:qid 
or /query/hooks/:owner/:qid, where the first case would just return the results 
from a db cache and the second would trigger a callback that returns (and 
stores) a payload from the nanosparql server.

A typical use case for this is a static query that returns dynamic results 
updated on a regular frequency (e.g. daily) from a single client.  The payload 
event handler for the sparql server callback could also be controlled based on 
client quota and retention policies.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, 
Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, 
jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-15 Thread Christopher
Christopher added a comment.

@smalyshev I completely agree with the concept of an intermediate service 
between the nanosparqlserver and the client.  I think that this service should 
"broker" requests (based on an options configuration object), and eval whether 
a query is re-executed against the BG db or the results could be returned from 
the "cache", i.e. an "offline" "response only" db.

I have been looking at Huginn https://github.com/cantino/huginn recently.  This 
is an application that delegates tasks to agents.   This (or similar app) may 
be suitable for MW extension usage just by using agents or webhooks instead of 
inline queries.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

question:  why is this task limited in scope to the Graph extension?


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Yurik, hoo, Aklapper, aude, Izno, Wikidata-bugs, Mbch331, 
Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations

2016-02-08 Thread Christopher
Christopher added a comment.

@smalyshev no, I think that this specific issue has been practically resolved.


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, 
StudiesWorld, debt, Gehel, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, 
Deskana, Manybubbles, JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T122848: Kill wdm.wmflabs.org

2016-01-18 Thread Christopher
Christopher added a comment.

done.


TASK DETAIL
  https://phabricator.wikimedia.org/T122848

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Abraham, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T122848: Kill wdm.wmflabs.org

2016-01-18 Thread Christopher
Christopher closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T122848

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Abraham, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T115996: [Task] Use package manager

2015-12-31 Thread Christopher
Christopher added a comment.

I have actively started working on this.  You can see the work here:  
https://github.com/christopher-johnson/wdqs-gui

Since using node requires a lot of refactoring and code style changes, I am 
interested in the developing the GUI as a separate dev branch or package.  And 
when or if it meets with general approval, then it can be merged into 
production.  I am using Gulp for the live build tasks and everything is 
installed with npm.

It also now runs completely independently of Blazegraph as a stand-alone app.


TASK DETAIL
  https://phabricator.wikimedia.org/T115996

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Smalyshev, JanZerebecki, StudiesWorld, Aklapper, Jonas, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T115996: [Task] Use package manager

2015-12-28 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

Question:  Why is not the GUI a completely independent project / repo / build 
/deployment from WDQS?

One reason to not have to do a full maven build for every GUI patch can be seen 
here: https://integration.wikimedia.org/ci/job/wikidata-query-rdf/777/console.  
The CI failed because of a network problem.

Using npm is a really good idea, but perhaps the first step is to just split 
the front end out from the main Blazegraph package.


TASK DETAIL
  https://phabricator.wikimedia.org/T115996

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Smalyshev, JanZerebecki, StudiesWorld, Aklapper, Jonas, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Obviously, a main aspect of the data presented in the todo stats is
"referenced statements".  (even though the chart labels there are wrong).
Whether or not this query maps directly to todo is actually not the key
issue.  Clearly, measuring data quality requires that the arity of
statement to reference relationships are quantified.  Right?

This assumption is based on Wikipedia's policy of maintaining a NPOV.  And,
unfortunately, all unreferenced statements contain a "bias" that makes the
data theoretically worthless, even though they may in fact be "correct".
On 8 Dec 2015 1:52 pm, "Addshore" <no-re...@phabricator.wikimedia.org>
wrote:

> Addshore added a comment.
>
> Okay, I'm struggling to see which part of the todo stats this is covering
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher,
> Aklapper, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Since P143 is primarily a "reference type" property, it should be used when
the reference node is the subject (with a few exceptions apparently). The
query only evaluates the arity of the reference nodes as objects.  So, the
results for P143 are expected.
On 8 Dec 2015 1:09 pm, "Addshore" <no-re...@phabricator.wikimedia.org>
wrote:

> Addshore added a comment.
>
> I am still confused, Running this for
> https://phabricator.wikimedia.org/P143 gives the following:
>
>   nrefs count
>   0 920
>   1 8
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper,
> Wikidata-bugs, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-04 Thread Christopher
Christopher added a comment.

@Addshore Some progress was made on this in 
https://phabricator.wikimedia.org/T120166.  The only "practical" way to get the 
statement and reference metrics is to facet the data by property.  It is just 
not possible to run counting queries against the whole database and get any 
reasonable response time.

This means that any large domain or range metric counts should iterate over all 
1800+ properties with separate SPARQL calls and then aggregate the numbers.  We 
can do this for the statement -> reference arity with:

  PREFIX wikibase: <http://wikiba.se/ontology#>
  PREFIX wd: <http://www.wikidata.org/entity/> 
  PREFIX wdt: <http://www.wikidata.org/prop/direct/>
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  prefix p: <http://www.wikidata.org/prop/>
  
  SELECT ?nrefs (COUNT(?wds) AS ?count) WHERE {
{
  SELECT ?wds (COUNT(DISTINCT(?ref)) AS ?nrefs)
  WHERE {
  ?item p:$property ?wds .
  OPTIONAL {?wds prov:wasDerivedFrom ?ref } .
  } GROUP BY ?wds
}
  } GROUP BY ?nrefs 
  ORDER BY ?nrefs

Would you do this in PHP?  If you want to handle this, just let me know, 
otherwise we could reuse the bulk sparql scripts that I have already done in R.

In addition to tracking aggregates, it would also be useful to show all 
property counts in a table like I did for here 
http://wdm.wmflabs.org/?t=wikidata_property_usage_count.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-04 Thread Christopher
Christopher added a comment.

I think that you may have missed the point.  I added the $property variable in 
the above query to indicate that this has to be run for **every** property.  
p:https://phabricator.wikimedia.org/P227 is a random example.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-03 Thread Christopher
Christopher added a comment.

So basically a clever adaptation as to what I suggested in 
https://phabricator.wikimedia.org/T119775 to get statements referenced to the 
Wikipedias.  It works, but seems a very hacky approach around the core problem 
of not having a way to ask how many references a statement has.

So, just so I am clear on this, a statement to reference triple is always 
unique in the dataset?  I was under the assumption that a singular reference 
statement could potentially be duplicated with different hashes, which is why 
distinct would need to be enforced on the subject.In theory, there should 
also be metadata on the reference that identifies it as "the latest" version, 
and previous revisions should not simply be replaced.  This is another issue, I 
guess.

Imho, there are clear problems with the reference implementation that should be 
addressed and not just worked around which is why I created 
https://phabricator.wikimedia.org/T120166 to start.   Is the objective here 
just to produce some numbers or to improve the quality of the data?


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T120166: Semantically define arity of statement -> reference relations

2015-12-03 Thread Christopher
Christopher added a comment.

Quick edit:  I ran this query successfully in 13min, 11sec, 476m returning 
312,068 results returning the arity of GND 
(https://phabricator.wikimedia.org/P227) property statements.  So it is 
possible, but really, really slow.

  prefix wikibase: <http://wikiba.se/ontology#>
  prefix wdt: <http://www.wikidata.org/prop/direct/>
  prefix prov: <http://www.w3.org/ns/prov#>
  prefix wd: <http://www.wikidata.org/entity/>
  prefix p: <http://www.wikidata.org/prop/>
  
  SELECT ?wds (count(distinct(?o)) AS ?ocount) WHERE {
?s p:P227 ?wds .
?wds a wikibase:Statement
OPTIONAL {
?wds prov:wasDerivedFrom ?o
} 
  } GROUP BY ?wds 


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations

2015-12-03 Thread Christopher
Christopher added a comment.

@Jheald Thank you for your suggestions.  What is fairly clear in my research is 
that counting type queries on large (or undefined ranges) with an unbound 
domain are just not possible (without huge resource consumption) when the 
namespace contains millions and millions of triples.  For example, the

  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (COUNT(DISTINCT ?stmt) AS ?count) WHERE {  
 ?stmt prov:wasDerivedFrom ?ref 
  }

will not work, even with no query timeout.  I have tried it on 
http://wdm-rdf.wmflabs.org and it uses all of the 8GB heap spaces and crashes 
Blazegraph.  Of course, there are ways to use SPARQL to post-process/filter 
manageable result sets (in memory) as you suggest, but this seems not possible 
for the 800M+ triples in wdq.

By introducing an "arity class property" (like "hasNullReference"), the 
evaluation on **all** data, can be achieved with minimal processing overhead 
because the query range is a boolean value and not a variable like "all 
references" .


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, StudiesWorld, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T120166: Semantically define arity of statement -> reference relations

2015-12-03 Thread Christopher
Christopher added a comment.

@Jheald Perfect.  This works, even with adding optional it runs in 10 seconds.  
Yea, definitely outputting the statements is unnecessary and adds a lot of time.

  Total results: 5, duration: 10445 ms
  nrefs count
  0 39775
  1 339700
  2 10050
  3 382
  4 14

Conclusion:  Faceting the namespace by property (and avoiding unnecessary 
output processing) is a practical way to get this data.  Thanks again.


TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Jheald, daniel, Lydia_Pintscher, Aklapper, Christopher, 
StudiesWorld, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T120166: Semantically define arity of statement -> reference relations

2015-12-02 Thread Christopher
Christopher created this task.
Christopher added a subscriber: Christopher.
Christopher added projects: Wikidata, Wikidata-Query-Service, 
Wikibase-DataModel.
Herald added subscribers: StudiesWorld, Aklapper.
Herald added a project: Discovery.

TASK DESCRIPTION
  This is data model and RDF serialization problem.  
  
  The primary use case is for measuring and evaluating "unreferenced 
statements", a nullary relationship that dominates the data set.  (See T117234)
  
  Since there are no attributes/properties in the data model/ontology to 
represent the arity of statement to reference relationships, querying for this 
property is not currently possible with SPARQL.  
  
  See http://www.w3.org/TR/swbp-n-aryRelations/ for recommendations on 
implementation.

TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-02 Thread Christopher
Christopher added a blocking task: T120166: Semantically define arity of 
statement -> reference relations.

TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T120166: Semantically define arity of statement -> reference relations

2015-12-02 Thread Christopher
Christopher added a blocked task: T117234: Reproduce wikidata-todo/stats data 
using analytics infrastructure  .

TASK DETAIL
  https://phabricator.wikimedia.org/T120166

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Aklapper, Christopher, StudiesWorld, jkroll, Smalyshev, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, JeroenDeDauw, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-02 Thread Christopher
Christopher added a comment.

The only way to get a count of statements with references in the current 
model/format is like this:

  PREFIX wd: <http://www.wikidata.org/entity/>
  PREFIX wdt: <http://www.wikidata.org/prop/direct/>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (count(distinct(?s)) AS ?scount) WHERE {
?s prov:wasDerivedFrom ?wdref .  
  }  

This query is super slow!  In fact, it has crashed Blazegraph because on an 
unlimited query timeout, it uses all of the 8GB allocated heap space.

Since a single statement can have multiple references, just counting 
prov:wasDerivedFrom using estimated cardinality only returns a count of all 
references.

I asked the experts in the mailing list how we can address this reference query 
problem, and no one has responded with anything useful yet.   This is an issue 
that could be handled in the Wikibase RDF serialization with any number of 
different solutions.  In addition to the idea of introducing a null reference 
object, another possibility would be to create a new attribute like 
wikibase:hasReference with a boolean datatype constraint.  I will create a new 
ticket for this issue I guess.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Project] [Updated] Wikidata-Query-Service

2015-11-30 Thread Christopher
Christopher added a member: Christopher.

PROJECT DETAIL
  https://phabricator.wikimedia.org/project/profile/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher, Gage, ksmith, Jdouglas, DanielFriesen, hoo, Addshore, Tpt, 
JeroenDeDauw, Joe, Eloquence, aude, Tobi_WMDE_SW, Wikidata-bugs, daniel, 
MaxSem, jkroll, JanZerebecki, Smalyshev, Manybubbles, GWicke



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5

2015-11-29 Thread Christopher Johnson
The statement to reference relation problem also relates to the topic of
Metadata Reification which from what I can gather, not really addressed in
the current WDQS RDF approach.

In Blazegraph, this could be supported by Quads or RDR (Reification Done
Right).
See http://arxiv.org/pdf/1406.3399.pdf ,
https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right

One possible approach using triples for the use case could be to assign a
blank node to a reference placeholder and introduce the valid range class
for prov:wasDerivedFrom (prov:entity) with the canonical reference UUID
like this:

wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom _:refhash .

_:refhash
a prov:entity, wikibase:Reference, wdref:referenceUUID ;
pr:P7 "Some data" ;
pr:P8 "1976-01-12T00:00:00Z"^^xsd:dateTime ;
prv:P8 wdv:b74072c03a5ced412a336ff213d69ef1 .

Introducing a owl:minCardinality on prov:wasDerivedFrom would mean that if
there were no refhash for a statement than a null object (similar to wdno)
would identify "unreferenced statements" like this:

wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom
wikibase:nullRef .

There are a lot ways to deal with this issue, I guess.  But, it seems to me
that having a simple programmatic method to validate statement integrity
(as supported or unsupported claims) is very important to substantiating
the utility of Wikidata for the academic community.


On 28 November 2015 at 11:20, Christopher Johnson <
christopher.john...@wikimedia.de> wrote:

> Thank you for the explanation.  The content negotion for an Item IRI is
> clear.  Any request for  http://www.wikidata.org/entity/Q... requires an
> Accept application/rdf+xml header in order to get the RDF.  The default
> response is JSON and Accept text/html returns a 200 response delivering the
> UI page.
>
> For statement resolution in the Item RDF, is not this a fragment?  So in
> the Item context, the IRI for a statement resource would be
> http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the
> statement IRI http://www.wikidata.org/entity/statement/Statement_UUID
> could just return the statement as a separate entity.
>
> On the topic of references, a use case is to measure data quality by
> counting the number of "unreferenced statements".  At
> https://phabricator.wikimedia.org/T117234#1834728, I propose the
> possibility of using blank reference nodes to identify these "bad"
> statements.  Having an object to count greatly expedites the query process
> because of the estimated cardinality feature of Blazegraph.  The only
> alternative to this is to count distinct statements with the
> prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may
> not be possible without a huge amount of memory).
>
> I do not know what would be involved in implementing blank reference nodes
> and what performance consequences may also occur. It seems to me that the
> pairing of statements and references is a core feature of the data model,
> and it is odd that there can exist statements that have no associated
> reference node in the RDF.
>
> Cheers,
> Christopher
>
> On 27 November 2015 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org>
> wrote:
>
>> Send Wikidata-tech mailing list submissions to
>> wikidata-tech@lists.wikimedia.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>> or, via email, send a message with subject or body 'help' to
>> wikidata-tech-requ...@lists.wikimedia.org
>>
>> You can reach the person managing the list at
>> wikidata-tech-ow...@lists.wikimedia.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Wikidata-tech digest..."
>>
>>
>> Today's Topics:
>>
>>1. RDF Item, Statement and Reference IRI Resolution?
>>   (Christopher Johnson)
>>2. Re: RDF Item, Statement and Reference IRI Resolution?
>>   (Markus Krötzsch)
>>
>>
>> --
>>
>> Message: 1
>> Date: Fri, 27 Nov 2015 07:21:10 +0100
>> From: Christopher Johnson <christopher.john...@wikimedia.de>
>> To: wikidata-tech@lists.wikimedia.org,  wikimedia-de-tech
>> <wikimedia-de-t...@wikimedia.de>
>> Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI
>> Resolution?
>> Message-ID:
>> <CACzuuKvGK1dM1+dn4ypocjhO=
>> psuk4lltwngzp1yfvp6wmv...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> 

[Wikidata-bugs] [Maniphest] [Commented On] T119775: Create WDQS service for snak / reference hashes

2015-11-28 Thread Christopher
Christopher added a comment.

You can get reference hashes for objects using the 
http://www.wikidata.org/prop/reference/ predicate.

For example,

  PREFIX wd: <http://www.wikidata.org/entity/>
  PREFIX wdt: <http://www.wikidata.org/prop/direct/>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (count(distinct(?s)) AS ?scount) WHERE {
?wds wdt:P31 wd:Q10876391 . 
?wdref <http://www.wikidata.org/prop/reference/P143> ?wds . 
?s prov:wasDerivedFrom ?wdref .
  }

This returns a count of 16,266,065 references to all the Wikipedias (from 
http://wdm-rdf.wmflabs.org)


TASK DETAIL
  https://phabricator.wikimedia.org/T119775

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Smalyshev, Aklapper, Addshore, StudiesWorld, jkroll, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-tech] Wikidata-tech Digest, Vol 31, Issue 5

2015-11-28 Thread Christopher Johnson
Thank you for the explanation.  The content negotion for an Item IRI is
clear.  Any request for  http://www.wikidata.org/entity/Q... requires an
Accept application/rdf+xml header in order to get the RDF.  The default
response is JSON and Accept text/html returns a 200 response delivering the
UI page.

For statement resolution in the Item RDF, is not this a fragment?  So in
the Item context, the IRI for a statement resource would be
http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the
statement IRI http://www.wikidata.org/entity/statement/Statement_UUID could
just return the statement as a separate entity.

On the topic of references, a use case is to measure data quality by
counting the number of "unreferenced statements".  At
https://phabricator.wikimedia.org/T117234#1834728, I propose the
possibility of using blank reference nodes to identify these "bad"
statements.  Having an object to count greatly expedites the query process
because of the estimated cardinality feature of Blazegraph.  The only
alternative to this is to count distinct statements with the
prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may
not be possible without a huge amount of memory).

I do not know what would be involved in implementing blank reference nodes
and what performance consequences may also occur. It seems to me that the
pairing of statements and references is a core feature of the data model,
and it is odd that there can exist statements that have no associated
reference node in the RDF.

Cheers,
Christopher

On 27 November 2015 at 13:00, <wikidata-tech-requ...@lists.wikimedia.org>
wrote:

> Send Wikidata-tech mailing list submissions to
> wikidata-tech@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
> or, via email, send a message with subject or body 'help' to
> wikidata-tech-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-tech-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata-tech digest..."
>
>
> Today's Topics:
>
>1. RDF Item, Statement and Reference IRI Resolution?
>   (Christopher Johnson)
>2. Re: RDF Item, Statement and Reference IRI Resolution?
>   (Markus Krötzsch)
>
>
> ------
>
> Message: 1
> Date: Fri, 27 Nov 2015 07:21:10 +0100
> From: Christopher Johnson <christopher.john...@wikimedia.de>
> To: wikidata-tech@lists.wikimedia.org,  wikimedia-de-tech
> <wikimedia-de-t...@wikimedia.de>
> Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI
> Resolution?
> Message-ID:
> <CACzuuKvGK1dM1+dn4ypocjhO=
> psuk4lltwngzp1yfvp6wmv...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> After looking at the RDF format closely, I am asking if the item, statement
> and reference IRIs could/should be directly resolvable to XML/JSON
> formatted resources.
>
> It seems that currently http://www.wikidata.org/entity/ redirects to
> the UI at https://www.wikidata.org/wiki/ which is not what a machine
> reader
> would expect.
> Without a simple method to resolve the IRIs (perhaps a RESTful API?), these
> RDF data objects are opaque for parsers.
>
> Of course, with wbgetclaims, it is possible to get the statement like this:
>
> https://www.wikidata.org/w/api.php?action=wbgetclaims=xml=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>
> but the API expected GUID format does not match the RDF UUID representation
> (there is a $ or "%24" after the item instead of a -) and it returns both
> the statement and the references.
>
> Since the reference is its own node in the RDF,  it can be queried
> independently.  For example, to ask "return all of the statements where
> reference R is bound."  But then, the return value is a list of statement
> IDs and a subquery or separate query is then required to return the
> associated statement node.
>
> I am also wondering why item, statement and reference "UUIDs" are not in
> canonical format in the RDF.  This is a question of compliance with IETF
> guidelines, which may or may not be relevant.
>
> Item: Q20913766
> Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
> Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
>
> See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
> See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
> and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
> guidelines.
&g

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T119775: Create WDQS service for snak / reference hashes

2015-11-27 Thread Christopher
Christopher added a subscriber: Christopher.

TASK DETAIL
  https://phabricator.wikimedia.org/T119775

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Smalyshev, Aklapper, Addshore, StudiesWorld, jkroll, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-tech] RDF Item, Statement and Reference IRI Resolution?

2015-11-26 Thread Christopher Johnson
Hi,

After looking at the RDF format closely, I am asking if the item, statement
and reference IRIs could/should be directly resolvable to XML/JSON
formatted resources.

It seems that currently http://www.wikidata.org/entity/ redirects to
the UI at https://www.wikidata.org/wiki/ which is not what a machine reader
would expect.
Without a simple method to resolve the IRIs (perhaps a RESTful API?), these
RDF data objects are opaque for parsers.

Of course, with wbgetclaims, it is possible to get the statement like this:
https://www.wikidata.org/w/api.php?action=wbgetclaims=xml=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1

but the API expected GUID format does not match the RDF UUID representation
(there is a $ or "%24" after the item instead of a -) and it returns both
the statement and the references.

Since the reference is its own node in the RDF,  it can be queried
independently.  For example, to ask "return all of the statements where
reference R is bound."  But then, the return value is a list of statement
IDs and a subquery or separate query is then required to return the
associated statement node.

I am also wondering why item, statement and reference "UUIDs" are not in
canonical format in the RDF.  This is a question of compliance with IETF
guidelines, which may or may not be relevant.

Item: Q20913766
Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9

See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
guidelines.

Thanks for your feedback,
Christopher
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-26 Thread Christopher
Christopher added a comment.

I am blocked on this by several problems with the data model/ontology.  The 
question of the relationship of the data model and the RDF node definitions is 
a bit complicated, perhaps more so than it should be.  A reference is a special 
type of statement defined by its relationship to other statements.  An 
"unreferenced statement" is undefined in the ontology and in the RDF format.  
All statements **should** in practice have a reference node.  But this is not 
an enforceable constraint in the data model apparently.

I think that when a statement is born, it should also create a reference 
"placeholder" or blank node in the RDF.  With this information in the RDF, 
counting these "bad" statements would be much easier.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-21 Thread Christopher
Christopher added a comment.

Truthy statement counts per Item can be done like this:

  PREFIX wd: <http://www.wikidata.org/entity/>
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/prop/direct;)) 
  }  

Labels per Item like this:

  PREFIX wd: <http://www.wikidata.org/entity/>
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER (REGEX(STR(?p), "http://www.w3.org/2000/01/rdf-schema#label;)) 
  } 

Descriptions per Item:

  PREFIX wd: <http://www.wikidata.org/entity/>
  
  SELECT (count(distinct(?o)) AS ?ocount)   WHERE {
   wd:Q7239 ?p ?o
   FILTER (REGEX(STR(?p), "http://schema.org/description;)) 
  } 

Sitelinks per item:

  PREFIX wd: <http://www.wikidata.org/entity/>
  
  SELECT (count(distinct(?s)) AS ?ocount)   WHERE {
   ?s ?p wd:Q7239
   FILTER (REGEX(STR(?p), "http://schema.org/about;)) 
  } 


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

OK.  So the title "Referenced Statements by Statement Type" is just wrong then. 
 Rather, it shows **All Statements ** by Type"

| Date   | itemlink   | string | globecoordinate | time  | quantity 
| somevalue | novalue | Total  |
| 2015-10-19 | 46,177,560 | 20,631,391 | 2,363,191   | 3,588,295 | 470,476  
| 9,630 | 4,436   | 73,244,979 |


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

True, a statement is either referenced or "unreferenced".  Getting the number 
of referenced statements (currently 41,735,203) is easy and fast with:

  curl -G https://query.wikidata.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'p=<http://www.w3.org/ns/prov#wasDerivedFrom>'

So we use the total of wikibase:Statement objects to represent the total number 
of statements and subtract referenced statements to get "unreferenced 
statements".

What is still murky to me, and I think possibly wrong with the todo/stats data, 
is the "Referenced statements by statement type".  Something does not add up 
there because the total should not be greater than the sum of "Statements 
referenced to Wikipedia by statement type" and "Statements referenced to other 
sources by statement type" ?

For getting counts of objects per item, this means running 19M separate queries 
or is there another way?   Creating a script to do this would be very similar 
to the property distribution method that I have already done I guess.  
Basically ask "list all of the items" and then "lapply(items, count labels, 
statements, links, descriptions)"


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-20 Thread Christopher
Christopher added a comment.

OK.  I may have found an answer to the question of wildcard "Prefix Matching" 
that is necessary in order to query for number of statements in an item.

  PREFIX bds: <http://www.bigdata.com/rdf/search#>
  prefix wikibase: <http://wikiba.se/ontology#>
  
  SELECT (count(distinct(?s)) AS ?scount) 
  WHERE {
wd:Q20903715 ?p wikibase:Item
?s bds:search "wd:statement*" .
  }

This requires the FullTextSearch 
https://wiki.blazegraph.com/wiki/index.php/FullTextSearch to be enabled (it is 
not on query.wikidata.org).  I will test on labs.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-19 Thread Christopher
Christopher added a comment.

Yes.  It seems I need to disable the 10 minute query timeout set here first: 
https://github.com/wikimedia/wikidata-query-rdf/blob/b3e646284f0b74131bce99a1b7d5fc6bfe675ec1/war/src/config/web.xml#L55

A fat query like this:

  PREFIX wikibase: <http://wikiba.se/ontology#>
  PREFIX prov: <http://www.w3.org/ns/prov#>
  
  SELECT (count(distinct(?wds)) AS ?scount) WHERE {
 ?wds ?p wikibase:Statement .
 OPTIONAL {
   ?wds1 <http://www.w3.org/ns/prov#wasDerivedFrom> ?o .
   FILTER (?wds1 = ?wds) .
}
FILTER (!bound(?wds1)) .
  } 

to find out how many statements do not have references is currently not 
possible.

There may be a better way to ask for this, but the way that the data is coded 
does not really facilitate type joins.   An important point is that 
wikidata-todo/stats, and possibly the standing perception of the data, assumes 
an iterable hierarchy.  But RDF does not make hierarchy.  So an Item does not 
"contain" statements, and statements do not "contain" references.

The relationship between statements and references is difficult to query by 
type, because a binding triple looks like this:

  wd:statement/Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1 
prov:wasDerivedFrom   wdref:39f3ce979f9d84a0ebf09abe1702bf22326695e9

Note that simply counting the frequency of 
http://www.w3.org/ns/prov#wasDerivedFrom and comparing that to the frequency of 
wikibase:Statement would provide a kind of global ratio that is a fast and easy 
alternative to counting individual statements without references.

I am rebuilding wdm-rdf now with the new Munger and no query timeout.

Also, I will load the dump from 17 November, so that the updater has some 
chance to sync.  It had fallen back to 14 days old, and I doubt that it would 
ever have caught up.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117735: Track all Wikidata metrics currently gathered in Graphite rather than SQL and TSVs

2015-11-09 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

To expand on the use cases for a metrics storage backend here is appropriate.

I think that Wikidata content metrics favor long term retention (i.e. forever) 
because their purpose is to evaluate dynamics over both short and long period 
intervals.  Since content is always changing, recreation of a past state from 
live data is not possible.  The value of these historical measurement 
"snapshots" is therefore quite high.  These old data are never archived either 
and must be able to be retrieved without loading a dump or using some offline 
process.

In contrast, ops metrics are much more focused on the present and/or recent 
state.

Thus, two different use cases exist here.  If the proposal to use Graphite can 
substantiate a long term ( not decaying ) storage method, then it should work 
for both.  If not, then something else (like OpenTSDB/ HBase) should be 
implemented.


TASK DETAIL
  https://phabricator.wikimedia.org/T117735

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, Christopher
Cc: Christopher, Aklapper, StudiesWorld, Addshore, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

I am not sure why this is considered to be "a simple use case" since as 
mentioned in https://phabricator.wikimedia.org/T117735 there are at least two 
different requirements.  Content metrics require long term (non-decaying) 
storage, operational metrics do not.

Whisper (Graphite's database) is not robust and has a fixed size.  Even the 
documentation says it is not "disk space efficient".   Of course, if we assume 
that the need is only to record a small number of data points with a low 
resolution, none of this matters.

The added complexity of introducing backups and HDFS,, etc. to the Graphite 
proposition does not seem "simple".  Also, the puppet module would still need 
to be reconfigured/modified as @Addshore tried to do, for long term retention, 
but this does not solve the archiving problem.   There has to be a built in way 
to preserve and "snapshot" the database, or else it could be a real pain to 
restore.  And, in the interim period from snapshot to restoration all 
measurements would be lost, unless it were on a cluster.

As far as I know, Cassandra can also run on a single instance, it does not need 
a cluster.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

If not HBase, what about Cassandra?  This is already puppetized.  At least you 
will be using a storage solution that is designed for HDFS.


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117732: Create a Graphite instance in the Analytics cluster

2015-11-09 Thread Christopher
Christopher added a comment.

If you are going to use HDFS, why not just use HBase instead of Graphite?


TASK DETAIL
  https://phabricator.wikimedia.org/T117732

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Joe, Lydia_Pintscher, fgiunchedi, Christopher, JanZerebecki, Nuria, 
Ottomata, Aklapper, Addshore, StudiesWorld, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-09 Thread Christopher
Christopher added a comment.

No.  the blocking task code enables an option to not filter item, statement, 
value and reference rdf:types in the munger.  I decided not to wait for this, 
so that I could get started, but having it in master is very helpful going 
forward.

In order to have these types on live wdqs, would require a complete rebuild of 
their data, which takes a long time.  The wdm-rdf instance is a clone that 
includes these types, and should eventually synch up to production (hopefully 
in another 5 or 6 day ...  24 hours of edits takes approx. 12 hours to process).

It is possible to do estimated cardinality queries on live wdqs for the 
property usage counts and anything else other than these primary types, however.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-11-07 Thread Christopher
Christopher added a comment.

Update:  All data loaded into Blazegraph (it took over 24 hours).  Sync now 
running and up to 27 October.

Using Fast Range Counts returns counts of content objects instantly.

Examples:
curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 'o=http://wikiba.se/ontology#Item'
Number of Items: 18,733,307
 curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'o=http://wikiba.se/ontology#Statement'
Number of Statements: 74,709,111
curl -G http://wdm-rdf.wmflabs.org/bigdata/namespace/wdq/sparql 
--data-urlencode ESTCARD --data-urlencode 
'p=http://www.w3.org/ns/prov#wasDerivedFrom'
Number of Predicate wasDerivedFrom: 38,985,221

Trending these kinds of objects should show interesting usage frequency 
patterns.


TASK DETAIL
  https://phabricator.wikimedia.org/T117234

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Block] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher reopened blocking task T117194: Evaluate Spark on YARN as "Open".

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher added a comment.

Note: A new task will be created for measuring SPARQL performance for counting 
tasks in different environments.  This has some relationship to Hadoop and 
Spark potentially, but the first step is profile Blazegraph with complex 
counting queries and use this as a benchmark for improvement.


TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher added a comment.

Can we agree that Graphite is the way forward for the backend and close this 
task?


TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher closed blocking task T117194: Evaluate Spark on YARN as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher closed blocking task T117194: Evaluate Spark on YARN as "Declined".

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-11-05 Thread Christopher
Christopher closed blocking task T117195: Develop Wikidata (JSON or RDF) Dump 
Processing API for use with Spark as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard

2015-10-31 Thread Christopher
Christopher added a comment.

I have observed that the property data does not have a persistent frequency. 
(i.e some days there are no values reported).  It may be better to generate 
null values for properties regularly if they do not report usage.

There are two options with the aggregate table:

1. show all properties without latest value.
2. only show latest reported properties.

  I favor option 2.  To have a complete list of properties with option 2, 
though requires a consistent reported dataset including nulls.

This is the patchset for the change:
https://gerrit.wikimedia.org/r/#/c/250185/4


TASK DETAIL
  https://phabricator.wikimedia.org/T116009

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard

2015-10-31 Thread Christopher
Christopher added a comment.

See the change here: 
http://wdm.wmflabs.org/?t=wikidata_daily_getclaims_property_use


TASK DETAIL
  https://phabricator.wikimedia.org/T116009

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist

2015-10-30 Thread Christopher
Christopher added a comment.

This is why there is the config.R file.  The only path variable that needs to 
be changed is there.
See base_uri <- "/srv/dashboards/shiny-server/wdm/".  In windows this would be 
C:\whatever\whatever I guess.


TASK DETAIL
  https://phabricator.wikimedia.org/T116150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist

2015-10-30 Thread Christopher
Christopher closed this task as "Resolved".
Christopher set Security to None.

TASK DETAIL
  https://phabricator.wikimedia.org/T116150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist

2015-10-30 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

I cannot reproduce this now.  I assume that this is fixed.  The file is local 
and in the repo now.
https://github.com/wikimedia/wikidata-analytics-dashboard/blob/master/data/wikidata_eng_social_media.tsv


TASK DETAIL
  https://phabricator.wikimedia.org/T116150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Addshore, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116009: Add graph to getclaimsusage on dashboard

2015-10-30 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

What is the benefit of having all properties on one graph?  To me, the simplest 
approach is to pass a parameter with a single property id from ordered table 
link to a chart.  To analyse the trend of a single property over time seems 
valuable, and possible, but because of the wide range, I do not think that 
graphing all property values on one chart is.


TASK DETAIL
  https://phabricator.wikimedia.org/T116009

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T116150: Error : '/srv/dashboards/shiny-server/wdm/data/wikidata_eng_social_media.tsv' does not exist

2015-10-30 Thread Christopher
Christopher closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T116150

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: gerritbot, Christopher, Addshore, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T117206: Move KPI section up to dashboard

2015-10-30 Thread Christopher
Christopher added a subscriber: Christopher.
Christopher added a comment.

Does this mean that you would prefer the KPI tab on the dashboard sidebar to be 
first in the list?


TASK DETAIL
  https://phabricator.wikimedia.org/T117206

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Abraham, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-10-30 Thread Christopher
Christopher added a project: WMDE-Analytics-Engineering.
Christopher set Security to None.

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T116009: Add graph to getclaimsusage on dashboard

2015-10-30 Thread Christopher
Christopher moved this task to Doing on the WMDE-Analytics-Engineering 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T116009

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1585/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Aklapper, Addshore, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T113180: Create semantic definitions for Wikidata Metrics

2015-10-30 Thread Christopher
Christopher moved this task to Doing on the WMDE-Analytics-Engineering 
workboard.

TASK DETAIL
  https://phabricator.wikimedia.org/T113180

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1585/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: gerritbot, Christopher, Aklapper, JanZerebecki, Deskana, Ricordisamoa, 
EBernhardson, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T115242: Add Munger option to not filter uninteresting object type triples

2015-10-30 Thread Christopher
Christopher added a blocked task: T117234: Reproduce wikidata-todo data using 
analytics infrastructure  .

TASK DETAIL
  https://phabricator.wikimedia.org/T115242

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Christopher
Cc: JanZerebecki, Aklapper, Christopher, jkroll, Smalyshev, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T117203: [WD] External usage KPI

2015-10-30 Thread Christopher
Christopher added subscribers: Addshore, Christopher.
Christopher added a comment.

Do you mean this https://searchdata.wmflabs.org/external/ ?

This should be able to be retrieved on short interval from Graphite?  
@Addshore?  The KPI is defined with a "rolling 30 day window".  Is this a 
requirement?  A 30-day aggregate might be a super huge number...


TASK DETAIL
  https://phabricator.wikimedia.org/T117203

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Christopher, Addshore, Lydia_Pintscher, Abraham, Aklapper, Wikidata-bugs, 
aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-10-30 Thread Christopher
Christopher added blocking tasks: T117194: Evaluate Spark on YARN, T117195: 
Develop Wikidata (JSON or RDF) Dump Processing API for use with Spark.

TASK DETAIL
  https://phabricator.wikimedia.org/T116547

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Wikidata-bugs, Addshore, Christopher, JanZerebecki, Lydia_Pintscher, 
Aklapper, Ricordisamoa, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-10-26 Thread Christopher Johnson
It is possible that a Hadoop  architecture could provide the performance
and scalability needed for robust statistical analysis of the Wikidata RDF
datasets.

It is also possible that Jena may have better integration tools with Hadoop
that Blazegraph.

See https://jena.apache.org/documentation/hadoop/

I do not see a direct relationship however between T115242 and performance
other than that the reasoning behind filtering these "boring" objects is
based on the perceived negative performance impact of allowing them to be
queried from a publicly accessible endpoint.

The intent of T115242 is to provide these objects in a dataset to a
"nonpublic" query interface for metrics evaluation only.

The question that should be asked is whether Blazegraph and the WDQS
platform are robust enough for intense stat analysis and if not, why and
what can be done to improve them?
On 26 Oct 2015 10:00, "JanZerebecki" <no-re...@phabricator.wikimedia.org>
wrote:

> JanZerebecki added a comment.
>
> @Christopher can as he created https://phabricator.wikimedia.org/T115242.
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T116547
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: JanZerebecki
> Cc: Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper,
> Ricordisamoa, Wikidata-bugs, aude
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T115120: Wikidata Metrics

2015-10-20 Thread Christopher
Christopher added a project: WMDE-Analytics-Engineering.
Christopher set Security to None.

TASK DETAIL
  https://phabricator.wikimedia.org/T115120

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, 
TempleM, Aklapper, RP88, revi, Luke081515, JGirault, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Gryllida, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T113180: Create semantic definitions for Wikidata Metrics

2015-10-20 Thread Christopher
Christopher added a project: WMDE-Analytics-Engineering.
Christopher set Security to None.

TASK DETAIL
  https://phabricator.wikimedia.org/T113180

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: gerritbot, Christopher, Aklapper, JanZerebecki, Deskana, Ricordisamoa, 
EBernhardson, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T108404: [Story] create a Wikidata analytics dashboard

2015-10-20 Thread Christopher
Christopher added a project: WMDE-Analytics-Engineering.

TASK DETAIL
  https://phabricator.wikimedia.org/T108404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: gerritbot, Addshore, Lydia_Pintscher, EBernhardson, Ricordisamoa, Deskana, 
JanZerebecki, Aklapper, Wikidata-bugs, aude



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T115120: Wikidata Metrics

2015-10-18 Thread Christopher
Christopher added a comment.

@Andrew Is there something else that needs to be said/done in order to make 
this happen?

Currently, the development dashboard is running on the scrumbugz project 
(http://wdm.wmflabs.org/wdm/), so this will just be reallocated.  Additional 
note: If the RDF dumps are available on /public/dumps, access to this would be 
beneficial.


TASK DETAIL
  https://phabricator.wikimedia.org/T115120

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, 
TempleM, Aklapper, RP88, revi, Luke081515, jkroll, Wikidata-bugs, Jdouglas, 
aude, Deskana, Manybubbles, Gryllida, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T115242: Add Munger option to not filter uninteresting object type triples

2015-10-12 Thread Christopher
Christopher created this task.
Christopher assigned this task to Smalyshev.
Christopher added a subscriber: Christopher.
Christopher added projects: Wikidata-Query-Service, Wikidata.
Christopher moved this task to All WDQS-related tasks on the 
Wikidata-Query-Service workboard.
Herald added a subscriber: Aklapper.
Herald added a project: Discovery.

TASK DESCRIPTION
  Triples with object types wikibase:Item, wikibase:Statement, 
wikibase:Reference, and wikibase:Value are filtered by default by the Munger.  
  
  For certain use cases, like object counting and comparison, it is desirable 
to retain these.  Adding an option to not filter uninteresting, similar to not 
removeSiteLinks, should be an option.
  
  See T115120

TASK DETAIL
  https://phabricator.wikimedia.org/T115242

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Christopher
Cc: Aklapper, Christopher, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, 
Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T115120: Wikidata Metrics

2015-10-11 Thread Christopher
Christopher added a subscriber: Smalyshev.
Christopher added a comment.

After researching this, I have discovered that the Munger that processes the 
RDF dump removes several ontology types (wikibase:Item, wikibase:Statement, 
wikibase:Reference, and wikibase:Value) that are needed for object counting and 
comparison.

See here 
https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/rdf/Munger.java,
 lines 405, 466, 514, 556.

@Smalyshev Is it possible to add an option to keep them?  And approximately how 
much additional space/memory would these use?


TASK DETAIL
  https://phabricator.wikimedia.org/T115120

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Christopher, Andrew, yuvipanda, coren, scfc, Matthewrbowker, 
TempleM, Aklapper, RP88, Revi, Luke081515, jkroll, Wikidata-bugs, Jdouglas, 
aude, Deskana, Manybubbles, Gryllida, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


  1   2   >