[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

dcausse Tue, 31 Oct 2023 10:49:52 -0700

dcausse added a comment.


  @EgonWillighagen thanks for the question!
  The set of triples that will be part of the split are the triples that we 
consider //owned// by the item, in other words these are the triples listed by 
Special:EntityData using the //dump// flavor, e.g. 
https://www.wikidata.org/wiki/Special:EntityData/Q59239844.ttl?flavor=dump.
  A scholarly article item will be part of the //scholarly item// subgraph if 
it matches this constraint: `?item wdt:P31 wd:Q5633421`.
  All its corresponding triples will also be part of the split, P53121 
<https://phabricator.wikimedia.org/P53121> is a relatively painful query that 
demonstrate what triples can be considered //owned// by an entity and thus 
moved alongside the scholarly article to the same subgraph.
  
  For instance in my query the BGP `?article wdt:P50 wd:Q1042470` matches a 
triple owned by the article and thus is queryable from the split.
  On the hand everything requiring access to the triples owned by the author 
`wd:Q1042470` is not queryable from the split and thus the BGP:
  
    ?article wdt:P50 ?author .
    ?author wdt:P213 "0000 0001 2124 7940"
  
  won't be possible and would require federation like:
  
    # all papers by ISNI 0000 0001 2124 7940 (Carlo Rovelli)
    SELECT ?article ?articleLabel {
      ?author wdt:P213 "0000 0001 2124 7940"
      SERVICE <https://query.wikidata.org/sparql> {
        # Querying the scholarly article split
        ?article wdt:P50 ?author .
        BIND(?articleLabel as ?articleLabel) .
        SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE],en". }
      }
    }
  
  The target endpoint being the main graph and the federated one being the 
scholarly article split.
  I suppose federation can be done the other way around with:
  
    # all papers by ISNI 0000 0001 2124 7940 (Carlo Rovelli)
    SELECT ?article ?articleLabel {
      SERVICE <https://query.wikidata.org/sparql> {
        # Querying the wikidata main graph split
        ?author wdt:P213 "0000 0001 2124 7940"
      }
      hint:Prior hint:runFirst true . # Tell blazegraph to first collect ?author
      ?article wdt:P50 ?author .
      SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE],en". }
    }
  
  Where the target endpoint is the scholarly split and the federated one the 
main wikidata graph.
  In the later example we already see that we have to help blazegraph by 
telling it what to run first (here collect the author information first).
  
  I agree that using the current wdqs endpoint federating itself can be error 
prone but it's in theory possible to use it if someone is interested in doing 
early experiments.

TASK DETAIL
  https://phabricator.wikimedia.org/T349911

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, Fnielsen, Daniel_Mietchen, EgonWillighagen, dcausse, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

Reply via email to