Manuel added a comment.

  Some thoughts about the notebook:
  
  **Double checking**
  
  Triples should always be distinct, correct? But the number 15 Billion seems 
lower than I have read elsewhere.
  
  **Size calculations**
  
  The predicates look correct to me for this analysis.
  
    predicate_representation_dict = {
        "label": "<http://www.w3.org/2000/01/rdf-schema#label>",
        "description": "<http://schema.org/description>",
        "alias": "<http://www.w3.org/2004/02/skos/core#altLabel>" 
    }
  
  But for the other tasks (e.g. T342111 
<https://phabricator.wikimedia.org/T342111>) it will not be as easy as querying 
Q-Ids in subjects. Otherwise, we would underestimate the size of the subgraph 
in question. I can e.g. see that qualifiers and references follow a different 
pattern.
  
  I would suggest that we set up a short meeting with someone from the Wikidata 
team who can explain this table to us. In the meeting, you could also briefly 
explain the most relevant steps in this notebook so that they could provide a 
high-level code review.

TASK DETAIL
  https://phabricator.wikimedia.org/T337021

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, Manuel
Cc: Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to