Manuel added a comment.
Some thoughts about the notebook: **Double checking** Triples should always be distinct, correct? But the number 15 Billion seems lower than I have read elsewhere. **Size calculations** The predicates look correct to me for this analysis. predicate_representation_dict = { "label": "<http://www.w3.org/2000/01/rdf-schema#label>", "description": "<http://schema.org/description>", "alias": "<http://www.w3.org/2004/02/skos/core#altLabel>" } But for the other tasks (e.g. T342111 <https://phabricator.wikimedia.org/T342111>) it will not be as easy as querying Q-Ids in subjects. Otherwise, we would underestimate the size of the subgraph in question. I can e.g. see that qualifiers and references follow a different pattern. I would suggest that we set up a short meeting with someone from the Wikidata team who can explain this table to us. In the meeting, you could also briefly explain the most relevant steps in this notebook so that they could provide a high-level code review. TASK DETAIL https://phabricator.wikimedia.org/T337021 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, Manuel Cc: Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org