GoranSMilovanovic added a comment.
@agray > I can confirm that the numbers on the tables seem a bit off for some other properties. I've been looking at P1614 <https://phabricator.wikimedia.org/P1614> (History of Parliament), which is complete and fairly stable. It currently has 21428 IDs on 17942 items (there's a lot of items with two/three IDs) and hasn't had any big changes since I finished matching in mid-2018. - For P1614 <https://phabricator.wikimedia.org/P1614> (History of Parliament), our dashboard reports: `This identifier is used by 17950 WD items.` - As of the following: "It currently has 21428 IDs on 17942 items (there's a lot of items with two/three IDs)" - **the dashboard discards multiple uses of the same identifier on a same item**. The data are "binarized": in our datasets, a particular item either makes use of the identifier, or not. > For VIAF (P214) the dashboard reports 440 items, against a SPARQL total of 2807. - For VIAF P214, the dashboard reports 1380767 items (tab: Tables, the table to the right, search for this identifier); are comparing the same data? What dashboard functionality have you used to find 440 items for P214 VIAF, please? > For Hansard ID (P2015 <https://phabricator.wikimedia.org/P2015>), the dashboard has 110 and a SPARQL query has 2369. - For Hansard ID (P2015 <https://phabricator.wikimedia.org/P2015>), the dashboard reports 14467 items, and overlap of 355 items with P214 VIAF ID. Are we looking at the same dashboard? :) > Most dramatically, for the Oxford DNB (P1415 <https://phabricator.wikimedia.org/P1415>), the dashboard has two items and SPARQL has 3171. - For the Oxford DNB (P1415 <https://phabricator.wikimedia.org/P1415>) the dashboard reports that 61143 items make use of it. Can you share your SPARQL queries, because I think we are discussing different datasets here. > Looking at P1415 <https://phabricator.wikimedia.org/P1415> specifically, since it's the weirdest one there, the "overlap data" for that property is even lower - the most frequent item is VIAF, but only 61 matches. In reality, this should be ~40,000 matches out of ~61,000 items. Perhaps some specific properties have worse data than others, for some reason? - I am inspecting the issue right now. The tests are difficult, they take time, but in the end we will have the correct overlap data for all identifiers. **Again:** we eliminate multiple uses of the same identifier on a same item in this dashboard, the only data that we are looking for are binary - an item does, or does not use a particular identifier. Thank you very much for your comments. I will be reporting back on this ticket as the situation with the overlap data progresses. TASK DETAIL https://phabricator.wikimedia.org/T204440 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Jheald, agray, Envlh, Lea_Lacroix_WMDE, VIGNERON, Pintoch, Daniel_Mietchen, connorshea, Moebeus, Multichill, Hjfocs, RazShuty, GoranSMilovanovic, Aklapper, Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs