GoranSMilovanovic added a comment.

  @agray
  
  > I can confirm that the numbers on the tables seem a bit off for some other 
properties. I've been looking at P1614 
<https://phabricator.wikimedia.org/P1614> (History of Parliament), which is 
complete and fairly stable. It currently has 21428 IDs on 17942 items (there's 
a lot of items with two/three IDs) and hasn't had any big changes since I 
finished matching in mid-2018.
  
  
  
  - For P1614 <https://phabricator.wikimedia.org/P1614> (History of 
Parliament), our dashboard reports: `This identifier is used by 17950 WD items.`
  - As of the following: "It currently has 21428 IDs on 17942 items (there's a 
lot of items with two/three IDs)" - **the dashboard discards multiple uses of 
the same identifier on a same item**.  The data are "binarized": in our 
datasets, a particular item either makes use of the identifier, or not.
  
  > For VIAF (P214) the dashboard reports 440 items, against a SPARQL total of 
2807.
  
  - For VIAF P214, the dashboard reports 1380767 items (tab: Tables, the table 
to the right, search for this identifier); are comparing the same data? What 
dashboard functionality have you used to find 440 items for P214 VIAF, please?
  
  
  
  > For Hansard ID (P2015 <https://phabricator.wikimedia.org/P2015>), the 
dashboard has 110 and a SPARQL query has 2369.
  
  - For Hansard ID (P2015 <https://phabricator.wikimedia.org/P2015>), the 
dashboard reports 14467 items, and overlap of 355 items with P214 VIAF ID. Are 
we looking at the same dashboard? :)
  
  > Most dramatically, for the Oxford DNB (P1415 
<https://phabricator.wikimedia.org/P1415>), the dashboard has two items and 
SPARQL has 3171.
  
  - For the Oxford DNB (P1415 <https://phabricator.wikimedia.org/P1415>) the 
dashboard reports that 61143 items make use of it. Can you share your SPARQL 
queries, because I think we are discussing different datasets here.
  
  > Looking at P1415 <https://phabricator.wikimedia.org/P1415> specifically, 
since it's the weirdest one there, the "overlap data" for that property is even 
lower - the most frequent item is VIAF, but only 61 matches. In reality, this 
should be ~40,000 matches out of ~61,000 items. Perhaps some specific 
properties have worse data than others, for some reason?
  
  - I am inspecting the issue right now. The tests are difficult, they take 
time, but in the end we will have the correct overlap data for all identifiers. 
**Again:** we eliminate multiple uses of the same identifier on a same item in 
this dashboard, the only data that we are looking for are binary - an item 
does, or does not use a particular identifier.
  
  Thank you very much for your comments. I will be reporting back on this 
ticket as the situation with the overlap data progresses.

TASK DETAIL
  https://phabricator.wikimedia.org/T204440

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Jheald, agray, Envlh, Lea_Lacroix_WMDE, VIGNERON, Pintoch, Daniel_Mietchen, 
connorshea, Moebeus, Multichill, Hjfocs, RazShuty, GoranSMilovanovic, Aklapper, 
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to