GoranSMilovanovic added a subscriber: Jheald.
GoranSMilovanovic added a comment.


  @Lydia_Pintscher
  
  - Everything else takes place once the WD JSON dump copy to HDFS (T209655 
<https://phabricator.wikimedia.org/T209655>) is in production, and the 
Analytics-Engineering tell me that is going to take a while.
  - I think we should consider investing a bit more of my time here to optimize 
the dashboard (large datasets --> heavy on client-side processing). Please let 
me know what you think.
  
  **Status (final):**
  
  - check why 20 - 30% of data are not delivered from Spark; most probable 
cause: failures due to I/O operations:
    - **DONE** (it was due to I/O failures when writing from Spark indeed)
  
  **Tests**
  
  @agray Replicating T204440#5112525 
<https://phabricator.wikimedia.org/T204440#5112525> using your SPARQL query 
<https://query.wikidata.org/#select%20distinct%20%3FhopCount%20%3Fviaf_overlap%20%3Fhansard_overlap%20%3Fodnb_overlap%20where%0A%23%20hopcount%20-%20total%20items%20with%20P1614%0A%23%20xxx_overlap%20-%20number%20of%20items%20with%20P1614%20and%20that%20property%0A%7B%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3FhopCount%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fviaf_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP214%20%3Fviaf%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fhansard_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP2015%20%3Fhansard%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fodnb_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP1415%20%3Fodnb%20%7D%0A%20%20%7D%0A%7D>:
  
  Tables tab, selected ID: P1614 <https://phabricator.wikimedia.org/P1614>
  
  - Usage: dashboard reports 17946, your query: 17942;
  - Overlap with VIAF: dashboard reports 2787, your query: 2807;
  - Overlap with Hansard (1803–2005) ID (P2015 
<https://phabricator.wikimedia.org/P2015>): dashboard reports 2369, your query: 
2369;
  - Overlap with ODNB P1415 <https://phabricator.wikimedia.org/P1415>: 
dashboard reports 3171, your query: 3171.
  
  Once again, please take into your consideration that we're still processing 
the February dump for the dashboard.
  
  @Jheald Could you please check out the results for P1367 
<https://phabricator.wikimedia.org/P1367> again, and let me know if everything 
is fine now? Thank you.
  
  @VIGNERON
  
  - Mérimée id and VIAF id overlap: dashboard reports 640, SPARQL: 647;
  - usage of Mérimée id: it is used on 48128 items.
  
  @Envlh Thanks once again; the moment you wrote
  
  > My tool checks overlaps only on properties used as statements, not when 
they are used as qualifiers or references...
  
  in T204440#5111313 <https://phabricator.wikimedia.org/T204440#5111313>, I've 
figured out where the things went wrong - besides facing I/O failures with 
Spark which is now fixed.

TASK DETAIL
  https://phabricator.wikimedia.org/T204440

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Jheald, Pintoch, Pigsonthewing, agray, Envlh, Lea_Lacroix_WMDE, VIGNERON, 
Daniel_Mietchen, connorshea, Moebeus, Multichill, Hjfocs, RazShuty, 
GoranSMilovanovic, Aklapper, Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, 
QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to