GoranSMilovanovic added a subscriber: Jheald. GoranSMilovanovic added a comment.
@Lydia_Pintscher - Everything else takes place once the WD JSON dump copy to HDFS (T209655 <https://phabricator.wikimedia.org/T209655>) is in production, and the Analytics-Engineering tell me that is going to take a while. - I think we should consider investing a bit more of my time here to optimize the dashboard (large datasets --> heavy on client-side processing). Please let me know what you think. **Status (final):** - check why 20 - 30% of data are not delivered from Spark; most probable cause: failures due to I/O operations: - **DONE** (it was due to I/O failures when writing from Spark indeed) **Tests** @agray Replicating T204440#5112525 <https://phabricator.wikimedia.org/T204440#5112525> using your SPARQL query <https://query.wikidata.org/#select%20distinct%20%3FhopCount%20%3Fviaf_overlap%20%3Fhansard_overlap%20%3Fodnb_overlap%20where%0A%23%20hopcount%20-%20total%20items%20with%20P1614%0A%23%20xxx_overlap%20-%20number%20of%20items%20with%20P1614%20and%20that%20property%0A%7B%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3FhopCount%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fviaf_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP214%20%3Fviaf%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fhansard_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP2015%20%3Fhansard%20%7D%0A%20%20%7D%0A%20%20%7B%0A%20%20%20%20select%20%28count%28distinct%20%3Fitem%29%20as%20%3Fodnb_overlap%29%20where%0A%20%20%20%20%7B%20%3Fitem%20wdt%3AP1614%20%3Fhop%20.%20%3Fitem%20wdt%3AP1415%20%3Fodnb%20%7D%0A%20%20%7D%0A%7D>: Tables tab, selected ID: P1614 <https://phabricator.wikimedia.org/P1614> - Usage: dashboard reports 17946, your query: 17942; - Overlap with VIAF: dashboard reports 2787, your query: 2807; - Overlap with Hansard (1803–2005) ID (P2015 <https://phabricator.wikimedia.org/P2015>): dashboard reports 2369, your query: 2369; - Overlap with ODNB P1415 <https://phabricator.wikimedia.org/P1415>: dashboard reports 3171, your query: 3171. Once again, please take into your consideration that we're still processing the February dump for the dashboard. @Jheald Could you please check out the results for P1367 <https://phabricator.wikimedia.org/P1367> again, and let me know if everything is fine now? Thank you. @VIGNERON - Mérimée id and VIAF id overlap: dashboard reports 640, SPARQL: 647; - usage of Mérimée id: it is used on 48128 items. @Envlh Thanks once again; the moment you wrote > My tool checks overlaps only on properties used as statements, not when they are used as qualifiers or references... in T204440#5111313 <https://phabricator.wikimedia.org/T204440#5111313>, I've figured out where the things went wrong - besides facing I/O failures with Spark which is now fixed. TASK DETAIL https://phabricator.wikimedia.org/T204440 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Jheald, Pintoch, Pigsonthewing, agray, Envlh, Lea_Lacroix_WMDE, VIGNERON, Daniel_Mietchen, connorshea, Moebeus, Multichill, Hjfocs, RazShuty, GoranSMilovanovic, Aklapper, Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs