GoranSMilovanovic added a comment.

  No way this is going to work with Spark `stat.crosstab`:
  
  - the limit on the number of pairs to collect from a contingency table is 
`1e6`,
  - while we're looking at the approximately `55M x 4247+` sized problem
  - (i.e. there are ~55M items to inspect x 4247 external identifiers to 
cross-tabulate across the items).
  
  This is going to be tough.

TASK DETAIL
  https://phabricator.wikimedia.org/T214897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic, 
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to