GoranSMilovanovic added a comment.
No way this is going to work with Spark `stat.crosstab`: - the limit on the number of pairs to collect from a contingency table is `1e6`, - while we're looking at the approximately `55M x 4247+` sized problem - (i.e. there are ~55M items to inspect x 4247 external identifiers to cross-tabulate across the items). This is going to be tough. TASK DETAIL https://phabricator.wikimedia.org/T214897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic, Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs