[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
daniel added a comment.@jcrespo so we should do one query per ID, with limit 1? ok!TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment.@daniel I think this is a case of prematurely optimizing. It is true that things like: foreach ... { 'SELECT' } Are usually considered bad practices, but for trying to minimize "round-trip time", we are actually doing a way worse query. Your query cannot know that with only 1 result per IN value it would be enough- it is implemented using a group by. Look at this quick script I wrote: time while read id; do mysql -h db1068 commonswiki -e "SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = '$id' LIMIT 1"; done < eu_entity_id eu_entity_id Q148475 eu_entity_id Q54919 eu_entity_id Q423048 eu_entity_id Q2494649 eu_entity_id Q13219454 eu_entity_id Q131454 eu_entity_id Q36578 eu_entity_id Q1798125 eu_entity_id Q2597810 eu_entity_id Q19938912 eu_entity_id Q193563 eu_entity_id Q2553334 eu_entity_id Q477675 eu_entity_id Q623578 eu_entity_id Q19675 eu_entity_id Q13481868 eu_entity_id Q384602 real 0m0.119s user 0m0.028s sys 0m0.048s Despite creating a *new connection* each time, it only takes 0.1 seconds (and I am connecting and disconnecting 15 times! It will take even less if we are not connecting every time, as a normal script would do). Let's go with the simpler approach (point SELECT queries), and we can later try to optimize the round-trip time if needed. InnoDB is very, very fast to do index lookups thanks to internal hash tables. Of course, that was not a trivial example, but this happens many times- we assume only a few records per value exist, when in reality, uses explodes to several millions quickly.TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
daniel added a comment.@jcrespo @hoo: ick, 14 million rows? And this isn't optimized away because of the DISTINCT? The intent of the query is: "for the given set of entity IDs, tell me which ones are in the table (at least once)". So the total result will never be larger than the number of IDs in the query. Instead of looking up batches, we could check each ID individually: SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = @id LIMIT 1; The crucial bit here is the LIMIT. Doing this for each ID would perform a lot better than the "bad" case I think. But how does performance compare for the typical case? Also: Would it help to add a limit to the batched query? Would it be planned/executed differently? I mean like this: SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475', 'Q54919', 'Q423048', 'Q2494649', 'Q13219454', 'Q131454', 'Q36578', 'Q1798125', 'Q2597810', 'Q19938912', 'Q193563', 'Q2553334', 'Q477675', 'Q623578', 'Q19675', 'Q13481868', 'Q384602') LIMIT 17; With 17 being the number if IDs in the set. Does this help?TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment.This is not a problem with the servers, the query planner, or the indexing: MariaDB db1068 commonswiki > EXPLAIN SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G *** 1. row *** id: 1 select_type: SIMPLE table: wbc_entity_usage type: index possible_keys: eu_entity_id key: eu_entity_id key_len: 300 ref: NULL rows: 17882560 Extra: Using where; Using index 1 row in set (0.03 sec) MariaDB db1068 commonswiki > SHOW CREATE TABLE wbc_entity_usage\G *** 1. row *** Table: wbc_entity_usage Create Table: CREATE TABLE `wbc_entity_usage` ( `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT, `eu_entity_id` varbinary(255) NOT NULL, `eu_aspect` varbinary(37) NOT NULL, `eu_page_id` int(11) NOT NULL, `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0', PRIMARY KEY (`eu_row_id`), UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`), KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`) ) ENGINE=InnoDB AUTO_INCREMENT=18752832 DEFAULT CHARSET=binary 1 row in set (0.01 sec) MariaDB db1070 wikidatawiki > EXPLAIN SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings */ DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G *** 1. row *** id: 1 select_type: SIMPLE table: wbc_entity_usage type: range possible_keys: eu_entity_id key: eu_entity_id key_len: 257 ref: NULL rows: 1323 Extra: Using where; Using index 1 row in set (0.05 sec) MariaDB db1070 wikidatawiki > SHOW CREATE TABLE wbc_entity_usage\G *** 1. row *** Table: wbc_entity_usage Create Table: CREATE TABLE `wbc_entity_usage` ( `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT, `eu_entity_id` varbinary(255) NOT NULL, `eu_aspect` varbinary(37) NOT NULL, `eu_page_id` int(11) NOT NULL, `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0', PRIMARY KEY (`eu_row_id`), UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`), KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`) ) ENGINE=InnoDB AUTO_INCREMENT=15125849 DEFAULT CHARSET=binary 1 row in set (0.00 sec) You are selecting in the first case 14 million rows: MariaDB db1068 commonswiki > SELECT count(*) FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G *** 1. row *** count(*): 13997900 1 row in set (1 min 30.30 sec) and 1328 in the second case: MariaDB db1070 wikidatawiki > SELECT count(*) FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602'); +--+ | count(*) | +--+ | 1328 | +--+ 1 row in set (0.00 sec) I will start killing this query when it takes more than 5 seconds until it is fixed.TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
hoo added a comment.db1070 vs. db1068 (different database, cold queries, the fact that the result rows match is just a coincidence): mysql:wikiadmin@db1068 [commonswiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings */ DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602'); … 17 rows in set (5.47 sec) mysql:wikiadmin@db1070 [wikidatawiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings */ DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602'); … 17 rows in set (0.06 sec)TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
hoo added a comment. I haven't seen it in the error logs recently at least, but I'm not looking at them every day. TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo, hoo Cc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, TerraCodes, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment. Marius, or someone else, do you know if this is still ongoing after latest deployments + hardware upgrades? TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, TerraCodes, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
hoo added a comment. I can no longer see this issue on either db1060 nor db1054, but it's still reproducible on eg. db1018. TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo, hoo Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment. The initial issue still happens, although now the query is consistently slow every time on both servers. TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment. A similar thing is happening on zhwiki for a different query- the optimizer seems to have some bug for that wiki in particular? TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment. Can confirm last seen on db1018: SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q10864210','Q10866766','Q10874855','Q10877844','Q10877846','Q10878314','Q10879635', 'Q10880445','Q10882043','Q10887655','Q10890010','Q10890075','Q10896051','Q10911990', 'Q10912402','Q10912474','Q10916748','Q10920782','Q10925493','Q10927910','Q10931742', 'Q10933511','Q10934945','Q10937305','Q10938551','Q11063426','Q11066570','Q11072572', 'Q11079946','Q11082798','Q11086067','Q11099893','Q11101554','Q11105025','Q11109539', 'Q0088','Q11122401','Q11124417','Q11131036','Q11181300','Q1152679','Q13908633', 'Q13909197','Q13910038','Q13910640','Q13910996','Q13911366','Q13912288','Q13912391', 'Q13912743','Q13912856','Q13912956','Q13913067','Q13913354','Q13913410','Q13913531', 'Q13913585','Q148','Q19770','Q426996','Q7418222') This could be related to the index dive limit. TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
jcrespo added a comment. I do not see this happening on enwiki. Checking on other wikis/hosts. TASK DETAIL https://phabricator.wikimedia.org/T116404 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs