[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-06-10 Thread daniel
daniel added a comment.@jcrespo so we should do one query per ID, with limit 1? ok!TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-06-10 Thread jcrespo
jcrespo added a comment.@daniel I think this is a case of prematurely optimizing. It is true that things like:

foreach ... { 'SELECT' }

Are usually considered bad practices, but for trying to minimize "round-trip time", we are actually doing a way worse query. Your query cannot know that with only 1 result per IN value it would be enough- it is implemented using a group by.

Look at this quick script I wrote:

time while read id; do mysql -h db1068 commonswiki -e "SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = '$id' LIMIT 1"; done < eu_entity_id

eu_entity_id
Q148475
eu_entity_id
Q54919
eu_entity_id
Q423048
eu_entity_id
Q2494649
eu_entity_id
Q13219454
eu_entity_id
Q131454
eu_entity_id
Q36578
eu_entity_id
Q1798125
eu_entity_id
Q2597810
eu_entity_id
Q19938912
eu_entity_id
Q193563
eu_entity_id
Q2553334
eu_entity_id
Q477675
eu_entity_id
Q623578
eu_entity_id
Q19675
eu_entity_id
Q13481868
eu_entity_id
Q384602

real	0m0.119s
user	0m0.028s
sys	0m0.048s

Despite creating a *new connection* each time, it only takes 0.1 seconds (and I am connecting and disconnecting 15 times! It will take even less if we are not connecting every time, as a normal script would do).

Let's go with the simpler approach (point SELECT queries), and we can later try to optimize the round-trip time if needed. InnoDB is very, very fast to do index lookups thanks to internal hash tables.

Of course, that was not a trivial example, but this happens many times- we assume only a few records per value exist, when in reality, uses explodes to several millions quickly.TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-06-10 Thread daniel
daniel added a comment.@jcrespo @hoo: ick, 14 million rows? And this isn't optimized away because of the DISTINCT?

The intent of the query is: "for the given set of entity IDs, tell me which ones are in the table (at least once)". So the total result will never be larger than the number of IDs in the query.

Instead of looking up batches, we could check each ID individually:

SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = @id LIMIT 1;

The crucial bit here is the LIMIT. Doing this for each ID would perform a lot better than the "bad" case I think. But how does performance compare for the typical case?

Also: Would it help to add a limit to the batched query? Would it be planned/executed differently? I mean like this:

SELECT DISTINCT eu_entity_id  FROM `wbc_entity_usage` 
WHERE eu_entity_id IN ('Q148475', 'Q54919', 'Q423048', 'Q2494649', 'Q13219454', 'Q131454', 'Q36578', 'Q1798125', 'Q2597810', 'Q19938912', 'Q193563', 'Q2553334', 'Q477675', 'Q623578', 'Q19675', 'Q13481868', 'Q384602')
LIMIT 17;

With 17 being the number if IDs in the set. Does this help?TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-06-10 Thread jcrespo
jcrespo added a comment.This is not a problem with the servers, the query planner, or the indexing:

MariaDB  db1068 commonswiki > EXPLAIN SELECT  DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: wbc_entity_usage
 type: index
possible_keys: eu_entity_id
  key: eu_entity_id
  key_len: 300
  ref: NULL
 rows: 17882560
Extra: Using where; Using index
1 row in set (0.03 sec)

MariaDB  db1068 commonswiki > SHOW CREATE TABLE wbc_entity_usage\G
*** 1. row ***
   Table: wbc_entity_usage
Create Table: CREATE TABLE `wbc_entity_usage` (
  `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `eu_entity_id` varbinary(255) NOT NULL,
  `eu_aspect` varbinary(37) NOT NULL,
  `eu_page_id` int(11) NOT NULL,
  `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`eu_row_id`),
  UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`),
  KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=18752832 DEFAULT CHARSET=binary
1 row in set (0.01 sec)

MariaDB  db1070 wikidatawiki > EXPLAIN SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: wbc_entity_usage
 type: range
possible_keys: eu_entity_id
  key: eu_entity_id
  key_len: 257
  ref: NULL
 rows: 1323
Extra: Using where; Using index
1 row in set (0.05 sec)

MariaDB  db1070 wikidatawiki > SHOW CREATE TABLE wbc_entity_usage\G
*** 1. row ***
   Table: wbc_entity_usage
Create Table: CREATE TABLE `wbc_entity_usage` (
  `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `eu_entity_id` varbinary(255) NOT NULL,
  `eu_aspect` varbinary(37) NOT NULL,
  `eu_page_id` int(11) NOT NULL,
  `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`eu_row_id`),
  UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`),
  KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15125849 DEFAULT CHARSET=binary
1 row in set (0.00 sec)

You are selecting in the first case 14 million rows:

MariaDB  db1068 commonswiki > SELECT count(*)  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*** 1. row ***
count(*): 13997900
1 row in set (1 min 30.30 sec)

and 1328 in the second case:

MariaDB  db1070 wikidatawiki > SELECT count(*) FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');
+--+
| count(*) |
+--+
| 1328 |
+--+
1 row in set (0.00 sec)

I will start killing this query when it takes more than 5 seconds until it is fixed.TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-06-09 Thread hoo
hoo added a comment.db1070 vs. db1068 (different database, cold queries, the fact that the result rows match is just a coincidence):

mysql:wikiadmin@db1068 [commonswiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');   
…  
17 rows in set (5.47 sec) 


mysql:wikiadmin@db1070 [wikidatawiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');  
…
17 rows in set (0.06 sec)TASK DETAILhttps://phabricator.wikimedia.org/T116404EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-04-22 Thread hoo
hoo added a comment.


  I haven't seen it in the error logs recently at least, but I'm not looking at 
them every day.

TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo, hoo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, 
TerraCodes, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2016-04-22 Thread jcrespo
jcrespo added a comment.


  Marius, or someone else, do you know if this is still ongoing after latest 
deployments + hardware upgrades?

TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, 
TerraCodes, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2015-11-07 Thread hoo
hoo added a comment.

I can no longer see this issue on either db1060 nor db1054, but it's still 
reproducible on eg. db1018.


TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo, hoo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2015-11-07 Thread jcrespo
jcrespo added a comment.

The initial issue still happens, although now the query is consistently slow 
every time on both servers.


TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2015-11-07 Thread jcrespo
jcrespo added a comment.

A similar thing is happening on zhwiki for a different query- the optimizer 
seems to have some bug for that wiki in particular?


TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2015-11-07 Thread jcrespo
jcrespo added a comment.

Can confirm last seen on db1018:

  SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN 
  
('Q10864210','Q10866766','Q10874855','Q10877844','Q10877846','Q10878314','Q10879635',
  
'Q10880445','Q10882043','Q10887655','Q10890010','Q10890075','Q10896051','Q10911990',
  
'Q10912402','Q10912474','Q10916748','Q10920782','Q10925493','Q10927910','Q10931742',
  
'Q10933511','Q10934945','Q10937305','Q10938551','Q11063426','Q11066570','Q11072572',
  
'Q11079946','Q11082798','Q11086067','Q11099893','Q11101554','Q11105025','Q11109539',
  
'Q0088','Q11122401','Q11124417','Q11131036','Q11181300','Q1152679','Q13908633',
  
'Q13909197','Q13910038','Q13910640','Q13910996','Q13911366','Q13912288','Q13912391',
  
'Q13912743','Q13912856','Q13912956','Q13913067','Q13913354','Q13913410','Q13913531',
  'Q13913585','Q148','Q19770','Q426996','Q7418222') 

This could be related to the index dive limit.


TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Mbch331, 
Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T116404: EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow

2015-10-29 Thread jcrespo
jcrespo added a comment.

I do not see this happening on enwiki. Checking on other wikis/hosts.


TASK DETAIL
  https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: aude, daniel, hoo, Aklapper, jcrespo, Wikidata-bugs, GWicke, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs