Anomie added a subscriber: Anomie.
Anomie added a comment.

The slowness in the module is due to the database query to fetch the list of 
used tags:

  SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER BY ct_tag 
LIMIT 11;

Strangely, increasing the limit seems to make it much faster. The query plans 
for the low-limit case versus the high-limit case look like:

  > explain SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER 
BY ct_tag LIMIT 101;
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
  | id   | select_type | table      | type  | possible_keys | key               
| key_len | ref  | rows | Extra                    |
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
  |    1 | SIMPLE      | change_tag | range | NULL          | change_tag_tag_id 
| 257     | NULL |   35 | Using index for group-by |
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
  1 row in set (0.00 sec)
  
  > explain SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER 
BY ct_tag LIMIT 11;
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
  | id   | select_type | table      | type  | possible_keys | key               
| key_len | ref  | rows     | Extra       |
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
  |    1 | SIMPLE      | change_tag | index | NULL          | change_tag_tag_id 
| 272     | NULL | 31483031 | Using index |
  
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
  1 row in set (0.00 sec)

There's also another mode ( 
https://www.wikidata.org/w/api.php?format=json&action=query&list=tags&tgprop=displayname|hitcount&continue=)
 that does `COUNT(*) AS hitcount` instead, BTW. That one uses the slower plan 
in all cases.

BTW, I note that the module only returns a 7 OAuth entries, although one of 
them (cid 93) accounts for 83% of the 31176994 rows in the change_tags table 
and the seven together account for almost 94%.


TASK DETAIL
  https://phabricator.wikimedia.org/T105189

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Anomie
Cc: Anomie, Sjoerddebruin, Krinkle, Aklapper, Wikidata-bugs, aude, Legoktm, 
Malyacko, P.Copp



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to