https://bugzilla.wikimedia.org/show_bug.cgi?id=45529

--- Comment #8 from Daniel Kinzler <daniel.kinz...@wikimedia.de> ---
(In reply to comment #3)
> It looks like there are four common select query types in production:
> 
> 1) /* Wikibase\TermSqlIndex::getTermsOfEntity */ select term_language,
> term_type, term_text from `wb_terms` where term_entity_id = ? and
> term_entity_type = ? 

This is getting all the terms defined for an entity. This is mostly used when
determining which term entries need to be updated when an entity is saved.

> 2) /* Wikibase\TermSqlIndex::getMatchingIDs */ select distinct term_entity_id
> from `wb_terms` where (term_language=? and term_search_key like ? and
> term_type=? and term_entity_type=?) or (term_language=? and term_search_key
> like ? and term_type=? and term_entity_type=?) limit ? 

This is used to find entities based on a user provided prefix. We should see
LOTS of those, since they are used for find-as-you-type suggestions in the
search box and in the entity selector.

> 3) /* Wikibase\TermSqlIndex::getMatchingTerms */ SELECT term_entity_type,
> term_type, term_language, term_text, term_entity_id FROM `wb_terms` WHERE
> (term_language=? AND term_type=? AND term_entity_type=?)

This should be relatively rare: it's used to load all labels for all
properties. The number of rows in the result should be the number of properties
defined, which at the moment is about 1000. TermPropertyLabelResolver already
caches the result in memcached.

There should be another variant of the Wikibase\TermSqlIndex::getMatchingTerms
queries, which searches for a specific term_text:

3b) /* Wikibase\TermSqlIndex::getMatchingTerms */ SELECT term_entity_type,
term_type, term_language, term_text, term_entity_id FROM `wb_terms` WHERE
(term_language=? AND term_type=? AND term_entity_type=? AND term_text=? )

This is quite similar to case (2), but using term_text, not term_search_key.

> 4) /* Wikibase\TermSqlIndex::getMatchingTermCombination */ select
> terms?term_entity_type as terms?term_entity_type, terms?term_type as
> terms?term_type, terms?term_language as terms?term_language, terms?term_text 
> [...]

Used to find entities with a given combination of label and description, to
avoid conflicts. This query is performed once per edit.

I suppose the query as such could be optimized - maybe using a subquery is more
efficient here than a full join. Should be investigated.


To cover all these use cases, I'm considering the following indexes:

-- for TermSqlIndex::getMatchingIDs
CREATE INDEX /*i*/term_search ON /*_*/wb_terms (term_language,
term_search_key(12), term_entity_type, term_type);

-- for TermSqlIndex::getTermsOfEntity and for the join in
TermSqlIndex::getMatchingTermCombination
CREATE INDEX /*i*/term_entity ON /*_*/wb_terms (term_entity_type,
term_entity_id, term_type);

-- TermSqlIndex::getMatchingTerms with or without given term_text, as well as
for TermSqlIndex::getMatchingTermCombination
CREATE UNIQUE INDEX /*i*/term_identity ON /*_*/wb_terms (term_language,
term_type, term_entity_type, term_text, term_entity_id);

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to