jcrespo created this task.
jcrespo added a project: Wikidata.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  At T246159 <https://phabricator.wikimedia.org/T246159> we can observe a 
really badly formatted query:
  
    SELECT /* 
Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin 
*/ wbtl_id, wbtl_type_id, wbxl_language, wbx_text, wbit_item_id FROM 
`wbt_term_in_lang`, `wbt_text_in_lang`, `wbt_text`, `wbt_item_terms` WHERE 
(wbtl_text_in_lang_id=wbxl_id) AND (wbxl_text_id=wbx_id) AND wbit_item_id IN 
(...) AND wbxl_language IN ('hu', 'en') AND (`wbit_term_in_lang_id` = wbtl_id)
  
  Compare this with the equivalent:
  
    SELECT /* 
Wikibase\Lib\Store\Sql\Terms\DatabaseTermInLangIdsResolver::selectTermsViaJoin 
*/
           wbtl_id, wbtl_type_id, wbxl_language, wbx_text, wbit_item_id
      FROM wbt_term_in_lang
      JOIN wbt_text_in_lang
        ON wbtl_text_in_lang_id = wbxl_id
      JOIN wbt_text
        ON wbxl_text_id = wbx_id
      JOIN wbt_item_terms
        ON wbit_term_in_lang_id = wbtl_id
     WHERE wbit_item_id IN (...) AND wbxl_language IN ('hu', 'en')
  
  (ignore the quotation and justification, and I don't know my heart the right 
columns of each table). It is funny that a method called `selectTermsViaJoin` 
didn't contain any join!
  
  While this could be disregarded as a minor issue, I would like to direct your 
attention to this issue for 2 reasons:
  
  - A badly formatted query like this **has been the direct cause of outages at 
wikimedia database infrastructure**. By convention (outside of Wikimedia, but 
specially on Wikimedia) implicit JOINS are highly discouraged. I don't know how 
even the wikimedia abstraction layer allows them.
  
  JOINS should be explicit and be used with its ON clause. In the past, the 
removal of an ON clause but not removal from the implicit list of tables to 
join caused overload an outage. As a preventive measure, this was considered a 
bad practice and discouraged. This is also a standard among any serious 
professional SQL work. This is hard to read, to the point I could not work with 
the query. **JOIN ... ON is way more legible**. You know with a single view 
which tables are involved and differenciate filtering from joining comparisons, 
preventing mistakes when changing the query.
  
  - This issue would have been caught immediately by DBAs or other developers 
on review and asked to be changed immediately. This leads to the more important 
issue that **review process may be broken** because not even the most obvious 
formatting issue has been corrected on review. I am not looking to assign any 
blame here, I am asking to review the reviewing process for database layer 
changes so obvious mistakes like these are caught on time.

TASK DETAIL
  https://phabricator.wikimedia.org/T246232

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: Aklapper, Marostegui, Ladsgroup, jcrespo, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to