Greetings, This is the weekly update from the Search Platform team for the week starting 2019-03-25 and 2019-04-01.
As always, feedback and questions are welcome. == Discussions == === Search === * ElasticSearch upgrade to v6: ** incident [0] *Trey finished a deep dive into the performance of language identification for cross-wiki searching [1] (example [2]) and punctuation-related problems, and discovered things are working pretty well overall, but the Chinese language model is a bit off. * Erik noticed that the inlabel / incaption keywords should highlight the label/caption but were not [3] * David worked on fixing an error code that Elasticsearch 6 nested_path and nested_filter are deprecated [4] and _retry_on_conflict was deprecated [5] * We worked on migrating mjolnir to stdout/syslog/cee logging output [6] * The team worked on upgrade to elasticsearch 6.5.4 for cirrus / codfw (specifically) [7] and for eqiad [8] * Erik worked on the implementation and testing of glent m0 integration with wmf infrastructure [9] * David did a lot of work to update the mw-config to use the psi&omega elastic clusters [10] * David found that the auto_generate_phrase_queries is deprecated and ineffective [11] * The team fixed an old bug where we were getting fatal errors - "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB) [12] * Gehel worked to make spicerack more robust when unfreezing writes to elasticsearch / cirrus [13] as well as creating a cookbook to reset frozen write state on elasticsearch / cirrus [14] * Stas moved WikibaseLexeme search code to WikibaseLexemeCirrusSearch extension [15] * We noticed that Elasticsearch indices went read-only, causing a huge lag [16] * We also saw where search exceptions handling was printing response information on the screen [17] * The team fixed an issue where mwgrep was not working [18] * We also fixed an issue where Elasticsearch 6 needed to silence deprecation warnings to avoid logspam [19] * We needed to create an extra elasticsearch clusters in the beta cluster [20] * We also needed some alerts so we know if mjolnir starts misbehaving [21] * We also converted check_elasticsearch.py icinga plugin to py3 [22] * We needed to start using local nginx reverse proxy for connections reuse [23] * The version of curator that we currently use (5.2.0) isn't compatible with elasticsearch 6. Which causes issues in a few cron on logtash servers (see blelow). Version 5.6.0 supports both elasticsearch 5 and 6.....so...we updated it [24] * We also did some cleanup of the reprepro configuration for elasticsearch-curator [25] * Getting a centralized way to inspect the content of the search profiles might be helpful when investigating search behaviors. In the same vein as other dump debug APIs (mapping/settings/cirrusdoc) David suggested that we should add a new simple API to dump the profiles (cirrus-profiles-dump) [26] * David also found that a call to a member function toArray() on a non-object (null) in vendor/ruflin/elastica/lib/Elastica/Client.php:736 and fixed it [27] [0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190327-elasticsearch report [1] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Review_of_Language_Identification_in_Production,_with_a_Special_Focus_on_Stupid_Identification_Tricks [2] https://en.wikipedia.org/w/index.php?search=%D0%93%D0%B0%D1%80%D1%80%D0%B8+%D0%9F%D0%BE%D1%82%D1%82%D0%B5%D1%80%D0%B5 [3] https://phabricator.wikimedia.org/T217809 [4] https://phabricator.wikimedia.org/T219266 [5] https://phabricator.wikimedia.org/T219265 [6] https://phabricator.wikimedia.org/T218833 [7] https://phabricator.wikimedia.org/T218878 [8] https://phabricator.wikimedia.org/T218879 [9] https://phabricator.wikimedia.org/T218164 [10] https://phabricator.wikimedia.org/T210381 [11] https://phabricator.wikimedia.org/T219267 [12] https://phabricator.wikimedia.org/T124196 [13] https://phabricator.wikimedia.org/T219640 [14] https://phabricator.wikimedia.org/T219638 [15] https://phabricator.wikimedia.org/T216206 [16] https://phabricator.wikimedia.org/T219364 [17] https://phabricator.wikimedia.org/T216959 [18] https://phabricator.wikimedia.org/T219162 [19] https://phabricator.wikimedia.org/T219269 [20] https://phabricator.wikimedia.org/T213940 [21] https://phabricator.wikimedia.org/T214494 [22] https://phabricator.wikimedia.org/T215439 [23] https://phabricator.wikimedia.org/T215491 [24] https://phabricator.wikimedia.org/T218991 [25] https://phabricator.wikimedia.org/T216235 [26] https://phabricator.wikimedia.org/T218682 [27] https://phabricator.wikimedia.org/T217402 ---- Subscribe to receive on-wiki (or opt-in email) notifications of the Discovery weekly update. https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly The archive of all past updates can be found on MediaWiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator. [1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R [2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R Yours, Chris Koerner (he/him) Community Relations Specialist Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l