[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
Gehel closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, Gehel Cc: dcausse, Gehel, Aklapper, Invadibot, MPhamWMF, maantietaja, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Addshore, Mbch331, Dzahn ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
dcausse claimed this task. dcausse moved this task from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. I ran a backfill (reaching directly to appservers) using a thread pool of size 6 over 12 workers (72) and the impact on the app servers was barely noticeable. We can somehow control the parallelism using the options of the flink pipeline itself. The pipeline has been running from yarn with these values for a couple months now so I'm tentatively calling this done. We can reconsider using other techniques like our poolcounter but I doubt this is worth the effort at this point. Related pipeline options: parallelism: 12 wikibase_repo_thread_pool_size: 6 TASK DETAIL https://phabricator.wikimedia.org/T275133 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Gehel, Aklapper, Invadibot, MPhamWMF, maantietaja, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Addshore, Mbch331, Dzahn ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
jijiki added a project: User-jijiki. TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jijiki Cc: dcausse, Gehel, Aklapper, Invadibot, MPhamWMF, maantietaja, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Addshore, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
dcausse added a comment. Restricted Application added a project: wdwb-tech. Since we are going to use envoy to contact MW applications servers I wonder if this kind of limits could be enforced by it? Today I think that wdqs updaters are talking to the edge caches and some requests might not reach app servers but when using envoy we will always hit the app servers. I have no clue what would be a reasonable limit here. I collected some stats on backend timings for the first 7 day of April 2021 (time_firstbyte on cache misses for `/wiki/Special:EntityData/QXYZ.ttl?flavor=dump&revion=XYZ`): | day of april | count | p50 | p75 | p95 | p99 | | 1| 1241154 | 0.083 | 0.104 | 0.157 | 0.212 | | 2| 1570675 | 0.084 | 0.105 | 0.156 | 0.210 | | 3| 1315251 | 0.083 | 0.103 | 0.153 | 0.209 | | 4| 1064852 | 0.081 | 0.102 | 0.155 | 0.209 | | 5| 1232205 | 0.081 | 0.103 | 0.154 | 0.209 | | 6| 1242875 | 0.082 | 0.103 | 0.156 | 0.209 | | 7| 1257607 | 0.082 | 0.103 | 0.157 | 0.212 | TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Gehel, Aklapper, Invadibot, MPhamWMF, maantietaja, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Addshore, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
MPhamWMF set the point value for this task to "2". TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: MPhamWMF Cc: Gehel, Aklapper, MPhamWMF, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
MPhamWMF moved this task from All WDQS-related tasks to Current work on the Wikidata-Query-Service board. MPhamWMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T275133 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: MPhamWMF Cc: Gehel, Aklapper, MPhamWMF, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
MPhamWMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: MPhamWMF Cc: Gehel, Aklapper, MPhamWMF, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
Gehel added a parent task: T244590: [Epic] Rework the WDQS updater as an event driven application. TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: Gehel, Aklapper, MPhamWMF, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T275133: Limit query parallelism from Flink based WDQS updater to Wikidata
Gehel created this task. Gehel added projects: serviceops, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION As an operator of WDQS, I want to ensure that the Updater isn't overloading dependent services by limiting max concurrent requests. Given the recent incident where the WDQS Flink based streaming updater was blocked as it seemed to generate too much traffic, we want to enforce appropriate limits on the query parallelism it can generate. The implementation is already in place, this is just a matter of finding the appropriate configuration. This new updater is a lot more efficient than the current one (which is the whole point), so it can potentially generate a lot more load. As a distributed application, this updater has concurrency limits on each node (currently, 12 nodes x 6 thread = 72 concurrent requests max). Note that this max number is never reached, each nodes does other things than querying Wikidata. In normal operations, the number of requests is limited by the edit rate on Wikidata, and whatever limit we set must be sufficient to support that edit rate. During initial data load, the backlog of edits is consumed as fast as possible, limited by whatever concurrency we set. This is where having a reasonable limit is necessary. The current updater duplicates requests to wikidata for each node (18 nodes at the moment). The new updater centralises this, so we can expect a 18x reduction in queries during normal operation once the new updater is pushed to production. AC: - parallelism limits are agreed on with Service Ops and configured TASK DETAIL https://phabricator.wikimedia.org/T275133 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: Gehel, Aklapper, MPhamWMF, wkandek, JMeybohm, CBogen, Akuckartz, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Dzahn ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs