[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-12-13 Thread Gehel
Gehel closed this task as "Resolved". Gehel claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: Legoktm, Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylo

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-12-02 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Maintenance_bot Cc: Legoktm, Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylo

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-12-02 Thread gerritbot
gerritbot added a comment. Change 743223 **merged** by Ryan Kemper: [operations/puppet@production] Switch WCQS to profile::base::linux510 https://gerrit.wikimedia.org/r/743223 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.o

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-12-02 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Legoktm, Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzh

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-12-02 Thread gerritbot
gerritbot added a comment. Change 743223 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper): [operations/puppet@production] Switch WCQS to profile::base::linux510 https://gerrit.wikimedia.org/r/743223 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-30 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Maintenance_bot Cc: Legoktm, Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylo

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-30 Thread gerritbot
gerritbot added a comment. Change 742729 **merged** by Ryan Kemper: [operations/puppet@production] Add profile::base::linux419 to the WCQS role https://gerrit.wikimedia.org/r/742729 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikim

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-30 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: Legoktm, Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzh

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-30 Thread gerritbot
gerritbot added a comment. Change 742729 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff): [operations/puppet@production] Add profile::base::linux419 to the WCQS role https://gerrit.wikimedia.org/r/742729 TASK DETAIL https://phabricator.wikimedia.org/T294961

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-30 Thread MoritzMuehlenhoff
MoritzMuehlenhoff added a comment. In T294961#7534879 , @EBernhardson wrote: > Another round of import tests completed, nothing fell over. Calling this done for now. We still to add profile::base::linux419 to the WCQS roles, otherwi

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-29 Thread EBernhardson
EBernhardson moved this task from Waiting to Needs Reporting on the Discovery-Search (Current work) board. EBernhardson added a comment. Another round of import tests completed, nothing fell over. Calling this done for now. TASK DETAIL https://phabricator.wikimedia.org/T294961 WORKBOARD

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-23 Thread EBernhardson
EBernhardson added a comment. Started another round of imports today to see how it goes. If it doesn't fall over might as well call this done for now. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-08 Thread colewhite
colewhite triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: colewhite Cc: Dzahn, MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzheado

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-08 Thread EBernhardson
EBernhardson added a comment. The import that caused everything to fall over last time completed. I'm not sure that's enough to declare this fixed (it ran once before as well) but after putting the puppet patch in place we can probably wait on this one to see if it reoccurs. TASK DETAIL h

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-08 Thread RKemper
RKemper added a comment. In T294961#7489817 , @Gehel wrote: > Let's make sure the kernel version is pinned somewhere in our puppet code! Then we can wait to see if the problem is reproduced or not. > ryankemper, ebernhardson:

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-08 Thread RKemper
RKemper merged a task: T294865: wcqs1002 and wcqs2001 unresponsive. RKemper added a subscriber: Dzahn. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper Cc: Dzahn, MoritzMuehlenhoff, RKemper,

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-08 Thread Gehel
Gehel added a comment. Let's make sure the kernel version is pinned somewhere in our puppet code! Then we can wait to see if the problem is reproduced or not. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpre

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-04T17:23:57Z] T294961 [WCQS] Installed kernel version `Linux 5.10.0-0.bpo.9-amd64` on all wcqs* hosts TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFER

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-04T16:48:45Z] T294961 [WCQS] Power cycled all 6 wcqs* hosts via the mgmt console (`racadm serveraction powercycle`) TASK DETAIL https://phabricator.wikimedia.org/T294961

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-04 Thread MoritzMuehlenhoff
MoritzMuehlenhoff added a comment. In T294961#7479399 , @EBernhardson wrote: > Without having better information, i would guess we are triggering a deadlock somewhere in the kernel related to disk writes? But no particular locking is me

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Maintenance_bot Cc: MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzhea

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-03T21:56:00Z] T294961 [WCQS] Forcing recheck of `PyBal IPVS diff check` and `PyBal backends health check` TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PR

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-03T21:53:32Z] T294961 [WCQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/736564 and successfully ran `ryankemper@cumin1001:~$ sudo cumin 'A:icinga or A:dns

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread gerritbot
gerritbot added a comment. Change 736564 **merged** by Ryan Kemper: [operations/puppet@production] wcqs: state change production->lvs_setup https://gerrit.wikimedia.org/r/736564 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-03T21:47:45Z] T294961 [WCQS] DNS changes rolled out, proceeding to the `lvs_setup` step: https://gerrit.wikimedia.org/r/c/operations/puppet/+/736564 TASK DETAIL https:/

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread RKemper
RKemper added a comment. DNS changes from https://gerrit.wikimedia.org/r/c/operations/dns/+/736585 - ryankemper@authdns1001:~$ sudo -i authdns-update Updating authdns1001.wikimedia.org (self)... Pulling the

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-03T21:45:46Z] T294961 [WCQS] Merged https://gerrit.wikimedia.org/r/c/operations/dns/+/736585, running `ryankemper@authdns1001:~$ sudo -i authdns-update` TASK DETAIL htt

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread gerritbot
gerritbot added a comment. Change 736585 **merged** by Ryan Kemper: [operations/dns@master] Revert \"wcqs: add discovery record\" https://gerrit.wikimedia.org/r/736585 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/setti

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzheado, So9q, GFo

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread gerritbot
gerritbot added a comment. Change 736585 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper): [operations/dns@master] Revert \"wcqs: add discovery record\" https://gerrit.wikimedia.org/r/736585 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERE

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: gerritbot Cc: MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzheado, So9q, GFon

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread gerritbot
gerritbot added a comment. Change 736564 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper): [operations/puppet@production] wcqs: state change production->lvs_setup https://gerrit.wikimedia.org/r/736564 TASK DETAIL https://phabricator.wikimedia.org/T294961 EMA

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread RKemper
RKemper added a parent task: T294865: wcqs1002 and wcqs2001 unresponsive. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper Cc: MoritzMuehlenhoff, RKemper, Aklapper, Gehel, CBogen, ttaylor, Z

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2021-11-03T19:35:49Z] depooled wcqs2003 (pooled=inactive) because Icinga alerts that servers are down but pooled. not in production yet but issues (T294961 ) TASK DETAIL https:/

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread EBernhardson
EBernhardson added a subscriber: MoritzMuehlenhoff. EBernhardson added a comment. Some random info i looked up: - grafana reports free memory of at least 90G across instances. This is typical, the application leans heavily on the linux disk cache to keep it's data in memory. - Instance

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread EBernhardson
EBernhardson updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzheado, So9q, GFontenelle_WMF, EBernhardson,

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread EBernhardson
EBernhardson added a project: SRE. Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T294961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson Cc: Aklapper, Gehel, CBogen, ttaylor, Zache, Fuzhe

[Wikidata-bugs] [Maniphest] T294961: Resolve kernel hang on wcqs* instances

2021-11-03 Thread EBernhardson
EBernhardson created this task. EBernhardson added projects: Wikidata, Wikidata-Query-Service, Discovery-Search (Current work). TASK DESCRIPTION While setting up the new wcqs service on wcqs[12]00[123] we started an import, the same import process that we have run many times on the wcqs-beta