[Bug 1890432] Re: Create subnet is failing under high load with OVN

2021-02-01 Thread Jason Hobbs
sub'd to field high, this is affecting sqa release testing. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890432 Title: Create subnet is failing under high load with OVN To manage notifications ab

[Bug 1908108] [NEW] tg3: transmit timed out, resetting

2020-12-14 Thread Jason Hobbs
Public bug reported: On a deploy of kubernetes, we're seeing a machine have issues with its tg3 driven nics. We see: Dec 14 07:44:08 juju-fcf29c-0-lxd-1 kernel: [ 1496.772960] tg3 :02:00.1 eth1: transmit timed out, resetting Around that time, we have issues with services losing network conn

[Bug 1900016] Re: [SRU] pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-11-09 Thread Jason Hobbs
I tested the resource-agents package from focal-proposed and it fixed it for me. Marking verification-complete. Logs: http://paste.ubuntu.com/p/F5yDkV2wKS/ ** Tags removed: verification-needed verification-needed-focal ** Tags added: verification-done verification-done-focal -- You received this

Re: [Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-21 Thread Jason Hobbs
We're not having any issues with the VIP moving when it's supposed to. I don't really understand what the command you suggested does, but it's not really relevant to our problem. As you say, the problem is that the xml is missing the node attributes right after the deployment. Jason On Wed, Oct 2

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-20 Thread Jason Hobbs
Here's my commands and output: https://paste.ubuntu.com/p/R4f5xX6QPq/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900016 Title: pgsql resource agent uses regexes for old crm_mon format, breaks

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-20 Thread Jason Hobbs
I can use my configuration for the test case and do the validation, no problem. Do you need anything from me right now? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900016 Title: pgsql resource ag

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-19 Thread Jason Hobbs
Here's crm configure show: https://paste.ubuntu.com/p/Mqcn7HMzng/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900016 Title: pgsql resource agent uses regexes for old crm_mon format, breaks pgs

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-15 Thread Jason Hobbs
Dropped back to field-high since we can hotpatch as a workaround, because there will be no additional package updates that don't contain this fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900016

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-15 Thread Jason Hobbs
Bumped to field crit as we don't have a good workaround for this. We could hotpatch the resource agent, but that only lasts until the package is updated again, and then crm status output for pgsql will be broken again. -- You received this bug notification because you are a member of Ubuntu Bugs,

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-15 Thread Jason Hobbs
Here they are with regex that accepts either version: is_node_online() { print_crm_mon | tr '[A-Z]' '[a-z]' | grep -e "^\( \* \)\?node $1 " -e "^\( \* \)\?node $1:" | grep -q -v "offline" } node_exist() { print_crm_mon | tr '[A-Z]' '[a-z]' | grep -q "^\( \* \)\?node $1" } -- You rec

[Bug 1900016] Re: pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-15 Thread Jason Hobbs
sub'd to field high; this breaks our ability to validate postgres HA on focal. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900016 Title: pgsql resource agent uses regexes for old crm_mon format,

[Bug 1900016] [NEW] pgsql resource agent uses regexes for old crm_mon format, breaks pgsql-status and pgsql-data-status attributes

2020-10-15 Thread Jason Hobbs
Public bug reported: There is a bug in the resource agent's node_exist function. It looks at crm_mon output, which has changed between bionic and focal. The result is that the 'pgsql-status' and 'pgsql-data-status' attributes are missing from crm status --as-xml output on focal. Here is the foca

[Bug 1899822] Re: hang during purging postgresql-common

2020-10-14 Thread Jason Hobbs
we were missing DEBIAN_FRONTEND=noninteractive ** Changed in: postgresql-common (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899822 Title: hang during purgi

[Bug 1899822] [NEW] hang during purging postgresql-common

2020-10-14 Thread Jason Hobbs
Public bug reported: purging postgresql-common on 20.04 results in a hang: root 269255 0.0 0.0 11004 4672 ?Ss 16:44 0:00 | \_ sudo bash --login -c apt-get -q autoremove --purge postgresql-common -y root 269256 0.4 0.0 69708 58152 ?S16:44 0:00 |

[Bug 1899822] Re: hang during purging postgresql-common

2020-10-14 Thread Jason Hobbs
sub'd to field high; this blocks cleaning in our CI. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899822 Title: hang during purging postgresql-common To manage notifications about this bug go to:

[Bug 1899187] [NEW] in python 3.8, ActionAPI.__call__ breaks with "RuntimeError: dictionary keys changed during iteration" #246

2020-10-09 Thread Jason Hobbs
Public bug reported: See https://github.com/maas/python-libmaas/issues/246 This is fixed upstream and needs to be fixed in Ubuntu too. ** Affects: python-libmaas (Ubuntu) Importance: Undecided Assignee: Adam Collard (adam-collard) Status: New ** Tags: cdo-qa foundations-engi

[Bug 1899187] Re: in python 3.8, ActionAPI.__call__ breaks with "RuntimeError: dictionary keys changed during iteration" #246

2020-10-09 Thread Jason Hobbs
sub'd to field high; this blocks FCE from working on focal. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1899187 Title: in python 3.8, ActionAPI.__call__ breaks with "RuntimeError: dictionary ke

[Bug 1880959] Re: Rules from the policy directory files are not reapplied after changes to the primary policy file

2020-07-08 Thread Jason Hobbs
sub'd to field high as this is blocking field high bug 1818113 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1880959 Title: Rules from the policy directory files are not reapplied after changes to

[Bug 1861457] Re: pyroute2 0.5.2 doesn't support neutron-common 14.0.4

2020-02-10 Thread Jason Hobbs
I tested with the package from stein-proposed (python3-pyroute2 0.5.4-0ubuntu0.19.04.1~cloud1) and got successful results. The issue I was seeing in bug 1862200 is no longer occurring. I'm marking this verification-stein-done. ** Tags removed: verification-stein-needed ** Tags added: verification-

[Bug 1861457] Re: pyroute2 0.5.2 doesn't support neutron-common 14.0.4

2020-02-10 Thread Jason Hobbs
I'm getting errors using proposed: http://paste.ubuntu.com/p/6wMxVZbYqz/ 2020-02-10 15:24:38 DEBUG install Setting up python3-pyroute2 (0.5.4-0ubuntu0.19.04.1~cloud0) ... 2020-02-10 15:24:38 DEBUG install update-alternatives: using /usr/bin/python3-ss2 to provide /usr/bin/ss2 (ss2) in auto mode

[Bug 1861457] Re: pyroute2 0.5.2 doesn't support neutron-common 14.0.4

2020-02-10 Thread Jason Hobbs
bundle for comment #12 http://paste.ubuntu.com/p/4mC9v237H9/ ** Tags removed: verification-stein-needed ** Tags added: verification-stein-failed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1861457

[Bug 1844543] Re: timeout removing bcache device when lvm is over bcache

2019-12-17 Thread Jason Hobbs
I tested with 19.3-787-gb022ed4-0ubuntu1+228~trunk~ubuntu18.04.1 and it can repeatedly install just fine - no issues with the setup described above. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/184454

[Bug 1844543] Re: timeout removing bcache device when lvm is over bcache

2019-12-17 Thread Jason Hobbs
20:37 < rharper> powersj: jhobbs: re: curtin SRU; I had planned to SRU in Nov, but the fix that landed at the time was not complete; I was still able to recreate the failure. We have a more omplete fix that's passing all of the vmtest scenarios with bcache; that's landed, so likely SRU will start

[Bug 1846535] Re: cloud-init 19.2.36 fails with python exception "Not all expected physical devices present ..." during bionic image deployment from MAAS

2019-10-04 Thread Jason Hobbs
I successfully verified the bionic fix on MAAS. Here's what I did: 1. deployed a machine with a bridge via maas 2. machine went to deployed mode, couldn't ssh in 3. switched to rescue mode, ssh'd in 4. mounted /, captured cloud-init-output.log with error: http://paste.ubuntu.com/p/Gg53xf9wtZ/ 5.

Re: [Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout

2019-08-22 Thread Jason Hobbs
@ Ryan we do not test Xenial or Disco On Thu, Aug 22, 2019 at 7:41 PM Ryan Harper <1784...@bugs.launchpad.net> wrote: > Finally, I did verify xenial proposed with our original test. I had > over 100 installs with no issue. > > @Jason > > Have you had any runs on Xenial or Disco? (or do you not

[Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout

2019-08-22 Thread Jason Hobbs
** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1784665 Title: bcache: bch_allocator_thread(): hung task timeout To manage not

[Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout

2019-08-22 Thread Jason Hobbs
** Attachment added: "spinda.maas-curtin_config.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1784665/+attachment/5284072/+files/spinda.maas-curtin_config.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs

[Bug 1784665] Re: bcache: bch_allocator_thread(): hung task timeout

2019-08-22 Thread Jason Hobbs
We're still seeing a bcache timeout failure during curtin install 2019-08-22T10:16:40+00:00 spinda cloud-init[1604]: finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAIL: removing previous storage devices 2019-08-22T10:16:40+00:00 spinda cloud-init[1604]:

Re: [Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures

2019-07-03 Thread Jason Hobbs
This is difficult for us to test in our lab because we are using MAAS, and we hit this during MAAS deployments of nodes, so we would need MAAS images built with these kernels. Additionally, this doesn't reproduce every time, it is maybe 1/4 test runs. It may be best to find a way to reproduce this

[Bug 1777512] Re: key retrieval timeouts cause failures

2019-05-21 Thread Jason Hobbs
** No longer affects: cdoqa-system-tests -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1777512 Title: key retrieval timeouts cause failures To manage notifications about this bug go to: https://bug

[Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures

2019-05-14 Thread Jason Hobbs
** Tags added: cdo-qa foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1796292 Title: Tight timeout for bcache removal causes spurious failures To manage notifications about this bu

[Bug 1777512] Re: key retrieval timeouts cause failures

2019-05-09 Thread Jason Hobbs
** Tags added: cpe-onsite -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1777512 Title: key retrieval timeouts cause failures To manage notifications about this bug go to: https://bugs.launchpad.net

[Bug 1777512] Re: rt #112309: keyserver.ubuntu.com increased failure rates over past few days

2019-05-09 Thread Jason Hobbs
Sub'd ~field-high. ** Also affects: software-properties (Ubuntu) Importance: Undecided Status: New ** Summary changed: - rt #112309: keyserver.ubuntu.com increased failure rates over past few days + key retrieval timeouts cause failures ** Description changed: - keyserver failures mo

[Bug 1796292] Re: Tight timeout for bcache removal causes spurious failures

2019-05-06 Thread Jason Hobbs
This occurrs on a target machine during maas install. Apport is not collected in this case. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.ne

[Bug 1594317] Re: Cannot start lxd-bridge.service when MAAS is managing DNS

2019-04-26 Thread Jason Hobbs
MAAS is installing bind9 and configuring it for its own purposes - it provides other config for bind, and it's perfectly reasonable to expect maas to configure it to only listen on interfaces MAAS wants to provide DNS services on. ** Changed in: maas Status: Invalid => New ** Tags added: c

[Bug 1812935] Re: oslo cache mempool issues with python3

2019-04-16 Thread Jason Hobbs
With the package 1.30.1-0ubuntu1.1 from rocky-proposed in place, I ran through the test case for this and did not hit any failures. I can verify this fixes the issue for rocky. ** Tags removed: verification-rocky-needed ** Tags added: verification-rocky-done -- You received this bug notification

[Bug 1812935] Re: oslo cache mempool issues with python3

2019-04-09 Thread Jason Hobbs
** Description changed: nova conductor running on a rhel8 host inside f28 based containers hits the following error: 2019-01-17 13:59:37.049 46 DEBUG oslo_concurrency.lockutils [req-284f3071-8eee-4dcb-903c-838f2e024b48 40ca1490773f49f791d3a834af3702c8 8671bdf05abf48f58a9bdcdb0ef4b740 - defa

[Bug 1812935] Re: oslo cache mempool issues with python3

2019-04-09 Thread Jason Hobbs
** Tags added: cdo-qa cdo-release-blocker foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1812935 Title: oslo cache mempool issues with python3 To manage notifications about this b

[Bug 1797581] Re: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic

2019-03-21 Thread Jason Hobbs
@Christian - release: bionic - seabios: 1.10.2-1ubuntu1 - qemu: 1:2.11+dfsg-1ubuntu7.10 - libvirt: 4.0.0-1ubuntu8.8 - ovmf - this is a uefi thing right? we're not using it. - kernel 2019-03-18T12:17:11+00:00 elastic-2 kernel: [0.00] Linux version 4.15.0-46-generic (buildd@lgw01-amd64-038)

[Bug 1797581] Re: Composing a VM in MAAS with exactly 2048 MB RAM causes the VM to kernel panic

2019-03-20 Thread Jason Hobbs
Bumped to field-high as we ran into this again in testing. We have a workaround, but it's to not use 2G VM's, which is really silly and hard to remember when we go and add new deployments, especially because the failure mode is not obvious at all. -- You received this bug notification because yo

[Bug 1792978] Re: initscript avahi-daemon, action "start" failed

2019-03-20 Thread Jason Hobbs
1) Testing 2) No, it's in baremetal 3) Attached. 4) The machine was deployed with maas - it's a bionic cloud image. We then install/configure maas, run some tests, and remove maas via purge, and repeat. This shows up sometimes on deployments after the initial one - avahi-daemon, or at least its

Re: [Bug 1820287] Re: kernel panic during pxe boot on DL360 gen9

2019-03-16 Thread Jason Hobbs
This happens only sporadically. If it happens, is there some keyboard sequence I can use to dump more information, or is the system totally frozen at this point? Jason On Sat, Mar 16, 2019 at 11:35 AM Kai-Heng Feng wrote: > Would it be possible to get earlier trace? > > -- > You received this b

[Bug 1820287] Re: kernel panic during pxe boot on DL360 gen9

2019-03-15 Thread Jason Hobbs
I can't get logs from the system because it's kernel panic'd. ** Changed in: linux (Ubuntu) Status: Incomplete => New ** Changed in: linux (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu

[Bug 1820287] [NEW] kernel panic during pxe boot on DL360 gen9

2019-03-15 Thread Jason Hobbs
Public bug reported: A machine in our test lab kernel panic'd during PXE boot from MAAS. It was running 4.15.0-46-generic #49-Ubuntu I've attached a screenshot of the call trace. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: cdo-qa foundations-engine **

[Bug 1594317] Re: Cannot start lxd-bridge.service when MAAS is managing DNS

2019-03-13 Thread Jason Hobbs
** Tags added: cdo-qa foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1594317 Title: Cannot start lxd-bridge.service when MAAS is managing DNS To manage notifications about this bu

[Bug 1792978] Re: initscript avahi-daemon, action "start" failed

2019-01-28 Thread Jason Hobbs
Subscribed to field-high as this is causing a lot of failures lately. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1792978 Title: initscript avahi-daemon, action "start" failed To manage notificat

[Bug 1792978] Re: initscript avahi-daemon, action "start" failed

2019-01-26 Thread Jason Hobbs
In the previous package removal, we are seeing: [10.244.40.30] out: Purging configuration files for avahi-daemon (0.7-3.1ubuntu1.1) ... [10.244.40.30] out: rmdir: failed to remove '/var/run/avahi-daemon': Directory not empty -- You received this bug notification because you are a member of Ubu

[Bug 1802355] Re: systemd-resolve is missing nameservers until interface is bounced

2018-11-08 Thread Jason Hobbs
I've configured our test runs to turn debug logging on for networkd and resolvd. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1802355 Title: systemd-resolve is missing nameservers until interface i

[Bug 1802355] [NEW] systemd-resolve is missing nameservers until interface is bounced

2018-11-08 Thread Jason Hobbs
Public bug reported: On some deployments of bionic, using maas and juju, my system ends up not being able to resolve hostnames. This happens inconsistently, it seems like a race. systemd-resolve is showing no nameservers https://pastebin.canonical.com/p/tyFw8TfSxk rharper had a look at one repro

[Bug 1774666] Re: Bond interfaces stuck at 1500 MTU on Bionic

2018-09-12 Thread Jason Hobbs
Marked as Fix Released on Bionic/Xenial because the SRU for bug 1777912 is done. I can't make Artful "Won't Fix", but it should be. ** Changed in: cloud-init (Ubuntu Xenial) Status: Confirmed => Fix Released ** Changed in: cloud-init (Ubuntu Bionic) Status: Confirmed => Fix Released

[Bug 1651497] Re: iscsid.service fails to start in container, results in failed dist-upgrade on

2018-08-29 Thread Jason Hobbs
** Summary changed: - iscsid.service fails to start in container, results in failed apt-get install later on + iscsid.service fails to start in container, results in failed dist-upgrade on ** Summary changed: - iscsid.service fails to start in container, results in failed dist-upgrade on + iscs

[Bug 1651497] Re: install Job for iscsid.service failed because a configured resource limit was exceeded

2018-08-29 Thread Jason Hobbs
We're actually seeing iscsi errors show up as soon as the container starts; I guess the iscsid service never starts properly and so when the charm does an apt-get install of some unrelated packages, something is trying to set up that iscsi service and it's failing again. I've attached logs from a

[Bug 1651497] Re: install Job for iscsid.service failed because a configured resource limit was exceeded

2018-08-27 Thread Jason Hobbs
We're still hitting this in LXD. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1651497 Title: install Job for iscsid.service failed because a configured resource limit was exceeded To manage noti

[Bug 1774666] Re: Bond interfaces stuck at 1500 MTU on Bionic

2018-06-05 Thread Jason Hobbs
Subscribed to Canonical Field High SLA. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1774666 Title: Bond interfaces stuck at 1500 MTU on Bionic To manage notifications about this bug go to: https:

[Bug 1774666] Re: Bond interfaces stuck at 1500 MTU on Bionic

2018-06-05 Thread Jason Hobbs
This is causing test failures for us, because containers deployed by juju that are bound to a space that sits on top of the bond have the corrent mtu (9000) but the bond's mtu is stuck at (1500), so packets are being dropped. curtin config for the machine: http://paste.ubuntu.com/p/8tMR2YBGYm/ cl

[Bug 1774666] Re: Bond interfaces stuck at 1500 MTU on Bionic

2018-06-05 Thread Jason Hobbs
We are seeing this in our test runs as well. ** Tags added: cdo-qa foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1774666 Title: Bond interfaces stuck at 1500 MTU on Bionic To ma

[Bug 1772947] Re: You have enabled the binary log, but you haven't provided the mandatory server-id.

2018-06-01 Thread Jason Hobbs
After testing with the new version of the charm, we're not seeing this anymore. Looks fixed to me. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1772947 Title: You have enabled the binary log, but y

[Bug 1607345] Re: Collect all logs needed to debug curtin/cloud-init for each deployment

2018-05-30 Thread Jason Hobbs
** Tags added: cdo-qa foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1607345 Title: Collect all logs needed to debug curtin/cloud-init for each deployment To manage notifications

[Bug 1772490] Re: 'Deploying' timed out after 40 minutes / Failedbcache: register_bcache() error

2018-05-23 Thread Jason Hobbs
*** This bug is a duplicate of bug 1768893 *** https://bugs.launchpad.net/bugs/1768893 ** This bug has been marked a duplicate of bug 1768893 installation on several nodes failed with errors relating to dmsetup remove of ceph devices. -- You received this bug notification because you are

Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
gt; Looks like the ls -aLR contains more data; we can compare bionic. > > On Tue, May 22, 2018 at 6:53 PM, Jason Hobbs > wrote: >> cd /sys/bus/pci/devices && grep -nr . * >> >> xenial: >> http://paste.ubuntu.com/p/F5qyvN2Qrr/ >> >> On Tue, May

Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
cd /sys/bus/pci/devices && grep -nr . * xenial: http://paste.ubuntu.com/p/F5qyvN2Qrr/ On Tue, May 22, 2018 at 5:27 PM, Jason Hobbs wrote: > Do you really want a tar? How about ls -alR? xenial: > > http://paste.ubuntu.com/p/wyQ3kTsyBB/ > > On Tue, May 22, 2018 at 5:14 PM

Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
Do you really want a tar? How about ls -alR? xenial: http://paste.ubuntu.com/p/wyQ3kTsyBB/ On Tue, May 22, 2018 at 5:14 PM, Jason Hobbs wrote: > ok; looks like that 4.15.0-22-generic just released and wasn't what I > used in the first reproduction... I doubt that's it. > >

Re: [Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
0_" name="/bin/" > pid=91949 comm="(y-helper)" flags="ro, remount, bind" > > > xenial.log:May 22 15:15:10 aurorus kernel: [ 918.311740] audit: > type=1400 audit(1527002110.131:109): apparmor="DENIED" > operation="file_mmap"

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
marked new on nova-compute-charm due to rharper's comment #18, and new on libvirt because I've posted all the requested logs now. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1771662 Title: libvirt

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-22 Thread Jason Hobbs
@rharper, here are the logs you requested from the xenial deploy. ** Attachment added: "xenial-logs-1771662.tgz" https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+attachment/5142976/+files/xenial-logs-1771662.tgz ** Changed in: charm-nova-compute Status: Invalid => New ** Ch

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-18 Thread Jason Hobbs
Christian, thanks for digging in. Yes, I really just setup base openstack and hit this condition. I'm not doing anything to setup devices as passthrough or anything along those lines, and I'm not trying to start instances. -- You received this bug notification because you are a member of Ubuntu B

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
all of /var/log and /etc from the bionic deploy. ** Attachment added: "bionic-var-log-and-etc.tgz" https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+attachment/5141000/+files/bionic-var-log-and-etc.tgz -- You received this bug notification because you are a member of Ubuntu Bugs, w

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
@rharper here are the logs you asked for from the bionic deploy ** Attachment added: "bionic-logs.tgz" https://bugs.launchpad.net/charm-nova-compute/+bug/1771662/+attachment/5140998/+files/bionic-logs.tgz -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
@rharper still working on getting the other stuff you've asked for, but here is the uname -a output from xenial vs bionic: http://paste.ubuntu.com/p/rJDpK5SyW9/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.ne

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
steve captured what I meant in #8 better than I did: 17:46 < slangasek> one could as accurately say "I'm suspicious this is related to us replacing the whole networking stack in Ubuntu" ;-) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubunt

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
This looks like it is specific to this hardware and the way it does VFs and PFs, so I'm removing field-high. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1771662 Title: libvirtError: Node device no

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
given it works with the same libvirt and kernel on 16.04 but not 18.04, I'm suspicious of netplan here. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1771662 Title: libvirtError: Node device not fou

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
The deploy works fine with juju 2.4 beta 2 and xenial/queens. package versions: http://paste.ubuntu.com/p/PF7Jb7gxnX/ we do see this in nova-compute.log, but it's not fatal: http://paste.ubuntu.com/p/Dh4ZGVTtH8/ -- You received this bug notification because you are a member of Ubuntu Bugs, whic

[Bug 1771662] Re: libvirtError: Node device not found: no node device with matching name

2018-05-17 Thread Jason Hobbs
** Description changed: After deploying openstack on arm64 using bionic and queens, no hypervisors show upon. On my compute nodes, I have an error like: 2018-05-16 19:23:08.165 282170 ERROR nova.compute.manager libvirtError: Node device not found: no node device with matching name 'ne

[Bug 1579652] Re: snap ignores the proxy environment variables

2018-05-14 Thread Jason Hobbs
I got this to work: http://paste.ubuntu.com/p/K8VncJv4vp/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1579652 Title: snap ignores the proxy environment variables To manage notifications about thi

[Bug 1766338] Re: package shim-signed 1.34.4+13-0ubuntu2 failed to install/upgrade: installed shim-signed package post-installation script subprocess returned error exit status 1

2018-04-24 Thread Jason Hobbs
This started to affect our solutions-qa test runs yesterday/last night at some point - we can no longer deploy bionic on uefi systems: We're not using dkms packages at all; here is the failed install log from maas: http://paste.ubuntu.com/p/NVdV4tZvJW/ ** Changed in: shim-signed (Ubuntu)

[Bug 1759445] Re: kernel panic when trying to reboot in bionic

2018-04-03 Thread Jason Hobbs
After updating firmware on the servers, we can't reproduce it at all anymore. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic To manage not

[Bug 1759445] Re: kernel panic when trying to reboot in bionic

2018-04-02 Thread Jason Hobbs
So far we've only been able to produce this by doing bionic deploys. One thing that stands out in the rsyslog for bionic deploys is this failure: http://paste.ubuntu.com/p/y8xXc7PYjp/ Apr 2 17:48:35 leafeon blkdeactivate[1782]: /sbin/blkdeactivate: line 345: /bin/sort: No such file or directory

[Bug 1759445] Re: kernel panic when trying to reboot in bionic

2018-03-30 Thread Jason Hobbs
We reproduced it again... looking to try the testing now. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic To manage notifications about thi

[Bug 1759445] Re: kernel panic when trying to reboot in bionic

2018-03-28 Thread Jason Hobbs
We can no longer reproduce this. ** Changed in: linux (Ubuntu Bionic) Status: Triaged => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to r

[Bug 1759445] Re: kernel panic when trying to reboot in bionic

2018-03-28 Thread Jason Hobbs
** Tags added: foundations-engine -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1759445 Title: kernel panic when trying to reboot in bionic To manage notifications about this bug go to: https://bug

[Bug 1759445] Re: Bionic due to kernel panic

2018-03-28 Thread Jason Hobbs
This bug is a kernel panic when rebooting at the end of a MAAS deployment of bionic; there is no way to run apport-collect. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Summary changed: - Bionic due to kernel panic + kernel panic when trying to reboot in bionic -- Yo

Re: [Bug 1750884] Re: [2.4, bionic] /etc/resolv.conf not configured correctly in Bionic, leads to no DNS resolution

2018-03-08 Thread Jason Hobbs
Ok, that's not much of a workaround then :). On Thu, Mar 8, 2018 at 3:52 AM, Dan Watkins wrote: > On Wed, Mar 07, 2018 at 11:42:29PM -0000, Jason Hobbs wrote: >> Is there a workaround for this? I can just rm /etc/resolv.conf and >> create it with the contents I want, ri

[Bug 1750884] Re: [2.4, bionic] /etc/resolv.conf not configured correctly in Bionic, leads to no DNS resolution

2018-03-07 Thread Jason Hobbs
Is there a workaround for this? I can just rm /etc/resolv.conf and create it with the contents I want, right? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1750884 Title: [2.4, bionic] /etc/resolv.c

[Bug 1747927] Re: when net booting servers with MAAS, grub should never wait for user input to reboot

2018-02-07 Thread Jason Hobbs
Steve, Sorry, I missed comment #65 on the original bug. I think this is a bit different question than what to do when it can't find the file. In that bug, grub has been instructed to fall back to grub.cfg-default-amd64 if it can't find the file. In some cases it does, but in others it displays t

[Bug 1743249] Re: High IO on the system prevents MAAS to provide PXE config in a timely manner (less than 30 secs)

2018-02-07 Thread Jason Hobbs
Ok - I filed a few more bugs to cover the aspects of this failure other than the slow response for grub.cfg. bug 1747927 - grub should not hang waiting for user input when booting from MAAS. bug 1747928 - When a known server in Deploying state boots to the enlisting environment, it should not jus

[Bug 1747927] Re: when net booting servers with MAAS, grub should never wait for user input to reboot

2018-02-07 Thread Jason Hobbs
** Summary changed: - when net booting servers, grub should never wait for user input to reboot + when net booting servers with MAAS, grub should never wait for user input to reboot -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. htt

[Bug 1747927] [NEW] when net booting servers, grub should never wait for user input to reboot

2018-02-07 Thread Jason Hobbs
Public bug reported: In bug 1743249, grub sometimes would hang waiting for user input on the keyboard. This is never an appropriate action for a net booting server, at least not one booting under MAAS direction. It may be appropriate to pause for 30 seconds or something so someone can see the err

[Bug 1743249] Re: High IO on the system prevents MAAS to provide PXE config in a timely manner (less than 30 secs)

2018-02-07 Thread Jason Hobbs
Andres, there is more to this bug than just the slow grub response. MAAS could do more after the fact to recover in this situation. For example, once a machine boots to the ephemeral environment, it has plenty of time to talk to MAAS and find out it's not actually supposed to be enlisting, and to

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
where it's getting hung up. Jason On Tue, Feb 6, 2018 at 5:09 PM, Jason Hobbs wrote: > dm-delay looks very interesting along those lines. > > https://www.enodev.fr/posts/emulate-a-slow-block-device-with-dm-delay.html > > https://www.kernel.org/doc/Documentation/device-mapper/del

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
dm-delay looks very interesting along those lines. https://www.enodev.fr/posts/emulate-a-slow-block-device-with-dm- delay.html https://www.kernel.org/doc/Documentation/device-mapper/delay.txt On Tue, Feb 6, 2018 at 5:06 PM, Jason Hobbs wrote: > On Tue, Feb 6, 2018 at 4:50 PM, Andres Rodrig

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
n Tue, Feb 6, 2018 at 5:17 PM, Jason Hobbs > wrote: > >> Andres, it was a single test in both cases, and in both cases there was >> almost no delay from MAAS. It's not significant enough to call it >> positive results. >> >> > Comment #93 shows there are /

[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
Andres, it was a single test in both cases, and in both cases there was almost no delay from MAAS. It's not significant enough to call it positive results. Since neither of you answered yes, I'll assume the answer was no to my question of whether there was anything in my logs or data that showed

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
Blake, that's great. Do you have before and after numbers showing the improvement this change made? Do you have any data or logs that led you to believe this was the culprit in the slow responses I saw on my cluster? On Tue, Feb 6, 2018 at 3:12 PM, Blake Rouse wrote: > Actually caching does mak

[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
The patch from #84 is adding a cache for reading the template file on the rack controller. I don't understand why this change is being made. This file will almost certainly be in the page cache anyhow as these systems have a lot of free ram. Usually it's best to just let the page cache do its th

[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
Anyhow, I tested with the patch from #84 as requested, here are the results: http://paste.ubuntu.com/26531873/ We're still seeing some retries with it, same as before. But, I think the test is of limited value. It didn't make things worse but we don't have any evidence from the test that it made

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
*dhcp* changes, so this is not at > all related to the RPC boot requests for pxe. > > On Tue, Feb 6, 2018 at 11:43 AM, Jason Hobbs > wrote: > >> Can you please comment on the deadlock detected error from the db log in >> posted in #36 >> >> http://paste.ub

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
On Tue, Feb 6, 2018 at 10:40 AM, Andres Rodriguez wrote: > On Tue, Feb 6, 2018 at 11:24 AM, Jason Hobbs > wrote: > >> On Mon, Feb 5, 2018 at 4:07 PM, Andres Rodriguez >> wrote: >> > I think there's a misunderstanding on how the network boot process >> ha

[Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
Can you please comment on the deadlock detected error from the db log in posted in #36 http://paste.ubuntu.com/26530761/ That is not expected behavior is it? Also the fact that MAAS thinks its losing rack/region connections seems like it could be related to this behavior. -- You received this

Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

2018-02-06 Thread Jason Hobbs
Andres, I ran the test with VMs limited to 9 of 20 cores (cut the core limit in half for VMs). The first time range from this dump is with the cores at their normal limit (18). As you can see, the behavior didn't change much from one set to the other. Both sets had instances where grub started

  1   2   3   >