I wonder if something is wrong with the NFS mount.  I see this error
periodically in /var/log/messages even though I have set the Domain in
/etc/idmapd.conf to the host's FQDN:

May 20 19:30:22 virthost1 rpc.idmapd[1790]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 20 19:36:20 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 20 19:44:35 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 21 10:21:25 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 21 12:46:40 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 21 13:52:42 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 21 13:55:20 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '107' does
not map into domain 'redacted.com'
May 21 20:31:51 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '107' does
not map into domain 'redacted.com'
May 22 10:14:18 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 22 10:18:40 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '107' does
not map into domain 'redacted.com'
May 22 10:19:23 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '0' does not
map into domain 'redacted.com'
May 22 10:25:16 virthost1 rpc.idmapd[1731]: nss_getpwnam: name '107' does
not map into domain 'redacted.com'

name '107' just started appearing in the log yesterday, which looks
unusual.  Up until then, the error was always name '0'.


On Thu, May 22, 2014 at 11:15 AM, Andrija Panic <andrija.pa...@gmail.com>wrote:

> I have observed this kind of problems ("process blocked for more than xx
> sec...") when I had access with storage - check your disks,  smartctl
> etc...
> best
>
> Sent from Google Nexus 4
> On May 22, 2014 7:49 PM, "Ian Young" <iyo...@ratespecial.com> wrote:
>
> > And this is in /var/log/messages right before that event:
> >
> > May 22 10:16:07 virthost1 kernel: INFO: task qemu-kvm:2971 blocked for
> more
> > than 120 seconds.
> > May 22 10:16:07 virthost1 kernel:      Not tainted
> > 2.6.32-431.11.2.el6.x86_64 #1
> > May 22 10:16:07 virthost1 kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > May 22 10:16:07 virthost1 kernel: qemu-kvm      D 0000000000000002     0
> >  2971      1 0x00000080
> > May 22 10:16:07 virthost1 kernel: ffff8810724e9be8 0000000000000082
> > 0000000000000000 ffff88106b6529d8
> > May 22 10:16:07 virthost1 kernel: ffff880871b3e8d8 ffff880871b3e8f0
> > ffffffff8100bb8e ffff8810724e9be8
> > May 22 10:16:07 virthost1 kernel: ffff881073525058 ffff8810724e9fd8
> > 000000000000fbc8 ffff881073525058
> > May 22 10:16:07 virthost1 kernel: Call Trace:
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8100bb8e>] ?
> > apic_timer_interrupt+0xe/0x20
> > May 22 10:16:07 virthost1 kernel: [<ffffffff810555ef>] ?
> > mutex_spin_on_owner+0x9f/0xc0
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8152969e>]
> > __mutex_lock_slowpath+0x13e/0x180
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8152953b>]
> mutex_lock+0x2b/0x50
> > May 22 10:16:07 virthost1 kernel: [<ffffffffa021c2cf>]
> > memory_access_ok+0x7f/0xc0 [vhost_net]
> > May 22 10:16:07 virthost1 kernel: [<ffffffffa021d89c>]
> > vhost_dev_ioctl+0x2ec/0xa50 [vhost_net]
> > May 22 10:16:07 virthost1 kernel: [<ffffffffa021c411>] ?
> > vhost_work_flush+0xe1/0x120 [vhost_net]
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8122db91>] ?
> > avc_has_perm+0x71/0x90
> > May 22 10:16:07 virthost1 kernel: [<ffffffffa021f11a>]
> > vhost_net_ioctl+0x7a/0x5d0 [vhost_net]
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8122f914>] ?
> > inode_has_perm+0x54/0xa0
> > May 22 10:16:07 virthost1 kernel: [<ffffffffa01a28b7>] ?
> > kvm_vcpu_ioctl+0x1e7/0x580 [kvm]
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8108b14e>] ?
> > send_signal+0x3e/0x90
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8119dc12>]
> vfs_ioctl+0x22/0xa0
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8119ddb4>]
> > do_vfs_ioctl+0x84/0x580
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8119e331>]
> sys_ioctl+0x81/0xa0
> > May 22 10:16:07 virthost1 kernel: [<ffffffff810e1e4e>] ?
> > __audit_syscall_exit+0x25e/0x290
> > May 22 10:16:07 virthost1 kernel: [<ffffffff8100b072>]
> > system_call_fastpath+0x16/0x1b
> >
> >
> > On Thu, May 22, 2014 at 10:39 AM, Ian Young <iyo...@ratespecial.com>
> > wrote:
> >
> > > The console proxy became unavailable again yesterday afternoon.  I
> could
> > > SSH into it via its link local address and nothing seemed to be wrong
> > > inside the VM itself.  However, the qemu-kvm process for that VM was at
> > > almost 100% CPU.  Inside the VM, the CPU usage was minimal and the java
> > > process was running and listening on port 443.  So there seems to be
> > > something wrong with it down at the KVM/QEMU level.  It's weird how
> this
> > > keeps happening to the console proxy only and not any of the other VMs.
> >  I
> > > tried to reboot it from the management UI and after about 15 minutes,
> it
> > > finally did.  Now the console proxy is working but I don't know how
> long
> > it
> > > will last before it breaks again.  I found this in libvirtd.log, which
> > > corresponds with the time the console proxy rebooted:
> > >
> > > 2014-05-22 17:17:04.362+0000: 25195: info : libvirt version: 0.10.2,
> > > package: 29.el6_5.7 (CentOS BuildSystem <http://bugs.centos.org>,
> > > 2014-04-07-07:42:04, c6b9.bsys.dev.centos.org)
> > > 2014-05-22 17:17:04.362+0000: 25195: error : qemuMonitorIO:614 :
> internal
> > > error End of file from monitor
> > >
> > >
> > > On Wed, May 21, 2014 at 2:07 PM, Ian Young <iyo...@ratespecial.com>
> > wrote:
> > >
> > >> I built and installed a libvirt 1.04 package from the Fedora src rpm.
> >  It
> > >> installed fine inside a test VM but installing it on the real
> hypervisor
> > >> was a bad idea and I doubt I'll be pursuing it further.  All VMs
> > promptly
> > >> stopped and this appeared in libvirtd.log:
> > >>
> > >> 2014-05-21 20:36:19.260+0000: 23567: info : libvirt version: 1.0.4,
> > >> package: 1.el6 (Unknown, 2014-05-21-11:36:09, redacted.com)
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_network.so
> > not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_storage.so
> > not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so
> > not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_secret.so
> not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so
> > not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module
> /usr/lib64/libvirt/connection-driver/libvirt_driver_interface.so
> > not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so not
> > >> accessible
> > >> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72
> :
> > >> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not
> > >> accessible
> > >> 2014-05-21 20:36:49.471+0000: 23570: error : do_open:1220 : no
> > connection
> > >> driver available for qemu:///system
> > >> 2014-05-21 20:36:49.472+0000: 23567: error :
> virNetSocketReadWire:1370 :
> > >> End of file while reading data: Input/output error
> > >> 2014-05-21 20:36:49.473+0000: 23571: error : do_open:1220 : no
> > connection
> > >> driver available for lxc:///
> > >> 2014-05-21 20:36:49.474+0000: 23567: error :
> virNetSocketReadWire:1370 :
> > >> End of file while reading data: Input/output error
> > >> 2014-05-21 20:36:49.475+0000: 23568: error : do_open:1220 : no
> > connection
> > >> driver available for qemu:///system
> > >> 2014-05-21 20:36:49.476+0000: 23567: error :
> virNetSocketReadWire:1370 :
> > >> End of file while reading data: Input/output error
> > >> 2014-05-21 20:36:49.678+0000: 23575: error : do_open:1220 : no
> > connection
> > >> driver available for qemu:///system
> > >> 2014-05-21 20:36:49.678+0000: 23567: error :
> virNetSocketReadWire:1370 :
> > >> End of file while reading data: Input/output error
> > >> 2014-05-21 20:36:49.681+0000: 23572: error : do_open:1220 : no
> > connection
> > >> driver available for qemu:///system
> > >> 2014-05-21 20:36:49.682+0000: 23567: error :
> virNetSocketReadWire:1370 :
> > >> End of file while reading data: Input/output error
> > >>
> > >>
> > >> On Wed, May 21, 2014 at 10:45 AM, Ian Young <iyo...@ratespecial.com
> > >wrote:
> > >>
> > >>> I was able to get it working by following these steps:
> > >>>
> > >>> 1. stop all instances
> > >>> 2. service cloudstack-management stop
> > >>> 3. service cloudstack-agent stop
> > >>> 4. virsh shutdown {domain} (for each of the system VMs)
> > >>> 5. service libvirtd stop
> > >>> 6. umount primary and secondary
> > >>> 7. reboot
> > >>>
> > >>> The console proxy is working again.  I expect it will probably break
> > >>> again in a day or two.  I have a feeling it's a result of this
> libvirtd
> > >>> bug, since I've seen the "cannot acquire state change lock" several
> > times.
> > >>>
> > >>> https://bugs.launchpad.net/nova/+bug/1254872
> > >>>
> > >>> I might try building my own libvirtd 1.0.3 for EL6.
> > >>>
> > >>>
> > >>> On Tue, May 20, 2014 at 6:21 PM, Ian Young <iyo...@ratespecial.com
> > >wrote:
> > >>>
> > >>>> So I got the console proxy working via HTTPS (by managing my own "
> > >>>> realhostip.com" DNS) last week and everything was working fine.
> > >>>>  Today, all of a sudden, the console proxy stopped working again.
>  The
> > >>>> browser says, "Connecting to 192-168-100-159.realhostip.com..." and
> > >>>> eventually times out.  I tried to restart it and it went into a
> > "Stopping"
> > >>>> state that never completed and the Agent State was "Disconnected."
>  I
> > could
> > >>>> not shut down the VM using virsh or with "kill -9" because libvirtd
> > kept
> > >>>> saying, "cannot acquire state change lock," so I gracefully shut
> down
> > the
> > >>>> remaining instances and rebooted the entire management
> > server/hypervisor.
> > >>>>  Start over.
> > >>>>
> > >>>> When it came back up, the SSVM and console proxy started but the
> > >>>> virtual router was stopped.  I was able to manually start it from
> the
> > UI.
> > >>>>  The console proxy still times out when I try to access it from a
> > browser.
> > >>>>  I don't see any errors in the management or agent logs, just this:
> > >>>>
> > >>>> 2014-05-20 18:04:27,632 DEBUG [c.c.a.t.Request]
> > (catalina-exec-10:null)
> > >>>> Seq 1-2130378876: Sending  { Cmd , MgmtId: 55157049428734, via: 1(
> > >>>> virthost1.redacted.com), Ver: v1, Flags: 100011,
> > >>>>
> >
> [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"r-4-VM","wait":0}}]
> > >>>> }
> > >>>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request]
> > >>>> (AgentManager-Handler-3:null) Seq 1-2130378876: Processing:  { Ans:
> ,
> > >>>> MgmtId: 55157049428734, via: 1, Ver: v1, Flags: 10,
> > >>>>
> >
> [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"192.168.100.6","port":5902,"result":true,"wait":0}}]
> > >>>> }
> > >>>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request]
> > (catalina-exec-10:null)
> > >>>> Seq 1-2130378876: Received:  { Ans: , MgmtId: 55157049428734, via:
> 1,
> > Ver:
> > >>>> v1, Flags: 10, { GetVncPortAnswer } }
> > >>>> 2014-05-20 18:04:27,684 DEBUG [c.c.s.ConsoleProxyServlet]
> > >>>> (catalina-exec-10:null) Port info 192.168.100.6
> > >>>> 2014-05-20 18:04:27,684 INFO  [c.c.s.ConsoleProxyServlet]
> > >>>> (catalina-exec-10:null) Parse host info returned from executing
> > >>>> GetVNCPortCommand. host info: 192.168.100.6
> > >>>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
> > >>>> (catalina-exec-10:null) Compose console url:
> > >>>>
> >
> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
> > >>>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
> > >>>> (catalina-exec-10:null) the console url is ::
> > >>>> <html><title>r-4-VM</title><frameset><frame src="
> > >>>>
> >
> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
> > >>>> "></frame></frameset></html>
> > >>>> 2014-05-20 18:04:29,216 DEBUG [c.c.a.m.AgentManagerImpl]
> > >>>> (AgentManager-Handler-4:null) SeqA 2-545: Processing Seq 2-545:  {
> > Cmd ,
> > >>>> MgmtId: -1, via: 2, Ver: v1, Flags: 11,
> > >>>>
> >
> [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":2,"_loadInfo":"{\n
> > >>>>  \"connections\": []\n}","wait":0}}] }
> > >>>>
> > >>>> If I try to restart the system VMs with cloudstack-sysvmadm, it
> says:
> > >>>>
> > >>>> Stopping and starting 1 secondary storage vm(s)...
> > >>>> curl: (7) couldn't connect to host
> > >>>> ERROR: Failed to stop secondary storage vm with id 1
> > >>>>
> > >>>> Done stopping and starting secondary storage vm(s)
> > >>>>
> > >>>> Stopping and starting 1 console proxy vm(s)...
> > >>>> curl: (7) couldn't connect to host
> > >>>> ERROR: Failed to stop console proxy vm with id 2
> > >>>>
> > >>>> Done stopping and starting console proxy vm(s) .
> > >>>>
> > >>>> Stopping and starting 1 running routing vm(s)...
> > >>>> curl: (7) couldn't connect to host
> > >>>> 2
> > >>>> Done restarting router(s).
> > >>>>
> > >>>> I notice there are now four entries for the same management server
> in
> > >>>> the mshost table, and they all are in an "Up" state and the
> "removed"
> > field
> > >>>> is NULL.  What's wrong with this system?
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Reply via email to