Launchpad has imported 25 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=757382.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2011-11-26T19:00:51+00:00 Steven wrote:

Description of problem:
Not exactly sure how to reproduce the problem.  I am doing quite a bit of heavy 
development that uses libvirt.  The libvirtd.log file prints out:

11:25:21.784: 24375: error : virNetServerDispatchNewClient:220 : Too many 
active clients (20), dropping connection from 127.0.0.1;0
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:23.935: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.648: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.655: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:25:41.656: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error
11:33:39.736: 24375: error : virNetSocketReadWire:912 : End of file while 
reading data: Input/output error

A second process is spawned by libvirt as well:
root     22049 24375  0 11:49 ?        00:00:00 libvirtd --daemon
root     22353 22671  0 11:58 pts/4    00:00:00 grep --color=auto libvirt
root     24375     1  2 11:00 ?        00:01:40 libvirtd --daemon

kill -9 of the non-init parented libvirt unsticks the system ie:
[root@beast libvirt]# kill -9 22049

Then things start working again.


Version-Release number of selected component (if applicable):
[root@beast libvirt]# rpm -qa | grep libvirt
libvirt-0.9.6-2.fc16.x86_64
libvirt-python-0.9.6-2.fc16.x86_64
libvirt-client-0.9.6-2.fc16.x86_64


How reproducible:
hard to reproduce

Steps to Reproduce:
1. execute vm starts/stops 
2.
3.
  
Actual results:
libvirt launches a second process and wedges

Expected results:
libvirt should not wedge

Additional info:

killing the second process unwedges libvirt

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/0

------------------------------------------------------------------------
On 2011-11-28T23:08:25+00:00 Steven wrote:

Not sure if too many active clients error from log is related to the
second process being spawned.  They may be separate events.  I have
changed my program behavior not to use so many sockets, but occasionally
the software still wedges.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/1

------------------------------------------------------------------------
On 2011-11-28T23:10:51+00:00 Steven wrote:

Notice the CPU time on the forked process is quite high.  This ps was
taken ~1-2 minutes after the system wedged.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/2

------------------------------------------------------------------------
On 2011-11-28T23:13:20+00:00 Steven wrote:

ignore comment #2, the 2nd process in the list wasn't the forked child.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/3

------------------------------------------------------------------------
On 2011-11-29T01:54:33+00:00 Laine wrote:

When this happens could you try attaching gdb to the 2nd libvirtd
process and executing the command "thread apply all bt"? That will give
us more of a clue what is (isn't) happening.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/4

------------------------------------------------------------------------
On 2011-11-29T02:08:36+00:00 Steven wrote:

Laine,

Next time it happens, I'll do that for you.  I noticed there often
appears a dmidecode error when this occurs which is documented in Bug
#754909.  I am going to try running with dmidecode installed to see if
that eliminates the problem.  Fedora 16 does not by default install
dmidecode.

Regards
-steve

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/5

------------------------------------------------------------------------
On 2011-11-29T02:28:31+00:00 Steven wrote:

__lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
96      2:      movl    %edx, %eax
(gdb) thread apply all bt
Thread 1 (Thread 0x7f8b9e029700 (LWP 5904)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:96
#1  0x00000031246ad2d4 in _L_lock_2197 () from /lib64/libc.so.6
#2  0x00000031246ad0e7 in __tz_convert (timer=0x31249b0ee8, use_localtime=1, 
    tp=0x7f8b9e027910) at tzset.c:619
#3  0x0000003e90c5552f in virLogMessage (
    category=0x3e90de9fea "file.util/command.c", priority=1, 
    funcname=0x3e90deac33 "virCommandHook", linenr=1962, flags=0, 
    fmt=<optimized out>) at util/logging.c:734
#4  0x0000003e90c45cce in virCommandHook (data=0x7f8b880c3280)
    at util/command.c:1962
#5  0x0000003e90c466cc in virExecWithHook (argv=0x7f8b88071920, 
    envp=0x7f8b880c08a0, keepfd=<optimized out>, retpid=<optimized out>, 
    infd=27, outfd=0x7f8b880c33e8, errfd=0x7f8b880c33ec, flags=4, 
    data=0x7f8b880c3280, pidfile=0x0, hook=0x3e90c45a10 <virCommandHook>)
    at util/command.c:527
#6  0x0000003e90c482cc in virCommandRunAsync (cmd=0x7f8b880c3280, pid=0x0)
    at util/command.c:2046
#7  0x0000003e90c48acc in virCommandRun (cmd=0x7f8b880c3280, exitstatus=0x0)
    at util/command.c:1841
#8  0x0000000000465d92 in qemuCapsExtractVersionInfo (

    qemu=0x7f8b880c0e90 "/usr/bin/qemu", arch=0x4f9db6 "i686", retversion=0x0, 
    retflags=0x7f8b9e028448) at qemu/qemu_capabilities.c:1297
#9  0x00000000004665ed in qemuCapsInitGuest (caps=0x7f8b880c1b10, 
    old_caps=0x7f8b8c000a70, hostmachine=<optimized out>, info=0x733440, hvm=1)
    at qemu/qemu_capabilities.c:592
#10 0x0000000000466e09 in qemuCapsInit (old_caps=0x7f8b8c000a70)
    at qemu/qemu_capabilities.c:846
#11 0x000000000044c8a0 in qemuCreateCapabilities (oldcaps=<optimized out>, 
    driver=0x7f8b940146b0) at qemu/qemu_driver.c:242
#12 0x000000000045fb86 in qemudGetCapabilities (conn=<optimized out>)
    at qemu/qemu_driver.c:1004
#13 0x0000003e90cc835e in virConnectGetCapabilities (conn=0x7f8b84000b60)
    at libvirt.c:5877
#14 0x0000000000434ea3 in remoteDispatchGetCapabilities (ret=0x7f8b880adbf0, 
    rerr=0x7f8b9e028c70, hdr=<optimized out>, client=0x1834970, 
    server=<optimized out>) at remote_dispatch.h:5555
#15 remoteDispatchGetCapabilitiesHelper (server=<optimized out>, 
    client=0x1834970, hdr=<optimized out>, rerr=0x7f8b9e028c70, 
    args=<optimized out>, ret=0x7f8b880adbf0) at remote_dispatch.h:5536
#16 0x000000000043f42e in virNetServerProgramDispatchCall (msg=0x1877430, 
    client=0x1834970, server=0x182c410, prog=0x182c3e0)
    at rpc/virnetserverprogram.c:401
#17 virNetServerProgramDispatch (prog=0x182c3e0, server=0x182c410, 

#18 0x000000000044169c in virNetServerHandleJob (jobOpaque=<optimized out>, 
    opaque=0x182c410) at rpc/virnetserver.c:136
#19 0x0000003e90c65eae in virThreadPoolWorker (opaque=<optimized out>)
    at util/threadpool.c:144
#20 0x0000003e90c65962 in virThreadHelper (data=<optimized out>)
    at util/threads-pthread.c:157
#21 0x0000003124a07d90 in start_thread (arg=0x7f8b9e029700)
    at pthread_create.c:309
#22 0x00000031246eed8d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/6

------------------------------------------------------------------------
On 2011-11-29T02:33:06+00:00 Steven wrote:

FYI, been down this path with the glibc maintainers.  Their claim is
that localtime_r while reentrant cannot be used in threaded programs,
because it is not thread safe according to POSIX.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/7

------------------------------------------------------------------------
On 2011-11-29T02:35:22+00:00 Steven wrote:

See:
http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004710.html

Which explains the problem in more detail.  Perhaps my memory is fuzzy,
it was getenv/setenv that may not be used in multithreaded programs at
all (you are probably getenv in one program thread, forking, then
localtime_r in another), and hence their use (which I see libvirt uses
alot of) can and will result in deadlocks.

Regards
-steve

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/8

------------------------------------------------------------------------
On 2011-11-29T04:42:24+00:00 Steven wrote:

See Bug #544022 which helped work around part of this problem.  I'm not
sure if it entirely resolved it - eventually we just reworked our code
to remove threads entirely.

Regards
-steve

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/9

------------------------------------------------------------------------
On 2011-11-29T08:41:35+00:00 Daniel wrote:

> 11:25:21.784: 24375: error : virNetServerDispatchNewClient:220 : Too many
> active clients (20), dropping connection from 127.0.0.1;0

This is the interesting error message. When this occurs, libvirtd will
immediately close the client socket.

> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error
> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error
> 11:25:23.926: 24375: error : virNetSocketReadWire:912 : End of file while
> reading data: Input/output error

This suggests, after closing the client socket, we have left a I/O watch
registered against the socket FD, and are now spinning 100% in poll().

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/10

------------------------------------------------------------------------
On 2011-11-29T08:42:20+00:00 Daniel wrote:

To avoid hitting this problem in the first place you can simply update
max_clients in /etc/libvirt/libvirtd.conf.

Of course we still want to fix the resulting bug.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/11

------------------------------------------------------------------------
On 2011-11-29T09:12:26+00:00 Daniel wrote:

> FYI, been down this path with the glibc maintainers.  Their claim is that
> localtime_r while reentrant cannot be used in threaded programs, because it is
> not thread safe according to POSIX.

IIUC the issue here is not one of "thread safety", but rather whether
the function is "async signal safe".

If you have a multi-threaded application, and you fork a child process,
then you are only allowed to use async-signal safe functions, until you
execve(). The reason is as you describe

 1. Thread A acquires lock L
 2. Thread B forks process process C
 3. Child process C attempts to acquire lock L
 4. Thread A releases lock L

....child process C can never see results of step 4, since it is no
longer in the same memory address space as Thread A.

Unfortunately libvirt is not complying for the async signal safety rules
here.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/12

------------------------------------------------------------------------
On 2011-11-29T12:40:01+00:00 Daniel wrote:

This series upstream improves libvirt's async signal safety by avoiding
localtime_r() completely

https://www.redhat.com/archives/libvir-list/2011-November/msg01609.html

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/13

------------------------------------------------------------------------
On 2011-11-29T14:02:54+00:00 Steven wrote:

Re Comment #11, I am no longer hittiing the limit on FDs after changing
my program.  It is some other log message.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/14

------------------------------------------------------------------------
On 2011-11-29T14:11:11+00:00 Steven wrote:

Daniel,

Thanks for a pointer to the upstream fixes.  I'll apply them on the
fedora 16 rpm, build a scratch rpm, and see if that fixes the problem.

As I recall, what happened in our case was both the localtime_r problem
you describe as well as the following:

getenv/setenv take a lock in the timezone code (I am not sure why it does this, 
read the glibc code for more details).
another thread forks (with the timezeone thread locked)
getenv/setenv do not have at_fork handlers for getenv/setenv leaving the lock 
locked in the child process.
getlocaltime_r takes lock and deadlocks (because it uses the same tzget lock)

Regards
-steve

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/15

------------------------------------------------------------------------
On 2011-11-30T03:41:11+00:00 Steven wrote:

Daniel,

I applied the patches and rebuilt a libvirt from git.  The lockup seems
to no longer occur.  Not sure if its the patches or a change in libvirt
0.9.7.  I couldn't get the older libvirt to build/patches backported in
a timely manner.  thanks for the quick response.

Regards
-steve

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/16

------------------------------------------------------------------------
On 2011-11-30T19:33:34+00:00 Fedora wrote:

This package has changed ownership in the Fedora Package Database.
Reassigning to the new owner of this component.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/17

------------------------------------------------------------------------
On 2011-11-30T19:37:08+00:00 Fedora wrote:

This package has changed ownership in the Fedora Package Database.
Reassigning to the new owner of this component.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/18

------------------------------------------------------------------------
On 2011-11-30T19:44:41+00:00 Fedora wrote:

This package has changed ownership in the Fedora Package Database.
Reassigning to the new owner of this component.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/19

------------------------------------------------------------------------
On 2011-11-30T19:55:13+00:00 Fedora wrote:

This package has changed ownership in the Fedora Package Database.
Reassigning to the new owner of this component.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/20

------------------------------------------------------------------------
On 2011-12-05T18:21:01+00:00 Eric wrote:

We should backport the following patches to F16:

3ec12898
32d3ec74
a8bb75a3
b265beda

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/21

------------------------------------------------------------------------
On 2011-12-20T02:08:51+00:00 Fedora wrote:

libvirt-0.9.6-4.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/libvirt-0.9.6-4.fc16

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/28

------------------------------------------------------------------------
On 2011-12-22T22:34:58+00:00 Fedora wrote:

Package libvirt-0.9.6-4.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-0.9.6-4.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-17267/libvirt-0.9.6-4.fc16
then log in and leave karma (feedback).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/29

------------------------------------------------------------------------
On 2012-01-05T20:59:42+00:00 Fedora wrote:

libvirt-0.9.6-4.fc16 has been pushed to the Fedora 16 stable repository.
If problems still persist, please make note of it in this bug report.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/comments/32


** Changed in: libvirt (Fedora)
       Status: Unknown => Fix Released

** Changed in: libvirt (Fedora)
   Importance: Unknown => Undecided

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/903212

Title:
  libvirtd stops responding in oneiric

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/903212/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to