[Bug 379452] Re: CPU Load Avg calculation gets very confused by multiple recv()s on the same PF_UNIX/SOCK_DGRAM socket

Bug Watch Updater Thu, 26 Oct 2017 20:21:51 -0700

Launchpad has imported 19 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=529202.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2009-10-15T13:26:42+00:00 Lennart wrote:

Created attachment 364906
the test case

Consider the attached example code. All it does is create a
PF_UNIX/SOCK_DGRAM socket, spawn 4 threads, and call recv() on the
socket in each of those threads. Nothing else. Because nobody is sending
anything on the socket the program bsically just hangs, rightly does not
appear in top -- except that the CPU load average top shows starts to go
up and up. Which it shouldn't of course.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/7

------------------------------------------------------------------------
On 2009-10-15T13:35:27+00:00 Lennart wrote:

use case written by the ubuntians btw.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/8

------------------------------------------------------------------------
On 2009-10-15T13:35:52+00:00 Lennart wrote:

s/use case/test case/

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/9

------------------------------------------------------------------------
On 2009-10-15T14:36:15+00:00 Matthew wrote:

Load average seems to go to 4 for me, which is what I'd expect. Do you
see different behaviour?

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/10

------------------------------------------------------------------------
On 2009-10-15T15:53:43+00:00 Lennart wrote:

yes, that's what i see too, butnot what i expected. recv() hangs in D
state, and it should be in S state i believe, given that the sleeping
actually *is* interruptable with a simple C-c which causes EINTR on the
recv().

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/11

------------------------------------------------------------------------
On 2009-10-15T22:40:50+00:00 Lennart wrote:

Problem seems to be related to the simultaneous recv() in multiple
threads:

One of the threads will be hanging in S state, and the others in D. The
load avg should hence go up to n-1 if we have n threads calling recv()
on the same socket.

I would say this a bug.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/13

------------------------------------------------------------------------
On 2009-10-15T22:44:05+00:00 Lennart wrote:

doing the same thing with a pipe instead of an AF_UNIX socket btw works
properly: all threads will hang in S.

Whether read() or recv() is used on the fd makes no real difference for
the AF_UNIX case.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/14

------------------------------------------------------------------------
On 2009-10-15T22:46:51+00:00 Lennart wrote:

If this code is done with AF_UNIX/SOCK_STREAM then all threads will hang
in S. As it should be.

Summarizing:

On pipes all threads waiting will be in S state
On AF_UNIX/SOCK_STREAM all threads waiting will be in S state
On AF_UNIX/SOCK_DGRAM one thread will be in S state, the others in D state 
(BROKEN!)

So, yes, this is definitely a bug in the socket handling code.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/15

------------------------------------------------------------------------
On 2009-10-15T22:52:08+00:00 Lennart wrote:

I am now setting this as F12Target. glib now uses libasyncns for the
resolver and we probably shouldn't show a completely bogus loadavg when
the user runs a glib app that uses the resolver.

It's admittedly not high prio though, given that only the statistics are
wrong but everything else seems to be fine.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/16

------------------------------------------------------------------------
On 2009-10-17T21:27:53+00:00 Matěj wrote:

(In reply to comment #8)
> It's admittedly not high prio though, given that only the statistics are wrong
> but everything else seems to be fine.  

Are you sure about this? My gajim when tries to use python-libasyncns
makes whole computer pretty slowly reacting.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/17

------------------------------------------------------------------------
On 2009-10-17T23:42:44+00:00 Michal wrote:

Created attachment 365146
modified test case with processes instead of threads

I modified the test case slightly to use full processes instead of
threads, just to demonstrate that it's not a threads-only issue. The
high loadavg is reproducible just as well here and the processes are
really in D state.

I can't reproduce the system slowdown (which Matěj is seeing) with this.
I'll try Gajim later.

I wonder if we can just use mutex_lock_interruptible(&u->readlock) in
unix_dgram_recvmsg...

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/18

------------------------------------------------------------------------
On 2009-10-19T17:36:26+00:00 Lennart wrote:

(In reply to comment #9)
> (In reply to comment #8)
> > It's admittedly not high prio though, given that only the statistics are 
> > wrong
> > but everything else seems to be fine.  
> 
> Are you sure about this? My gajim when tries to use python-libasyncns makes
> whole computer pretty slowly reacting.  

Hmm, no, never seen that. Everyone else reports as if this is only a
statistics issue. Is the process actually showing up as CPU time
consuming in top?

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/19

------------------------------------------------------------------------
On 2009-10-19T17:40:38+00:00 Lennart wrote:

*** Bug 529504 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/20

------------------------------------------------------------------------
On 2009-10-19T19:49:28+00:00 Matěj wrote:

(In reply to comment #11)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > It's admittedly not high prio though, given that only the statistics are 
> > > wrong
> > > but everything else seems to be fine.  
> > 
> > Are you sure about this? My gajim when tries to use python-libasyncns makes
> > whole computer pretty slowly reacting.  
> 
> Hmm, no, never seen that. Everyone else reports as if this is only a 
> statistics
> issue. Is the process actually showing up as CPU time consuming in top?  

Yes, gajim is then pretty active process (low tens of per cent).

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/21

------------------------------------------------------------------------
On 2010-03-04T07:55:35+00:00 Michal wrote:

*** Bug 570323 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/22

------------------------------------------------------------------------
On 2010-03-04T14:34:24+00:00 Michal wrote:

Note to self: The mutex was added in 2.6.10 by DaveM in:
[AF_UNIX]: Serialize dgram read using semaphore just like stream
It fixed an exploitable race condition 
(http://www.securityfocus.com/archive/1/381689).
Using mutex_lock_interruptible() almost works, except that SO_RCVTIMEO will 
still work badly in this situation.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/23

------------------------------------------------------------------------
On 2010-03-15T12:56:44+00:00 Bug wrote:

This bug appears to have been reported against 'rawhide' during the Fedora 13 
development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/24

------------------------------------------------------------------------
On 2011-06-02T17:36:26+00:00 Bug wrote:

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/25

------------------------------------------------------------------------
On 2011-06-27T14:27:05+00:00 Bug wrote:

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Reply at: https://bugs.launchpad.net/linux/+bug/379452/comments/26

** Changed in: linux (Fedora)
       Status: Confirmed => Won't Fix

** Changed in: linux (Fedora)
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/379452

Title:
  CPU Load Avg calculation gets very confused by multiple recv()s on the
  same PF_UNIX/SOCK_DGRAM socket

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/379452/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 379452] Re: CPU Load Avg calculation gets very confused by multiple recv()s on the same PF_UNIX/SOCK_DGRAM socket

Reply via email to