From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Tue, 06 Nov 2007 04:51:07 +0100
Here's also some output from apt-get which got stuck in my unstable
chroot while I wanted to retrieve the klibc source to try to debug it...
So the good news is that I started getting the hang seen
on the Debain buildd
David Miller wrote:
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Tue, 06 Nov 2007 04:51:07 +0100
Here's also some output from apt-get which got stuck in my unstable
chroot while I wanted to retrieve the klibc source to try to debug it...
So the good news is that I started getting the hang
So I'm not sure if the result is really useful for you - if not just let
me know. I've attached the last ~10-20 sysrq-g outputs - as it was
running in a loop I have a ton of them. In case you're wondering: http
is aptitude's http method.
The http module is stuck in a different place, I'll
From: Josip Rodin [EMAIL PROTECTED]
Date: Fri, 2 Nov 2007 17:21:06 +0100
Great. Here you go, three of them, while the load was 3 and this process was
stuck:
buildd 10813 100 0.8 987368 17504 ?RN 14:44 155:49 dpkg-query
--search libpthread.so.0 libdl.so.2 libstdc++.so.6
Ok, the key in the trace is:
Nov 2 16:25:30 titan kernel: [ 978.134874] CPU[ 1]:
TSTATE[80009603] TPC[0067d2e0] TNPC[0067d2d4]
TASK[aptitude:3204]
Nov 2 16:25:30 titan kernel: [ 978.257809]
TPC[_write_unlock_irq+0x20/0x110]
...
Nov 2
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Sun, 04 Nov 2007 20:55:20 +0100
So I'm not sure if the result is really useful for you - if not just let
me know. I've attached the last ~10-20 sysrq-g outputs - as it was
running in a loop I have a ton of them. In case you're wondering: http
is
David Miller wrote:
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Sun, 04 Nov 2007 20:55:20 +0100
So I'm not sure if the result is really useful for you - if not just let
me know. I've attached the last ~10-20 sysrq-g outputs - as it was
running in a loop I have a ton of them. In case you're
In the meantime I'll build an aptitude which should exit after running
trough the part which crashed usually, so it should be possible to run
it in a loop...
This was successful - it made crashing the machine pretty simple, even
without activated libnss-db.
To reproduce on Etch:
- get the
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Fri, 02 Nov 2007 16:37:25 +0100
I've sent g several times to sysrq, output is attached.
According to top the two hanging aptitude processes were running on CPU
1 + 3.
3204 root 20 0 19552 5088 4072 R 100 0.1 6:54.49 1 aptitude
3203
David Miller wrote:
From: David Miller [EMAIL PROTECTED]
Date: Thu, 01 Nov 2007 15:01:13 -0700 (PDT)
I'm working on a kernel patch for 2.6.23 that will allow you to get
some useful debugging information in situations like this.
I'll try to get you that patch by the end of tonight.
As
On Thu, Nov 01, 2007 at 09:55:44PM -0700, David Miller wrote:
I'm working on a kernel patch for 2.6.23 that will allow you to get
some useful debugging information in situations like this.
I'll try to get you that patch by the end of tonight.
As promised, here is the patch below.
echo
Hi,
lebrun.d.o hasn't crashed in a while now, but it has this in the
process list:
buildd2382 0.0 0.2 8144 4736 ?Ss Oct30 0:00 /usr/bin/perl
/usr/bin/buildd
buildd2407 0.0 0.5 13920 11296 ?SN Oct30 0:10 \_
/usr/bin/perl /usr/bin/sbuild --batch
The futex() calls are definitely from libnss-db.
And on Lenny/testing we have futex calls from libc6.
Didn't have the time to come up with any instructions yet as we have
public holidays today, I'll try to finish them tomorrow.
--
Bernd Zeimetz
[EMAIL PROTECTED]
From: David Miller [EMAIL PROTECTED]
Date: Thu, 01 Nov 2007 15:01:13 -0700 (PDT)
I'm working on a kernel patch for 2.6.23 that will allow you to get
some useful debugging information in situations like this.
I'll try to get you that patch by the end of tonight.
As promised, here is the patch
From: Josip Rodin [EMAIL PROTECTED]
Date: Tue, 30 Oct 2007 00:37:13 +0100
I'd try doing a debootstrap of lenny (that's Debian testing),
and then inside it, run one or more of those 'dpkg-query -S libc.so.6'.
Thanks for the info.
While waiting for you to reply I created a lenny buildd
build
Hi,
(Sorry for breaking the threading - I didn't subscribe to the list,
I just found this in the web archive. I should probably subscribe... :)
David Miller wrote:
Ok, since I have a 280R just like Josip, I think a good plan
is for him to show me the commands he used to create the
build root
mount -t devpts none /dev/pts
mount --bind /dev /thechroot/dev
is what I use here, running udev in a chroot is no fun.
So, it's a lot more than just running the appropriate debootstrap
command.
I'm almost done with a howto which is cutpaste for 95% to debootstrap
and boot a debian
Here you go.
(Mind, this is capturing the current status of the chroot, which is fairly
unclean, because right now it happens to be building python-qt4-4.3.1.)
What we're missing here is a probably important piece:
If dpkg-query is running during a build, it is running in a fakeroot
mount -t devpts none /dev/pts
mount --bind /dev /thechroot/dev
is what I use here, running udev in a chroot is no fun.
Ok.
AFaik the buildds only have a minimal /dev. though. But to bootstrap a
system that's usually not enough.
Let's stick to 2.6.23 testing for pinpointing these
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Tue, 30 Oct 2007 01:50:30 +0100
What we're missing here is a probably important piece:
If dpkg-query is running during a build, it is running in a fakeroot
environment. I've straced that, see the attachment.
What I find in the strace are at
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Tue, 30 Oct 2007 01:47:33 +0100
mount -t devpts none /dev/pts
mount --bind /dev /thechroot/dev
is what I use here, running udev in a chroot is no fun.
Ok.
I'm almost done with a howto which is cutpaste for 95% to debootstrap
and boot a
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Tue, 30 Oct 2007 02:54:14 +0100
Ok. Do you have a .deb with a kernel for me? If not - would you like to
have any specific options enabled - I have to build one then.
I usually just cp arch/sparc64/defconfig ./.config in a fresh
vanilla kernel tree
I think things got worse with 2.6.24...
The machine shoots itself now, I guess by running cron jobs or so.
[29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC:
0042f928 Y: Not tainted
[29074.884191] TPC: sched_clock+0x0/0x30
What kind of OOPS is this?
[29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC:
0042f928 Y: Not tainted
[29074.884191] TPC: sched_clock+0x0/0x30
What kind of OOPS is this? Please provide the kernel log messages
that appeared right before these register dumps.
Oct 28 03:25:12
Bernd Zeimetz a écrit :
Hi,
please note that the futex bug also happens on US II machines,
it is jsut almost impossible to reproduce it - it'll just hang
after random days of building.
Everyone who sees these UltraSPARC-III problems please send me PRECISE
and FULL description of how to
Hi,
Since mono team decided that the mono is broken on Sparc (and despite
the fix provided by David Miller), I had to rebuild after enabling the
sparc
arch in the source.
The hangs happens always at the end of the buid when invoking
dh_shgenlibs in the build.
This is not 100%
Bernd Zeimetz wrote:
Hi,
Since mono team decided that the mono is broken on Sparc (and despite
the fix provided by David Miller), I had to rebuild after enabling the
sparc
arch in the source.
Trying this at the moment.
not reproducible - mono fails to build from source in sid... so it
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Mon, 29 Oct 2007 02:18:30 +0100
But if this bug isn't fixed chances are good that the next Debian
release won't support Sparc at all.
Please don't use pseudo-threats like this, it only deters me even more
from working on this bug.
This explains why
David Miller wrote:
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Mon, 29 Oct 2007 02:18:30 +0100
But if this bug isn't fixed chances are good that the next Debian
release won't support Sparc at all.
Please don't use pseudo-threats like this, it only deters me even more
from working on
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Mon, 29 Oct 2007 03:06:13 +0100
David Miller wrote:
Josip stated explicitly that he has a SunFire280R, which disagrees
with what you're saying here.
Sorry, I mixed something up here. I was somehow sure that they were
using a v440, but it was
Bernd Zeimetz wrote:
For those who can reproduce it an have something like libnss-db
enabled, try disabling it.
- disabled it
- running vgdisplay killed the machine (wanted to create a new LV for a
chroot)... it's not accessible at all anymore, I think the kernel is
a 2.6.23-something
I think things got worse with 2.6.24...
The machine shoots itself now, I guess by running cron jobs or so.
[29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC:
0042f928 Y: Not tainted
[29074.884191] TPC: sched_clock+0x0/0x30
[29074.929988] g0:
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Sun, 28 Oct 2007 04:03:44 +0100
I think things got worse with 2.6.24...
The machine shoots itself now, I guess by running cron jobs or so.
[29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC:
0042f928 Y:
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Sat, 27 Oct 2007 20:09:47 +0200
titan:~# [ 2427.313946] BUG: soft lockup - CPU#3 stuck for 11s!
[aptitude:13375]
[ 2427.389128] TSTATE: 11009602 TPC: 0042f93c TNPC:
0042f7d0 Y: Not tainted
[ 2427.506821] TPC:
Hi,
It seems that instead of getting stuck in the kernel where I
thought it would, the process gets stuck elsewhere and
also tends to loop allocating memory until all memory in the
machine is exhausted and the OOM killer starts to try and
kill processes left and right.
at least it runs
Hi,
just got linked to this thread, so here's a bit input form me :)
1) system type
A Sun Fire 280R, with two CPU boards, each carrying a TI UltraSparc III
(Cheetah), and 2 GB of RAM. If you need more info, just say.
(Bernd Zeimetz has previously suggested that the problem is linked to
From: Bernd Zeimetz [EMAIL PROTECTED]
Date: Fri, 26 Oct 2007 14:30:21 +0200
at least it runs with 100% CPU, attaching strace to the pid doesn't give
any results
strace-ing the whole process doesn't result in more useful output, but
the hanging processes were killable when they were
On Fri, Oct 26, 2007 at 03:01:24PM -0700, David Miller wrote:
One thing I notice in the debian bug report is a mention of libnss-db
So I did some testing here and without libnss-db installed, running
dpkg-query does not use futexes at all.
But once I install libnss-db and enable it (by
On Sat, Oct 27, 2007 at 12:30:56AM +0200, Bernd Zeimetz wrote:
Josip, do you guys have libnss-db or similar in use on the buildd
machine?
They have, that's what Debian's userdir-ldap uses.
No, I have to correct you, this machine isn't part of that setup
(at least not yet).
--
2.
Josip Rodin wrote:
On Sat, Oct 27, 2007 at 12:30:56AM +0200, Bernd Zeimetz wrote:
Josip, do you guys have libnss-db or similar in use on the buildd
machine?
They have, that's what Debian's userdir-ldap uses.
No, I have to correct you, this machine isn't part of that setup
(at least not
For those who can reproduce it an have something like libnss-db
enabled, try disabling it.
- disabled it
- running vgdisplay killed the machine (wanted to create a new LV for a
chroot)... it's not accessible at all anymore, I think the kernel is
a 2.6.23-something here, I'll build a recent
Josip, do you guys have libnss-db or similar in use on the buildd
machine?
They have, that's what Debian's userdir-ldap uses.
For those who can reproduce it an have something like libnss-db
enabled, try disabling it.
Will do in a few minutes.
--
Bernd Zeimetz
[EMAIL PROTECTED]
Josip, give this debugging patch a try. It is against 2.6.23.1
but it should apply to most recent kernels.
It should give you debugging messages in the kernel log that
start with FUTEX_BUG if the debugging code triggers.
Please post just a few samples of whatever it spits out.
Thanks!
diff
On Wed, Oct 24, 2007 at 11:41:13PM -0700, David Miller wrote:
Josip, give this debugging patch a try. It is against 2.6.23.1
but it should apply to most recent kernels.
OK, after resurrecting the machine once again (it had died in the meantime,
reliably as ever), I did:
patching file
On Thu, Oct 25, 2007 at 05:07:36PM +0200, joy wrote:
If you try, within that troublesome build-root, a few times to try to
fork off a couple hundred:
dpkg-query --something python-2.5
or whatever, can you get some of processes to wedge under that
build root?
I did this in a
From: Josip Rodin [EMAIL PROTECTED]
Date: Thu, 25 Oct 2007 00:33:32 +0200
We've been having grave issues with a few of our sparc build daemon machines
in Debian. Something causes dpkg-query(8) processes, otherwise harmless, to
run amok and allocate too much memory, but keep running and become
On Wed, Oct 24, 2007 at 03:58:29PM -0700, David Miller wrote:
I know, I've seen this report a million times :-)
Oh, I know you know, I mailed you a while ago and you told me to mail
the mailing list :)
I can't reproduce it, I've even tried the fabled test case
where you spawn thousands of
47 matches
Mail list logo