Re: unkillable dpkg-query processes

2007-11-06 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Tue, 06 Nov 2007 04:51:07 +0100 Here's also some output from apt-get which got stuck in my unstable chroot while I wanted to retrieve the klibc source to try to debug it... So the good news is that I started getting the hang seen on the Debain buildd

Re: unkillable dpkg-query processes

2007-11-06 Thread Bernd Zeimetz
David Miller wrote: From: Bernd Zeimetz [EMAIL PROTECTED] Date: Tue, 06 Nov 2007 04:51:07 +0100 Here's also some output from apt-get which got stuck in my unstable chroot while I wanted to retrieve the klibc source to try to debug it... So the good news is that I started getting the hang

Re: unkillable dpkg-query processes

2007-11-05 Thread Bernd Zeimetz
So I'm not sure if the result is really useful for you - if not just let me know. I've attached the last ~10-20 sysrq-g outputs - as it was running in a loop I have a ton of them. In case you're wondering: http is aptitude's http method. The http module is stuck in a different place, I'll

Re: unkillable dpkg-query processes

2007-11-04 Thread David Miller
From: Josip Rodin [EMAIL PROTECTED] Date: Fri, 2 Nov 2007 17:21:06 +0100 Great. Here you go, three of them, while the load was 3 and this process was stuck: buildd 10813 100 0.8 987368 17504 ?RN 14:44 155:49 dpkg-query --search libpthread.so.0 libdl.so.2 libstdc++.so.6

Re: unkillable dpkg-query processes

2007-11-04 Thread Bernd Zeimetz
Ok, the key in the trace is: Nov 2 16:25:30 titan kernel: [ 978.134874] CPU[ 1]: TSTATE[80009603] TPC[0067d2e0] TNPC[0067d2d4] TASK[aptitude:3204] Nov 2 16:25:30 titan kernel: [ 978.257809] TPC[_write_unlock_irq+0x20/0x110] ... Nov 2

Re: unkillable dpkg-query processes

2007-11-04 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Sun, 04 Nov 2007 20:55:20 +0100 So I'm not sure if the result is really useful for you - if not just let me know. I've attached the last ~10-20 sysrq-g outputs - as it was running in a loop I have a ton of them. In case you're wondering: http is

Re: unkillable dpkg-query processes

2007-11-04 Thread Bernd Zeimetz
David Miller wrote: From: Bernd Zeimetz [EMAIL PROTECTED] Date: Sun, 04 Nov 2007 20:55:20 +0100 So I'm not sure if the result is really useful for you - if not just let me know. I've attached the last ~10-20 sysrq-g outputs - as it was running in a loop I have a ton of them. In case you're

Re: unkillable dpkg-query processes

2007-11-04 Thread Bernd Zeimetz
In the meantime I'll build an aptitude which should exit after running trough the part which crashed usually, so it should be possible to run it in a loop... This was successful - it made crashing the machine pretty simple, even without activated libnss-db. To reproduce on Etch: - get the

Re: unkillable dpkg-query processes

2007-11-03 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Fri, 02 Nov 2007 16:37:25 +0100 I've sent g several times to sysrq, output is attached. According to top the two hanging aptitude processes were running on CPU 1 + 3. 3204 root 20 0 19552 5088 4072 R 100 0.1 6:54.49 1 aptitude 3203

Re: unkillable dpkg-query processes

2007-11-02 Thread Bernd Zeimetz
David Miller wrote: From: David Miller [EMAIL PROTECTED] Date: Thu, 01 Nov 2007 15:01:13 -0700 (PDT) I'm working on a kernel patch for 2.6.23 that will allow you to get some useful debugging information in situations like this. I'll try to get you that patch by the end of tonight. As

Re: unkillable dpkg-query processes

2007-11-02 Thread Josip Rodin
On Thu, Nov 01, 2007 at 09:55:44PM -0700, David Miller wrote: I'm working on a kernel patch for 2.6.23 that will allow you to get some useful debugging information in situations like this. I'll try to get you that patch by the end of tonight. As promised, here is the patch below. echo

Re: unkillable dpkg-query processes

2007-11-01 Thread Josip Rodin
Hi, lebrun.d.o hasn't crashed in a while now, but it has this in the process list: buildd2382 0.0 0.2 8144 4736 ?Ss Oct30 0:00 /usr/bin/perl /usr/bin/buildd buildd2407 0.0 0.5 13920 11296 ?SN Oct30 0:10 \_ /usr/bin/perl /usr/bin/sbuild --batch

Re: unkillable dpkg-query processes

2007-11-01 Thread Bernd Zeimetz
The futex() calls are definitely from libnss-db. And on Lenny/testing we have futex calls from libc6. Didn't have the time to come up with any instructions yet as we have public holidays today, I'll try to finish them tomorrow. -- Bernd Zeimetz [EMAIL PROTECTED]

Re: unkillable dpkg-query processes

2007-11-01 Thread David Miller
From: David Miller [EMAIL PROTECTED] Date: Thu, 01 Nov 2007 15:01:13 -0700 (PDT) I'm working on a kernel patch for 2.6.23 that will allow you to get some useful debugging information in situations like this. I'll try to get you that patch by the end of tonight. As promised, here is the patch

Re: unkillable dpkg-query processes

2007-10-29 Thread David Miller
From: Josip Rodin [EMAIL PROTECTED] Date: Tue, 30 Oct 2007 00:37:13 +0100 I'd try doing a debootstrap of lenny (that's Debian testing), and then inside it, run one or more of those 'dpkg-query -S libc.so.6'. Thanks for the info. While waiting for you to reply I created a lenny buildd build

Re: unkillable dpkg-query processes

2007-10-29 Thread Josip Rodin
Hi, (Sorry for breaking the threading - I didn't subscribe to the list, I just found this in the web archive. I should probably subscribe... :) David Miller wrote: Ok, since I have a 280R just like Josip, I think a good plan is for him to show me the commands he used to create the build root

Re: unkillable dpkg-query processes

2007-10-29 Thread Bernd Zeimetz
mount -t devpts none /dev/pts mount --bind /dev /thechroot/dev is what I use here, running udev in a chroot is no fun. So, it's a lot more than just running the appropriate debootstrap command. I'm almost done with a howto which is cutpaste for 95% to debootstrap and boot a debian

Re: unkillable dpkg-query processes

2007-10-29 Thread Bernd Zeimetz
Here you go. (Mind, this is capturing the current status of the chroot, which is fairly unclean, because right now it happens to be building python-qt4-4.3.1.) What we're missing here is a probably important piece: If dpkg-query is running during a build, it is running in a fakeroot

Re: unkillable dpkg-query processes

2007-10-29 Thread Bernd Zeimetz
mount -t devpts none /dev/pts mount --bind /dev /thechroot/dev is what I use here, running udev in a chroot is no fun. Ok. AFaik the buildds only have a minimal /dev. though. But to bootstrap a system that's usually not enough. Let's stick to 2.6.23 testing for pinpointing these

Re: unkillable dpkg-query processes

2007-10-29 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Tue, 30 Oct 2007 01:50:30 +0100 What we're missing here is a probably important piece: If dpkg-query is running during a build, it is running in a fakeroot environment. I've straced that, see the attachment. What I find in the strace are at

Re: unkillable dpkg-query processes

2007-10-29 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Tue, 30 Oct 2007 01:47:33 +0100 mount -t devpts none /dev/pts mount --bind /dev /thechroot/dev is what I use here, running udev in a chroot is no fun. Ok. I'm almost done with a howto which is cutpaste for 95% to debootstrap and boot a

Re: unkillable dpkg-query processes

2007-10-29 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Tue, 30 Oct 2007 02:54:14 +0100 Ok. Do you have a .deb with a kernel for me? If not - would you like to have any specific options enabled - I have to build one then. I usually just cp arch/sparc64/defconfig ./.config in a fresh vanilla kernel tree

Re: unkillable dpkg-query processes

2007-10-28 Thread Bernd Zeimetz
I think things got worse with 2.6.24... The machine shoots itself now, I guess by running cron jobs or so. [29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC: 0042f928 Y: Not tainted [29074.884191] TPC: sched_clock+0x0/0x30 What kind of OOPS is this?

Re: unkillable dpkg-query processes

2007-10-28 Thread Bernd Zeimetz
[29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC: 0042f928 Y: Not tainted [29074.884191] TPC: sched_clock+0x0/0x30 What kind of OOPS is this? Please provide the kernel log messages that appeared right before these register dumps. Oct 28 03:25:12

Re: unkillable dpkg-query processes

2007-10-28 Thread Sébastien Bernard
Bernd Zeimetz a écrit : Hi, please note that the futex bug also happens on US II machines, it is jsut almost impossible to reproduce it - it'll just hang after random days of building. Everyone who sees these UltraSPARC-III problems please send me PRECISE and FULL description of how to

Re: unkillable dpkg-query processes

2007-10-28 Thread Bernd Zeimetz
Hi, Since mono team decided that the mono is broken on Sparc (and despite the fix provided by David Miller), I had to rebuild after enabling the sparc arch in the source. The hangs happens always at the end of the buid when invoking dh_shgenlibs in the build. This is not 100%

Re: unkillable dpkg-query processes

2007-10-28 Thread Bernd Zeimetz
Bernd Zeimetz wrote: Hi, Since mono team decided that the mono is broken on Sparc (and despite the fix provided by David Miller), I had to rebuild after enabling the sparc arch in the source. Trying this at the moment. not reproducible - mono fails to build from source in sid... so it

Re: unkillable dpkg-query processes

2007-10-28 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Mon, 29 Oct 2007 02:18:30 +0100 But if this bug isn't fixed chances are good that the next Debian release won't support Sparc at all. Please don't use pseudo-threats like this, it only deters me even more from working on this bug. This explains why

Re: unkillable dpkg-query processes

2007-10-28 Thread Bernd Zeimetz
David Miller wrote: From: Bernd Zeimetz [EMAIL PROTECTED] Date: Mon, 29 Oct 2007 02:18:30 +0100 But if this bug isn't fixed chances are good that the next Debian release won't support Sparc at all. Please don't use pseudo-threats like this, it only deters me even more from working on

Re: unkillable dpkg-query processes

2007-10-28 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Mon, 29 Oct 2007 03:06:13 +0100 David Miller wrote: Josip stated explicitly that he has a SunFire280R, which disagrees with what you're saying here. Sorry, I mixed something up here. I was somehow sure that they were using a v440, but it was

Re: unkillable dpkg-query processes

2007-10-27 Thread Bernd Zeimetz
Bernd Zeimetz wrote: For those who can reproduce it an have something like libnss-db enabled, try disabling it. - disabled it - running vgdisplay killed the machine (wanted to create a new LV for a chroot)... it's not accessible at all anymore, I think the kernel is a 2.6.23-something

Re: unkillable dpkg-query processes

2007-10-27 Thread Bernd Zeimetz
I think things got worse with 2.6.24... The machine shoots itself now, I guess by running cron jobs or so. [29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC: 0042f928 Y: Not tainted [29074.884191] TPC: sched_clock+0x0/0x30 [29074.929988] g0:

Re: unkillable dpkg-query processes

2007-10-27 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Sun, 28 Oct 2007 04:03:44 +0100 I think things got worse with 2.6.24... The machine shoots itself now, I guess by running cron jobs or so. [29074.766486] TSTATE: 11009600 TPC: 0042f984 TNPC: 0042f928 Y:

Re: unkillable dpkg-query processes

2007-10-27 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Sat, 27 Oct 2007 20:09:47 +0200 titan:~# [ 2427.313946] BUG: soft lockup - CPU#3 stuck for 11s! [aptitude:13375] [ 2427.389128] TSTATE: 11009602 TPC: 0042f93c TNPC: 0042f7d0 Y: Not tainted [ 2427.506821] TPC:

Re: unkillable dpkg-query processes

2007-10-26 Thread Bernd Zeimetz
Hi, It seems that instead of getting stuck in the kernel where I thought it would, the process gets stuck elsewhere and also tends to loop allocating memory until all memory in the machine is exhausted and the OOM killer starts to try and kill processes left and right. at least it runs

Re: unkillable dpkg-query processes

2007-10-26 Thread Bernd Zeimetz
Hi, just got linked to this thread, so here's a bit input form me :) 1) system type A Sun Fire 280R, with two CPU boards, each carrying a TI UltraSparc III (Cheetah), and 2 GB of RAM. If you need more info, just say. (Bernd Zeimetz has previously suggested that the problem is linked to

Re: unkillable dpkg-query processes

2007-10-26 Thread David Miller
From: Bernd Zeimetz [EMAIL PROTECTED] Date: Fri, 26 Oct 2007 14:30:21 +0200 at least it runs with 100% CPU, attaching strace to the pid doesn't give any results strace-ing the whole process doesn't result in more useful output, but the hanging processes were killable when they were

Re: unkillable dpkg-query processes

2007-10-26 Thread Josip Rodin
On Fri, Oct 26, 2007 at 03:01:24PM -0700, David Miller wrote: One thing I notice in the debian bug report is a mention of libnss-db So I did some testing here and without libnss-db installed, running dpkg-query does not use futexes at all. But once I install libnss-db and enable it (by

Re: unkillable dpkg-query processes

2007-10-26 Thread Josip Rodin
On Sat, Oct 27, 2007 at 12:30:56AM +0200, Bernd Zeimetz wrote: Josip, do you guys have libnss-db or similar in use on the buildd machine? They have, that's what Debian's userdir-ldap uses. No, I have to correct you, this machine isn't part of that setup (at least not yet). -- 2.

Re: unkillable dpkg-query processes

2007-10-26 Thread Bernd Zeimetz
Josip Rodin wrote: On Sat, Oct 27, 2007 at 12:30:56AM +0200, Bernd Zeimetz wrote: Josip, do you guys have libnss-db or similar in use on the buildd machine? They have, that's what Debian's userdir-ldap uses. No, I have to correct you, this machine isn't part of that setup (at least not

Re: unkillable dpkg-query processes

2007-10-26 Thread Bernd Zeimetz
For those who can reproduce it an have something like libnss-db enabled, try disabling it. - disabled it - running vgdisplay killed the machine (wanted to create a new LV for a chroot)... it's not accessible at all anymore, I think the kernel is a 2.6.23-something here, I'll build a recent

Re: unkillable dpkg-query processes

2007-10-26 Thread Bernd Zeimetz
Josip, do you guys have libnss-db or similar in use on the buildd machine? They have, that's what Debian's userdir-ldap uses. For those who can reproduce it an have something like libnss-db enabled, try disabling it. Will do in a few minutes. -- Bernd Zeimetz [EMAIL PROTECTED]

Re: unkillable dpkg-query processes

2007-10-25 Thread David Miller
Josip, give this debugging patch a try. It is against 2.6.23.1 but it should apply to most recent kernels. It should give you debugging messages in the kernel log that start with FUTEX_BUG if the debugging code triggers. Please post just a few samples of whatever it spits out. Thanks! diff

Re: unkillable dpkg-query processes

2007-10-25 Thread Josip Rodin
On Wed, Oct 24, 2007 at 11:41:13PM -0700, David Miller wrote: Josip, give this debugging patch a try. It is against 2.6.23.1 but it should apply to most recent kernels. OK, after resurrecting the machine once again (it had died in the meantime, reliably as ever), I did: patching file

Re: unkillable dpkg-query processes

2007-10-25 Thread Josip Rodin
On Thu, Oct 25, 2007 at 05:07:36PM +0200, joy wrote: If you try, within that troublesome build-root, a few times to try to fork off a couple hundred: dpkg-query --something python-2.5 or whatever, can you get some of processes to wedge under that build root? I did this in a

Re: unkillable dpkg-query processes

2007-10-24 Thread David Miller
From: Josip Rodin [EMAIL PROTECTED] Date: Thu, 25 Oct 2007 00:33:32 +0200 We've been having grave issues with a few of our sparc build daemon machines in Debian. Something causes dpkg-query(8) processes, otherwise harmless, to run amok and allocate too much memory, but keep running and become

Re: unkillable dpkg-query processes

2007-10-24 Thread Josip Rodin
On Wed, Oct 24, 2007 at 03:58:29PM -0700, David Miller wrote: I know, I've seen this report a million times :-) Oh, I know you know, I mailed you a while ago and you told me to mail the mailing list :) I can't reproduce it, I've even tried the fabled test case where you spawn thousands of