Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-12-10 Thread George Spelvin
> Hum, can you try disabling the HW support of CRC32C implementation > (CRYPTO_CRC32C_INTEL)? If the problem disappears, we know there's some > problem in the HW support code... To isolate it even better, I left in the hardware support, but commented out the CLMUL code. I could have just upped t

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-12-10 Thread Jan Kara
On Tue 10-12-13 16:27:01, Jan Kara wrote: > On Tue 10-12-13 04:35:28, George Spelvin wrote: > > One of those additional WARN_ON tests tripped, hooray! > > And it turned out to be in the ext4 metadata checksumming. To be > > precise, ext4_block_bitmap_csum_set() returned with irqs disabled, > > and

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-12-10 Thread Jan Kara
On Tue 10-12-13 04:35:28, George Spelvin wrote: > One of those additional WARN_ON tests tripped, hooray! > And it turned out to be in the ext4 metadata checksumming. To be > precise, ext4_block_bitmap_csum_set() returned with irqs disabled, > and kaboom. Ha, great. Thanks for the persistence in

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-12-10 Thread George Spelvin
One of those additional WARN_ON tests tripped, hooray! And it turned out to be in the ext4 metadata checksumming. To be precise, ext4_block_bitmap_csum_set() returned with irqs disabled, and kaboom. Since I have this experimental feature turned on and most people don't, this explains why I'm find

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-11-28 Thread Jan Kara
On Thu 28-11-13 00:09:06, George Spelvin wrote: > Well, it finally triggered. > > > Not *that* long before, I fiddled with a USB thumb drive, which > I'll mention here, but I don't think it's connected. > > [2328294.996152] usb 1-1.3: new high-speed USB device number 6 using ehci-pci > [2328295.

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-11-27 Thread George Spelvin
Well, it finally triggered. Not *that* long before, I fiddled with a USB thumb drive, which I'll mention here, but I don't think it's connected. [2328294.996152] usb 1-1.3: new high-speed USB device number 6 using ehci-pci [2328295.080347] usb 1-1.3: New USB device found, idVendor=0781, idProduc

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread George Spelvin
Due to wanting to stick with 3.11.x baseline, as opposed to whatever you based your diff on, I had to amend the last hunk slightly. Included just FYI. Compiled, rebooting now. It may take some days to get a bug report. diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 4bbbf13b..e6f0d6b 1

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread Jan Kara
On Thu 31-10-13 21:37:25, Jan Kara wrote: > On Thu 31-10-13 12:30:51, George Spelvin wrote: > > Jan Kara wrote: > > > On Thu 31-10-13 05:58:16, George Spelvin wrote: > > >> [x.908259] Call Trace: > > >> [x.908265] [] dump_stack+0x54/0x74 > > >> [x.908268] [] __might_sleep+0xcf/0xf0 > > >> [x.908

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread Jan Kara
On Thu 31-10-13 12:30:51, George Spelvin wrote: > Jan Kara wrote: > > On Thu 31-10-13 05:58:16, George Spelvin wrote: > >> [x.908259] Call Trace: > >> [x.908265] [] dump_stack+0x54/0x74 > >> [x.908268] [] __might_sleep+0xcf/0xf0 > >> [x.908271] [] ext4_journal_check_start+0x1b/0xa0 > >> [x.9082

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread George Spelvin
Andreas Dilger asked: > What kind of storage stack is underneath this filesystem? If > it is deep (e.g. DM + LVM + iSCSI) then the stack overflow is > definitely possible. ext4 on md raid1 on SATA. Nothing too complicated. Personalities : [raid0] [raid1] md1 : active raid1 sdb2[1] sda2[0]

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread Andreas Dilger
On Oct 17, 2013, at 4:14 PM, Al Viro wrote: > On Thu, Oct 17, 2013 at 05:11:43PM -0400, George Spelvin wrote: >> >> Well, it happened again (error appended). Can you please clarify what you >> mean >> by "such BUG_ON()"; I'm having a hard time following the RCU code and >> determining >> all t

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread George Spelvin
Jan Kara wrote: > On Thu 31-10-13 05:58:16, George Spelvin wrote: >> [x.908259] Call Trace: >> [x.908265] [] dump_stack+0x54/0x74 >> [x.908268] [] __might_sleep+0xcf/0xf0 >> [x.908271] [] ext4_journal_check_start+0x1b/0xa0 >> [x.908273] [] __ext4_journal_start_sb+0x21/0x80 >> [x.908276] [] ex

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread Jan Kara
Hello, On Thu 31-10-13 05:58:16, George Spelvin wrote: > Sorry for the long delay between updates, but it took a while to > re-trigger the bug. It seems to be caused by iceweasel crashing due to > some OOM condition. > > Anyway, here's the stack dump with CONFIG_DEBUG_ATOMIC_SLEEP enabled. > (

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-31 Thread George Spelvin
Sorry for the long delay between updates, but it took a while to re-trigger the bug. It seems to be caused by iceweasel crashing due to some OOM condition. Anyway, here's the stack dump with CONFIG_DEBUG_ATOMIC_SLEEP enabled. (x = 1166866 seconds of uptime.) [x.908244] BUG: sleeping function cal

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-17 Thread Al Viro
On Thu, Oct 17, 2013 at 05:11:43PM -0400, George Spelvin wrote: > Al Viro wrote: > > Note that do_group_exit() is preceded by > >spin_unlock_irq(&sighand->siglock); > > so no matter what happened in callers, irq is enabled. I'd suggest sticking > > such BUG_ON() into __fput() and t

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-17 Thread Jan Kara
On Thu 17-10-13 17:11:43, George Spelvin wrote: > Al Viro wrote: > > Note that do_group_exit() is preceded by > >spin_unlock_irq(&sighand->siglock); > > so no matter what happened in callers, irq is enabled. I'd suggest sticking > > such BUG_ON() into __fput() and trying to reprodu

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-17 Thread George Spelvin
Al Viro wrote: > Note that do_group_exit() is preceded by >spin_unlock_irq(&sighand->siglock); > so no matter what happened in callers, irq is enabled. I'd suggest sticking > such BUG_ON() into __fput() and trying to reproduce that crap... Well, it happened again (error appended).

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-09 Thread Al Viro
On Wed, Oct 09, 2013 at 05:18:53PM +0200, Jan Kara wrote: > This is really weird. We are delivering a signal to a task. While task is ITYM "a fatal signal" > returning from kernel space we are running queued task works and one of get_signal_to_deliver() notices that the signal has to be dealt

Re: 3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-09 Thread Jan Kara
On Wed 09-10-13 07:55:02, George Spelvin wrote: > This is a newly built machine (although out of "tested" parts), so RAM > problems are not unthinkable, but I had the chance to capture this so > it seemed worth reporting. > > i7-2xxx CPU, 8GB RAM, file system is ext4 on RAID-1. > The local patches

3.11.4: kernel BUG at fs/buffer.c:1268

2013-10-09 Thread George Spelvin
This is a newly built machine (although out of "tested" parts), so RAM problems are not unthinkable, but I had the chance to capture this so it seemed worth reporting. i7-2xxx CPU, 8GB RAM, file system is ext4 on RAID-1. The local patches are to a char device driver (remote control/rf subsystem) t