Package: src:linux
Version: 3.2.46-1
Severity: important

Dear Debian Linux Kernel Maintainers,

If I create a cgroup freezer container on an SMP machine and repeatedly
freeze/thaw it in a loop, the kernel freezes with a BUG.

To reproduce, create a cgroups freezer container with a single process
in it on an SMP machine with wheezy standard kernel 3.2.46-1:

 mkdir /dev/cgroups-freezer
 mount -t cgroup -o freezer freezer /dev/cgroups-freezer
 mkdir /dev/cgroups-freezer/crashtest
 cd /dev/cgroups-freezer/crashtest
 sleep 3600 &
 echo $! > tasks

Then run this ugly perl one-liner from within the same "crashtest"
directory:

 perl -e 'while (1) { open FILE, ">freezer.state" or die; print FILE
"FROZEN" or die; close FILE or die; open FILE, ">freezer.state" or die;
print FILE "THAWED" or die; close FILE or die; };'

On my test machines, the following BUG reproducibly happens in less than
a second, and the machine locks up:

[ 2703.254372] ------------[ cut here ]------------
[ 2703.254530] kernel BUG at
/build/linux-dJLVDt/linux-3.2.46/kernel/cgroup_freezer.c:241!
[ 2703.254769] invalid opcode: 0000 [#1] SMP
[ 2703.254917] Modules linked in: netconsole nfnetlink_log nfnetlink
configfs nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop
snd_intel8x0 snd_ac97_codec snd_pcm snd_page_alloc snd_timer snd
soundcore ac97_bus ac battery processor parport_pc parport power_supply
thermal_sys button psmouse serio_raw pcspkr joydev evdev i2c_piix4
i2c_core vboxguest(O) ext4 crc16 jbd2 mbcache usbhid hid sg sr_mod
sd_mod cdrom crc_t10dif ata_generic ata_piix ohci_hcd ehci_hcd ahci
libahci usbcore e1000 libata scsi_mod usb_common [last unloaded: netconsole]
[ 2703.256018]
[ 2703.256018] Pid: 2835, comm: perl Tainted: G           O
3.2.0-4-686-pae #1 Debian 3.2.46-1 innotek GmbH VirtualBox/VirtualBox
[ 2703.256018] EIP: 0060:[<c106dc6f>] EFLAGS: 00010002 CPU: 0
[ 2703.256018] EIP is at update_if_frozen.isra.1+0x47/0x73
[ 2703.256018] EAX: 00000000 EBX: 00000001 ECX: df2ef4c0 EDX: dd265ee4
[ 2703.256018] ESI: 00000001 EDI: dd6a6350 EBP: 00000000 ESP: dd265edc
[ 2703.256018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2703.256018] Process perl (pid: 2835, ti=dd264000 task=df248ee0
task.ti=dd264000)
[ 2703.256018] Stack:
[ 2703.256018]  dd265ee4 df2ef4c0 00000000 de2b1284 df2ef4c0 dd6a6340
dd265f28 00000002
[ 2703.256018]  c106dd5a c12c271a c1165b6c c106dd01 c13e892c dd265f28
0916b860 c106b49d
[ 2703.256018]  00000006 df2ef4c0 00001000 5a4f5246 00004e45 520eb4b9
2fb866f6 520eb4bf
[ 2703.256018] Call Trace:
[ 2703.256018]  [<c106dd5a>] ? freezer_write+0x59/0x13c
[ 2703.256018]  [<c12c271a>] ? _cond_resched+0x5/0x18
[ 2703.256018]  [<c1165b6c>] ? _copy_from_user+0x28/0x47
[ 2703.256018]  [<c106dd01>] ? freezer_read+0x66/0x66
[ 2703.256018]  [<c106b49d>] ? cgroup_file_write+0x18f/0x1e1
[ 2703.256018]  [<c10ccddf>] ? rw_verify_area+0xc6/0xe7
[ 2703.256018]  [<c106b30e>] ? cgroup_file_open+0x87/0x87
[ 2703.256018]  [<c10cd07f>] ? vfs_write+0x83/0xd4
[ 2703.256018]  [<c10cd23f>] ? sys_write+0x3d/0x61
[ 2703.256018]  [<c12c7f5f>] ? sysenter_do_call+0x12/0x28
[ 2703.256018] Code: e8 2b f6 ff ff eb 0b e8 2d ff ff ff 46 3c 01 83 db
ff 8b 44 24 04 8d 54 24 08 e8 fe f6 ff ff 85 c0 75 e4 85 ed 75 06 85 db
74 17 <0f> 0b 4d 75 0c 39 f3 75 0e c7 07 02 00 00 00 eb 06 39 f3 74 02
[ 2703.256018] EIP: [<c106dc6f>] update_if_frozen.isra.1+0x47/0x73
SS:ESP 0068:dd265edc
[ 2703.256018] ---[ end trace 29c9f3fc0f436abe ]---

I have duplicated this on wheezy with this kernel:

 Linux [hostname] 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux

And on squeeze with the same kernel backported, but on different amd64
(non-virtual) hardware:

 Linux [hostname] 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.46-1~bpo60+1
x86_64 GNU/Linux

In my testing, the BUG only happens on SMP machines, and not on single
CPU machines.

Also, if you include a slight delay before the freeze, the problem
doesn't happen reproducibly, at least to me:

 perl -e 'while (1) { select (undef, undef, undef, 0.01); open FILE,
">freezer.state" or die; print FILE "FROZEN" or die; close FILE or die;
open FILE, ">freezer.state" or die; print FILE "THAWED" or die; close
FILE or die; };'  # does not BUG due to the select() delay

Looking at line 241 of kernel/cgroup_freezer.c in version 3.2.46,
something is clearly wrong: the code believes the state of the group is
CGROUP_THAWED, and yet it contains a frozen task. The fact that it's
both timing- and SMP- dependent suggests a race condition of some kind.

-- System Information:
Debian Release: 7.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
Robert L Mathews, Tiger Technologies


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to