Package: src:linux Version: 3.2.46-1 Severity: important Dear Debian Linux Kernel Maintainers,
If I create a cgroup freezer container on an SMP machine and repeatedly freeze/thaw it in a loop, the kernel freezes with a BUG. To reproduce, create a cgroups freezer container with a single process in it on an SMP machine with wheezy standard kernel 3.2.46-1: mkdir /dev/cgroups-freezer mount -t cgroup -o freezer freezer /dev/cgroups-freezer mkdir /dev/cgroups-freezer/crashtest cd /dev/cgroups-freezer/crashtest sleep 3600 & echo $! > tasks Then run this ugly perl one-liner from within the same "crashtest" directory: perl -e 'while (1) { open FILE, ">freezer.state" or die; print FILE "FROZEN" or die; close FILE or die; open FILE, ">freezer.state" or die; print FILE "THAWED" or die; close FILE or die; };' On my test machines, the following BUG reproducibly happens in less than a second, and the machine locks up: [ 2703.254372] ------------[ cut here ]------------ [ 2703.254530] kernel BUG at /build/linux-dJLVDt/linux-3.2.46/kernel/cgroup_freezer.c:241! [ 2703.254769] invalid opcode: 0000 [#1] SMP [ 2703.254917] Modules linked in: netconsole nfnetlink_log nfnetlink configfs nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcm snd_page_alloc snd_timer snd soundcore ac97_bus ac battery processor parport_pc parport power_supply thermal_sys button psmouse serio_raw pcspkr joydev evdev i2c_piix4 i2c_core vboxguest(O) ext4 crc16 jbd2 mbcache usbhid hid sg sr_mod sd_mod cdrom crc_t10dif ata_generic ata_piix ohci_hcd ehci_hcd ahci libahci usbcore e1000 libata scsi_mod usb_common [last unloaded: netconsole] [ 2703.256018] [ 2703.256018] Pid: 2835, comm: perl Tainted: G O 3.2.0-4-686-pae #1 Debian 3.2.46-1 innotek GmbH VirtualBox/VirtualBox [ 2703.256018] EIP: 0060:[<c106dc6f>] EFLAGS: 00010002 CPU: 0 [ 2703.256018] EIP is at update_if_frozen.isra.1+0x47/0x73 [ 2703.256018] EAX: 00000000 EBX: 00000001 ECX: df2ef4c0 EDX: dd265ee4 [ 2703.256018] ESI: 00000001 EDI: dd6a6350 EBP: 00000000 ESP: dd265edc [ 2703.256018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 2703.256018] Process perl (pid: 2835, ti=dd264000 task=df248ee0 task.ti=dd264000) [ 2703.256018] Stack: [ 2703.256018] dd265ee4 df2ef4c0 00000000 de2b1284 df2ef4c0 dd6a6340 dd265f28 00000002 [ 2703.256018] c106dd5a c12c271a c1165b6c c106dd01 c13e892c dd265f28 0916b860 c106b49d [ 2703.256018] 00000006 df2ef4c0 00001000 5a4f5246 00004e45 520eb4b9 2fb866f6 520eb4bf [ 2703.256018] Call Trace: [ 2703.256018] [<c106dd5a>] ? freezer_write+0x59/0x13c [ 2703.256018] [<c12c271a>] ? _cond_resched+0x5/0x18 [ 2703.256018] [<c1165b6c>] ? _copy_from_user+0x28/0x47 [ 2703.256018] [<c106dd01>] ? freezer_read+0x66/0x66 [ 2703.256018] [<c106b49d>] ? cgroup_file_write+0x18f/0x1e1 [ 2703.256018] [<c10ccddf>] ? rw_verify_area+0xc6/0xe7 [ 2703.256018] [<c106b30e>] ? cgroup_file_open+0x87/0x87 [ 2703.256018] [<c10cd07f>] ? vfs_write+0x83/0xd4 [ 2703.256018] [<c10cd23f>] ? sys_write+0x3d/0x61 [ 2703.256018] [<c12c7f5f>] ? sysenter_do_call+0x12/0x28 [ 2703.256018] Code: e8 2b f6 ff ff eb 0b e8 2d ff ff ff 46 3c 01 83 db ff 8b 44 24 04 8d 54 24 08 e8 fe f6 ff ff 85 c0 75 e4 85 ed 75 06 85 db 74 17 <0f> 0b 4d 75 0c 39 f3 75 0e c7 07 02 00 00 00 eb 06 39 f3 74 02 [ 2703.256018] EIP: [<c106dc6f>] update_if_frozen.isra.1+0x47/0x73 SS:ESP 0068:dd265edc [ 2703.256018] ---[ end trace 29c9f3fc0f436abe ]--- I have duplicated this on wheezy with this kernel: Linux [hostname] 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux And on squeeze with the same kernel backported, but on different amd64 (non-virtual) hardware: Linux [hostname] 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.46-1~bpo60+1 x86_64 GNU/Linux In my testing, the BUG only happens on SMP machines, and not on single CPU machines. Also, if you include a slight delay before the freeze, the problem doesn't happen reproducibly, at least to me: perl -e 'while (1) { select (undef, undef, undef, 0.01); open FILE, ">freezer.state" or die; print FILE "FROZEN" or die; close FILE or die; open FILE, ">freezer.state" or die; print FILE "THAWED" or die; close FILE or die; };' # does not BUG due to the select() delay Looking at line 241 of kernel/cgroup_freezer.c in version 3.2.46, something is clearly wrong: the code believes the state of the group is CGROUP_THAWED, and yet it contains a frozen task. The fact that it's both timing- and SMP- dependent suggests a race condition of some kind. -- System Information: Debian Release: 7.1 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash -- Robert L Mathews, Tiger Technologies -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org