We've been able to reproduce the bug in a more isolated environment. I wrote a Python script (pgslam.py) that generates the (correct enough) similar load to our production traffic. In addition, I wrote a bash script that will setup a hi1.4xlarge EC2 instance to reproduce the issue. During the tests, I launched the pgslam.py script from another instance and pointed it at the instance prepared with the bash script:
This command results in the EC2 instance built with that script locking up in under a minute: $ python pgslam.py 'host=10.10.10.10 user=pgslam password=pgslam' 800 These messages appear in the console log: 706342.844192] BUG: soft lockup - CPU#7 stuck for 23s! [postgres:9266] [706342.844272] Stack: [706342.844296] Call Trace: [706342.844409] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [706370.844190] BUG: soft lockup - CPU#7 stuck for 23s! [postgres:9266] [706370.844519] Stack: [706370.844549] Call Trace: [706370.844916] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [706371.320186] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 11 13} (detected by 7, t=15002 jiffies) [706406.844191] BUG: soft lockup - CPU#7 stuck for 24s! [postgres:9266] [706406.844293] Stack: [706406.844330] Call Trace: [706406.844461] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [706434.844191] BUG: soft lockup - CPU#7 stuck for 22s! [postgres:9266] [706434.844273] Stack: [706434.844297] Call Trace: [706434.844411] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [706462.844192] BUG: soft lockup - CPU#7 stuck for 22s! [postgres:9266] [706462.844273] Stack: [706462.844297] Call Trace: [706462.844412] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1011792 Title: Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1011792/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs