All: We have thus far been unable to reproduce the following application behavior when running outside of gdb running inside emacs. Therefore, I am starting with this list for suggestions on how to proceed.
The high level description of the problem is this: At times, when running an application that (a) opens many sockets (b) receives relatively high rates of traffic on those sockets (c) eventually has many threads running [50+] and (d) mallocs several large blocks of memory, some up to 500M or 1G ... the application will "hang" for long periods inside memset(0) of one of those memory blocks. It is not clear that (a-c) are relevant, since the behavior is often exhibited in an initialization thread ahead of starting the sockets. The slow memset() happens most or more often when running gdb inside emacs. C-c C-c will take 30sec+ and sometimes up to several minutes to return to the (gdb) prompt, and the machine will be generally slow for some period of time during/after this. We do not see evidence of virtual memory paging but we are not certain we are looking in all the right places -- hints appreciated. The problem occurs both within stock gdb and within a gdb patched to with this patch: http://sourceware.org/ml/gdb-patches/2010-04/msg00466.html. The version strings are respectively: GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2 GNU gdb (GDB) 7.2 Invariably, if breaking via C-c C-c we end up with the program counter *inside* memset, which causes me to suspect that some kind of overactive page fault situation is occurring, and that this is drastically slowing the machine. We have seen references to problems with memset when crossing 2G boundaries, but this is a 64bit box with 33G of ram, so I would have thought this is not a problem. Further, the reference seems to be to introduce pointers mapped to userland from a driver, which is not the case here: http://lists.kernelnewbies.org/pipermail/kernelnewbies/2011-February/000760.html Another reference, which implicates MCFG ACPI table problems (http://lists.us.dell.com/pipermail/linux-precision/2011-February/001503.html and https://bugzilla.redhat.com/show_bug.cgi?id=581933) also seems unrelated, as booting with pci=nommconf does not help. I am addressing gdb to start with because this intermittent problem does not seem to occur outside of gdb. My thought is that when running inside both emacs and gdb we end up mapped into an area of physical memory that exhibits a problem at the kernel level. However, I am not entirely sure whether gdb could be receiving any signals or be otherwise interposed in the memory allocation and subsequent walk of those bytes by memset(). I can say with a low degree of confidence that the problem occurs more frequently when the system has higher incoming network load, though there is no chance that all CPUs are pegged or that "lots" of active threads are running by the time the slow call is made. In fact, often, the memset() occurs in an initialization thread that starts before other threads. This, gdb, emacs and bash are always the *only* running processes besides stock ubuntu processes -- i.e., there is not other work going on that is starving the CPUs. Finally, we have disabled the "ondemand" functionality, so all cores are running at full speed. Another reason to suspect gdb is that we have run fairly thorough memory tests (both "bios level" and memtester within linux) on the machine, including writing an application that simply allocates huge chunks of memory and memsets them. The large memset application was run inside gdb and inside emacs and did not exhibit the behavior. Could this be some kind of code offset/alignment or symbol lookup problem exhibited only by the problem executable when loaded by gdb??? Also, this does not appear to be a problem with emacs' prompt parsing/gud-mode/etc because the application definitely does not proceed beyond the memset for a long period and breaking puts us within memset. Under these conditions it is sometimes necessary to C-z out of emacs, and kill -9 both gdb and the application. I am somewhat at a loss on how to debug this and do not have the resources to run into too many dead ends. Therefore, can anyone suggest whether whole-system profiling such as oprofile would help catch kmap/kunmap or other kernel / virtual-memory badness ? Would running gdb inside gdb be fruitful, and if so can anyone point me to functions or areas I would look to breakpoint or otherwise monitor under such a setup? I am unable to immediately reproduce the problem in order to provide a stack trace of the slow thread at this moment, but will follow up when I can. Details of the hardware and OS below. Thanks in advance, A.A. Ubuntu system with 48 AMD cores uname -a: Linux 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Last /proc/cpuinfo stanza: processor : 47 vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6180 SE stepping : 1 cpu MHz : 2500.000 cache size : 512 KB physical id : 3 siblings : 12 core id : 5 cpu cores : 12 apicid : 75 initial apicid : 59 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter bogomips : 5000.18 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate MemTotal: 33008392 kB MemFree: 28624044 kB Buffers: 59708 kB Cached: 1302196 kB SwapCached: 1764 kB Active: 2700812 kB Inactive: 369772 kB Active(anon): 1673156 kB Inactive(anon): 35572 kB Active(file): 1027656 kB Inactive(file): 334200 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 14865404 kB SwapFree: 14861092 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 1707772 kB Mapped: 59296 kB Shmem: 28 kB Slab: 666848 kB SReclaimable: 85424 kB SUnreclaim: 581424 kB KernelStack: 3128 kB PageTables: 10576 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 31369600 kB Committed_AS: 4772588 kB VmallocTotal: 34359738367 kB VmallocUsed: 122392 kB VmallocChunk: 34328955388 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 65920 kB DirectMap2M: 9369600 kB DirectMap1G: 24117248 kB _______________________________________________ bug-gdb mailing list bug-gdb@gnu.org https://lists.gnu.org/mailman/listinfo/bug-gdb