A good way to find out what its burning CPU doing if the logs have no useful 
info is to get a stack trace on all threads, several times, then just look at 
them and see what's going on. Even better is to use 'top's 'H' option to find 
out which specific thread ID's are burning cpu and correlate that with the 
stack traces. Best is to run oprofile (which is kind of a PITA to set up) and 
see where all the CPU is going with the calltree option.

I'm happy to try to help read a few stack dumps, I've already chased down the 
continuous 5% cpu issue (and it seems to be 'by design' unfortunately, it 
idles/polls at a high rate) this way. Not saying this is guaranteed, but it 
might give a hint as to what is getting stuck in Freeswitch and help point the 
direction for next steps.

Grabbing a core file (with gcore) is OK too, but you really want a few samples 
and to start with, only the stacks are interesting instead of a full dump of 
memory (the core might be interesting later though).

1) as root ps ax or top to find the main 'pid' of the freeswitch process 
burning cycles
2) using top turn on thread mode by typing 'H', try to get a list of freeswitch 
thread ID's with high CPU
3) save all output from here on (use 'script xxx.out' or whatever tool you're 
familiar with)
4) gdb /proc/exe/12345 12345     (note this will suspend freeswitch while you 
do this)
    - be sure to use the 'main' process ID, not one of the thread ids
5) 'info threads'
    - this should show 15+ threads, if it shows only 1 you didn't use the main 
pid, try 'ps ax | grep frees'
6) 'thread apply all bt'
    - this will spew a stack trace for each thread
    - do this a few times to get a rough sample, between each do 'cont' to 
resume freeswitch, wait a bit, then hit ctl-c (in gdb)
7) 'quit' out of gdb and say its OK to detach the running process and 
freeswitch will continue running 'normally'
8) for good measure (still as root) run 'lsof -p 12345' to get a list of open 
files, this frequently gives a clue as to what the process is doing

When you look at the thread stacks pay specific attention to the ones you 
determined were running at high CPU from top. If you post or send me the stacks 
and lsof output (and which threads were burning the CPU) I can help look for 
obvious clues.

The other easy debugging tool to use is 'strace', you can run (for a short 
time) something like 'strace -f -o /tmp/xxx.out -s 1000 -p 12345', this will 
log every system call freeswitch is doing. Run this for a few seconds then hit 
ctl-c (it will make a huge file and slow down freeswitch while its running). If 
the stuck threads are interacting the kernel this is a great way to get a clue 
as to what's going on in them (maybe we'll see it playing the same file over 
and over into a closed socket for instance).

-Eric
 
On Feb 4, 2010, at 8:58 AM, mkitchin.pub...@gmail.com wrote:

> Rebooted last night, and it started up again a little while ago. I'm already 
> at about 200% processor utilization. This looks like it could be a disaster 
> for me. 
> I've take a snapshot of the logs to preserve any evidence that might be 
> there. 
> 
> Cpu0  : 98.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
> Cpu1  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   8177104k total,  3057448k used,  5119656k free,   138992k buffers
> Swap: 10223608k total,        0k used, 10223608k free,  1730152k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3719 sipxchan  18   0  298m  46m 4888 S 196.7  0.6 273:22.00 freeswitch
>  3915 sipxchan  21   0 1438m 113m 9068 S  1.3  1.4   1:18.14 java
> 
> [r...@nshpbx1 sipxpbx]# ps aux |grep "3719"
> root       790  0.0  0.0  61156   724 pts/0    S+   08:58   0:00 grep 3719
> 500       3719 35.0  0.5 305748 47672 ?        Sl   Feb03 280:04 
> /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf -db 
> /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs 
> /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql
> 
> 
> 
> On 2/3/2010 5:35 PM, mkitchin.pub...@gmail.com wrote:
>> 
>> After reading a post today that mentioned size limits in posts, I realized 
>> these posts never made it through because it had 2 screenshots. I have 
>> removed the pictures and am resending. 
>> Any help would be greatly appreciated!
>> 
>> On 2/3/2010 12:05 PM, mkitchin.pub...@gmail.com wrote:
>>> 
>>> I rebooted last night to resolve the problem. It just started happening 
>>> again.
>>> 
>>> It looks like there was a bug report briefly open about something similar:
>>> http://track.sipfoundry.org/browse/XX-5881
>>> 
>>> Any ideas on this one? I will gladly provide any info that would help. 
>>> 
>>> Cpu0  : 12.0%us,  1.3%sy,  0.0%ni, 85.4%id,  0.0%wa,  0.0%hi,  1.3%si,  
>>> 0.0%st
>>> Cpu1  : 92.0%us,  0.3%sy,  0.0%ni,  7.6%id,  0.0%wa,  0.0%hi,  0.0%si,  
>>> 0.0%st
>>> Mem:   8177104k total,  2425588k used,  5751516k free,   152772k buffers
>>> Swap: 10223608k total,        0k used, 10223608k free,  1017704k cached
>>> 
>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 13533 sipxchan  18   0  281m  33m 4944 S 99.5  0.4  25:31.50 freeswitch
>>> 
>>> [r...@nshpbx1 sipxpbx]# ps aux |grep "13533"
>>> 500      13533  2.9  0.4 288392 34092 pts/0    Sl   Feb02  26:32 
>>> /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf -db 
>>> /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs 
>>> /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql
>> 
>>> I see others with the same issue:
>>> http://list.sipfoundry.org/archive/sipx-dev/msg21612.html
>>> I am not subscribed to the dev list. I don't think I could contribute too 
>>> much. the discussion on this one seems to have died off. It would seem to 
>>> me this is a pretty significant problem. 
>>> 
>>>> 
>>>> On 2/2/2010 12:55 PM, mkitchin.pub...@gmail.com wrote:
>>>>> 
>>>>> I know there is a similar thread about this, but it was a little 
>>>>> different and I didn't want to hijack it.
>>>>> http://list.sipfoundry.org/archive/sipx-users/msg21074.html
>>>>> 
>>>>> A freeswitch process has started using a large amount of CPU on my 
>>>>> server. I can't see any obvious reason why. 
>>>>> 
>>>>> Tasks: 151 total,   1 running, 150 sleeping,   0 stopped,   0 zombie
>>>>> Cpu0  : 50.2%us,  0.3%sy,  0.0%ni, 48.8%id,  0.0%wa,  0.3%hi,  0.3%si,  
>>>>> 0.0%st
>>>>> Cpu1  : 53.0%us,  0.3%sy,  0.0%ni, 46.7%id,  0.0%wa,  0.0%hi,  0.0%si,  
>>>>> 0.0%st
>>>>> Mem:   8177104k total,  2887232k used,  5289872k free,   183680k buffers
>>>>> Swap: 10223608k total,        0k used, 10223608k free,  1232760k cached
>>>>> 
>>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>  3811 sipxchan  18   0  296m  49m 4944 S 97.8  0.6   1133:41 freeswitch
>>>>>  3591 sipxchan  18   0 1541m 403m  11m S  1.3  5.1  38:52.22 java
>>>>>  3946 sipxchan  18   0 1418m 197m 9100 S  1.3  2.5  25:33.71 java
>>>>>  3922 sipxchan  19   0 1379m 196m 9184 S  1.0  2.5   8:33.80 java
>>>>> 10359 postgres  15   0  121m  13m  10m S  1.0  0.2   2:49.72 postmaster
>>>>> 
>>>>> This may provide some details as to what the process is doing:
>>>>> [r...@nshpbx1 sipxpbx]# ps aux |grep "freeswitch"
>>>>> 500       3811 13.9  0.6 303420 50768 ?        Sl   Jan27 1134:08 
>>>>> /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf 
>>>>> -db /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs 
>>>>> /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql
>>>>> 
>>>>> Local CPU monitoring seems to have died shortly after it registered the 
>>>>> spike in CPU.
>>>>>   <removed - picture of sipx SPU stats showing CPU stat collection died>
>>>>> 
>>>>> Remote monitoring is still recording the high CPU utilization:
>>>>> <removed - picture of zenoss showing CPU uake went way up and stayed up>
>>>>> 
>>>>> I only have 1 warning entry in freeswitch.log from yesterday, and none 
>>>>> from today.
>>>>> 2010-02-01 07:52:10 [WARNING] switch_core_file.c:119 
>>>>> switch_core_perform_file_open() Sample rate doesn't match
>>>>> 
>>>>> I only have a few active calls right now, and none active for more than 
>>>>> an hour. 
>>>>> 
>>>>> Anyone have any idea what might be causing this?
>>>>> 
>>>>> CentOS 5.4 64 Bit, Sipx 4.0.4, sixbridge, Verizon VOIP, No firewall (not 
>>>>> needed, private connection), Polycom 450s and 550s - bootrom 4.2.1, 
>>>>> firmware 3.1.3C split. 
>>>>> 
>>>>> Thanks as always,
>>>>> Matthew
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> _______________________________________________
> sipx-users mailing list sipx-users@list.sipfoundry.org
> List Archive: http://list.sipfoundry.org/archive/sipx-users
> Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-users
> sipXecs IP PBX -- http://www.sipfoundry.org/

_______________________________________________
sipx-users mailing list sipx-users@list.sipfoundry.org
List Archive: http://list.sipfoundry.org/archive/sipx-users
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-users
sipXecs IP PBX -- http://www.sipfoundry.org/

Reply via email to