Thanks. I had to kill the process this morning, and all is well now. Next time it does it, I will start to troubleshoot.

On 2/4/2010 11:51 AM, Eric Varsanyi wrote:
A good way to find out what its burning CPU doing if the logs have no useful info is to get a stack trace on all threads, several times, then just look at them and see what's going on. Even better is to use 'top's 'H' option to find out which specific thread ID's are burning cpu and correlate that with the stack traces. Best is to run oprofile (which is kind of a PITA to set up) and see where all the CPU is going with the calltree option.

I'm happy to try to help read a few stack dumps, I've already chased down the continuous 5% cpu issue (and it seems to be 'by design' unfortunately, it idles/polls at a high rate) this way. Not saying this is guaranteed, but it might give a hint as to what is getting stuck in Freeswitch and help point the direction for next steps.

Grabbing a core file (with gcore) is OK too, but you really want a few samples and to start with, only the stacks are interesting instead of a full dump of memory (the core might be interesting later though).

1) as root ps ax or top to find the main 'pid' of the freeswitch process burning cycles 2) using top turn on thread mode by typing 'H', try to get a list of freeswitch thread ID's with high CPU 3) save all output from here on (use 'script xxx.out' or whatever tool you're familiar with) 4) gdb /proc/exe/12345 12345 (note this will suspend freeswitch while you do this)
    - be sure to use the 'main' process ID, not one of the thread ids
5) 'info threads'
- this should show 15+ threads, if it shows only 1 you didn't use the main pid, try 'ps ax | grep frees'
6) 'thread apply all bt'
    - this will spew a stack trace for each thread
- do this a few times to get a rough sample, between each do 'cont' to resume freeswitch, wait a bit, then hit ctl-c (in gdb) 7) 'quit' out of gdb and say its OK to detach the running process and freeswitch will continue running 'normally' 8) for good measure (still as root) run 'lsof -p 12345' to get a list of open files, this frequently gives a clue as to what the process is doing

When you look at the thread stacks pay specific attention to the ones you determined were running at high CPU from top. If you post or send me the stacks and lsof output (and which threads were burning the CPU) I can help look for obvious clues.

The other easy debugging tool to use is 'strace', you can run (for a short time) something like 'strace -f -o /tmp/xxx.out -s 1000 -p 12345', this will log every system call freeswitch is doing. Run this for a few seconds then hit ctl-c (it will make a huge file and slow down freeswitch while its running). If the stuck threads are interacting the kernel this is a great way to get a clue as to what's going on in them (maybe we'll see it playing the same file over and over into a closed socket for instance).

-Eric

On Feb 4, 2010, at 8:58 AM, mkitchin.pub...@gmail.com <mailto:mkitchin.pub...@gmail.com> wrote:

Rebooted last night, and it started up again a little while ago. I'm already at about 200% processor utilization. This looks like it could be a disaster for me. I've take a snapshot of the logs to preserve any evidence that might be there.

Cpu0 : 98.3%us, 0.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st Cpu1 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem:   8177104k total,  3057448k used,  5119656k free,   138992k buffers
Swap: 10223608k total,        0k used, 10223608k free,  1730152k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3719 sipxchan  18   0  298m  46m 4888 S 196.7  0.6 273:22.00 freeswitch
 3915 sipxchan  21   0 1438m 113m 9068 S  1.3  1.4   1:18.14 java

[r...@nshpbx1 sipxpbx]# ps aux |grep "3719"
root 790 0.0 0.0 61156 724 pts/0 S+ 08:58 0:00 grep 3719 500 3719 35.0 0.5 305748 47672 ? Sl Feb03 280:04 /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf -db /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql



On 2/3/2010 5:35 PM, mkitchin.pub...@gmail.com wrote:
After reading a post today that mentioned size limits in posts, I realized these posts never made it through because it had 2 screenshots. I have removed the pictures and am resending.
Any help would be greatly appreciated!

On 2/3/2010 12:05 PM, mkitchin.pub...@gmail.com wrote:
I rebooted last night to resolve the problem. It just started happening again.

It looks like there was a bug report briefly open about something similar:
http://track.sipfoundry.org/browse/XX-5881

Any ideas on this one? I will gladly provide any info that would help.

Cpu0 : 12.0%us, 1.3%sy, 0.0%ni, 85.4%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st Cpu1 : 92.0%us, 0.3%sy, 0.0%ni, 7.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8177104k total, 2425588k used, 5751516k free, 152772k buffers
Swap: 10223608k total,        0k used, 10223608k free,  1017704k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13533 sipxchan  18   0  281m  33m 4944 S 99.5  0.4  25:31.50 freeswitch

[r...@nshpbx1 sipxpbx]# ps aux |grep "13533"
500 13533 2.9 0.4 288392 34092 pts/0 Sl Feb02 26:32 /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf -db /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql

I see others with the same issue:
http://list.sipfoundry.org/archive/sipx-dev/msg21612.html
I am not subscribed to the dev list. I don't think I could contribute too much. the discussion on this one seems to have died off. It would seem to me this is a pretty significant problem.


On 2/2/2010 12:55 PM, mkitchin.pub...@gmail.com wrote:
I know there is a similar thread about this, but it was a little different and I didn't want to hijack it.
http://list.sipfoundry.org/archive/sipx-users/msg21074.html

A freeswitch process has started using a large amount of CPU on my server. I can't see any obvious reason why.

Tasks: 151 total,   1 running, 150 sleeping,   0 stopped,   0 zombie
Cpu0 : 50.2%us, 0.3%sy, 0.0%ni, 48.8%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st Cpu1 : 53.0%us, 0.3%sy, 0.0%ni, 46.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8177104k total, 2887232k used, 5289872k free, 183680k buffers Swap: 10223608k total, 0k used, 10223608k free, 1232760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
3811 sipxchan 18 0 296m 49m 4944 S 97.8 0.6 1133:41 freeswitch
 3591 sipxchan  18   0 1541m 403m  11m S  1.3  5.1  38:52.22 java
 3946 sipxchan  18   0 1418m 197m 9100 S  1.3  2.5  25:33.71 java
 3922 sipxchan  19   0 1379m 196m 9184 S  1.0  2.5   8:33.80 java
10359 postgres 15 0 121m 13m 10m S 1.0 0.2 2:49.72 postmaster

This may provide some details as to what the process is doing:
[r...@nshpbx1 sipxpbx]# ps aux |grep "freeswitch"
500 3811 13.9 0.6 303420 50768 ? Sl Jan27 1134:08 /usr/local/freeswitch/bin/freeswitch -conf /etc/sipxpbx/freeswitch/conf -db /var/sipxdata/tmp/freeswitch -log /var/log/sipxpbx -htdocs /etc/sipxpbx/freeswitch/conf/htdocs -nc -nf -nosql

Local CPU monitoring seems to have died shortly after it registered the spike in CPU. <removed - picture of sipx SPU stats showing CPU stat collection died>

Remote monitoring is still recording the high CPU utilization:
<removed - picture of zenoss showing CPU uake went way up and stayed up>

I only have 1 warning entry in freeswitch.log from yesterday, and none from today. 2010-02-01 07:52:10 [WARNING] switch_core_file.c:119 switch_core_perform_file_open() Sample rate doesn't match

I only have a few active calls right now, and none active for more than an hour.

Anyone have any idea what might be causing this?

CentOS 5.4 64 Bit, Sipx 4.0.4, sixbridge, Verizon VOIP, No firewall (not needed, private connection), Polycom 450s and 550s - bootrom 4.2.1, firmware 3.1.3C split.

Thanks as always,
Matthew








_______________________________________________
sipx-users mailing list sipx-users@list.sipfoundry.org <mailto:sipx-users@list.sipfoundry.org>
List Archive: http://list.sipfoundry.org/archive/sipx-users
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-users
sipXecs IP PBX -- http://www.sipfoundry.org/


_______________________________________________
sipx-users mailing list sipx-users@list.sipfoundry.org
List Archive: http://list.sipfoundry.org/archive/sipx-users
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-users
sipXecs IP PBX -- http://www.sipfoundry.org/

Reply via email to