Hello, On 10/06/08 13:22, mayamatakeshi wrote: > [...] > > > > Hello, > we have openser 1.3.3 running in production > (current rev.: > 4943). > For 3 times in 50 days we had to restart openser to > correct pkg memory problem. > > openser 1.3.3 was released 3 weeks ago, so I guess you were > running previous version before, but it happened again > since > you upgraded to 1.3.3, right? > > > After some time logging messages like this: > /openser.log:Aug 19 10:39:18 ipx022 > /usr/local/sbin/openser[16991]: > ERROR:core:new_credentials: no pkg memory left, > openser will eventually run out of pkg memory and > refuse > all subsequent requests. > > We are trying to recreate this in our lab so that > we can > follow memory troubleshooting instructions at > > http://kamailio.net/dokuwiki/doku.php/troubleshooting:memory, > but so far we were unable to do it even when generating > millions of calls and registration transactions (we are > using SIPp to generate normal call flows and even > abnormal > call flows detected when reading openser.log, like > 'invalid cseq for aor', malformed SIP messages etc). > > We can spot memory leaks even the "out of memory" > message is > not printed. Just archive the logs (the most important > is the > shut down time) and made them available for download so > they > can be investigated. > > There could be two reasons: > - there is memory leak but happens in some cases that you > don't reproduce in lab, but they are in the production > environment > - you get memory fragmentation > > Let's see first the debug messages... > > > Hello, > here are the link for openser.log and cfg files: > http://www.yousendit.com/download/bVlEV0o4R3NoeWJIRGc9PQ > > After compilation with debug flags for memory manager, I left > openser running in production for 24 hours. Then, I moved all > traffic to another host and waited for more than 30 minutes > before > stopping openser. > In the openser.cfg, I set debug=2. If you need, I can run > it again > with a higher value (but I hope it doesn't have to be too high, > due to overhead concerns). > > Sorry, I forgot to tell one thing: the last revision that > showed this problem was 4809, so we reverted back to that > revision before performing the above. > > to understand that you couldn't reproduce with latest svn version? > So you had to get a previous version? > > > Hi, > no, the reason for reversion is that the latest version running in > production will not show the problem because we adopted preventive > reset to minimize impact to customer calls. So I don't know yet if it > shows this problem or not. > So I collected the logs using a revision that I was sure could > recreate the problem. OK, I understand now. I was looking at the logs and there seems to be a leak with db operations - something does not free a db result. I will go over the modules that you are using and try to spot any issue -- i will check the change log to see if something happened in the last time regarding such issue.. > > But here's some developments on my investigation: > Up to now, I was trying to recreate the problem using VirtualMachines > running the same OS (Fedora 5) as in production. It never happened > there, even after 30 million of calls. > But we eventually were able to test openser 1.3 using a production > machine with the same spec as the ones showing the problem and we were > able to generate pkg memory problem using a simple outgoing SIPp > scenario. The problem always happens after we reach around 28.000 > calls and we confirmed the amount of calls needed to cause the problem > grows linearly with the amount of pkg memory (after increase of pkg > memory pool by 4, problem started to happen only after around 128.000 > calls). > However, we also tried the same tests with kamailio 1.4 (rev. 5017) on > that machine and we could not recreate the problem after 1.5 million > calls, so we are thinking in just upgrade to 1.4 after other scenarios > show everything else is working. OK, 1.4 is recommended, it has lot of new features and many fixes. > > But I don't know why the problem cannot be recreated using the VMs: > the only significant difference is that the productions machines have > 4 NICs that are bound in 2 pairs (1 for private ip and another for > public ip) while the VMs have just one NIC. I see no relation with the NICs. > > I hope upgrading to 1.4 will solve everything, however, since nobody > is complaining about having openser stopping after 28.000 calls, I > still believe we have some problem in the openser.cfg itself. I'll > check it after we put kamailio 1.4 in production. OK, I will dig in further, I might be a bit slow, however, these days.
Cheers, Daniel -- Daniel-Constantin Mierla http://www.asipto.com _______________________________________________ Users mailing list [email protected] http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
