Just to conclude the thread. The issue here was high load combined with the fact that tm has two timers (a second based timer *tm-timer* that runs every second and millisecond based timer *tm-utimer* that runs every 200ms). Both timers are protected by the same lock and the timers cannot run in parallel. The second based timer *tm-timer* sometimes takes more then 200ms to complete which prevents the millisecond based timer *tm-utimer* to be executed in its 200ms window.
-ovidiu On Fri, Apr 1, 2022 at 10:10 AM Ovidiu Sas <o...@voipembedded.com> wrote: > > Hello Bogdan, > > During my test, it was tm-utimer only. It was a typo on my side. > > I also see in the logs from time to time the other timers too, > including tm-timer. > > What I noticed in my tests is that as soon as I increase the > timer_partitions, the system is able to handle less cps until workers > are becoming 100% loaded and calls starts failing (due to > retransmissions and udp queue being full - the udp queue is quite big > to accommodate spikes). > > Is there a way to make the timer lists more efficient (in terms of ops > in shared memory)? > > Please take a look at the mentioned ticket as it makes the ratelimit > module unusable (and maybe with side effects for other modules that > require accurate timeslots). > Basically, for a timer that is supposed to fire every second, the > observed behaviour is that the timer fires at approx 1s (or less by a > few ms) and then from time to time it fires at 1.8s and the cycle > repeats. > > Thanks, > Ovidiu > > On Fri, Apr 1, 2022 at 9:48 AM Bogdan-Andrei Iancu <bog...@opensips.org> > wrote: > > > > Hi Ovidiu, > > > > Originally you mentioned tm-utimer, now tm-timer....which one is ? As it > > is very important. > > > > When increasing the timer_partitions, what you mean by "instability" of > > the system? > > > > Yes, in the reactor, the UDP workers may handle timer jobs also beside > > the UDP traffic. While the timer procs are 100% dedicated to the timer > > jobs only. So yes, if the workers are idle, they can act as any timer > > procs also. > > > > Increasing the TM_TABLE_ENTRIES should not impact too much, at the > > performance over the timer lists (in TM) has nothing to do with the size > > of the hash table. > > > > I will check the mentioned ticket, but if what you are saying is true on > > the HP malloc, it means the bottle neck is actually in the ops on the > > shared memory. > > > > Best regards, > > > > Bogdan-Andrei Iancu > > > > OpenSIPS Founder and Developer > > https://www.opensips-solutions.com > > OpenSIPS eBootcamp 23rd May - 3rd June 2022 > > https://opensips.org/training/OpenSIPS_eBootcamp_2022/ > > > > On 4/1/22 12:31 AM, Ovidiu Sas wrote: > > > Hello Bogdan, > > > > > > Thank you for looking into this! > > > > > > I get warnings mostly from tm-timer. I've seen warnings from > > > blcore-expire, dlg-options-pinger, dlg-reinvite-pinger, dlg-timer (in > > > the logs, but not during my testing). > > > While testing, I saw only the tm-timer warnings. > > > > > > I took a superficial look at the "timer_partitions" and your > > > explanation matches my findings. However, increasing the > > > "timer_partitions" makes the system unstable (doesn't matter how many > > > timer procs we have). > > > I found that I can get the most out of the system if one > > > "timer_partiton" is used along with one timer_proc. > > > > > > With the reactor scheme, a UDP receiver can handle timer jobs, is that > > > right? If yes, if the UDP workers are idle, there are enough resources > > > to handle timer jobs, correct? > > > > > > I was also increasing the TM_TABLE_ENTRIES to (1<<18) and there was a > > > little bit of performance increase, but I will need to test more to > > > come up with a valid conclusion. > > > > > > On the other hand, I noticed a strange behavior on timer handling. > > > Take a look at: > > > https://github.com/OpenSIPS/opensips/issues/2797 > > > Not sure if this is related to the warnings that I'm seeing. > > > > > > The biggest performance improvement was switching to HP_MALLOC for > > > both pkg and shm memory. > > > > > > I will keep you posted with my findings, > > > Ovidiu > > > > > > On Thu, Mar 31, 2022 at 10:28 AM Bogdan-Andrei Iancu > > > <bog...@opensips.org> wrote: > > >> Hi Ovidiu, > > >> > > >> As warnings from the timer_ticker, do you get only for the tm-utimer > > >> task ? I'm asking as the key question here is where the bottleneck is : > > >> in the whole "timer" subsystem, or in the tm-utimer task only? > > >> > > >> The TM "timer_partitions" creates multiple parallel timer lists, to > > >> avoid having large "amounts" of transactions handled at a moment in a > > >> single tm-utimer task (but rather split/partition the whole of amount of > > >> handled transactions into smaller chunks, to be handled one at a time in > > >> the timer task. > > >> > > >> The "timer_workers" creates more than one dedicated processes for > > >> handling the timer tasks (so scales up the timer sub-system). > > >> > > >> If you get warnings only on tm-utimer, I suspect the bottleneck is TM > > >> related, mainly on performing re-transmissions (that's what that task is > > >> doing). So the increasing the timer-partitions should be the way to help. > > >> > > >> Best regards, > > >> > > >> Bogdan-Andrei Iancu > > >> > > >> OpenSIPS Founder and Developer > > >> https://www.opensips-solutions.com > > >> OpenSIPS eBootcamp 23rd May - 3rd June 2022 > > >> https://opensips.org/training/OpenSIPS_eBootcamp_2022/ > > >> > > >> On 3/24/22 12:54 AM, Ovidiu Sas wrote: > > >>> Hello all, > > >>> > > >>> I'm working on tuning an opensips server. I get this pesky: > > >>> WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled > > >>> I was trying to get rid of them by playing with the tm > > >>> timer_partitions parameter and the timer_workers core param. > > >>> By increasing any of them doesn't increase performance. > > >>> By increasing both of them, it actually decreases performance. > > >>> The server is not at limit, the load on the UDP workers is around > > >>> 50-60 with some spikes. > > >>> I have around 3500+ cps sipp traffic. > > >>> > > >>> My understanding is that by increasing the number of timer_partitions, > > >>> we will have more procs walking in parallel over the timer structures. > > >>> If we have on timer structure, we have one proc walking over it. > > >>> How is this working for two timer structures? What is the difference > > >>> between the first and the second timer structure? Should we expect > > >>> less work for each proc? > > >>> > > >>> For now, to reduce the occurrence of the warning log, I increased the > > >>> timer interval for tm-utimer from 100ms to 200ms. This should be ok as > > >>> the timer has the TIMER_FLAG_DELAY_ON_DELAY flag set. > > >>> > > >>> Thanks, > > >>> Ovidiu > > >>> > > > > > > > > -- > VoIP Embedded, Inc. > http://www.voipembedded.com -- VoIP Embedded, Inc. http://www.voipembedded.com _______________________________________________ Users mailing list Users@lists.opensips.org http://lists.opensips.org/cgi-bin/mailman/listinfo/users