On 6/12/20 9:11 PM, Calvin Ellison wrote:

You suggested to "Monitor your receive queue scrupulously at a very high timing resolution". How do I do this?

If using pre-systemd systems, e.g. EL6:

# netstat --inet -n -l | grep 5060

If it's systemd and beyond -- I'm sure Ubuntu Server 18 is, though I have no experience with it:

# ss -4nl | grep 5060

Example:

---
[root@allegro-1 ~]# ss -4nl | grep 5060
udp UNCONN 0 0 10.150.20.5:5060 *:* udp UNCONN 0 0 10.150.20.2:5060 *:* udp UNCONN 0 0 209.51.167.66:5060 *:* tcp LISTEN 0 128 10.150.20.2:5060 *:*
---

The third column there (all-0s) is the RecvQ, as can be gleaned from the header this command outputs:

----
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
---

For `netstat`, it would be the second column.

To monitor it at a low interval, for example 200 ms (5 times per sec), you could do something like:

---
#!/bin/bash

while : ; do
        echo -n "$(date +"%T.%3N"): "
        ss -4nl | grep 5060 | head -1 | awk '{print $3}'
        sleep 0.2
done
---

That should give you some idea of where the value sits in general.

You propose there is a pathological issue and the increased buffer size is masking it. How do I determine what that issue is?

Without knowing what your exact routing workflow is, I can't say.

However, 99.9% of the time, it occurs in blocking queries to databases or other data sources.

I've asked repeatedly about children, shared memory, process memory, timer_partitions, etc. but the only answers have been "try more". I've been trying more and less of these things two weeks and changing the buffers was the only thing that appeared to have any immediate impact. How do I know when enough is enough versus too much?

I wrote this article several years ago for Kamailio, but the same basic considerations apply to OpenSIPS:

http://www.evaristesys.com/blog/tuning-kamailio-for-high-throughput-and-performance/

"Try more" is definitely not the answer except in cases where the workload is overwhelmingly network I/O-bound and/or database-bound. Otherwise, the most natural course of action would be to spawn a functionally infinite number of children. However, children create context switches, contend with each other for CPU time (less a concern if most of the workload is waiting on blocking external I/O) and fight for various global shared memory structures and locks (still a concern regardless). So, there is a point of diminishing returns for any given workload. All other things being equal, as per the article, the reasonable number of child processes is equal to the number of available CPU threads (in /proc/cpuinfo). This number can be increased if the workload is very I/O-bound, but only to a point. It's hard to say exactly what that point is, and it does have to be empirically determined, but I would not run more than 2 * (CPU threads).

Note, there have been no memory-related log messages. The 16-thread servers have 48GB RAM and the 8-thread servers have 16GB. I'm happy to give all that to OpenSIPS once I know the right way to carve it up.

I see no rationality in giving it all to OpenSIPS.

It's worth bearing in mind that there are two kinds of memory allocations:

- Shared memory, used by the system for global/system-wide data constructs, such as transaction memory, dialog state, etc.;

- Package memory, memory that is private to each process and used for handling the immediate message. That means every child process pre-allocates the package memory requested, so this value should of course be much, much smaller than your shared memory pool size.

But still, when you consider all the data that OpenSIPS needs to keep in the course of call processing, a lot of it is ephemeral and transaction-associated. Once the call is set up, the INVITE transaction is disposed. Other call state may add up to a few kilobytes per call at most (notwithstanding page sizes and blocks in the underlying allocator), but nothing on the order of gigabytes upon gigabytes. Assuming 4 KB per call and 200,000 concurrent calls, that's ~800 MB, and that is a very generous assumption indeed.

-- Alex

--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

_______________________________________________
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to