For a weaker AWS instance this is what we find with cpu_layout.py: ====================================================================== Core and Socket Information (as reported by '/sys/devices/system/cpu') ====================================================================== cores = [0, 1] sockets = [0]
Socket 0 -------- Core 0 [0, 2] Core 1 [1, 3] So from this we deduce that logical core 0 and logical core 2 are on the same physical core and 1, 3 are on the other physical core, learnt something valuable today ... regards On Thursday, September 26, 2024 at 08:13:48 PM PDT, amit sehas <cu...@yahoo.com> wrote: Thanks for the suggestion, i didnt even know about cpu_layout.py ... i will definitely try it.... i made some more measurements and so far this is the hypothesis: 1) 8 hyperthreads are not the same as 8 CPUs, the scale up is not linear. 2) the vCPU cache allocation per logical CPU thread is also important, if 2 threads are running the same code on the same physical core but 2 different logical cores then we will not have cores competing with each other. 3) try to run dissimilar code on the logical cores that run on the same physical core ... the cpu map is deinitely worth figuring out ... regards On Thursday, September 26, 2024 at 08:03:30 PM PDT, Stephen Hemminger <step...@networkplumber.org> wrote: On Thu, 26 Sep 2024 17:03:17 +0000 (UTC) amit sehas <cu...@yahoo.com> wrote: > If there is a way to determine: > > vCPU thread utilization numbers over a period of time, such as a few hours > > or which processes are consuming the most CPU > > top always indicates that the server is consuming the most CPU. > > Now i am begining to wonder if 8 vCPU threads really are capable of running 6 > high intensity threads or only 4 such threads? Dont know > > Also tried to utilize pthread_setschedparam() explicitly on some of the > threads, it made no difference to the performance. But if we do it on more > than 1-2 threads then it hangs the whole system. > > This is primarily a matter of CPU scheduling, and if we restirct context > switching on even 2 critical threads we have a win. > > Some other recommendations. - avoid CPU 0 you can't isolate it, and it has other stuff that has to run there if you have main thread that sleeps, and worker threads that poll, then go ahead and put main on cpu 0. - don't put two active polling cores on shared hyper-thread. You can used DPDK's cpu_layout.py script to show this. For example: $ ./usertools/cpu_layout.py ====================================================================== Core and Socket Information (as reported by '/sys/devices/system/cpu') ====================================================================== cores = [0, 1, 2, 3] sockets = [0] Socket 0 -------- Core 0 [0, 4] Core 1 [1, 5] Core 2 [2, 6] Core 3 [3, 7] On this system, don't poll on cores 0 and 4 (system activity). Use lcore 1, 2, 3