On Wed, May 10, 2017 at 12:32 PM, Michael Lackner < michael.lack...@unileoben.ac.at> wrote:
> On 05/10/2017 08:24 AM, Pradeep Ramachandran wrote: > > On Wed, May 10, 2017 at 11:12 AM, Michael Lackner < > > michael.lack...@unileoben.ac.at> wrote: > > > >> Thank you very much for your input! > >> > >> Since --pmode and --pme seem to break NUMA support, I disabled them. I > >> simply cannot tell > >> users that they have to switch off NUMA in their UEFI firmware just for > >> this one > >> application. There may be a lot of situations where this is just not > >> doable. > >> > >> If there is a way to make --pmode --pme work together with x265s' NUMA > >> support, I'd use > >> it, but I don't know how? > > > > Could you please elaborate more here? It seems to work ok for us here. > I've > > tried on CentOS and Win Server 2017 dual socket systems and I see all > > sockets being used. > > It's like this: x265 does *say* it's using 32 threads in two NUMA pools. > That's just how > it should be. But it behaves very weirdly, almost never loading more than > two logical > cores. FPS are extremely low, so it's really slow. > > CPU load stays at 190-200%, sometimes briefly dropping to 140-150%, where > it should be in > the range of 2800-3200%. As soon as I remove --pmode --pme, the system is > being loaded > very well! It almost never drops below the 3000% (30 cores) mark then. > > I also works *with* --pmode --pme, but only if NUMA is disabled on the > firmware level, > showing only a classic, flat topology to the OS. > > That behavior can be seen on CentOS 7.3 Linux, having compiled x265 2.4+2 > with GCC 4.8.5 > and yasm 1.3.0. The machine is a HP ProLiant DL360 Gen9 machine with two > Intel Xeon > E5-2620 CPUs. > > Removing --pmode --pme was suggested by Mario *LigH* Rohkrämer earlier in > this thread. > This seems something specific with your configuration setup. I just tried an identical experiment on two systems that I have which are dual-socket E5-2699 v4s (88 threads spread across two sockets) running CentOS 6.8 and CentOS 7.2. I compiled x265 with gcc version 4.4 and am able to see utilization actually pick up closer to 5000% (monitored using htop) when --pme and --pmode are enabled in the command line; without these options, the utilization is closer to 3300%. > Here is my topology when NUMA is enabled (pretty simple): > > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 > node 0 size: 32638 MB > node 0 free: 266 MB > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 > node 1 size: 32768 MB > node 1 free: 82 MB > node distances: > node 0 1 > 0: 10 21 > 1: 21 10 > > Thanks! > You seem to have very little free memory in each node which might be making you go to disk and therefore affecting performance. I recommend trying to free some memory up before running x265 to see if that helps. > >> Ah yes, I've also found that 8K does indeed help a ton. With 4K and > >> similar settings, I'm > >> able to load 16-25 CPUs currently, sometimes briefly 30. With 8K, load > is > >> much higher. > >> > >> Maybe you can advise how to maximize parallelization / loading as many > >> CPUs as possible > >> without breaking NUMA support on both Windows and Linux. > >> > >> I'm saying this, because my benchmarking project is targeting multiple > >> operating systems, > >> it currently works on: > >> * Windows NT 5.2 & 6.0 (wo. NUMA) > >> * Windows NT 6.1 - 10.0 (w. NUMA) > >> * MacOS X (wo. NUMA) > >> * Linux (w. and wo. NUMA) > >> * FreeBSD, OpenBSD, NetBSD and DragonFly BSD UNIX (wo. NUMA) > >> * Solaris (wo. NUMA) > >> * Haiku OS (wo. NUMA) > >> > >> Thank you very much! > >> > >> Best, > >> Michael > >> > >> On 05/10/2017 07:21 AM, Pradeep Ramachandran wrote: > >>> Michael, > >>> Adding --lookahead-threads 2 statically allocated two threads for > >>> lookahead. Therefore, the worker threads launched to work on WPP will > >> 32-2 > >>> = 30 in count. We've found some situations in which statically > allocating > >>> threads for lookahead was useful and therefore decided to expose it to > >> the > >>> user. Please see if this helps your use-case and enable appropriately. > >>> > >>> Now as far as scaling up for 8K goes, a single instance of x265 scales > up > >>> well to 25-30 threads depending on the preset you're running in. We've > >>> found pmode and pme help performance considerably on some Broadwell > >> server > >>> systems but again, that is also dependent on content. I would encourage > >> you > >>> play with those settings and see if they help your use case. Beyond > these > >>> thread counts, one instance of x265 may not be beneficial for you. > >>> > >>> Pradeep. > >>> > >>> On Fri, May 5, 2017 at 3:26 PM, Michael Lackner < > >>> michael.lack...@unileoben.ac.at> wrote: > >>> > >>>> I found the reason for "why did x265 use 30 threads and not 32, when I > >>>> have 32 CPUs". > >>>> > >>>> Actually, it was (once again) my own fault. Thinking I know better > than > >>>> x265, I spawned > >>>> two lookahead threads starting with 32 logical CPUs > >> ('--lookahead-threads > >>>> 2'). > >>>> > >>>> It seems what x265 does is to reserve two dedicated CPUs for this, but > >>>> then it couldn't > >>>> permanently saturate them. > >>>> > >>>> I still don't know when I should be starting with that stuff for 8K > >>>> content. 64 CPUs? 256 > >>>> CPUs? Or should I leave everything to x265? My goal was to be able to > >>>> fully load as many > >>>> CPUs as possible in the future. > >>>> > >>>> In any case, the culprit was myself. > >>>> > >>>> On 05/04/2017 11:18 AM, Mario *LigH* Rohkrämer wrote: > >>>>> Am 04.05.2017, 10:58 Uhr, schrieb Michael Lackner < > >>>> michael.lack...@unileoben.ac.at>: > >>>>> > >>>>>> Still wondering why not 32, but ok. > >>>>> > >>>>> x265 will calculate how many threads it will really need to utilize > the > >>>> WPP and other > >>>>> parallelizable steps, in relation to the frame dimensions and the > >>>> complexity. It may not > >>>>> *need* more than 30 threads, would not have any task to give to two > >>>> more. Possibly. > >>>>> Developers know better... > >>>> > >>>> -- > >>>> Michael Lackner > >>>> Lehrstuhl für Informationstechnologie (CiT) > >>>> Montanuniversität Leoben > >>>> Tel.: +43 (0)3842/402-1505 | Mail: michael.lack...@unileoben.ac.at > >>>> Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac. > >>>> at/infotech > >>>> _______________________________________________ > >>>> x265-devel mailing list > >>>> x265-devel@videolan.org > >>>> https://mailman.videolan.org/listinfo/x265-devel > >> > >> -- > >> Michael Lackner > >> Lehrstuhl für Informationstechnologie (CiT) > >> Montanuniversität Leoben > >> Tel.: +43 (0)3842/402-1505 | Mail: michael.lack...@unileoben.ac.at > >> Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac. > >> at/infotech > >> _______________________________________________ > >> x265-devel mailing list > >> x265-devel@videolan.org > >> https://mailman.videolan.org/listinfo/x265-devel > >> > >> > >> > >> N �n�r����)em�h�yhiם�w^�� > > -- > Michael Lackner > Lehrstuhl für Informationstechnologie (CiT) > Montanuniversität Leoben > Tel.: +43 (0)3842/402-1505 | Mail: michael.lack...@unileoben.ac.at > Fax.: +43 (0)3842/402-1502 | Web: http://institute.unileoben.ac. > at/infotech > _______________________________________________ > x265-devel mailing list > x265-devel@videolan.org > https://mailman.videolan.org/listinfo/x265-devel >
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel