[x265] [PATCH] asm: avx2 code for intra_ang_16 mode 17, improved over 65% than SSE asm

2015-08-24 Thread rajesh
# HG changeset patch # User Rajesh Paulraj # Date 1440482220 -19800 # Tue Aug 25 11:27:00 2015 +0530 # Node ID 65feb1620237d624296276635b2f658c0b1b1719 # Parent 8a414544bfbf64b119fa6dd2e23cef8cb89d0a54 asm: avx2 code for intra_ang_16 mode 17, improved over 65% than SSE asm diff -r 8a414544bf

[x265] [PATCH] asm: replace movu+vinserti128 by vbroadcasti128 instruction

2015-08-24 Thread rajesh
# HG changeset patch # User Rajesh Paulraj # Date 1440413739 -19800 # Mon Aug 24 16:25:39 2015 +0530 # Node ID 8a414544bfbf64b119fa6dd2e23cef8cb89d0a54 # Parent a28a863393994d8fb1d58c721352d9b4ec8c46ee asm: replace movu+vinserti128 by vbroadcasti128 instruction diff -r a28a86339399 -r 8a4145

[x265] [PATCH] Performance: Prevent small thread-pools if NUMA disabled and # CPUs > MAX_POOL_THREADS

2015-08-24 Thread pradeep
# HG changeset patch # User pradeep # Date 1440476198 -19800 # Tue Aug 25 09:46:38 2015 +0530 # Node ID ba08fde2e66bb66f0410fd5bb7a28f62df0043e0 # Parent a28a863393994d8fb1d58c721352d9b4ec8c46ee Performance: Prevent small thread-pools if NUMA disabled and # CPUs > MAX_POOL_THREADS When NUMA

Re: [x265] [PATCH] Performance: Balance # threads per pool for non-NUMA machines with > 64 vCPUs

2015-08-24 Thread Pradeep Ramachandran
This may not be the best thing to do for performance though; it may spawn few threads per pool and limit performance. Please don't push this in. I think it may be better to spawn 64-threads in case the delta between # cores and 64 is < x% of 64 - basically if the second pool won't have "enough" th

[x265] [PATCH 1 of 4] asm: re-design AVX2 algorithm for intra_pred_8x8[16], 334c -> 200c

2015-08-24 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1440458765 25200 # Node ID b73698bbfae7a31651ef428a9d27bc15f21e09d4 # Parent a28a863393994d8fb1d58c721352d9b4ec8c46ee asm: re-design AVX2 algorithm for intra_pred_8x8[16], 334c -> 200c --- source/common/x86/asm-primitives.cpp |3 +- source/common/x

[x265] [PATCH 4 of 4] improve motionEstimate() by bypass reduce MV Candidate

2015-08-24 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1440460166 25200 # Node ID d3884d55fb6caedcd751db7b289dac470d139d1d # Parent 63f16fa65e3f4963863f31bc3802d7958410ff09 improve motionEstimate() by bypass reduce MV Candidate --- source/encoder/motion.cpp | 10 +- 1 files changed, 9 insertions(

[x265] [PATCH 3 of 4] reorder on intra_pred_8x8 function pointer

2015-08-24 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1440458774 25200 # Node ID 63f16fa65e3f4963863f31bc3802d7958410ff09 # Parent a22a8592a9fabd848d630c9e38b941ebba5d7799 reorder on intra_pred_8x8 function pointer --- source/common/x86/asm-primitives.cpp | 27 ++- 1 files change

[x265] [PATCH 2 of 4] asm: re-design AVX2 algorithm for intra_pred_8x8[20], 246c -> 194c

2015-08-24 Thread Min Chen
# HG changeset patch # User Min Chen # Date 1440458772 25200 # Node ID a22a8592a9fabd848d630c9e38b941ebba5d7799 # Parent b73698bbfae7a31651ef428a9d27bc15f21e09d4 asm: re-design AVX2 algorithm for intra_pred_8x8[20], 246c -> 194c --- source/common/x86/asm-primitives.cpp |2 +- source/common/x

Re: [x265] [PATCH] Performance: Balance # threads per pool for non-NUMA machines with > 64 vCPUs

2015-08-24 Thread Steve Borho
On 08/24, prad...@multicorewareinc.com wrote: > # HG changeset patch > # User pradeep > # Date 1440406873 -19800 > # Mon Aug 24 14:31:13 2015 +0530 > # Node ID cf6210f6f5cbbeec441f7eee3d8abf82208942fd > # Parent f63273fa3137fef2f6898c686b68ee12608acd31 > Performance: Balance # threads per poo

[x265] [PATCH] Performance: Balance # threads per pool for non-NUMA machines with > 64 vCPUs

2015-08-24 Thread pradeep
# HG changeset patch # User pradeep # Date 1440406873 -19800 # Mon Aug 24 14:31:13 2015 +0530 # Node ID cf6210f6f5cbbeec441f7eee3d8abf82208942fd # Parent f63273fa3137fef2f6898c686b68ee12608acd31 Performance: Balance # threads per pool for non-NUMA machines with > 64 vCPUs By default, each th

Re: [x265] [PATCH 1 of 2] asm: re-design AVX2 algorithm for intra_pred_8x8[20], 246c -> 194c

2015-08-24 Thread Deepthi Nandakumar
Min, Can you update your patches to the tip and re-send? On Sat, Aug 22, 2015 at 6:49 AM, Min Chen wrote: > # HG changeset patch > # User Min Chen > # Date 1440204616 25200 > # Node ID 6ad08d8288099fe11f2f518e54194c02b7a36020 > # Parent 9bbcc4a622f25aeead11efd8cac82cabbf413d62 > asm: re-desig