[x265] libx265 3.5 reports version 3.4+31-6722fce1f

2021-03-26 Thread Andrey Semashev

Hi,

When libx265 3.5 (as checked out from git tag 3.5) is compiled, it 
reports version 3.4+31-6722fce1f. Probably because x265Version.txt file 
contents are incorrect.

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] x265 download mirrors

2020-11-24 Thread Andrey Semashev

On 11/24/20 12:08 PM, Michael Lackner wrote:

Hello,

I'm not sure whether this is a slightly stupid question (I'm no dev, so
not used to git much), but since I'm interested in obtaining and
building release versions as well, I'll just ask:

How do I get a specific x265 release version from git?


git clone https://bitbucket.org/multicoreware/x265_git x265
cd x265
git checkout 3.4

where 3.4 is the release version you want.


I tried to clone the corresponding branch into a new folder:

$ git clone --single-branch --branch 'Release_3.5'
https://bitbucket.org/multicoreware/x265_git

Compiled the resulting source code, but, alas:

$ x265 --version 2>&1 | grep version
x265 [info]: HEVC encoder version 3.4+30-g6722fce

So, I ended up with 3.4+30-g6722fce instead, which is newer than the
latest in the Release_3.4 branch, which gives me 3.4+12-g9103319.

Am I right to assume that 3.4+12-g9103319 is the "3.4 final release"
version, and that the version number in the Release_3.5 branch will
change to "3.5+something" once the release is done and a new Release_3.6
branch will have been created?


As I said in my other email, there is no 3.5 release yet. What the 
version number tells is that the x265 version is 12 commits past the 3.4 
release and thus is an arbitrary snapshot build.

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] x265 download mirrors

2020-11-24 Thread Andrey Semashev

On 11/24/20 1:33 AM, ebals...@pm.me wrote:
I’d like to update MacPorts x265 to enable the 10-bit variant.  As I 
started reading the docs and source code, I am confused about where is 
the best place to obtain source code.  Currently MacPorts is using 
https://download.videolan.org/pub/videolan/x265/ 
 but this is over a 
year old with version 3.2.1.


First I checked http://x265.org/developers/ 
.  This suggests downloading from 
http://x265.org/downloads/  but that link is 
password protected.


I found this page https://bitbucket.org/multicoreware/x265_git 

This shows version 3.5 is released, but the download page only has 
tarballs through 3.3.


3.5 is not released, as there is no 3.5 tag. Release_3.5 is a branch 
where the upcoming 3.5 release is being prepared/maintained.



GitHub has a tarball available for 3.4.
https://github.com/videolan/x265/releases 



Which URL would be the most stable to use in port file?


I believe, both https://github.com/videolan/x265/ and 
https://bitbucket.org/multicoreware/x265_git are mirrors of the 
mercurial repository http://hg.videolan.org/x265. You can use whichever 
of them to obtain the code; just check out the release tag you need.


As to the tarballs, I think Bitbucket tarballs are manually uploaded by 
developers, so these are not up to date (there is still no 3.4 tarball). 
Tarballs on GitHub are automatically generated, so you could use those 
more or less reliably.

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [ANN]x265 version 3.4 released

2020-05-30 Thread Andrey Semashev

I see no 3.4 tag in git.

On 2020-05-29 21:15, Aruna Matheswaran wrote:

Hi all,

x265 version 3.4 is out with cool new features and encoder enhancements 
in terms of encoding efficiency as well as speed.


Please download v3.4 from our downloads page 
 ( 
MD5Sum is e37b91c1c114f8815a3f46f039fe79b5) and do check out the full 
documentation available in our release notes 
.


Release Notes of version 3.4
==

New features

1. **Edge-aware quadtree partitioning** to terminate CU depth recursion 
based on edge information. :option:`--rskip` level 2 enables the feature 
and  :option:`--rskip-edge-threshold` denotes the minimum expected 
edge-density percentage within the CU, below which the recursion is 
skipped. Experimental feature.
2. Application-level feature :option:`--abr-ladder` for automating 
efficient ABR ladder generation. Shows ~65% savings in the over-all 
turn-around time required for the generation of a typical Apple HLS 
ladder in Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz over a sequential 
ABR-ladder generation approach that leverages save-load architecture.


Enhancements to existing features
-
1. Improved efficiency in 2-pass rate-control algorithm. The savings in 
the bitrate is ~1.72% with visual improvement in quality in the initial 
1-2 secs.


Encoder enhancements

1. Faster ARM64 encodes enabled by ASM contributions from Huawei. The 
speed-up over no-asm version for 1080p encodes @ medium preset is ~15% 
in a 16 core H/W.

2. Strict VBV conformance in zone encoding.

Bug fixes
-
1. Multi-pass encode failures with :option:`--frame-dup`.
2. Corrupted bitstreams with :option:`--hist-scenecut` when input depth 
and internal bit-depth differ.

3. Incorrect analysis propagation in multi-level save-load architecture.
4. Failure in detecting NUMA packages installed in non-standard directories.

Happy compressing!!


--
Regards,
*Aruna Matheswaran,*
Video Codec Engineer,
Media & AI analytics BU,




___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel



___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] Some compiler warnings under Linux

2019-07-13 Thread Andrey Semashev

Here's a patch I'm using.

On 7/13/19 6:21 PM, Mario Rohkrämer wrote:

Forgot to mention: This was v3.1.1+1


Am 13.07.2019, 16:41 Uhr, schrieb Mario *LigH* Rohkrämer :


Ubuntu 18.04 LTS + GCC 9.0


/home/ligh/x265/source/encoder/ratecontrol.cpp: In member function 
‘int x265::RateControl::writeRateControlFrameStats(x265::Frame*, 
x265::RateControlEntry*)’:
/home/ligh/x265/source/encoder/ratecontrol.cpp:2867:21: warning: 
passing argument 1 to restrict-qualified parameter aliases with 
argument 3 [-Wrestrict]
  2867 | sprintf(deltaPOC, "%s%d~", deltaPOC, 
rpsWriter->deltaPOC[i]);

   | ^~~~   
/home/ligh/x265/source/encoder/ratecontrol.cpp:2868:21: warning: 
passing argument 1 to restrict-qualified parameter aliases with 
argument 3 [-Wrestrict]

  2868 | sprintf(bUsed, "%s%d~", bUsed, rpsWriter->bUsed[i]);
   | ^   ~
/home/ligh/x265/source/encoder/ratecontrol.cpp:2867:36: warning: ‘~’ 
directive writing 1 byte into a region of size between 0 and 127 
[-Wformat-overflow=]
  2867 | sprintf(deltaPOC, "%s%d~", deltaPOC, 
rpsWriter->deltaPOC[i]);

   |    ^
/home/ligh/x265/source/encoder/ratecontrol.cpp:2867:20: note: 
‘sprintf’ output between 3 and 140 bytes into a destination of size 128
  2867 | sprintf(deltaPOC, "%s%d~", deltaPOC, 
rpsWriter->deltaPOC[i]);

   | ~~~^
/home/ligh/x265/source/encoder/ratecontrol.cpp:2868:33: warning: ‘~’ 
directive writing 1 byte into a region of size between 0 and 39 
[-Wformat-overflow=]

  2868 | sprintf(bUsed, "%s%d~", bUsed, rpsWriter->bUsed[i]);
   | ^
/home/ligh/x265/source/encoder/ratecontrol.cpp:2868:20: note: 
‘sprintf’ output between 3 and 42 bytes into a destination of size 40

  2868 | sprintf(bUsed, "%s%d~", bUsed, rpsWriter->bUsed[i]);
   | ~~~^~~~

Index: x265/source/encoder/ratecontrol.cpp
===
--- x265.orig/source/encoder/ratecontrol.cpp	2019-07-13 15:23:50.0 +0300
+++ x265/source/encoder/ratecontrol.cpp	2019-07-13 18:44:08.712025660 +0300
@@ -2857,17 +2857,17 @@ int RateControl::writeRateControlFrameSt
 int i, num = rpsWriter->numberOfPictures;
 char deltaPOC[128];
 char bUsed[40];
 memset(deltaPOC, 0, sizeof(deltaPOC));
 memset(bUsed, 0, sizeof(bUsed));
-sprintf(deltaPOC, "deltapoc:~");
-sprintf(bUsed, "bused:~");
+char* deltaPOCPtr = deltaPOC + snprintf(deltaPOC, sizeof(deltaPOC), "deltapoc:~");
+char* bUsedPtr = bUsed + snprintf(bUsed, sizeof(bUsed), "bused:~");
 
 for (i = 0; i < num; i++)
 {
-sprintf(deltaPOC, "%s%d~", deltaPOC, rpsWriter->deltaPOC[i]);
-sprintf(bUsed, "%s%d~", bUsed, rpsWriter->bUsed[i]);
+deltaPOCPtr += snprintf(deltaPOCPtr, sizeof(deltaPOC) - (deltaPOCPtr - deltaPOC), "%d~", rpsWriter->deltaPOC[i]);
+bUsedPtr += snprintf(bUsedPtr, sizeof(bUsed) - (bUsedPtr - bUsed), "%d~", rpsWriter->bUsed[i]);
 }
 
 if (fprintf(m_statFileOut,
 "in:%d out:%d type:%c q:%.2f q-aq:%.2f q-noVbv:%.2f q-Rceq:%.2f tex:%d mv:%d misc:%d icu:%.2f pcu:%.2f scu:%.2f nump:%d numnegp:%d numposp:%d %s %s ;\n",
 rce->poc, rce->encodeOrder,
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] fix Issue #442: linking issue on non x86 platform

2018-10-31 Thread Andrey Semashev

On 10/31/18 2:33 PM, prav...@multicorewareinc.com wrote:

# HG changeset patch
# User Praveen Tiwari 
# Date 1540983948 -19800
#  Wed Oct 31 16:35:48 2018 +0530
# Node ID 1c878790edea64186edabcd40fb3df121f536311
# Parent  fd517ae68f93dbfdd1bff45a9dd8e626523542b6
fix Issue #442: linking issue on non x86 platform

diff -r fd517ae68f93 -r 1c878790edea source/common/cpu.cpp
--- a/source/common/cpu.cpp Tue Sep 25 16:02:31 2018 +0530
+++ b/source/common/cpu.cpp Wed Oct 31 16:35:48 2018 +0530
@@ -127,6 +127,7 @@
  {
  return(enable512);
  }
+
  uint32_t cpu_detect(bool benableavx512 )
  {
  
diff -r fd517ae68f93 -r 1c878790edea source/common/quant.cpp

--- a/source/common/quant.cpp   Tue Sep 25 16:02:31 2018 +0530
+++ b/source/common/quant.cpp   Wed Oct 31 16:35:48 2018 +0530
@@ -723,6 +723,7 @@
  X265_CHECK(coeffNum[cgScanPos] == 0, "count of coeff failure\n");
  uint32_t scanPosBase = (cgScanPos << MLS_CG_SIZE);
  uint32_t blkPos  = codeParams.scan[scanPosBase];
+#if X265_ARCH_X86
  bool enable512 = detect512();
  if (enable512)
  primitives.cu[log2TrSize - 2].psyRdoQuant(m_resiDctCoeff, m_fencDctCoeff, 
costUncoded, , , , blkPos);
@@ -731,6 +732,10 @@
  primitives.cu[log2TrSize - 2].psyRdoQuant_1p(m_resiDctCoeff,  
costUncoded, , ,blkPos);
  primitives.cu[log2TrSize - 2].psyRdoQuant_2p(m_resiDctCoeff, 
m_fencDctCoeff, costUncoded, , , , 
blkPos);
  }
+#elif


#else? Everywhere else, too.


+primitives.cu[log2TrSize - 2].psyRdoQuant_1p(m_resiDctCoeff, costUncoded, 
, , blkPos);
+primitives.cu[log2TrSize - 2].psyRdoQuant_2p(m_resiDctCoeff, m_fencDctCoeff, 
costUncoded, , , , blkPos);
+#endif
  }
  }
  else
@@ -805,8 +810,8 @@
  uint32_t blkPos = codeParams.scan[scanPosBase];
  if (usePsyMask)
  {
+#if X265_ARCH_X86
  bool enable512 = detect512();
-
  if (enable512)
  primitives.cu[log2TrSize - 2].psyRdoQuant(m_resiDctCoeff, 
m_fencDctCoeff, costUncoded, , , , 
blkPos);
  else
@@ -814,6 +819,10 @@
  primitives.cu[log2TrSize - 2].psyRdoQuant_1p(m_resiDctCoeff, 
costUncoded, , , blkPos);
  primitives.cu[log2TrSize - 2].psyRdoQuant_2p(m_resiDctCoeff, 
m_fencDctCoeff, costUncoded, , , , 
blkPos);
  }
+#elif
+primitives.cu[log2TrSize - 2].psyRdoQuant_1p(m_resiDctCoeff, 
costUncoded, , , blkPos);
+primitives.cu[log2TrSize - 2].psyRdoQuant_2p(m_resiDctCoeff, 
m_fencDctCoeff, costUncoded, , , , 
blkPos);
+#endif
  blkPos = codeParams.scan[scanPosBase];
  for (int y = 0; y < MLS_CG_SIZE; y++)
  {


___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel



___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] threadpool.cpp: use WIN system call for popcount

2018-05-03 Thread Andrey Semashev
On Thu, May 3, 2018 at 7:37 PM, Pradeep Ramachandran
 wrote:
>
> On Thu, May 3, 2018 at 2:23 PM,  wrote:
>>
>> # HG changeset patch
>> # User Praveen Tiwari 
>> # Date 1525328839 -19800
>> #  Thu May 03 11:57:19 2018 +0530
>> # Branch stable
>> # Node ID 9cbb2aadcca3a2f7a308ea1dc792fb817bcc5b51
>> # Parent  69aafa6d70ad4e151f4590766c6b125621c5d007
>> threadpool.cpp: use WIN system call for popcount
>
>
> Unless this fixes a known bug, I don't want to push this directly into
> stable. Syscalls are notorious especially when working with older versions
> of the OS.
> I would rather push this into default and allow users to test that this
> works with all kinds of systems and then merge with stable once the answer
> is known.
> Does this fix a specific issue on some platform, or improve performance?

The comment is not quite right, __popcnt is not a syscall but an
MSVC-specific intrinsic.

https://msdn.microsoft.com/en-us/library/bb385231.aspx

The equivalent gcc intrinsic is __builtin_popcount and friends.

I think, the patch is buggy because the relevant field is a 64-bit
integer on 64-bit Windows and __popcnt is 32-bit.

Note also that the popcount instruction only available in ABM ISA
extension. In Intel CPUs it is available since Nehalem.

>> diff -r 69aafa6d70ad -r 9cbb2aadcca3 source/common/threadpool.cpp
>> --- a/source/common/threadpool.cpp  Wed May 02 15:15:05 2018 +0530
>> +++ b/source/common/threadpool.cpp  Thu May 03 11:57:19 2018 +0530
>> @@ -71,21 +71,6 @@
>>  # define strcasecmp _stricmp
>>  #endif
>>
>> -#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
>> -const uint64_t m1 = 0x; //binary: 0101...
>> -const uint64_t m2 = 0x; //binary: 00110011..
>> -const uint64_t m3 = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
>> -const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of
>> 0,1,2,3...
>> -
>> -static int popCount(uint64_t x)
>> -{
>> -x -= (x >> 1) & m1;
>> -x = (x & m2) + ((x >> 2) & m2);
>> -x = (x + (x >> 4)) & m3;
>> -return (x * h01) >> 56;
>> -}
>> -#endif
>> -
>>  namespace X265_NS {
>>  // x265 private namespace
>>
>> @@ -274,7 +259,7 @@
>>  for (int i = 0; i < numNumaNodes; i++)
>>  {
>>  GetNumaNodeProcessorMaskEx((UCHAR)i, groupAffinityPointer);
>> -cpusPerNode[i] = popCount(groupAffinityPointer->Mask);
>> +cpusPerNode[i] = __popcnt(static_cast> int>(groupAffinityPointer->Mask));
>>  }
>>  delete groupAffinityPointer;
>>  #elif HAVE_LIBNUMA
>> @@ -623,7 +608,7 @@
>>  for (int i = 0; i < numNumaNodes; i++)
>>  {
>>  GetNumaNodeProcessorMaskEx((UCHAR)i, );
>> -cpus += popCount(groupAffinity.Mask);
>> +cpus += __popcnt(static_cast(groupAffinity.Mask));
>>  }
>>  return cpus;
>>  #elif _WIN32
>> ___
>> x265-devel mailing list
>> x265-devel@videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>
>
>
> ___
> x265-devel mailing list
> x265-devel@videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH 1 of 2] Add support for customizing logging

2018-03-18 Thread Andrey Semashev

On 03/18/18 20:04, Derek Buitenhuis wrote:

On 3/16/2018 3:28 PM, Andrey Semashev wrote:

+/* x265_set_log:
+ *   Sets a custom logger function */
+void x265_set_log(x265_log_t log);


I would like to voice my distaste for this implementation.

This is a global callback, and will introduce problems with e.g. multiple
libraries or dependencies in a program have libx265 as a dependency, or even
with multiple encoders in the same program are used.

Consider program A that has B as a dependency, both of which use libx265:

A - libx265
   - B
   - libx265

Either A or B could set the log callback for x265, and it would take
over for both. This is not ideal, and it is the same design mistake
we at FFmpeg made a long time ago, which has been haunting us for years.


Normally, there is only one _application_. There may be multiple 
_libraries_ using libx265 but none of them should be configuring its 
logging, unless it is supposed to be the only one using libx265 in the 
application (and generic libraries cannot make that assumption). It is 
the application that should configure logging in every component it uses.


That said, I myself would prefer to configure logging on per-encoder 
basis, although for different reasons. But the way x265 is currently 
written does not allow this because in many places logging is done 
without any encoder context. I'm not willing to rewrite x265 to support 
this as global logging customization is "good enough" for me. It 
certainly is much better than unconditionally logging to stdout.



Further, defining user types ending in _t is not allowed in C (which this
header is, and is used from), and x265 does not do this.


I'm not aware of this limitation. In particular, C11 section 7.1.3 
contains no such restriction. Can you provide a reference to the part of 
the C standard that says this?


Note also that the name starts with x265, so it clearly belongs to the 
x265 namespace and is unlikely to clash with anything.

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] [PATCH 1 of 2] Add support for customizing logging

2018-03-16 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1515596947 -10800
#  Wed Jan 10 18:09:07 2018 +0300
# Branch add_custom_logging_v3
# Node ID 3ebefb64a1eb1487389db4ff7a808490b3005c39
# Parent  d7c26df32fae052b7e895fee9bda1c22b24cc44b
Add support for customizing logging

This commit adds support for customizing logging behavior. This can be used
by the application using libx265 to override the default behavior of printing
messages to the console with the application-specific logging system. This is
especially important if the console is not available or monitored for the
application (e.g. if it runs as a service or daemon).

The commit also increments API version.

diff -r d7c26df32fae -r 3ebefb64a1eb source/CMakeLists.txt
--- a/source/CMakeLists.txt Tue Mar 13 13:40:13 2018 +0530
+++ b/source/CMakeLists.txt Wed Jan 10 18:09:07 2018 +0300
@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 156)
+set(X265_BUILD 157)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r d7c26df32fae -r 3ebefb64a1eb source/common/common.cpp
--- a/source/common/common.cpp  Tue Mar 13 13:40:13 2018 +0530
+++ b/source/common/common.cpp  Wed Jan 10 18:09:07 2018 +0300
@@ -102,49 +102,9 @@
 return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
 }
 
-void general_log(const x265_param* param, const char* caller, int level, const 
char* fmt, ...)
-{
-if (param && level > param->logLevel)
-return;
-const int bufferSize = 4096;
-char buffer[bufferSize];
-int p = 0;
-const char* log_level;
-switch (level)
-{
-case X265_LOG_ERROR:
-log_level = "error";
-break;
-case X265_LOG_WARNING:
-log_level = "warning";
-break;
-case X265_LOG_INFO:
-log_level = "info";
-break;
-case X265_LOG_DEBUG:
-log_level = "debug";
-break;
-case X265_LOG_FULL:
-log_level = "full";
-break;
-default:
-log_level = "unknown";
-break;
-}
+x265_log_t g_x265_log = _log;
 
-if (caller)
-p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
-va_list arg;
-va_start(arg, fmt);
-vsnprintf(buffer + p, bufferSize - p, fmt, arg);
-va_end(arg);
-fputs(buffer, stderr);
-}
-
-#if _WIN32
-/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
- * For other OS we do not make any changes. */
-void general_log_file(const x265_param* param, const char* caller, int level, 
const char* fmt, ...)
+void general_log(const x265_param* param, const char* caller, int national, 
int level, const char* fmt, ...)
 {
 if (param && level > param->logLevel)
 return;
@@ -181,19 +141,31 @@
 vsnprintf(buffer + p, bufferSize - p, fmt, arg);
 va_end(arg);
 
-HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
-DWORD mode;
-if (GetConsoleMode(console, ))
+#if _WIN32
+if (national)
 {
-wchar_t buf_utf16[bufferSize];
-int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
-if (length_utf16 > 0)
-WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
+DWORD mode;
+if (GetConsoleMode(console, ))
+{
+wchar_t buf_utf16[bufferSize];
+int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
+if (length_utf16 > 0)
+WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+return;
+}
 }
-else
-fputs(buffer, stderr);
+#else
+// Suppress warnings about unused argument
+(void)national;
+#endif
+
+fputs(buffer, stderr);
 }
 
+#if _WIN32
+/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
+ * For other OS we do not make any changes. */
 FILE* x265_fopen(const char* fileName, const char* mode)
 {
 wchar_t buf_utf16[MAX_PATH * 2], mode_utf16[16];
diff -r d7c26df32fae -r 3ebefb64a1eb source/common/common.h
--- a/source/common/common.hTue Mar 13 13:40:13 2018 +0530
+++ b/source/common/common.hWed Jan 10 18:09:07 2018 +0300
@@ -409,16 +409,15 @@
 
 /* located in common.cpp */
 int64_t  x265_mdate(void);
-#define  x265_log(param, ...) general_log(param, "x265", __VA_ARGS__)
-#define  x265_log_file(param, ...) general_log_file(param, "x265

[x265] [PATCH 0 of 2] Add support for customizing logging v3

2018-03-16 Thread Andrey Semashev
This is the same as v2 with the only change being X265_BUILD version conflict 
resolved.
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] [PATCH 2 of 2] Do not use logging in a signal handler

2018-03-16 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1515597241 -10800
#  Wed Jan 10 18:14:01 2018 +0300
# Branch add_custom_logging_v3
# Node ID c53408bbc3021696f9b949858799639311eb44ee
# Parent  3ebefb64a1eb1487389db4ff7a808490b3005c39
Do not use logging in a signal handler.

Signal handlers can only call signal-safe functions, which the default logging
function is not (it uses a lot of unsafe C functions). User-specified logging
functions are also unlikely to be prepared to be called from a signal handler.

diff -r 3ebefb64a1eb -r c53408bbc302 source/output/reconplay.cpp
--- a/source/output/reconplay.cpp   Wed Jan 10 18:09:07 2018 +0300
+++ b/source/output/reconplay.cpp   Wed Jan 10 18:14:01 2018 +0300
@@ -42,8 +42,6 @@
 #ifndef _WIN32
 static void sigpipe_handler(int)
 {
-if (ReconPlay::pipeValid)
-g_x265_log(NULL, "exec", false, X265_LOG_ERROR, "pipe closed\n");
 ReconPlay::pipeValid = false;
 }
 #endif
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH 0 of 2] Add support for customizing logging v2

2018-03-16 Thread Andrey Semashev

On 03/05/18 13:09, Ashok Kumar Mishra wrote:


On Mon, Mar 5, 2018 at 3:26 PM, Andrey Semashev 
<andrey.semas...@gmail.com <mailto:andrey.semas...@gmail.com>> wrote:


On 01/10/18 18:20, Andrey Semashev wrote:

This is a minor update from the previous patch. The changes are:

- Incremented X265_BUILD version.
- Moved set_log in x265_api above the delimiter comment to add
extensions below it.
- Added a second patch to remove logging from a signal handler.


What is the status of these patches? Is there a problem with them?

Thanks Andrey. We are going to review your patch and push it to public 
mailing list by this weekend.


Ping?
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH 0 of 2] Add support for customizing logging v2

2018-03-05 Thread Andrey Semashev

On 01/10/18 18:20, Andrey Semashev wrote:

This is a minor update from the previous patch. The changes are:

- Incremented X265_BUILD version.
- Moved set_log in x265_api above the delimiter comment to add extensions below 
it.
- Added a second patch to remove logging from a signal handler.



What is the status of these patches? Is there a problem with them?
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [ANN] x265 version 2.7 released

2018-03-01 Thread Andrey Semashev

On 03/01/18 14:50, Ashok Kumar Mishra wrote:

Hi all,
x265 version 2.7 is now out! The key new improvements include support 
for RADL pictures, moving from YASM to NASM assembler and reduced x265 
build time by more than 50%. The tarball of this release can be 
downloaded fromhere 
.


Release date - 21st Feb, 2018.


Version 2.7
=

New features
-
1. :option:`--gop-lookahead` can be used to extend the gop boundary(set 
by `--keyint`). The GOP will be extended, if a scene-cut frame is found 
within this many number of frames.

2. Support for RADL pictures added in x265.
    :option:`--radl` can be used to decide number of RADL pictures 
preceding the IDR picture.


Encoder enhancements
---
1. Moved from YASM to NASM assembler. Supports NASM assembler version 
2.13 and greater.
2. Enable analysis save and load in a single run. Introduces two new cli 
options `--analysis-save ` and `--analysis-load `.

3. Comply to HDR10+ LLC specification.
4. Reduced x265 build time by more than 50% by re-factoring ipfilter.asm.

Bug fixes

1. Fixed inconsistent output issue in deblock filter and --const-vbv.
2. Fixed Mac OS build warnings.
3. Fixed inconsistency in pass-2 when weightp and cutree are enabled.
4. Fixed deadlock issue due to dropping of BREF frames, while forcing 
slice types through qp file.


Looks like the patch for customizing logging is still not applied?
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] issue 139

2018-02-04 Thread Andrey Semashev

On 02/05/18 02:31, John Comeau wrote:

https://bitbucket.org/multicoreware/x265/issues/139 was solved over a
year ago by rmous...@us.ibm.com when he wrote
source/common/ppc/pixel_altivec.cpp.

the issue should be closed and the bounty awarded as appropriate.

I've done some testing to prove that the altivec code is indeed being
used by the binary when built on the PPC architecture, and produces an
hevc video that plays the same as one produced on an x86_64 system. I
can provide access to the PPC instance to anyone who wishes to verify
before closing the issue.

this is my 3rd attempt to send the email. since this mailman list
doesn't seem to have archives, I can't tell if it went out or not; but
I never got the email from the list, and I checked both that and the
option to get a confirmation, and neither arrived.


FWIW, I can see all three of your messages.
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-10 Thread Andrey Semashev

On 01/10/18 21:03, chen wrote:


At 2018-01-11 00:06:29, "Andrey Semashev" <andrey.semas...@gmail.com> wrote:

On 01/10/18 18:53, chen wrote:

the "lock" prefix will lock the CPU bus, it will be greater penalty on 
the multi-core system.


Just for the record, the lock prefix is implemented much more 
efficiently nowdays and involves CPU cache management rather bus 
locking. It used to lock the memory bus on early CPUs (I want to say 
before Pentium, but I'm not sure which exact architecture changed this). 
In any case, the patch does not introduce new lock instructions but it 
replaces "lock; cmpxchg" loops that are normally generated for the 

 >atomic AND and OR operations with a single instruction.

https://htor.inf.ethz.ch/publications/img/atomic-bench.pdf

In this paper, the author explain toat lock (SWP) just performance drop
a little in modern CPUs, but they just try less cores system (Xeon Phi
have more lost and it is single socket CPU), on multi-socket system,
the cache coherency maintenance will be very expensive.


I don't dispute that on massively parallel systems cache coherency 
protocols are more expensive. They are equally as expensive with the 
current code. If anything, replacing a CAS loop with a single 
instruction has the potential to *reduce* the number of executed atomic 
instructions. More so on heavy contention.



However, the intrinsic may get more benefit from compiler, it may decide

which method is best choice on target platform.


Well, on x86 there really is not much choice of atomic instructions, all 
of them have the lock prefix (xchg has an implicit one) and presumably 
rely on the same cache coherency protocols. There are TSX extensions, 
but given that the operations always modify the same memory, I don't see 
those as beneficial. So basically, you can only hope that the compiler 
does perform the optimization that this patch does, but generated code 
inspection shows that current compilers are not capable enough.

___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-10 Thread Andrey Semashev

On 01/10/18 18:53, chen wrote:

Hi Andrey,

Our code rule prohibit inline assembly, especially the patch used GCC 
extension syntax.


Ok, I see.

the "lock" prefix will lock the CPU bus, it will be greater penalty on 
the multi-core system.


Just for the record, the lock prefix is implemented much more 
efficiently nowdays and involves CPU cache management rather bus 
locking. It used to lock the memory bus on early CPUs (I want to say 
before Pentium, but I'm not sure which exact architecture changed this). 
In any case, the patch does not introduce new lock instructions but it 
replaces "lock; cmpxchg" loops that are normally generated for the 
atomic AND and OR operations with a single instruction.



At 2018-01-10 23:30:06, "Andrey Semashev" <andrey.semas...@gmail.com 
<mailto:andrey.semas...@gmail.com>> wrote:

Any feedback on this one?

I've been using it for quite some time locally. It does seem to work 
slightly faster on my Sandy Bridge machine (it should be a few percents 
of gain in fps, although I didn't save the benchmark numbers).


On 01/01/18 15:28, Andrey Semashev wrote:

# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com 
<mailto:andrey.semas...@gmail.com>>
# Date 1514809583 -10800
#  Mon Jan 01 15:26:23 2018 +0300
# Branch atomic_bit_opsv2
# Node ID 81529b6bd6adc8eb31162daeee44399dc1f95999
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Use atomic bit test and set/reset operations on x86.

The 'lock bts/btr' instructions are potentially more efficient than the
'lock cmpxchg' loops which are emitted to implement ATOMIC_AND and ATOMIC_OR
on x86. The commit adds new macros ATOMIC_BTS and ATOMIC_BTR which atomically
set/reset the specified bit in the integer and return the previous value of
the modified bit.

Since in many places of the code the result is not needed, two more macros are
provided as well: ATOMIC_BTS_VOID and ATOMIC_BTR_VOID. The effect of these
macros is the same except that they don't return the previous value. These
macros may generate a slightly more efficient code.

diff -r ff02513b92c0 -r 81529b6bd6ad source/common/threading.h
--- a/source/common/threading.h Fri Dec 22 18:23:24 2017 +0530
+++ b/source/common/threading.h Mon Jan 01 15:26:23 2018 +0300
@@ -80,6 +80,91 @@
  #define ATOMIC_ADD(ptr, val)  __sync_fetch_and_add((volatile int32_t*)ptr, 
val)
  #define GIVE_UP_TIME()usleep(0)
  
+#if defined(__x86_64__) || defined(__i386__)

+
+namespace X265_NS {
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_set_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_reset_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_set(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_reset(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+}
+
+#define ATOMIC_BTS_VOID(ptr, bit)  
atomic_bit_test_and_set_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR_VOID(ptr, bit)  
atomic_bit_test_and_reset_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTS(ptr, bit)  atomic_bit_test_and_set((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR(pt

Re: [x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-10 Thread Andrey Semashev

Any feedback on this one?

I've been using it for quite some time locally. It does seem to work 
slightly faster on my Sandy Bridge machine (it should be a few percents 
of gain in fps, although I didn't save the benchmark numbers).


On 01/01/18 15:28, Andrey Semashev wrote:

# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1514809583 -10800
#  Mon Jan 01 15:26:23 2018 +0300
# Branch atomic_bit_opsv2
# Node ID 81529b6bd6adc8eb31162daeee44399dc1f95999
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Use atomic bit test and set/reset operations on x86.

The 'lock bts/btr' instructions are potentially more efficient than the
'lock cmpxchg' loops which are emitted to implement ATOMIC_AND and ATOMIC_OR
on x86. The commit adds new macros ATOMIC_BTS and ATOMIC_BTR which atomically
set/reset the specified bit in the integer and return the previous value of
the modified bit.

Since in many places of the code the result is not needed, two more macros are
provided as well: ATOMIC_BTS_VOID and ATOMIC_BTR_VOID. The effect of these
macros is the same except that they don't return the previous value. These
macros may generate a slightly more efficient code.

diff -r ff02513b92c0 -r 81529b6bd6ad source/common/threading.h
--- a/source/common/threading.h Fri Dec 22 18:23:24 2017 +0530
+++ b/source/common/threading.h Mon Jan 01 15:26:23 2018 +0300
@@ -80,6 +80,91 @@
  #define ATOMIC_ADD(ptr, val)  __sync_fetch_and_add((volatile int32_t*)ptr, 
val)
  #define GIVE_UP_TIME()usleep(0)
  
+#if defined(__x86_64__) || defined(__i386__)

+
+namespace X265_NS {
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_set_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_reset_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_set(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_reset(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+}
+
+#define ATOMIC_BTS_VOID(ptr, bit)  
atomic_bit_test_and_set_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR_VOID(ptr, bit)  
atomic_bit_test_and_reset_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTS(ptr, bit)  atomic_bit_test_and_set((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR(ptr, bit)  atomic_bit_test_and_reset((uint32_t*)(ptr), 
(bit))
+
+#endif // defined(__x86_64__) || defined(__i386__)
+
  #elif defined(_MSC_VER)   /* Windows atomic intrinsics */
  
  #include 

@@ -93,8 +178,26 @@
  #define ATOMIC_AND(ptr, mask) _InterlockedAnd((volatile LONG*)ptr, (LONG)mask)
  #define GIVE_UP_TIME()Sleep(0)
  
+#if defined(_M_IX86) || defined(_M_X64)

+#define ATOMIC_BTS(ptr, bit)  (!!_interlockedbittestandset((long*)(ptr), 
(bit)))
+#define ATOMIC_BTR(ptr, bit)  (!!_interlockedbittestandreset((long*)(ptr), 
(bit)))
+#endif // defined(_M_IX86) || defined(_M_X64)
+
  #endif // ifdef __GNUC__
  
+#if !defined(ATOMIC_BTS)

+#define ATOMIC_BTS(ptr, bit)  (!!(ATOMIC_OR(ptr, (1u << (bit & (1u << 
(bit)))
+#endif
+#if !defined(ATOMIC_BTR)
+#define ATOMIC_BTR(ptr, bit)  (!!(ATOMIC_AND(ptr, ~(1u << (bit & (1u << 
(bit)))
+#endif
+#if !defi

[x265] [PATCH 2 of 2] Do not use logging in a signal handler

2018-01-10 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1515597241 -10800
#  Wed Jan 10 18:14:01 2018 +0300
# Branch add_custom_logging
# Node ID fbd3aa7b346e337bc5fefb4f2e51fb74c2ded597
# Parent  a5ad2f32cd45697897fdf6b643c69291861520c7
Do not use logging in a signal handler.

Signal handlers can only call signal-safe functions, which the default logging
function is not (it uses a lot of unsafe C functions). User-specified logging
functions are also unlikely to be prepared to be called from a signal handler.

diff -r a5ad2f32cd45 -r fbd3aa7b346e source/output/reconplay.cpp
--- a/source/output/reconplay.cpp   Wed Jan 10 18:09:07 2018 +0300
+++ b/source/output/reconplay.cpp   Wed Jan 10 18:14:01 2018 +0300
@@ -42,8 +42,6 @@
 #ifndef _WIN32
 static void sigpipe_handler(int)
 {
-if (ReconPlay::pipeValid)
-g_x265_log(NULL, "exec", false, X265_LOG_ERROR, "pipe closed\n");
 ReconPlay::pipeValid = false;
 }
 #endif
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] [PATCH 1 of 2] Add support for customizing logging

2018-01-10 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1515596947 -10800
#  Wed Jan 10 18:09:07 2018 +0300
# Branch add_custom_logging
# Node ID a5ad2f32cd45697897fdf6b643c69291861520c7
# Parent  2f3c4158cf3553030920708271bc43cdc79932a3
Add support for customizing logging

This commit adds support for customizing logging behavior. This can be used
by the application using libx265 to override the default behavior of printing
messages to the console with the application-specific logging system. This is
especially important if the console is not available or monitored for the
application (e.g. if it runs as a service or daemon).

The commit also increments API version.

diff -r 2f3c4158cf35 -r a5ad2f32cd45 source/CMakeLists.txt
--- a/source/CMakeLists.txt Thu Jan 04 12:37:01 2018 +0530
+++ b/source/CMakeLists.txt Wed Jan 10 18:09:07 2018 +0300
@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 151)
+set(X265_BUILD 152)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
diff -r 2f3c4158cf35 -r a5ad2f32cd45 source/common/common.cpp
--- a/source/common/common.cpp  Thu Jan 04 12:37:01 2018 +0530
+++ b/source/common/common.cpp  Wed Jan 10 18:09:07 2018 +0300
@@ -102,49 +102,9 @@
 return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
 }
 
-void general_log(const x265_param* param, const char* caller, int level, const 
char* fmt, ...)
-{
-if (param && level > param->logLevel)
-return;
-const int bufferSize = 4096;
-char buffer[bufferSize];
-int p = 0;
-const char* log_level;
-switch (level)
-{
-case X265_LOG_ERROR:
-log_level = "error";
-break;
-case X265_LOG_WARNING:
-log_level = "warning";
-break;
-case X265_LOG_INFO:
-log_level = "info";
-break;
-case X265_LOG_DEBUG:
-log_level = "debug";
-break;
-case X265_LOG_FULL:
-log_level = "full";
-break;
-default:
-log_level = "unknown";
-break;
-}
+x265_log_t g_x265_log = _log;
 
-if (caller)
-p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
-va_list arg;
-va_start(arg, fmt);
-vsnprintf(buffer + p, bufferSize - p, fmt, arg);
-va_end(arg);
-fputs(buffer, stderr);
-}
-
-#if _WIN32
-/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
- * For other OS we do not make any changes. */
-void general_log_file(const x265_param* param, const char* caller, int level, 
const char* fmt, ...)
+void general_log(const x265_param* param, const char* caller, int national, 
int level, const char* fmt, ...)
 {
 if (param && level > param->logLevel)
 return;
@@ -181,19 +141,31 @@
 vsnprintf(buffer + p, bufferSize - p, fmt, arg);
 va_end(arg);
 
-HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
-DWORD mode;
-if (GetConsoleMode(console, ))
+#if _WIN32
+if (national)
 {
-wchar_t buf_utf16[bufferSize];
-int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
-if (length_utf16 > 0)
-WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
+DWORD mode;
+if (GetConsoleMode(console, ))
+{
+wchar_t buf_utf16[bufferSize];
+int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
+if (length_utf16 > 0)
+WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+return;
+}
 }
-else
-fputs(buffer, stderr);
+#else
+// Suppress warnings about unused argument
+(void)national;
+#endif
+
+fputs(buffer, stderr);
 }
 
+#if _WIN32
+/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
+ * For other OS we do not make any changes. */
 FILE* x265_fopen(const char* fileName, const char* mode)
 {
 wchar_t buf_utf16[MAX_PATH * 2], mode_utf16[16];
diff -r 2f3c4158cf35 -r a5ad2f32cd45 source/common/common.h
--- a/source/common/common.hThu Jan 04 12:37:01 2018 +0530
+++ b/source/common/common.hWed Jan 10 18:09:07 2018 +0300
@@ -411,16 +411,15 @@
 
 /* located in common.cpp */
 int64_t  x265_mdate(void);
-#define  x265_log(param, ...) general_log(param, "x265", __VA_ARGS__)
-#define  x265_log_file(param, ...) general_log_file(param, "x265

[x265] [PATCH 0 of 2] Add support for customizing logging v2

2018-01-10 Thread Andrey Semashev
This is a minor update from the previous patch. The changes are:

- Incremented X265_BUILD version.
- Moved set_log in x265_api above the delimiter comment to add extensions below 
it.
- Added a second patch to remove logging from a signal handler.
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


Re: [x265] [PATCH] Add support for customizing logging

2018-01-10 Thread Andrey Semashev

On 01/10/18 11:52, Pradeep Ramachandran wrote:


On Mon, Jan 1, 2018 at 6:05 PM, Andrey Semashev 
<andrey.semas...@gmail.com <mailto:andrey.semas...@gmail.com>> wrote:


# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com
<mailto:andrey.semas...@gmail.com>>
# Date 1514810081 -10800
#      Mon Jan 01 15:34:41 2018 +0300
# Branch add_custom_logging
# Node ID 248f862033e9fb01ee54d3e466dc284eeeb30365
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Add support for customizing logging.

This commit adds support for customizing logging behavior. This can
be used
by the application using libx265 to override the default behavior of
printing
messages to the console with the application-specific logging
system. This is
especially important if the console is not available or monitored
for the
application (e.g. if it runs as a service or daemon).


I like this patch - it is a good feature to have to support better 
logging for application that includes libx265. We have to increase 
BUILD_NUMBER as there is a change to the API but that is a trivial change.


I can increment X265_BUILD in source/CMakeLists.txt and prepare a new patch.

Having the g_x265_log as a global may result in some issues when the 
application instantiates multiple instances of x265 but I think we can 
live with it for now; a follow-up patch to clean this up would be great :-).


The problem with this is that in quite a few places x265_log is invoked 
with no x265_params, so the logging system has no way to know the 
current encoder state or params, where the logging function pointer 
could be stored. For now, I think the only way is to use the global 
function pointer.



We will regress it and update the repo.


diff -r ff02513b92c0 -r 248f862033e9 source/common/common.cpp
--- a/source/common/common.cpp  Fri Dec 22 18:23:24 2017 +0530
+++ b/source/common/common.cpp  Mon Jan 01 15:34:41 2018 +0300
@@ -102,49 +102,9 @@
      return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
  }

-void general_log(const x265_param* param, const char* caller, int
level, const char* fmt, ...)
-{
-    if (param && level > param->logLevel)
-        return;
-    const int bufferSize = 4096;
-    char buffer[bufferSize];
-    int p = 0;
-    const char* log_level;
-    switch (level)
-    {
-    case X265_LOG_ERROR:
-        log_level = "error";
-        break;
-    case X265_LOG_WARNING:
-        log_level = "warning";
-        break;
-    case X265_LOG_INFO:
-        log_level = "info";
-        break;
-    case X265_LOG_DEBUG:
-        log_level = "debug";
-        break;
-    case X265_LOG_FULL:
-        log_level = "full";
-        break;
-    default:
-        log_level = "unknown";
-        break;
-    }
+x265_log_t g_x265_log = _log;

-    if (caller)
-        p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
-    va_list arg;
-    va_start(arg, fmt);
-    vsnprintf(buffer + p, bufferSize - p, fmt, arg);
-    va_end(arg);
-    fputs(buffer, stderr);
-}
-
-#if _WIN32
-/* For Unicode filenames in Windows we convert UTF-8 strings to
UTF-16 and we use _w functions.
- * For other OS we do not make any changes. */
-void general_log_file(const x265_param* param, const char* caller,
int level, const char* fmt, ...)
+void general_log(const x265_param* param, const char* caller, int
national, int level, const char* fmt, ...)
  {
      if (param && level > param->logLevel)
          return;
@@ -181,19 +141,31 @@
      vsnprintf(buffer + p, bufferSize - p, fmt, arg);
      va_end(arg);

-    HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
-    DWORD mode;
-    if (GetConsoleMode(console, ))
+#if _WIN32
+    if (national)
      {
-        wchar_t buf_utf16[bufferSize];
-        int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer,
-1, buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
-        if (length_utf16 > 0)
-            WriteConsoleW(console, buf_utf16, length_utf16, ,
NULL);
+        HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
+        DWORD mode;
+        if (GetConsoleMode(console, ))
+        {
+            wchar_t buf_utf16[bufferSize];
+            int length_utf16 = MultiByteToWideChar(CP_UTF8, 0,
buffer, -1, buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
+            if (length_utf16 > 0)
+                WriteConsoleW(console, buf_utf16, length_utf16,
, NULL);
+            return;
+        }
      }
-    else
-        fputs(bu

[x265] [PATCH] Fix invalid UTF-8 characters in the docs

2018-01-01 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1514810404 -10800
#  Mon Jan 01 15:40:04 2018 +0300
# Branch fix_docs_invalid_utf8_chars
# Node ID 3bf4f11669b20669cf2d64f7405774c41ad24b98
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Fix invalid UTF-8 characters in the docs.

This causes errors during docs generation on Linux.

diff -r ff02513b92c0 -r 3bf4f11669b2 doc/reST/cli.rst
--- a/doc/reST/cli.rst  Fri Dec 22 18:23:24 2017 +0530
+++ b/doc/reST/cli.rst  Mon Jan 01 15:40:04 2018 +0300
@@ -2058,7 +2058,7 @@
Example for MaxCLL=1000 candela per square meter, MaxFALL=400
candela per square meter:
 
-   --max-cll “1000,400”
+   --max-cll "1000,400"
 
Note that this string value will need to be escaped or quoted to
protect against shell expansion on many platforms. No default.
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] [PATCH] Add support for customizing logging

2018-01-01 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1514810081 -10800
#  Mon Jan 01 15:34:41 2018 +0300
# Branch add_custom_logging
# Node ID 248f862033e9fb01ee54d3e466dc284eeeb30365
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Add support for customizing logging.

This commit adds support for customizing logging behavior. This can be used
by the application using libx265 to override the default behavior of printing
messages to the console with the application-specific logging system. This is
especially important if the console is not available or monitored for the
application (e.g. if it runs as a service or daemon).

diff -r ff02513b92c0 -r 248f862033e9 source/common/common.cpp
--- a/source/common/common.cpp  Fri Dec 22 18:23:24 2017 +0530
+++ b/source/common/common.cpp  Mon Jan 01 15:34:41 2018 +0300
@@ -102,49 +102,9 @@
 return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
 }
 
-void general_log(const x265_param* param, const char* caller, int level, const 
char* fmt, ...)
-{
-if (param && level > param->logLevel)
-return;
-const int bufferSize = 4096;
-char buffer[bufferSize];
-int p = 0;
-const char* log_level;
-switch (level)
-{
-case X265_LOG_ERROR:
-log_level = "error";
-break;
-case X265_LOG_WARNING:
-log_level = "warning";
-break;
-case X265_LOG_INFO:
-log_level = "info";
-break;
-case X265_LOG_DEBUG:
-log_level = "debug";
-break;
-case X265_LOG_FULL:
-log_level = "full";
-break;
-default:
-log_level = "unknown";
-break;
-}
+x265_log_t g_x265_log = _log;
 
-if (caller)
-p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
-va_list arg;
-va_start(arg, fmt);
-vsnprintf(buffer + p, bufferSize - p, fmt, arg);
-va_end(arg);
-fputs(buffer, stderr);
-}
-
-#if _WIN32
-/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
- * For other OS we do not make any changes. */
-void general_log_file(const x265_param* param, const char* caller, int level, 
const char* fmt, ...)
+void general_log(const x265_param* param, const char* caller, int national, 
int level, const char* fmt, ...)
 {
 if (param && level > param->logLevel)
 return;
@@ -181,19 +141,31 @@
 vsnprintf(buffer + p, bufferSize - p, fmt, arg);
 va_end(arg);
 
-HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
-DWORD mode;
-if (GetConsoleMode(console, ))
+#if _WIN32
+if (national)
 {
-wchar_t buf_utf16[bufferSize];
-int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
-if (length_utf16 > 0)
-WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+HANDLE console = GetStdHandle(STD_ERROR_HANDLE);
+DWORD mode;
+if (GetConsoleMode(console, ))
+{
+wchar_t buf_utf16[bufferSize];
+int length_utf16 = MultiByteToWideChar(CP_UTF8, 0, buffer, -1, 
buf_utf16, sizeof(buf_utf16)/sizeof(wchar_t)) - 1;
+if (length_utf16 > 0)
+WriteConsoleW(console, buf_utf16, length_utf16, , NULL);
+return;
+}
 }
-else
-fputs(buffer, stderr);
+#else
+// Suppress warnings about unused argument
+(void)national;
+#endif
+
+fputs(buffer, stderr);
 }
 
+#if _WIN32
+/* For Unicode filenames in Windows we convert UTF-8 strings to UTF-16 and we 
use _w functions.
+ * For other OS we do not make any changes. */
 FILE* x265_fopen(const char* fileName, const char* mode)
 {
 wchar_t buf_utf16[MAX_PATH * 2], mode_utf16[16];
diff -r ff02513b92c0 -r 248f862033e9 source/common/common.h
--- a/source/common/common.hFri Dec 22 18:23:24 2017 +0530
+++ b/source/common/common.hMon Jan 01 15:34:41 2018 +0300
@@ -411,16 +411,15 @@
 
 /* located in common.cpp */
 int64_t  x265_mdate(void);
-#define  x265_log(param, ...) general_log(param, "x265", __VA_ARGS__)
-#define  x265_log_file(param, ...) general_log_file(param, "x265", __VA_ARGS__)
-void general_log(const x265_param* param, const char* caller, int level, 
const char* fmt, ...);
+extern x265_log_t g_x265_log;
+#define  x265_log(param, ...) X265_NS::g_x265_log(param, "x265", 0, 
__VA_ARGS__)
+#define  x265_log_file(param, ...) X265_NS::g_x265_log(param, "x265", 1, 
__VA_ARGS__)
+void general_log(const x265_param* param, const char* caller, int 
national, int level, const char* fmt, ...);
 #if _WIN32
-void general_log_file(const x265_param* param, const char* caller, int 
level, const char* fmt, ...);
 FILE*x265_fopen(const char* fileName, const char* mode);
 int  x265_unlink(const char* fileName);
 int  

[x265] [PATCH] Use atomic bit test and set/reset operations on x86

2018-01-01 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1514809583 -10800
#  Mon Jan 01 15:26:23 2018 +0300
# Branch atomic_bit_opsv2
# Node ID 81529b6bd6adc8eb31162daeee44399dc1f95999
# Parent  ff02513b92c000c3bb3dcc51deb79af57f5358d5
Use atomic bit test and set/reset operations on x86.

The 'lock bts/btr' instructions are potentially more efficient than the
'lock cmpxchg' loops which are emitted to implement ATOMIC_AND and ATOMIC_OR
on x86. The commit adds new macros ATOMIC_BTS and ATOMIC_BTR which atomically
set/reset the specified bit in the integer and return the previous value of
the modified bit.

Since in many places of the code the result is not needed, two more macros are
provided as well: ATOMIC_BTS_VOID and ATOMIC_BTR_VOID. The effect of these
macros is the same except that they don't return the previous value. These
macros may generate a slightly more efficient code.

diff -r ff02513b92c0 -r 81529b6bd6ad source/common/threading.h
--- a/source/common/threading.h Fri Dec 22 18:23:24 2017 +0530
+++ b/source/common/threading.h Mon Jan 01 15:26:23 2018 +0300
@@ -80,6 +80,91 @@
 #define ATOMIC_ADD(ptr, val)  __sync_fetch_and_add((volatile int32_t*)ptr, val)
 #define GIVE_UP_TIME()usleep(0)
 
+#if defined(__x86_64__) || defined(__i386__)
+
+namespace X265_NS {
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_set_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_reset_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_set(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_reset(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+}
+
+#define ATOMIC_BTS_VOID(ptr, bit)  
atomic_bit_test_and_set_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR_VOID(ptr, bit)  
atomic_bit_test_and_reset_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTS(ptr, bit)  atomic_bit_test_and_set((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR(ptr, bit)  atomic_bit_test_and_reset((uint32_t*)(ptr), 
(bit))
+
+#endif // defined(__x86_64__) || defined(__i386__)
+
 #elif defined(_MSC_VER)   /* Windows atomic intrinsics */
 
 #include 
@@ -93,8 +178,26 @@
 #define ATOMIC_AND(ptr, mask) _InterlockedAnd((volatile LONG*)ptr, (LONG)mask)
 #define GIVE_UP_TIME()Sleep(0)
 
+#if defined(_M_IX86) || defined(_M_X64)
+#define ATOMIC_BTS(ptr, bit)  (!!_interlockedbittestandset((long*)(ptr), 
(bit)))
+#define ATOMIC_BTR(ptr, bit)  (!!_interlockedbittestandreset((long*)(ptr), 
(bit)))
+#endif // defined(_M_IX86) || defined(_M_X64)
+
 #endif // ifdef __GNUC__
 
+#if !defined(ATOMIC_BTS)
+#define ATOMIC_BTS(ptr, bit)  (!!(ATOMIC_OR(ptr, (1u << (bit & (1u << 
(bit)))
+#endif
+#if !defined(ATOMIC_BTR)
+#define ATOMIC_BTR(ptr, bit)  (!!(ATOMIC_AND(ptr, ~(1u << (bit & (1u << 
(bit)))
+#endif
+#if !defined(ATOMIC_BTS_VOID)
+#define ATOMIC_BTS_VOID ATOMIC_BTS
+#endif
+#if !defined(ATOMIC_BTR_VOID)
+#define ATOMIC_BTR_VOID ATOMIC_BTR
+#endif
+
 namespace X265_NS {
 // x265 private namespace
 
diff -r ff02513b92c0 -r 81529b6bd6ad source/common/threadpool.cpp
--- a/source/common/threadpool.cpp  F

Re: [x265] [PATCH 2 of 2] x86: Change assembler from YASM to NASM

2017-11-21 Thread Andrey Semashev

On 11/21/17 15:25, Sean McGovern wrote:

Hi,

Is this really necessary?

Ubuntu 16.04 Xenial Xerus only ships with nasm 2.11.08.


yasm does not support AVX-512 and has very low activity recently[1]. 
nasm supports AVX-512 since 2.13[2]. Even if x265 does not use AVX-512 
currently, this is the right way forward in the long run. It may be 
possible to reduce the minimum required nasm version though, until 
AVX-512 support is added.


BTW, x264 and ffmpeg made a similar move recently.

[1]: https://github.com/yasm/yasm/commits/master
[2]: http://www.nasm.us/doc/nasmdocc.html


   Original Message
From: vign...@multicorewareinc.com
Sent: November 21, 2017 12:07 AM
To: x265-devel@videolan.org
Reply-to: x265-devel@videolan.org
Subject: [x265] [PATCH 2 of 2] x86: Change assembler from YASM to NASM

# HG changeset patch
# User Vignesh Vijayakumar
# Date 1509595841 -19800
#  Thu Nov 02 09:40:41 2017 +0530
# Node ID 16ea92bf3627c6de43d583554df294dbbfd8fa8a
# Parent  182bfd0d5af929a801a08b35ee863d79eadb2833
x86: Change assembler from YASM to NASM

Supports NASM versions 2.13 and greater

diff -r 182bfd0d5af9 -r 16ea92bf3627 source/CMakeLists.txt
--- a/source/CMakeLists.txt Thu Nov 02 09:39:58 2017 +0530
+++ b/source/CMakeLists.txt Thu Nov 02 09:40:41 2017 +0530
@@ -323,15 +323,15 @@
  execute_process(COMMAND ${CMAKE_CXX_COMPILER} -dumpversion 
OUTPUT_VARIABLE CC_VERSION)
endif(GCC)

-find_package(Yasm)
+find_package(Nasm)
if(ARM OR CROSS_COMPILE_ARM)
  option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" ON)
-elseif(YASM_FOUND AND X86)
-    if (YASM_VERSION_STRING VERSION_LESS "1.2.0")
-    message(STATUS "Yasm version ${YASM_VERSION_STRING} is too old. 1.2.0 or 
later required")
+elseif(NASM_FOUND AND X86)
+    if (NASM_VERSION_STRING VERSION_LESS "2.13.0")
+    message(STATUS "Nasm version ${NASM_VERSION_STRING} is too old. 2.13.0 or 
later required")
  option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" OFF)
  else()
-    message(STATUS "Found Yasm ${YASM_VERSION_STRING} to build assembly 
primitives")
+    message(STATUS "Found Nasm ${NASM_VERSION_STRING} to build assembly 
primitives")
  option(ENABLE_ASSEMBLY "Enable use of assembly coded primitives" ON)
  endif()
else()
@@ -517,18 +517,18 @@
  list(APPEND ASM_OBJS ${ASM}.${SUFFIX})
  add_custom_command(
  OUTPUT ${ASM}.${SUFFIX}
-    COMMAND ${YASM_EXECUTABLE} ARGS ${YASM_FLAGS} ${ASM_SRC} -o 
${ASM}.${SUFFIX}
+    COMMAND ${NASM_EXECUTABLE} ARGS ${NASM_FLAGS} ${ASM_SRC} -o 
${ASM}.${SUFFIX}
  DEPENDS ${ASM_SRC})
  endforeach()
  endif()
endif()
source_group(ASM FILES ${ASM_SRCS})
if(ENABLE_HDR10_PLUS)
-    add_library(x265-static STATIC $ 
$ $ ${ASM_OBJS} ${ASM_SRCS})
+    add_library(x265-static STATIC $ 
$ $ ${ASM_OBJS})
  add_library(hdr10plus-static STATIC $)
  set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
else()
-    add_library(x265-static STATIC $ 
$ ${ASM_OBJS} ${ASM_SRCS})
+    add_library(x265-static STATIC $ 
$ ${ASM_OBJS})
endif()
if(NOT MSVC)
  set_target_properties(x265-static PROPERTIES OUTPUT_NAME x265)
@@ -686,11 +686,11 @@
  if(ENABLE_HDR10_PLUS)
  add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
  x265.cpp x265.h x265cli.h
-    $ $ 
$ ${ASM_OBJS} ${ASM_SRCS})
+    $ $ 
$ ${ASM_OBJS})
  else()
  add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} 
${GETOPT}
  x265.cpp x265.h x265cli.h
-    $ $ 
${ASM_OBJS} ${ASM_SRCS})
+    $ $ 
${ASM_OBJS})
  endif()
  else()
  add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} 
${X265_RC_FILE}
diff -r 182bfd0d5af9 -r 16ea92bf3627 source/cmake/CMakeASM_NASMInformation.cmake
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/source/cmake/CMakeASM_NASMInformation.cmake   Thu Nov 02 09:40:41 
2017 +0530
@@ -0,0 +1,68 @@
+set(ASM_DIALECT "_NASM")
+set(CMAKE_ASM${ASM_DIALECT}_SOURCE_FILE_EXTENSIONS asm)
+
+if(X64)
+    list(APPEND ASM_FLAGS -DARCH_X86_64=1 -I 
${CMAKE_CURRENT_SOURCE_DIR}/../common/x86/)
+    if(ENABLE_PIC)
+    list(APPEND ASM_FLAGS -DPIC)
+    endif()
+    if(APPLE)
+    set(ARGS -f macho64 -DPREFIX)
+    elseif(UNIX AND NOT CYGWIN)
+    set(ARGS -f elf64)
+    else()
+    set(ARGS -f win64)
+    endif()
+else()
+    list(APPEND ASM_FLAGS -DARCH_X86_64=0 -I 
${CMAKE_CURRENT_SOURCE_DIR}/../common/x86/)
+    if(APPLE)
+    set(ARGS -f macho32 -DPREFIX)
+    elseif(UNIX AND NOT CYGWIN)
+    set(ARGS -f elf32)
+    else()
+    set(ARGS -f win32 -DPREFIX)
+    endif()
+endif()
+
+if(GCC)
+    list(APPEND ASM_FLAGS -DHAVE_ALIGNED_STACK=1)
+else()
+    list(APPEND ASM_FLAGS -DHAVE_ALIGNED_STACK=0)

Re: [x265] [PATCH] threadpool: fix memory leak

2017-07-21 Thread Andrey Semashev

On 07/21/17 09:29, ar...@multicorewareinc.com wrote:

# HG changeset patch
# User Aruna Matheswaran 
# Date 1500457328 -19800
#  Wed Jul 19 15:12:08 2017 +0530
# Branch stable
# Node ID fd354d5ec1328a000c24e1551804a1ce569ec3b0
# Parent  adbcc90bdef36b50a091deb5b0d0ad77debfbee7
threadpool: fix memory leak

diff -r adbcc90bdef3 -r fd354d5ec132 source/common/threadpool.cpp
--- a/source/common/threadpool.cpp  Thu Jul 13 16:50:18 2017 +0530
+++ b/source/common/threadpool.cpp  Wed Jul 19 15:12:08 2017 +0530
@@ -454,6 +454,7 @@
  if ((nodeMaskPerPool[node] >> j) & 1)
  len += sprintf(nodesstr + len, ",%d", j);
  x265_log(p, X265_LOG_INFO, "Thread pool %d using %d threads on 
numa nodes %s\n", i, numThreads, nodesstr + 1);
+free(nodesstr);


This applies free() to a pointer obtained with new[]. It should be 
delete[] or better yet - replace the original new with a buffer on the 
stack.



  }
  else
  x265_log(p, X265_LOG_INFO, "Thread pool created using %d 
threads\n", numThreads);


___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] Negative integer shifts

2017-04-26 Thread Andrey Semashev

Hi,

While compiling 2.4 I'm seeind lots of warnings like this:

.../source/common/ipfilter.cpp: In instantiation of ‘void 
{anonymous}::interp_horiz_ps_c(const pixel*, intptr_t, int16_t*, 
intptr_t, int, int) [with int N = 8; int width = 4; int height = 4; 
pixel = unsigned char; intptr_t = long int; int16_t = short int]’:

.../source/common/ipfilter.cpp:417:5:   required from here
.../source/common/ipfilter.cpp:126:36: warning: left shift of negative 
value [-Wshift-negative-value]

 int offset = -IF_INTERNAL_OFFS << shift;
  ~~^~~~

Left-shifting negative signed intehers is undefined behavior in C++. 
I've attached a patch that resolves the warnings. The patch assumes 2's 
complement signed integers and that the shift does not introduce an 
arithmetic overflow.
Index: x265-2.4+0-e7a4dd48293b/source/common/ipfilter.cpp
===
--- x265-2.4+0-e7a4dd48293b.orig/source/common/ipfilter.cpp	2017-04-26 17:22:40.548759520 +0300
+++ x265-2.4+0-e7a4dd48293b/source/common/ipfilter.cpp	2017-04-26 17:44:38.960479060 +0300
@@ -123,7 +123,7 @@ void interp_horiz_ps_c(const pixel* src,
 const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
 int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
 int shift = IF_FILTER_PREC - headRoom;
-int offset = -IF_INTERNAL_OFFS << shift;
+int offset = -(IF_INTERNAL_OFFS << shift);
 int blkheight = height;
 
 src -= N / 2 - 1;
@@ -209,7 +209,7 @@ void interp_vert_ps_c(const pixel* src,
 const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
 int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
 int shift = IF_FILTER_PREC - headRoom;
-int offset = -IF_INTERNAL_OFFS << shift;
+int offset = -(IF_INTERNAL_OFFS << shift);
 
 src -= (N / 2 - 1) * srcStride;
 
___
x265-devel mailing list
x265-devel@videolan.org
https://mailman.videolan.org/listinfo/x265-devel


[x265] [PATCH] Use atomic bit test and set/reset operations on x86

2016-11-16 Thread Andrey Semashev
# HG changeset patch
# User Andrey Semashev <andrey.semas...@gmail.com>
# Date 1475709624 -10800
#  Thu Oct 06 02:20:24 2016 +0300
# Branch atomic_bit_ops
# Node ID 539c83edceb540e7785719876a161a48382b4c09
# Parent  af3678bc1dff6eb3df5879ac49fdb532ce8bd6ac
Use atomic bit test and set/reset operations on x86.

The 'lock bts/btr' instructions are potentially more efficient than the
'lock cmpxchg' loops which are emitted to implement ATOMIC_AND and ATOMIC_OR
on x86. The commit adds new macros ATOMIC_BTS and ATOMIC_BTR which atomically
set/reset the specified bit in the integer and return the previous value of
the modified bit.

Since in many places of the code the result is not needed, two more macros
are provided as well: ATOMIC_BTS_VOID and ATOMIC_BTR_VOID. The effect of these
macros is the same except that they don't return the previous value. These
macros may generate a slightly more efficient code.

diff -r af3678bc1dff -r 539c83edceb5 source/common/threading.h
--- a/source/common/threading.h Wed Oct 05 11:58:49 2016 +0530
+++ b/source/common/threading.h Thu Oct 06 02:20:24 2016 +0300
@@ -80,6 +80,91 @@
 #define ATOMIC_ADD(ptr, val)  __sync_fetch_and_add((volatile int32_t*)ptr, val)
 #define GIVE_UP_TIME()usleep(0)
 
+#if defined(__x86_64__) || defined(__i386__)
+
+namespace X265_NS {
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_set_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) void 
atomic_bit_test_and_reset_void(uint32_t* ptr, uint32_t bit)
+{
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr)
+: [bit] "Kq" (bit)
+: "memory"
+);
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_set(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btsl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+inline __attribute__((always_inline)) bool atomic_bit_test_and_reset(uint32_t* 
ptr, uint32_t bit)
+{
+bool res;
+#if defined(__GCC_ASM_FLAG_OUTPUTS__)
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+: [mem] "+m" (*ptr), [res] "=@ccc" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#else
+res = false; // to avoid false dependency on the higher part of the result 
register
+__asm__ __volatile__
+(
+"lock; btrl %[bit], %[mem]\n\t"
+"setc %[res]\n\t"
+: [mem] "+m" (*ptr), [res] "+q" (res)
+: [bit] "Kq" (bit)
+: "memory"
+);
+#endif
+return res;
+}
+
+}
+
+#define ATOMIC_BTS_VOID(ptr, bit)  
atomic_bit_test_and_set_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR_VOID(ptr, bit)  
atomic_bit_test_and_reset_void((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTS(ptr, bit)  atomic_bit_test_and_set((uint32_t*)(ptr), (bit))
+#define ATOMIC_BTR(ptr, bit)  atomic_bit_test_and_reset((uint32_t*)(ptr), 
(bit))
+
+#endif // defined(__x86_64__) || defined(__i386__)
+
 #elif defined(_MSC_VER)   /* Windows atomic intrinsics */
 
 #include 
@@ -93,8 +178,26 @@
 #define ATOMIC_AND(ptr, mask) _InterlockedAnd((volatile LONG*)ptr, (LONG)mask)
 #define GIVE_UP_TIME()Sleep(0)
 
+#if defined(_M_IX86) || defined(_M_X64)
+#define ATOMIC_BTS(ptr, bit)  (!!_interlockedbittestandset((long*)(ptr), 
(bit)))
+#define ATOMIC_BTR(ptr, bit)  (!!_interlockedbittestandreset((long*)(ptr), 
(bit)))
+#endif // defined(_M_IX86) || defined(_M_X64)
+
 #endif // ifdef __GNUC__
 
+#if !defined(ATOMIC_BTS)
+#define ATOMIC_BTS(ptr, bit)  (!!(ATOMIC_OR(ptr, (1u << (bit & (1u << 
(bit)))
+#endif
+#if !defined(ATOMIC_BTR)
+#define ATOMIC_BTR(ptr, bit)  (!!(ATOMIC_AND(ptr, ~(1u << (bit & (1u << 
(bit)))
+#endif
+#if !defined(ATOMIC_BTS_VOID)
+#define ATOMIC_BTS_VOID ATOMIC_BTS
+#endif
+#if !defined(ATOMIC_BTR_VOID)
+#define ATOMIC_BTR_VOID ATOMIC_BTR
+#endif
+
 namespace X265_NS {
 // x265 private namespace
 
diff -r af3678bc1dff -r 539c83edceb5 source/common/threadpool.cpp
--- a/source/common/threadpool.cpp  W