Re: [PHP-DEV] reasonability of change the mbfl library

2019-02-12 Thread Legale Legage
Hello, internals. As Rowan Collins suggested i've replaced lookup table with simple macros: #define UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0xFC00 == 0xD800) #define UTF16_BE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0x00FC == 0x00D8) I repeated the benchmarks again. Here is the

Re: [PHP-DEV] reasonability of change the mbfl library

2019-02-11 Thread Legale Legage
Got it. Thanks. On Mon, Feb 11, 2019, 18:00 Dan Ackroyd On Sun, 10 Feb 2019 at 12:29, Legale Legage > wrote: > > > > > > > https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38 > > > > To do, or not to do: that is the

Re: [PHP-DEV] reasonability of change the mbfl library

2019-02-11 Thread Dan Ackroyd
On Sun, 10 Feb 2019 at 12:29, Legale Legage wrote: > > > https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38 > > To do, or not to do: that is the question. > What do you think? Opening separate pull requests for

Re: [PHP-DEV] reasonability of change the mbfl library

2019-02-10 Thread Legale Legage
Good idea, thanks. should be a bit slower than lookup table, but faster then now. On Sun, Feb 10, 2019, 21:02 Rowan Collins On 10/02/2019 12:29, Legale Legage wrote: > > This conception can be used for the utf-16 encoding, but table size > > would be 65536 bytes against 256 byte for the utf-8

Re: [PHP-DEV] reasonability of change the mbfl library

2019-02-10 Thread Rowan Collins
On 10/02/2019 12:29, Legale Legage wrote: This conception can be used for the utf-16 encoding, but table size would be 65536 bytes against 256 byte for the utf-8 table. Rather than two 65 kilobyte lookup tables with most entries identical, would it be reasonable to use a bit mask to check for

[PHP-DEV] reasonability of change the mbfl library

2019-02-10 Thread Legale Legage
Hello, internals! While I was working on a new function mb_str_split (https://wiki.php.net/rfc/mb_str_split) for the extension mbstring, I noticed a place to seriously improve the mbfl library performance for the utf-16 encoding. Currently, all variable-length encodings are processed