Hello, internals.
As Rowan Collins suggested i've replaced lookup table with simple macros:
#define UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0xFC00 == 0xD800)
#define UTF16_BE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0x00FC == 0x00D8)
I repeated the benchmarks again. Here is the
Got it. Thanks.
On Mon, Feb 11, 2019, 18:00 Dan Ackroyd On Sun, 10 Feb 2019 at 12:29, Legale Legage
> wrote:
> >
> >
> >
> https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38
> >
> > To do, or not to do: that is the
On Sun, 10 Feb 2019 at 12:29, Legale Legage wrote:
>
>
> https://github.com/php/php-src/pull/3715/commits/d868059626290b7ba773b957045e08c3efb1d603#diff-22d593ced03b2cb94450d9f9990865c8R38
>
> To do, or not to do: that is the question.
> What do you think?
Opening separate pull requests for
Good idea, thanks. should be a bit slower than lookup table, but faster
then now.
On Sun, Feb 10, 2019, 21:02 Rowan Collins On 10/02/2019 12:29, Legale Legage wrote:
> > This conception can be used for the utf-16 encoding, but table size
> > would be 65536 bytes against 256 byte for the utf-8
On 10/02/2019 12:29, Legale Legage wrote:
This conception can be used for the utf-16 encoding, but table size
would be 65536 bytes against 256 byte for the utf-8 table.
Rather than two 65 kilobyte lookup tables with most entries identical,
would it be reasonable to use a bit mask to check for
Hello, internals!
While I was working on a new function mb_str_split
(https://wiki.php.net/rfc/mb_str_split) for the extension mbstring, I
noticed a place to seriously improve the mbfl library performance for
the utf-16 encoding.
Currently, all variable-length encodings are processed