Re: Encode UTF-8 optimizations

2016-11-01 Thread pali
Hi! New Encode 2.87 with lots of fixes for Encode.xs and Encode::MIME::Header was released. Can you sync/import it into blead?

Re: Encode UTF-8 optimizations

2016-10-27 Thread pali
On Sunday 25 September 2016 10:49:41 Karl Williamson wrote: > On 09/25/2016 04:06 AM, p...@cpan.org wrote: > >On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: > >>On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > >>>We may change Encode in blead too, since it already differs

Re: Encode UTF-8 optimizations

2016-09-25 Thread Karl Williamson
On 09/25/2016 04:06 AM, p...@cpan.org wrote: On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: We may change Encode in blead too, since it already differs from cpan. I'll have to get Sawyer's opinion on that. But the next st

Re: Encode UTF-8 optimizations

2016-09-25 Thread pali
On Thursday 01 September 2016 09:30:08 p...@cpan.org wrote: > On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > > We may change Encode in blead too, since it already differs from > > cpan. I'll have to get Sawyer's opinion on that. But the next > > step is for me to fix Devel::PPPort t

Re: Encode UTF-8 optimizations

2016-09-01 Thread pali
On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote: > On 08/31/2016 03:43 PM, p...@cpan.org wrote: > >On Monday 29 August 2016 17:00:00 Karl Williamson wrote: > >>If you'd be willing to test this out, especially the performance > >>parts that would be great! > >[snip] > >>There are 2 experi

Re: Encode UTF-8 optimizations

2016-08-31 Thread Karl Williamson
On 08/31/2016 03:43 PM, p...@cpan.org wrote: On Monday 29 August 2016 17:00:00 Karl Williamson wrote: If you'd be willing to test this out, especially the performance parts that would be great! [snip] There are 2 experimental performance commits. If you want to see if they actually improve pe

Re: Encode UTF-8 optimizations

2016-08-31 Thread pali
On Monday 29 August 2016 17:00:00 Karl Williamson wrote: > If you'd be willing to test this out, especially the performance > parts that would be great! [snip] > There are 2 experimental performance commits. If you want to see if > they actually improve performance by doing a before/after compare

Re: Encode UTF-8 optimizations

2016-08-29 Thread Karl Williamson
On 08/25/2016 01:48 AM, p...@cpan.org wrote: Anyway, if you need some help with Encode module or something different, let me know. As I want to have UTF-8 support in Encode correctly working... I now have a branch with my proposed changes at: http://perl5.git.perl.org/perl.git/shortlog/refs/hea

Re: Encode UTF-8 optimizations

2016-08-25 Thread pali
On Wednesday 24 August 2016 22:49:21 Karl Williamson wrote: > On 08/22/2016 02:47 PM, p...@cpan.org wrote: > > snip > > >I added some tests for overlong sequences. Only for ASCII platforms, tests > >for EBCDIC > >are missing (sorry, I do not have access to any EBCDIC platform for testing). > >

Re: Encode UTF-8 optimizations

2016-08-24 Thread Karl Williamson
On 08/22/2016 02:47 PM, p...@cpan.org wrote: snip I added some tests for overlong sequences. Only for ASCII platforms, tests for EBCDIC are missing (sorry, I do not have access to any EBCDIC platform for testing). It's fine to skip those tests on EBCDIC. > > Anyway, how it behave on EBCDI

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Monday 22 August 2016 23:38:05 Karl Williamson wrote: > And, I'd rather not tweak it to call UTF8_IS_SUPER first, > because that relies on knowing what the current internal > implementation is. Then maybe add new macro isUTF8_CHAR_STRICT which only check if character is strictly valid UTF-8? I

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
(this only applies for strict UTF-8) On Monday 22 August 2016 23:19:51 Karl Williamson wrote: > The code could be tweaked to call UTF8_IS_SUPER first, but I'm > asserting that an optimizing compiler will see that any call to > is_utf8_char_slow() is pointless, and will optimize it out. Such optim

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 03:19 PM, Karl Williamson wrote: On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a code point th

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 02:47 PM, p...@cpan.org wrote: > And I think you misunderstand when is_utf8_char_slow() is called. It is > called only when the next byte in the input indicates that the only > legal UTF-8 that might follow would be for a code point that is at least > U+20, almost twice as high

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Monday 22 August 2016 21:43:59 Karl Williamson wrote: > On 08/22/2016 07:05 AM, p...@cpan.org wrote: > > On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: > >> On 08/21/2016 02:34 AM, p...@cpan.org wrote: > >>> On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > Top posting. >

Re: Encode UTF-8 optimizations

2016-08-22 Thread Karl Williamson
On 08/22/2016 07:05 AM, p...@cpan.org wrote: On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: On 08/21/2016 02:34 AM, p...@cpan.org wrote: On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: Top posting. Attached is my alternative patch. It effectively uses a different algorithm

Re: Encode UTF-8 optimizations

2016-08-22 Thread pali
On Sunday 21 August 2016 08:49:08 Karl Williamson wrote: > On 08/21/2016 02:34 AM, p...@cpan.org wrote: > >On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > >>Top posting. > >> > >>Attached is my alternative patch. It effectively uses a different > >>algorithm to avoid decoding the input

Re: Encode UTF-8 optimizations

2016-08-21 Thread Karl Williamson
On 08/21/2016 02:34 AM, p...@cpan.org wrote: On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: Top posting. Attached is my alternative patch. It effectively uses a different algorithm to avoid decoding the input into code points, and to copy all spans of valid input at once, instead of

Re: Encode UTF-8 optimizations

2016-08-21 Thread pali
On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: > Top posting. > > Attached is my alternative patch. It effectively uses a different > algorithm to avoid decoding the input into code points, and to copy > all spans of valid input at once, instead of character at a time. > > And it uses

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson
On 08/20/2016 08:33 PM, Aristotle Pagaltzis wrote: * Karl Williamson [2016-08-21 03:12]: That should be done anyway to make sure we've got less buggy Unicode handling code available to older modules. I think you meant “available to older perls”? Yes, thanks

Re: Encode UTF-8 optimizations

2016-08-20 Thread Aristotle Pagaltzis
* Karl Williamson [2016-08-21 03:12]: > That should be done anyway to make sure we've got less buggy Unicode > handling code available to older modules. I think you meant “available to older perls”?

Re: Encode UTF-8 optimizations

2016-08-20 Thread Karl Williamson
Top posting. Attached is my alternative patch. It effectively uses a different algorithm to avoid decoding the input into code points, and to copy all spans of valid input at once, instead of character at a time. And it uses only currently available functions. Any of these that are missing

Re: Encode UTF-8 optimizations

2016-08-19 Thread pali
On Thursday 18 August 2016 23:06:27 Karl Williamson wrote: > On 08/12/2016 09:31 AM, p...@cpan.org wrote: > >On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: > >>On 07/09/2016 05:12 PM, p...@cpan.org wrote: > >>>Hi! As we know utf8::encode() does not provide correct UTF-8 encoding > >>>an

Re: Encode UTF-8 optimizations

2016-08-18 Thread Karl Williamson
On 08/12/2016 09:31 AM, p...@cpan.org wrote: On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file s

Re: Encode UTF-8 optimizations

2016-08-12 Thread pali
On Thursday 11 August 2016 17:41:23 Karl Williamson wrote: > On 07/09/2016 05:12 PM, p...@cpan.org wrote: > >Hi! As we know utf8::encode() does not provide correct UTF-8 encoding > >and Encode::encode("UTF-8", ...) should be used instead. Also opening > >file should be done by :encoding(UTF-8) laye

Re: Encode UTF-8 optimizations

2016-08-11 Thread Karl Williamson
On 07/09/2016 05:12 PM, p...@cpan.org wrote: Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8. But UTF-8 strict implementation in Encode module i

Encode UTF-8 optimizations

2016-07-09 Thread pali
Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8. But UTF-8 strict implementation in Encode module is horrible slow when comparing to utf8::encode(