Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-07 Thread gregor herrmann
On Mon, 07 Aug 2017 08:48:36 -0700, Gregory Williams wrote: > On Aug 7, 2017, at 8:26 AM, gregor herrmann wrote: > > This looks indeed much better than my crude workarounds, thanks for > > that! > > Do you think you can take this up with upstream? > Yes, I think Kjetil and I

Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-07 Thread Gregory Williams
On Aug 7, 2017, at 8:26 AM, gregor herrmann wrote: > > > This looks indeed much better than my crude workarounds, thanks for > that! > > Do you think you can take this up with upstream? Yes, I think Kjetil and I can work on getting this merged upstream. Thanks, Greg

Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-07 Thread gregor herrmann
Control: tag -1 + patch On Sun, 06 Aug 2017 18:11:34 -0700, Gregory Williams wrote: > I also looked into this and found another possible fix: > > diff -ru HTML-HTML5-Parser-0.301/lib/HTML/HTML5/Parser.pm > HTML-HTML5-Parser-0.301-patched/lib/HTML/HTML5/Parser.pm > ---

Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-06 Thread Vincent Lefevre
On 2017-08-06 18:11:34 -0700, Gregory Williams wrote: > The above patch should handle the LWP case which the previously > suggest patch avoids. It still passes the test suite (which should > probably be improved to verify this case), and also supports the > test case detailed in this bug report

Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-06 Thread Gregory Williams
On Sat, 5 Aug 2017 12:16:04 -0400 gregor herrmann wrote: > What helps is: > - replace in lib/HTML/HTML5/Parser.pm > $response->{decoded_content} with $response->{content} > which feels a bit dangerous > - or in lib/HTML/HTML5/Parser/UA.pm's get: > move the > if ($uri =~

Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

2017-08-05 Thread gregor herrmann
On Wed, 22 Oct 2014 14:13:17 +0200, Vincent Lefevre wrote: > Control: retitle -1 libhtml-html5-parser-perl: UTF-8 character breaks > parse_file > > As a consequence of this bug, html2xhtml doesn't work at all when > applied on a file. No problems when the HTML document is provided > in the