Sending again, since I don't think this email made it to the libxml mailing list since I was not subscribed.
---------- Forwarded message ---------- From: Joel Hockey <joelhoc...@chromium.org> Date: Wed, Jan 3, 2018 at 5:01 PM Subject: Re: [xml] Patch to fix ICU flush and pivot buffer To: "Jungshik Shin (신정식, 申政湜)" <js...@chromium.org> Cc: Nick Wellnhofer <wellnho...@aevum.de>, Markus Scherer < msche...@google.com>, "xml@gnome.org" <xml@gnome.org>, Markus Scherer < markus....@gmail.com> Nick, I have another patch for some additional call sites where flush is being incorrectly set on the non-final read. This was found by the chromium fuzzing tests. https://bugs.chromium.org/p/chromium/issues/detail?id=790944 I have included a test case for this which uses UTF8 and only works with icu. I saw that you were able to create a testcase with EUC-JP last time which worked with icu and iconv. I've tried quite a bit to do something similar, but I can't replicate the error condition with that encoding. I don't expect that you would want to check in this testcase, but I've included for you to run locally if you like. On Thu, Nov 9, 2017 at 11:36 AM, Joel Hockey <joelhoc...@chromium.org> wrote: > Yes, I will update chromium with this as per https://cs.chromium.org/ch > romium/src/third_party/libxml/chromium/roll.py > > On Thu, Nov 9, 2017 at 10:35 AM, Jungshik Shin (신정식, 申政湜) < > js...@chromium.org> wrote: > >> Thank you, Joel and Nick ! >> >> Joel: I guess you're gonna roll libxml in the Chromium tree to a version >> including these changes. >> >> Jungshik >> >> 2017-11-08 15:22 GMT-08:00 Joel Hockey <joelhoc...@chromium.org>: >> >>> Thanks Nick. Nice work with the test. >>> >>> >>> >>> On Sun, Nov 5, 2017 at 2:04 AM, Nick Wellnhofer <wellnho...@aevum.de> >>> wrote: >>> >>>> On 26/10/2017 03:17, Joel Hockey wrote: >>>> >>>>> I've updated the patch using git format-patch. >>>>> >>>> >>>> Thanks for the updated patch. Applied here: >>>> https://git.gnome.org/browse/libxml2/commit/?id=0b19f236a263 >>>> a7b0acacd4ea84dc7237303ee3d9 >>>> >>>> The original bug found by fuzzer only relates to UTF8 decoding, so >>>>> using Shift-JIS or anything else wont help. >>>>> >>>> >>>> Why not? My reasoning was that ICU uses the same code path for all >>>> variable-width encodings. I simply converted your test file to EUC-JP and >>>> it turns out that this triggers the bug as well: >>>> >>>> https://git.gnome.org/browse/libxml2/commit/?id=72182550926d >>>> 31ad17357bd3ed69e49d7e69df02 >>>> >>>> Nick >>>> >>> >>> >> >
From 441e1e413a8f67c0813fa0e04b19dfea76e5ece6 Mon Sep 17 00:00:00 2001 From: Joel Hockey <joel.hoc...@gmail.com> Date: Tue, 2 Jan 2018 21:47:35 -0800 Subject: [PATCH] Change calls to xmlCharEncInput to set flush false when not final call. Having flush incorrectly set to true causes errors for ICU. --- HTMLparser.c | 2 +- parserInternals.c | 2 +- xmlIO.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/HTMLparser.c b/HTMLparser.c index 7e243e60..9adeb174 100644 --- a/HTMLparser.c +++ b/HTMLparser.c @@ -3635,7 +3635,7 @@ htmlCheckEncodingDirect(htmlParserCtxtPtr ctxt, const xmlChar *encoding) { */ processed = ctxt->input->cur - ctxt->input->base; xmlBufShrink(ctxt->input->buf->buffer, processed); - nbchars = xmlCharEncInput(ctxt->input->buf, 1); + nbchars = xmlCharEncInput(ctxt->input->buf, 0); if (nbchars < 0) { htmlParseErr(ctxt, XML_ERR_INVALID_ENCODING, "htmlCheckEncoding: encoder error\n", diff --git a/parserInternals.c b/parserInternals.c index 09876ab4..8c0cd57a 100644 --- a/parserInternals.c +++ b/parserInternals.c @@ -1214,7 +1214,7 @@ xmlSwitchInputEncodingInt(xmlParserCtxtPtr ctxt, xmlParserInputPtr input, /* * convert as much as possible of the buffer */ - nbchars = xmlCharEncInput(input->buf, 1); + nbchars = xmlCharEncInput(input->buf, 0); } else { /* * convert just enough to get diff --git a/xmlIO.c b/xmlIO.c index f61dd05a..82543477 100644 --- a/xmlIO.c +++ b/xmlIO.c @@ -3157,7 +3157,7 @@ xmlParserInputBufferPush(xmlParserInputBufferPtr in, * convert as much as possible to the parser reading buffer. */ use = xmlBufUse(in->raw); - nbchars = xmlCharEncInput(in, 1); + nbchars = xmlCharEncInput(in, 0); if (nbchars < 0) { xmlIOErr(XML_IO_ENCODER, NULL); in->error = XML_IO_ENCODER; @@ -3273,7 +3273,7 @@ xmlParserInputBufferGrow(xmlParserInputBufferPtr in, int len) { * convert as much as possible to the parser reading buffer. */ use = xmlBufUse(in->raw); - nbchars = xmlCharEncInput(in, 1); + nbchars = xmlCharEncInput(in, 0); if (nbchars < 0) { xmlIOErr(XML_IO_ENCODER, NULL); in->error = XML_IO_ENCODER; -- 2.15.1.620.gb9897f4670-goog
<?xml version="1.0" encoding="UTF8-"?> <foo> Text with UTF8 chars at position 214 (0xd6) and 513 (0x201) ______ _______________ _______________ _______________ _______________ _______________ _______________ ____£_____ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _______________ _£________ </foo>
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml