On 11/16/2012 05:12 PM, Nick Kew wrote:
On Fri, 16 Nov 2012 11:31:38 +0100
Thomas Eckert<thomas.eck...@sophos.com>  wrote:

Thanks for the hint but unfortunately "manually" adding xml2enc to the
filtering chain does not help.
Looks like you've got problems over and above anything to do with
your configuration!

      "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
I thought you said it had charset issues?


[pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
trying apr_xlate
That seems implausible.  How do you get a libxml2 install that
doesn't natively support ISO-8859-1 (latin1)?

[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
(and more conversion errors)
It looks as if your backend incorrectly identifies the charset
of the page in question.  Either that or you found a bug.
Do you have a URL where your unprocessed page could be viewed?

Sorry for the delay on this. The basic problem remains: If I enable html rewriting and connect with a client requesting content compression the reverse proxy will fail with a message pointing at libxml2/encoding. I can also see different log entries depending on whether I set the charset of the page.

So if I just send the page with "Content-Type: text/html" this is what I get

mod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated 348 to 682 : URL / mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is text/html mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 682 bytes from bucket mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: converted 682/682 bytes mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed 668 to 344 : URL / mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 10 bytes from bucket [client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 1/1 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: converted 9/8 bytes mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: consuming 344 bytes from bucket [client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 4/4 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 4/3 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] AH01441: xml2enc: converted 1/0 bytes [client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc: reinserting 334 unconsumed bytes from bucket [client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output buffer ((null))


But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is what I get

mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated 348 to 682 : URL / mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headers
mod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed 668 to 344 : URL / mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: consuming 10 bytes from bucket [client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 1/1 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc: converted 9/8 bytes mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: consuming 344 bytes from bucket [client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder failure (rv=-2) mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 4/4 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 4/3 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] AH01441: xml2enc: converted 1/0 bytes [client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input stream! mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc: reinserting 334 unconsumed bytes from bucket

From what I can tell, this still seems to be the "wrong" processing as the page cannot be inflated correctly at the user's end. Nevertheless the message
  AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
does not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268 that makes sense but would imply the enc detection in +198-206 failed. I suggest adding some sort of "failed" debug message in case xmlDetectCharEncoding() didn't work as desired.

I've tried a couple more combinations, including using mod_charset_lite and different non-latin1 encodings on the backend, but the only thing that works is using the Header directive on the backend to set "Content-Type: text/html; charset=UTF-8" while leaving the actual contents unchanged. Here, "works" means the page is displayed correctly at the client's end.

The goal is still to get mod_proxy_html to rewrite the html just like it would to with "ProxyHTMLEnable On" but at the same time retaining compression support. So setting
 SetOutputFilter INFLATE;proxy-html
which "drops out" the "xml2enc" filter might be problematic.

Unfortunately, the page is not accessible publicly. It is rather simply, though, and I made sure there is nothing 'special' on that page - e.g. it's just plain ascii, no meta tags, etc.

Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter INFLATE;proxy-html" as filter directives for all above mentioned setups. Neither worked except with the mentioned forced UTF-8 header.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to