Alright, just forget I suggested that. If in front of a html character a
byte above 127 appears (a character outside of 7 bit ASCII), the control
character would get interpreted as part of the same character in utf-8. In
other words: It WILL break.
The suggestion just sounded too good. Back to the regularly scheduled
program...

2009/11/10 Martin Gerdes <marting...@googlemail.com>

> A completely different idea to solve my actual problem:
>
> Someone else suggested to just take out the conversions all together.
> I mean, I am converting right back into the encoding I converted from. I
> have been assured that no link uses a character above the first 128 (7 bit
> ASCII). As far as I know there are no HTML control characters outside of 7
> bit ASCII either.
> So shouldn't the parser just be able to parse the ISO-8859-1 document as if
> it was utf-8? Yeah, I know it sounds horrible, but as far as I can tell it
> should not actually break...
>
> As author of the module:
> Could this work?
> What would I have to change in the code to keep any input conversion from
> happening?
> (I will play around abit myself, but I am not familiar with the code, nor
> with Apache module logic. And its been quite a few years since I last coded
> C...)
>
> At the very least this would tell us (if it works) whether or not the
> conversions are to blame for the problems I experience.
>
> Martin
>
>

Reply via email to