Alright, just forget I suggested that. If in front of a html character a byte above 127 appears (a character outside of 7 bit ASCII), the control character would get interpreted as part of the same character in utf-8. In other words: It WILL break. The suggestion just sounded too good. Back to the regularly scheduled program...
2009/11/10 Martin Gerdes <marting...@googlemail.com> > A completely different idea to solve my actual problem: > > Someone else suggested to just take out the conversions all together. > I mean, I am converting right back into the encoding I converted from. I > have been assured that no link uses a character above the first 128 (7 bit > ASCII). As far as I know there are no HTML control characters outside of 7 > bit ASCII either. > So shouldn't the parser just be able to parse the ISO-8859-1 document as if > it was utf-8? Yeah, I know it sounds horrible, but as far as I can tell it > should not actually break... > > As author of the module: > Could this work? > What would I have to change in the code to keep any input conversion from > happening? > (I will play around abit myself, but I am not familiar with the code, nor > with Apache module logic. And its been quite a few years since I last coded > C...) > > At the very least this would tell us (if it works) whether or not the > conversions are to blame for the problems I experience. > > Martin > >