Careful that in the http world, there's an amibuity:
x-www-form-url-encoded does not specify the content-encoding that the
byts represented in the %-escaped sequences are written with.
That's fixed by the very recent URI spec where absence means utf-8...
My experience was that Tomcat simply con
Did you check that the request string you get at the analyzer
level is corectly encoded as UTF-8?
We had the same problem with french accentuated char encoded
also as UTF-8, and transmited by tomcat as ISO-8859-1. It was
just for a test, also we didn't investgated a lot, but
re-encode in URL/ISO-8
I'm trying to index and search html and jsp files that are saved using utf-8
encoding. The pages are indexed on the file system using the
StandardAnalyzer. The files can contain a mix of english, chinese,
japanese, etc. saved as utf-8. Searches using english terms are successful
but none of the