Hi Erik,
Erik Hatcher wrote:
Can you give me a test document that causes an issue? (maybe send me a
Solr XML document in private e-mail). I'll see what I can do once I
can see the issue first hand.
Thank you! Just try the utf8-example.xml file in the exampledoc
directory. After having indexed the document, the output of the script
test_utf8.sh suggests to me that everything works correctly:
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST does not default to UTF-8
HTTP GET is accepting UTF-8 beyond the basic multilingual plane
HTTP POST is accepting UTF-8 beyond the basic multilingual plane
HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual
If I'm using the standard QueryResponseWriter and the query q=umlauts,
the responding xml page contains properly printed non-ASCII characters.
The same query against the VelocityResponseWriter returns a lot of
Unicode replacement characters (u+FFFD) instead.
-Sascha
On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
Hi,
I've played around with Solr's VelocityResponseWriter (which is indeed
a very useful feature for rapid prototyping). I've realized that
Velocity uses ISO-8859-1 as default character encoding. I've changed
this setting to UTF-8 in my velocity.properties file (inside the conf
directory), i.e.,
input.encoding=UTF-8
output.encoding=UTF-8
and checked that the settings were successfully loaded.
Within the main Velocity template, browse.vm, the character encoding
is set to UTF-8 as well, i.e.,
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
After starting Solr (which is deployed in a Tomcat 6 server on a
Ubuntu machine), I ran into some character encoding problems.
Due to the change of input.encoding to UTF-8, no problems occur when
non-ASCII characters are presend in the query string, e.g. german
umlauts. But unfortunately, something is wrong with the encoding of
characters in the html page that is generated by
VelocityResponseWriter. The non-ASCII characters aren't displayed
properly (for example, FF prints a black diamond with a white question
mark). If I manually set the encoding to ISO-8859-1, the non-ASCII
characters are displayed correctly. Does anybody have a clue?
Thanks in advance,
Sascha