Python does not do Unicode strings natively, you have to do them explicitly. It is possible that your python receiver is not doing the right thing with the incoming strings. Also, Jetty has problems with UTF-8; the Wiki has more on this.
Lance -----Original Message----- From: Maximilian Hütter [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 02, 2007 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Searching combined English-Japanese index Yonik Seeley schrieb: > On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: >> Yonik Seeley schrieb: >>> On 10/1/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote: >>>> When I search using an English term, I get results but the Japanese >>>> is not encoded correctly in the response. (although it is UTF-8 >>>> encoded) >>> One quick thing to try is the python writer (wt=python) to see the >>> actual unicode values of what you are getting back (since the python >>> writer automatically escapes non-ascii). That can help rule out >>> incorrect charset handling by clients. >>> >>> -Yonik >>> >> Thanks for the tip, it turns out that the unicode values are wrong... >> I mean the browser displays correctly what is send. But I don't know >> how solr gets these values. > > OK, so they never got into the index correctly. > The most likely explanation is that the charset wasn't set correctly > when the update message was sent to Solr. > > -Yonik > Are you sure, they are wrong in the index? When I use the Lucene Index Monitor (http://limo.sourceforge.net/) to look at the document in the index the Japanese is displayed correctly. I am using Jetty 6.0.1 by the way. Best regards, Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel : (+49) 0711 - 45 10 17 578 Fax : (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich