27;Lucene Users List'
Subject: RE: Problems indexing Japanese with CJKAnalyzer ... Or French with
UTF-8 and MetaData
I don't think I understand correctly your proposal.
As a basis, I am using Demo3 with indexHTML, HTMLDocument and HTMLParser.
Inside HTML parser, I am calling getMetaTags (ca
4 15:12
À : Lucene Users List
Objet : Re: Problems indexing Japanese with CJKAnalyzer
If its a web application, you have to cal request.setEncoding("UTF-8")
before reading any parameters. Also make sure html page encoding is
specified as "UTF-8" in the metatag. most web app server
If u call above
method, I think it will solve ur problem.
Praveen
- Original Message -
From: "Bruno Tirel" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Thursday, July 15, 2004 6:15 AM
Subject: RE: Problems indexin
Any help available?
Best regards,
Bruno
-Message d'origine-
De : Jon Schuster [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 14 juillet 2004 22:51
À : 'Lucene Users List'
Objet : RE: Problems indexing Japanese with CJKAnalyzer
Hi all,
Thanks for the help on indexing Japanes
Hi all,
Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.
The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of
Jon,
Java expects your files to be in the encoding of the Native Locale. In most cases in
the U.S., this will be English. If you want to read files in that are in a different
encoding, you have to tell Java what your encoding is, in this case, Shift JIS. See
the javadocs for java.io.InputStr
Hi Jon,
It sounds to me like you have a character encoding problem. The
native2ascii tool is designed to produce input for the Java compiler;
the "\u7aef" notation you're seeing is understood by Java string
interpreters to mean the corresponding hexadecimal Unicode code point.
Other Java progr
Hi,
I've gone through all of the past messages regarding the CJKAnalyzer but I
still must be doing something wrong because my searches don't work.
I'm using the IndexHTML application from the org.apache.lucene.demo package
to do the indexing, and I've changed the analyzer to use the CJKAnalyzer.