RE: Problems indexing Japanese with CJKAnalyzer ... Or French wit h UTF-8 and MetaData

2004-07-16 Thread Jon Schuster
27;Lucene Users List' Subject: RE: Problems indexing Japanese with CJKAnalyzer ... Or French with UTF-8 and MetaData I don't think I understand correctly your proposal. As a basis, I am using Demo3 with indexHTML, HTMLDocument and HTMLParser. Inside HTML parser, I am calling getMetaTags (ca

RE: Problems indexing Japanese with CJKAnalyzer ... Or French with UTF-8 and MetaData

2004-07-15 Thread Bruno Tirel
4 15:12 À : Lucene Users List Objet : Re: Problems indexing Japanese with CJKAnalyzer If its a web application, you have to cal request.setEncoding("UTF-8") before reading any parameters. Also make sure html page encoding is specified as "UTF-8" in the metatag. most web app server

Re: Problems indexing Japanese with CJKAnalyzer

2004-07-15 Thread Praveen Peddi
If u call above method, I think it will solve ur problem. Praveen - Original Message - From: "Bruno Tirel" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Thursday, July 15, 2004 6:15 AM Subject: RE: Problems indexin

RE: Problems indexing Japanese with CJKAnalyzer

2004-07-15 Thread Bruno Tirel
Any help available? Best regards, Bruno -Message d'origine- De : Jon Schuster [mailto:[EMAIL PROTECTED] Envoyé : mercredi 14 juillet 2004 22:51 À : 'Lucene Users List' Objet : RE: Problems indexing Japanese with CJKAnalyzer Hi all, Thanks for the help on indexing Japanes

RE: Problems indexing Japanese with CJKAnalyzer

2004-07-14 Thread Jon Schuster
Hi all, Thanks for the help on indexing Japanese documents. I eventually got things working, and here's an update so that other folks might have an easier time in similar situations. The problem I had was indeed with the encoding, but it was more than just the encoding on the initial creation of

Re: Problems indexing Japanese with CJKAnalyzer

2004-07-06 Thread Grant Ingersoll
Jon, Java expects your files to be in the encoding of the Native Locale. In most cases in the U.S., this will be English. If you want to read files in that are in a different encoding, you have to tell Java what your encoding is, in this case, Shift JIS. See the javadocs for java.io.InputStr

Re: Problems indexing Japanese with CJKAnalyzer

2004-07-06 Thread Steven Rowe
Hi Jon, It sounds to me like you have a character encoding problem. The native2ascii tool is designed to produce input for the Java compiler; the "\u7aef" notation you're seeing is understood by Java string interpreters to mean the corresponding hexadecimal Unicode code point. Other Java progr

Problems indexing Japanese with CJKAnalyzer

2004-07-02 Thread Jon Schuster
Hi, I've gone through all of the past messages regarding the CJKAnalyzer but I still must be doing something wrong because my searches don't work. I'm using the IndexHTML application from the org.apache.lucene.demo package to do the indexing, and I've changed the analyzer to use the CJKAnalyzer.