Re: Lucene Unicode Usage

2005-02-11 Thread Owen Densmore
Bingo! I used the InputStreamReader and that fixed the index. Boy, tough to catch all the holes through which unicode leaks occur! Owen From: aurora <[EMAIL PROTECTED]> Date: February 9, 2005 11:04:35 PM MST To: lucene-user@jakarta.apache.org Subject: Re: Lucene Unicode Usage So you got

Re: Lucene Unicode Usage

2005-02-10 Thread Andrzej Bialecki
Owen Densmore wrote: I'm building an index from a FileMaker database by dumping the data to a tab-separated file. Because the FileMaker output is encoded in MacRoman, and uses Mac line separators, I run a script across the tab file to clean it up: tr '\r\v' '\n ' | iconv -f MAC -t UTF-8 Thi

Re: Lucene Unicode Usage

2005-02-09 Thread aurora
So you got a utf8 encoded text file. But how do you read the file into Java? The default encoding of Java is likely to be something other than utf8. Make sure you specify the encoding like: InputStreamReader( new FileInputStream(filename), "UTF-8"); On Wed, 9 Feb 2005 22:32:38 -0700, Owen De

Lucene Unicode Usage

2005-02-09 Thread Owen Densmore
I'm building an index from a FileMaker database by dumping the data to a tab-separated file. Because the FileMaker output is encoded in MacRoman, and uses Mac line separators, I run a script across the tab file to clean it up: tr '\r\v' '\n ' | iconv -f MAC -t UTF-8 This basically converts the