Re: encoding problem when retrieving document field value

2014-03-04 Thread G.Long
Hi :) I found the source of the problem. It is indeed the input string. It comes from a csv export from a relational database. The inputStream of this csv file was encoded with the wrong charset (ISO8859-1 instead of CP1252). So the right single quote was returned as this character

RE: encoding problem when retrieving document field value

2014-03-03 Thread Uwe Schindler
Hi G. Long, Most likely, the problem is in your application. Lucene does not change the value stored in the index. For stored fields, Lucene does not deal with entities, it's just binary data to Lucene. From your application perspective, it is String in - String out. I think maybe you strip

Re: encoding problem when retrieving document field value

2014-03-03 Thread G.Long
Hi :) I've got this result directly from tncTitle in the following code: field = doc.getFieldable(IndexConstants.FIELD_TNC_TITLE); if (field != null) { tncTitle = field.stringValue(); } ps: in my previous email, the copy/paste of the apostrophe html number made it appear correctly

Re: encoding problem when retrieving document field value

2014-03-03 Thread Jack Krupansky
What is the hex value for that second character returned that appears to display as an apostrophe? Hex 92 (decimal 146) is listed as Private Use 2, so who knows what it might display as. All that is important is the binary/hax value. Out of curiosity, how did your application come about

Re: encoding problem when retrieving document field value

2014-03-03 Thread Trejkaz
On Tue, Mar 4, 2014 at 4:44 AM, Jack Krupansky j...@basetechnology.com wrote: What is the hex value for that second character returned that appears to display as an apostrophe? Hex 92 (decimal 146) is listed as Private Use 2, so who knows what it might display as. Well, if they're dealing