Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++++++
Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1 Identifying Ideal Results Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 2 Starting a Search Application A Lucid Imagination White Paper +++++++ But if I open the pdf file I have no problem to see the content correctly. I think this is a question of the charset encoding, but I don't know if I can avoid this behaviour with a different analyzer o tokenizer to be applied in indexing time, may be. I've got this problem with some documents downloaded from Lucid's Web. I don't know if some have had the same problem and know how to solve this. Thanks Best regards