Re: Indexing PDF

Robert Muir Tue, 04 Oct 2011 12:27:51 -0700

Your persian pdf problem is different, and already taken care of in pdfbox trunk


https://issues.apache.org/jira/browse/PDFBOX-1127

On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo <ahmad.aji...@gmail.com> wrote:
> I have this problem too, in indexing some of persian pdf files.
>
> 2011/10/4 Héctor Trujillo <hecto...@gmail.com>
>
>> Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But
>> with
>> some files I’ve got problems because they stored estrange characters. I got
>> stored this content:
>> +++++++
>>
>> Starting a Search Application
>>
>> 
>> Abstract
>>
>> Starting
>> a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i
>>
>> 
>> Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
>> Page ii Do You Need Full-text Search?
>>
>> ∞
>>
>> ∞
>> ∞
>>
>> Starting
>> a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
>>
>> Identifying
>> Ideal Results
>>
>> Starting
>> a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 2
>>
>> Starting
>> a Search Application A Lucid Imagination White Paper
>>
>>
>> +++++++
>>
>> But if I open the pdf file I have no problem to see the content correctly.
>>
>> I think this is a question of the charset encoding, but I don't know if I
>> can avoid this behaviour with a different analyzer o tokenizer to be
>> applied
>> in indexing time, may be.
>>
>> I've got this problem with some documents downloaded from Lucid's Web.
>>
>>
>>
>> I don't know if some have had the same problem and know how to solve this.
>>
>> Thanks
>>
>> Best regards
>>
>



-- 
lucidimagination.com

Re: Indexing PDF

Reply via email to