Iain,

If you send in the patches they'll be used.  I've seen quite a few 
performance enhancement patches which will be release in version 1.0.0, 
which is being wrapped up now and should be released shortly.  There's 
about a half dozen items where the patches need to be applied and tested 
and I believe the only thing left to do is switch from ant to maven.  I'm 
not the one who's personally doing this and I don't have an official ETA, 
but I'd say it'll likely be ready in the next two weeks (give or take).

Feel free to hop on the developer list and let them know that you'd like 
to help improve font handling and they can point you to the classes you'll 
want to update.  That will probably save you some time and it'll make sure 
everyone is on the same page in terms of how the changes will be 
implemented.

--Adam



From:
Iain Clapham <iain.clap...@googlemail.com>
To:
users@pdfbox.apache.org
Cc:
Villu Ruusmann <villu.ruusm...@gmail.com>
Date:
02/04/2010 15:37
Subject:
Re: Question mark in the extracted text



I get this a lot with "obscure" fonts - I would love to improve the font 
handling
but worry that the project is not well controlled and any effort in this 
direction
would be wasted.

Who is producing 1.0.0 and WHEN ???????????

iaincc

Villu Ruusmann wrote:
> Hello there,
>
> 
>> I'm using the text extraction of the Apache PDFBox 0.8.0 library.
>> Unfortunately, the text extraction is replacing some signs and letters 
by
>> '?'.
>>
>> 
> Without having seen the PDF file, I guess that the problem is that the
> "faulty" characters depend on a font which is not properly supported
> by PDFBox 0.8.0 (the translation rules from bytes to character codes
> could be embedded into the font program; PDFBox does not know yet how
> to parse/interpret all types of font programs, so it bails out with a
> "?" instead).
>
> Hopefully the upcoming PDFBox 1.0.0 release is a bit more savvy in this 
regard.
>
>
> VR
>
> 



?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage 
Company, Inc.  is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or the taking of any action in reliance on 
the contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call  (800) 453 7884.   

Reply via email to