AW: PDF contains any text?

2010-02-04 Thread Roeder, Andreas
Erik, I wrote the following code: public boolean containsText() throws IOException { PDDocument document = null; try { document = PDDocument.load(pdfFile); PDFTextStripperByArea stripper = new PDFTextStripperByAr

Re: Question mark in the extracted text

2010-02-04 Thread Adam
Iain, If you send in the patches they'll be used. I've seen quite a few performance enhancement patches which will be release in version 1.0.0, which is being wrapped up now and should be released shortly. There's about a half dozen items where the patches need to be applied and tested and I

Re: Question mark in the extracted text

2010-02-04 Thread Iain Clapham
I get this a lot with "obscure" fonts - I would love to improve the font handling but worry that the project is not well controlled and any effort in this direction would be wasted. Who is producing 1.0.0 and WHEN ??? iaincc Villu Ruusmann wrote: Hello there, I'm using the text

Re: Question mark in the extracted text

2010-02-04 Thread Villu Ruusmann
Hello there, > > I'm using the text extraction of the Apache PDFBox 0.8.0 library. > Unfortunately, the text extraction is replacing some signs and letters by > '?'. > Without having seen the PDF file, I guess that the problem is that the "faulty" characters depend on a font which is not properly

Question mark in the extracted text

2010-02-04 Thread Christian Mewes
Hi, I'm using the text extraction of the Apache PDFBox 0.8.0 library. Unfortunately, the text extraction is replacing some signs and letters by '?'. The PDF-File contains German language. I have extracted the text with the ExtractText.java example from the PDFBox package. Here is an exampl

Re: Conversion to display units

2010-02-04 Thread Leandro de Oliveira
I'm doing as you said, first I find rectangular areas converting coordinates to display units then I get the text from them. Thank you --- Em qui, 4/2/10, Villu Ruusmann escreveu: > De: Villu Ruusmann > Assunto: Re: Conversion to display units > Para: users@pdfbox.apache.org > Data: Quinta-fe

Re: Conversion to display units

2010-02-04 Thread Villu Ruusmann
Hello there, > > I'm using PDFTextStripper to get text from a PDF document but I need to get > text only from some regions in the PDF. I know these regions are being drawn > using the "re" operator which draws a rectangle using x,y,width,height as > arguments. How do I convert these four argume

How to parse and manage Fonts

2010-02-04 Thread Leleu Eric
Hi! I would like to check if an embedded font is damaged and I want to access to the font data (ex : glyph width..). TrueType font aren't a problem (java.awt.Font is used to know if the Font is damaged and the TrueTypeFont of fontbox is used to access font data) I'm able to check if Type1 fonts