Extracting character-level text coordinates

2020-02-18 Thread Luca Loiodice
Hello, I am having problems extracting precise character-level text coordinates from PDF. I have overridden PDFTextStripper's writeString(String text, List textPositions) to access the text characters information This is a bit of code I use to extract info from the TextPosition fields and pass i

Re: Detect Invisible Text (placed by tools which make searchable PDF)

2019-05-03 Thread Luca Loiodice
gt; https://stackoverflow.com/questions/50044892/pdfbox-invisible-text-from-pdftextstripper-not-clip-path-or-color-issue > > https://stackoverflow.com/questions/50487520/pdfbox-2-0-invisible-text-from-pdftextstripper > > Tilman > > Am 03.05.2019 um 17:02 schrieb Luca Loiodice: > > Hell

Detect Invisible Text (placed by tools which make searchable PDF)

2019-05-03 Thread Luca Loiodice
Hello, I would need to remove (often low quality) invisible text placed on images by tools which use OCR to make searchable PDF. We use pdfbox ourselves to make searchable PDF... and we use setRenderingMode(RenderingMode.NEITHER); when we place the text to make it invisible.We also use pdfbox's t

Re: Extract Skewed Text

2018-11-12 Thread Luca Loiodice
vailable as a snapshot for now. > > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.13-SNAPSHOT/ > > Tilman > > Am 31.10.2018 um 21:19 schrieb Luca Loiodice: > > Is it possible to extract the 2 lines of text from this page?

Re: Extract Skewed Text

2018-10-31 Thread Luca Loiodice
her weaknesses. > > Tilman > > Am 31.10.2018 um 21:19 schrieb Luca Loiodice: > > Is it possible to extract the 2 lines of text from this page? > > https://www.dropbox.com/s/2uh3p464i7iwjwv/textonanangle.pdf?dl=0 > > > > This is the text lines I get using standard PdfStripp

Extract Skewed Text

2018-10-31 Thread Luca Loiodice
Is it possible to extract the 2 lines of text from this page? https://www.dropbox.com/s/2uh3p464i7iwjwv/textonanangle.pdf?dl=0 This is the text lines I get using standard PdfStripper Tex t on e o n a n a ngl e Text two on an angle Thanks a lot, Luca

Re: Cannot Render Pdf

2018-05-10 Thread Luca Loiodice
" (ms)"); return elapsed; On Wed, May 9, 2018 at 4:59 PM, Tilman Hausherr wrote: > I'm able to render your file with PDFDebugger and had -Xmx4g, and it takes > 30 seconds on a fast computer. Did you use that KCMS option? What jdk are > you using? > > Tilman > >

Cannot Render Pdf

2018-05-09 Thread Luca Loiodice
Hello, I have some files that I cannot render using version 2.0.9. Also tried adding following: System.setProperty("org.apache.pdfbox.rendering.UsePureJavaCMYKConversion", "true"); and pdfRenderer.setSubsamplingAllowed(true); This is an example: https://www.dropbox.com/s/4i0gf0895viwk93/my_i

Issue with Pdf form requiring Acrobat Reader 7.0.5

2018-03-19 Thread Luca Loiodice
Hello, I have a form that I can view with Latest Acrobat Reader, but cannot view with Preview or older Acrobat readers. When I render this using PdfBox, I get the same message "Please Wait. If this message is not eventually replaced ... " ... I see in Preview. Is there a way to render these doc

Re: Rendering an image at a lower resolution

2018-02-23 Thread Luca Loiodice
I have the same issue, while trying to just render images in AWS lambda (where there is limited memory available). So if there was a way to make this work, it would be great. On 2018/02/23 07:31:31, Itai wrote: > Hi,> > > I'm trying to use PDFBox to show a preview of some PDFs containing ver

Issue when rendering a pdf page as image

2018-01-18 Thread Luca Loiodice
Hello, I get an exception when I call pdfRenderer.renderImageWithDPI(pageIndex, 300, ImageType.RGB); on the 4th page (pageIndex 3) of the Pdf https://www.dropbox.com/s/ut3ayyblsifsk36/my_inputfile.pdf?dl=0 This happens on a Amazon Linux instance (and not happening on my dev Mac machine) ...

Re: "No Unicode mapping for" when extracting text from a PDF

2018-01-04 Thread Luca Loiodice
the character. In that case I am not sure how I can load the data from the font ... but I see the debugger is able to do it. *Luca Loiodice |* Software Architect *T: *713 231 9100*F: *713 583 1131 *C:* 512 577 6677 4400 Post Oak Parkway, Suite 2700, Houston, TX 77027 Follow Us: Faceb

"No Unicode mapping for" when extracting text from a PDF

2018-01-04 Thread Luca Loiodice
I am trying to migrate a project from a commercial Windows PDF library to PDFBox, but I see reduced accuracy when I extract text from arbitrary files. For example, I have a PDF (enclosed) that does not have Unicode mappings for certain glyph ... and so when I try and extract the text using PDF Box