Re: Eyebrow-raising memory consumption exporting PDXObjectImages in PDFBox 1.8

2014-05-23 Thread Tilman Hausherr
Hi Tim, I'd recommend to open new issues anyway. Especially the 2nd part. Please mention ONE file that is corrupt, and attach it, and the code you are using. Note that you will need jai_imageio to write tiff images. Tilman Am 23.05.2014 18:18, schrieb Allison, Timothy B.: All, Over on Ti

question on what text is extractable (in comparison to other tools)

2014-05-23 Thread Trey Matteson
I'm trying to extract text from documents like http://tinyurl.com/nljnnyk, having started with code from the ExtractText tool. Unfortunately I find that only a small portion of the text is extracted, which seems to be related to which fonts are used. I've seen the FAQ related to failed text ex

Underlining text

2014-05-23 Thread Ryan Bair
I'm attempting to underline some text. >From what I can see, PDF itself doesn't have too much to offer in the region. InDesign and other applications seem to simply draw a line underneath the word. Doing this is a bit of a hassle, but certainly doable with a little work. What I don't see is how t

Eyebrow-raising memory consumption exporting PDXObjectImages in PDFBox 1.8

2014-05-23 Thread Allison, Timothy B.
All, Over on Tika, we recently added the ability to export PDXObjectImages (TIKA-1268) as we do now with regular attachments. Some users have noticed some eyebrow-raising memory consumption after we made the change with some files. We're currently using PDFBox 1.8.5. This 4MB file shows the

overwrite a overlay

2014-05-23 Thread Skrilay
Hey guys, first of all pdfbox is awesome! But sadly I’m in that strange situation that I have to delete a watermark (sometimes life is bad and you can’t choose the requirements). Here is my code which should do this: public void setFile(File f) throws IOException { Fil