Re: [SURVEY] PDFBox Uses Cases

Hannes Erven Mon, 06 Jan 2014 10:19:35 -0800

Hi,


I'm using PDFbox in my client's project to:

- set crop/media boxes to automatically crop whitespace and/or unwantedcontent(the actual cut points are calculated with ghostscript bbox and textextraction from "suspect" unwanted areas)


- extract individual pages from foreign documents

- add overlays to existing documents (like a stamp "COPY" on an invoicePDF, highlighting a particular area on a page, or "underlay" a page withanother document [eg. 'business paper'])

- extract text from foreign documents (or parts of such documents) forfull-text-search


- "convert" images to PDF documents (in that case, one image per page)

What I would like to do is to "optimize" a document in a way thatremoves everything that is not related to the currently "visible"(possibly cropped) area of the document, including metadata. I onceasked about metadata removal on the mailing list (seehttp://mail-archives.apache.org/mod_mbox/pdfbox-dev/201307.mbox/%[email protected]%3E) but since that is still "only" a nice-to-have for my project, I haveyet to look further into how to "write back the [modified] PDmetadatastream" (and then supply a patch ;-] ) .

Anyways, for me PDFbox has always been a very valuable tool. This surveyis a perfect occasion to say THANK YOU to the busy community!



Best regards,

        -hannes erven

Re: [SURVEY] PDFBox Uses Cases

Reply via email to