[ https://issues.apache.org/jira/browse/PDFBOX-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713746#comment-17713746 ]
Moritz Flöter edited comment on PDFBOX-5584 at 4/19/23 6:00 PM: ---------------------------------------------------------------- I do get that point. Depending on how deep the plugins are allowed to integrate into the Debugger, that could complicate future development. A lightweight approach that doesn't add much hindrance could be to only allow the following: * Pass the PDF as InputStream to the plugin, allow for an OutputStream to come back ** Plugins that do not take an OutputStream as input and only work on the InputStream cannot modify the document and are considered analysis plugins (e.g. validation through pdfcpu, ghostscript, perhaps also stuff like image extraction, rendering to images etc.) _performAnalysis(InputStream input)_ ** Plugins that write their result (the modified PDF file) to an OutputStream do modify the document and are considered editing plugins (e.g., removing all text, removing all images, removing pages, moving printboxes) _performPdfModification(InputStream input, OutputStream output)_ ** After the execution of an editing plugin, PDFDebugger opens the document received through the OutputStream and loads it into a PDDocument that is displayed in the GUI * Plugins are responsible for creating their own dialogs if parameter input is needed and are responsible for displaying analysis results * Plugins register as menu entry (similar to the screenshot) * As long as no editing plugin has been used after opening the file, the input stream passed to an analysis plugin should always be that of the original file. If an editing plugin has been used, the PDDocument gets serialized and passed to the plugin through the InputStream. Adding stuff like context menu entries in the tree view depending on the selected object would certainly be nice in terms of functionality, but they would indeed introduce overhead for future development. was (Author: moritzf): I do get that point. Depending on how deep the plugins are allowed to integrate into the Debugger, that could complicate future development. A lightweight approach that doesn't add much hindrance could be to only allow the following: * Pass the PDF as InputStream to the plugin, allow for an OutputStream to come back ** Plugins that do not return an OutputStream cannot modify the document and are considered analysis plugins (e.g. validation through pdfcpu, ghostscript, perhaps also stuff like image extraction, rendering to images etc.) ** Plugins that return an OutputStream do modify the document and are considered editing plugins (e.g., removing all text, removing all images, removing pages, moving printboxes) ** After the execution of an editing plugin, PDFDebugger opens the OutputStream and loads it into a PDDocument that is displayed in the GUI * Plugins are responsible for creating their own dialogs if parameter input is needed * Plugins register as menu entry (similar to the screenshot) * As long as no editing plugin has been used after opening the file, the input stream passed to an analysis plugin should always be that of the original file. If an editing plugin has been used, the PDDocument gets serialized and passed to the plugin through the InputStream. Adding stuff like context menu entries in the tree view depending on the selected object would certainly be nice in terms of functionality, but they would indeed introduce overhead for future development. > Plugins for PDFDebugger > ----------------------- > > Key: PDFBOX-5584 > URL: https://issues.apache.org/jira/browse/PDFBOX-5584 > Project: PDFBox > Issue Type: New Feature > Components: Utilities > Reporter: Moritz Flöter > Priority: Minor > Attachments: 2023-04-12_09-00-01_explorer_bmps4tqOTT.png > > > The PDFBox Debugger is a great tool for analyzing PDF documents due to its > functionality and licence. > However, it is constrained to what PDFBox itself can do. We extended the > Debugger to accomplish some of the more frequent tasks needed for processing > service tickets for our own software product. > !2023-04-12_09-00-01_explorer_bmps4tqOTT.png! > Some of the extended functionality relies on our proprietary PDF processing > (this is completely separate from PDFBox) but other features rely on > Implementations around PDFBOX functionality (such as drawing PrintBoxes or > moving them, removing document security attributes for subsequent analysis in > other tools, removing all text to get rid of sensitive data etc.). > There is also functionality that relies on Java-Libraries like VeraPDF, > OpenPDF or even calls to external command line tools like ghostscript and > pdfcpu (the latter with bundled binaries, the former without because of GPL). > We would very much like to publish and contribute Plugins for the Debugger > but as of now, everything is based on a direct extension (even using some > Reflection) of the PDFDebugger class and thus can not be made available in > source for public (as it also relies on our proprietary PDF code). > Furthermore, dependencies to external software or third party PDF libraries > really should not be directly integrated in the main PDFBox repositories and > therefore I wouldn't know how to contribute back to the PDFBox project. > I do however not see any harm in having such dependencies for plugins that > are provided in different repositories and possibly developed and maintained > by other developers. So I do see a benefit in having a Plugin-Interface in > PDFDebugger. > Most likely the group of people making and using such plugins is rather small > but I still wanted to run the idea by you in case you are interested. > I am also willing to work on this feature if I am provided with some input as > to what you expect. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org