I would think supporting the following PDF 2.0 features are highly relevant, 
given that other implementations are already generating PDF 2.0 files today 
(see https://pdfa.org/supporting-pdf20/):

* UTF-8 support for user-visible strings, such as bookmarks (outlines), OCG 
layer names, certain annot fields, etc. Missing support results in the ugly 
display or extraction of "mojibake" for the UTF-8 BoMs. See 
https://pdfa.org/understanding-utf-8-in-pdf-2-0. Note that this does NOT impact 
content streams or text extraction (unless you also combine with Logical 
Structure)!

* the latest encryption (AES-GCM 256 bit), dig-sig, hash algorithms, and 
Unicode password support which is FAR more up-to-date and secure than all 
legacy crypto. 3rd parties are already generating such files which will 
otherwise be unreadable/unprocessable by PDFBox. I don't know what crypto 
library PDFBox uses, but all the new algorithms are standard modern crypto 
algorithms. Unicode passwords also follow the latest ICU (not knowing what you 
use internally) but be careful so legacy files continue to work. 

* if high-quality semantic content extraction is important, then updating for 
PDF 2.0 standard structure types, although this is likely to be a larger dev 
effort. 


If you want to address rendering issues, then:

* PLEASE PLEASE PLEASE fix your incorrect rendering of "fill and stroke" in the 
presence of transparency! See 
https://github.com/pdf-association/pdf-differences/blob/main/Atomic-Fill%2BStroke/README.md

* making sure you using the correct blend mode formulae for ColorBurn and 
ColorDodge (from 2009 - I have not checked your implementation): 
https://github.com/pdf-association/pdf-differences/tree/main/ColorBurn-ColorDodge

* ensuring negative dash phase is correct (previously unstated as what to do) - 
see 
https://github.com/pdf-association/pdf-differences/tree/main/Negative-DashPhase

* page-based OutputIntents. Already in use, especially in print-centric 
workflows where page merging and imposition across PDFs is now far easier to 
do. Implementation update to select a page-based OutputIntent ahead of the 
document-level OutputIntent should be relatively easy.

* the use of transparency and blend mode of an annotations appearance stream 
when rendered onto a page (not _within_ the annot). 


Obviously there are other PDF 2.0 features but these would be my go-to short 
list for starting to address the most obvious visible differences. 
See also https://pdfa.org/how-to-get-started-with-pdf-2-0/ since reporting a 
simple PDF version is unlikely to withstand the test of time... 


Of course I am also biased 😊 - and I'm not a Java expert!
  

> -----Original Message-----
> From: Tilman Hausherr <thaush...@t-online.de>
> Sent: Thursday, November 9, 2023 3:35 AM
> To: users@pdfbox.apache.org
> Subject: Re: PDF 2.0, PDF/A-4 support
> 
> We don't have roadmaps. If you need a PDF 2.0 feature, tell us which one
> and why. PDF/A-4 isn't a topic because preflight isn't developed
> further. Use VeraPDF instead. You can create PDF/A-4 files like you can
> create PDF/A-1b files.
> 
> Tilman
> 
> On 08.11.2023 00:15, Gili Tzabari wrote:
> > Hi,
> >
> > I noticed that PDFBox 3.0 was recently released, but I can't tell what
> > the status/roadmap is for PDF 2.0 and PDF/A-4 support.
> >
> > Can someone in the know please let me know where we stand?
> >
> > Thanks,
> > Gili
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to