On Wednesday, January 13, 2016 at 3:52:48 PM UTC-5, John Scancella wrote: > > I tried searching but couldn't find which versions of PDF/A (if any) > tesseract supports. Specifically I have a requirement for PDF/A-2a > generation, but I couldn't find anywhere if tesseract can write PDF/A-2a > compliant files, and if so how to tell it do so. Any help is greatly > appreciated. >
PDF/A-2 is a profile of PDF 1.7 and Tesseract currently writes 1.5 (although changing that is probably the easiest part of the changes required). The metadata that Jeff mentions would probably need to be externally provided. For example things like the document title, author, etc would likely need to be provided by the user. One thing that you might consider is using a tool like Adobe Acrobat Pro to conform the output of Tesseract to the necessary standard. Getting someone to update Tess to conform to an ISO standard is going to be difficult since they're not freely available and need to be purchased (ISO 19005-2:2011 <http://www.iso.org/iso/catalogue_detail?csnumber=50655> is 158 Swiss Francs). Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cc393074-90a1-42ad-8063-ab35445e0e35%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

