On Wednesday, January 13, 2016 at 3:52:48 PM UTC-5, John Scancella wrote:
>
> I tried searching but couldn't find which versions of PDF/A (if any) 
> tesseract supports. Specifically I have a requirement for PDF/A-2a 
> generation, but I couldn't find anywhere if tesseract can write PDF/A-2a 
> compliant files, and if so how to tell it do so. Any help is greatly 
> appreciated.
>

 PDF/A-2 is a profile of PDF 1.7 and Tesseract currently writes 1.5 
(although changing that is probably the easiest part of the changes 
required).

The metadata that Jeff mentions would probably need to be externally 
provided.  For example things like the document title, author, etc would 
likely need to be provided by the user.

One thing that you might consider is using a tool like Adobe Acrobat Pro to 
conform the output of Tesseract to the necessary standard.  Getting someone 
to update Tess to conform to an ISO standard is going to be difficult since 
they're not freely available and need to be purchased (ISO 19005-2:2011 
<http://www.iso.org/iso/catalogue_detail?csnumber=50655> is 158 Swiss 
Francs).

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cc393074-90a1-42ad-8063-ab35445e0e35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to