I've been trying to use PoDoFo to extract accessible text from PDFs.
I have some PDF documents tagged for accessibility which show as tagged in
Adobe Reader properties (Tagged: Yes), but PoDoFo::PdfDocument::
GetStructTreeRoot returns null. My (limited) understanding of ISO 32000 is
that the ta
According to the document catalog, the StructTreeRoot is in object 74 0 R
which is missing in this PDF.
Maybe acrobat reader just checks if there is a StructTreeRoot entry in the
catalog to display that a document is tagged. But from my understanding there
is no StructTreeRoot dictionary in thi
There seems to be some sort of tagged text in there:
- the Read Out Loud feature of Adobe Reader does a good job of reading out the
document and synchronising the reading to highlighted text on the document
- the online PDF to HTML converter at Adobe gets all the document structure
right (includ
2009/6/23, Dominik Seichter :
> According to the document catalog, the StructTreeRoot is in object 74 0 R
> which is missing in this PDF.
Uncompressing the file with pdftk unearths a StructTreeRoot object.
Maybe it's there but not in the reference table?
Best
Martin
-