[Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-23 Thread Mark Rogers
I've been trying to use PoDoFo to extract accessible text from PDFs. I have some PDF documents tagged for accessibility which show as tagged in Adobe Reader properties (Tagged: Yes), but PoDoFo::PdfDocument:: GetStructTreeRoot returns null. My (limited) understanding of ISO 32000 is that the ta

Re: [Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-23 Thread Dominik Seichter
According to the document catalog, the StructTreeRoot is in object 74 0 R which is missing in this PDF. Maybe acrobat reader just checks if there is a StructTreeRoot entry in the catalog to display that a document is tagged. But from my understanding there is no StructTreeRoot dictionary in thi

Re: [Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-23 Thread Mark Rogers
There seems to be some sort of tagged text in there: - the Read Out Loud feature of Adobe Reader does a good job of reading out the document and synchronising the reading to highlighted text on the document - the online PDF to HTML converter at Adobe gets all the document structure right (includ

Re: [Podofo-users] GetStructTreeRoot returns null on some tagged PDF documents?

2009-06-25 Thread Martin Schröder
2009/6/23, Dominik Seichter : > According to the document catalog, the StructTreeRoot is in object 74 0 R > which is missing in this PDF. Uncompressing the file with pdftk unearths a StructTreeRoot object. Maybe it's there but not in the reference table? Best Martin -