Hi
I'm trying to figure out how to extract text from a PDF into an
accessibility tool
I've figured out how to walk the tagged structure returned by
GetStructTreeRoot, but stuck on how to get from an integer marked content
identifier (PDF 32000 14.7.2) to the actual text.
Looks like I pr
a problem document is:
http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf
Any pointers or suggestions would be gratefully accepted.
Regards
Mark Rogers - mark.rog...@electrum.co.uk
--
Are y
There seems to be some sort of tagged text in there:
- the Read Out Loud feature of Adobe Reader does a good job of reading out the
document and synchronising the reading to highlighted text on the document
- the online PDF to HTML converter at Adobe gets all the document structure
right (includ
Filename, strLen+1);
Regards
Mark
Mark Rogers - mark.rog...@electrum.co.uk
Electrum Multimedia Ltd - http://www. electrum.co.uk
Registered in Scotland No 158435 Registered Office 50 Lothian Road
--
SOLARIS 10 is the OS
32000-1:2008 14.7.1)
* \returns PdfObject the MarkInfo dictionary
*/
PdfObject* GetMarkInfo() const { return GetNamedObjectFromCatalog(
"MarkInfo" ); }
Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com
Registered
Filename=0x03825600) Line 160
+ 0x3d bytes C++
PoDoFo::PdfMemDocument::PdfMemDocument(const wchar_t * pszFilename=0x03825600)
Line 74 C++
Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - http://www.powermapper.com
Registered in Scotland No 362274 30-31 Que
pp
Add #include "PdfDefines.h" before first include.
Only tested on Visual Studio - don't have a linux tool chain currently (but
looks correct after a quick scan of the GCC documentation).
PS Congratulations on getting the 0.8.0 release out - worked very smoothly
apart from this issue.
ion::s_lNumActions
PdfElement.cpp
PdfElement::TypeNameToIndexfaults if ppTypes[] contains sentinel NULL entries
and name isn't found
Changes:
98 Change: if( strcmp( pszType, ppTypes[i] ) == 0 )
To: if( ppTypes[i] != NULL && strcmp( pszType, ppTypes[i] ) ==
0 )
Regards
M
STL allocator should do the trick, but that could break a lot of
existing code).
The other alternative is to require apps to use the shared CRT, but that pulls
in other dependencies (e.g. installer merge modules and the broken SxS
deployment model)
Regards
Mark
Mark Rogers - mark.rog...
*/
PdfObject* GetLanguage() const { return GetNamedObjectFromCatalog( "Lang"
); }
Do simple accessors like this need a unit test?
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinbur
adOnDemand = true in PdfParser::ParseFile if that's important)
Has anyone any insights into the problem? I'm happy to dig in and provide a
patch if I can work out what's wrong.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
;m_nNumObjects is
greater than m_nMaxObjects." );
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
agesTree::GetPageNode
.230
- if( !pObj->IsArray() )
+ if( pObj == NULL || !pObj->IsArray() )
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
---
}
.540 added
+ --m_nReadNextTrailerLevel;
}
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
--
numbers in my patches all refer to PoDoFo 0.9.1
Best Regards
Mark
-Original Message-
From: Dominik Seichter [mailto:domseich...@googlemail.com]
Sent: 15 July 2012 08:40
To: Mark Rogers
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for performance issue
Hi Mark,
I'm happy to check the patched files against my version if you email me them,
or let me know when they're committed to SVN
Best Regards
Mark
-Original Message-
From: Dominik Seichter [mailto:domseich...@googlemail.com]
Sent: 15 July 2012 08:36
To: Mark Rogers
C
Great - checked SVN against my source - all the changes look good.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
-Original Message-
From: Dominik Seichter
ot;, (long)(pBuffer - pStart));
//printf("pEnd=%li ", lBufferLen);
if ( nFirstObj >= 0 && nFirstObj < m_pOffsets->size() )
if( !
(*m_pOffsets)[static_cast(nFirstObj)].bParsed )
ReadXRefStreamEntry( pBuffer, lBufferLen, nW, static_cast(nFirstObj)
meters.pdf is read,
but I can see an XRefStm when displaying the PDF in a text editor
I'm happy to try to produce a patch, but some pointers on where to start
looking for the cause would be much appreciated.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd
f
The attached patch adds support for XRefStm to PdfParser::ReadNextTrailer() by
adding code between
MergeTrailer( &trailer );
and
if( trailer.GetDictionary().HasKey( "Prev" ) )
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.
y when the "Kids"
array is missing
PdfPages.patch
Fixes an access violation in PdfPage::GetPageNumber when the "Kids" array is
missing
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 3622
assumes is an XRefStm
Given that fixing this might uncover more problems, and it’s very close to
release day, I’d suggest keeping r1648 for the moment and I’ll submit a patch
after the release.
Does that sound ok?
Cheers
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
patch2.diff
Description: patch2.diff
patch1.diff
Description: patch1.diff
#else
# define PODOFO_LL_LITERAL(x) x##LL
# define PODOFO_ULL_LITERAL(x) x##ULL
#endif
#endif
RC.exe doesn't define __FILE__ when compiling - all C/C++ compilers do (and
PoDoFo already has dependencies on __FILE__ in PODOFO_RAISE_ERROR and
AddToCallstack calls)
Cheers
Mark
Mark Rogers - mark.ro
::GetCharacterLength() const
{
return this->IsUnicode() ? this->GetUnicodeLength() : this->GetLength();
}
pdf_long PdfString::GetUnicodeLength() const
{
// patch? if ( m_buffer.GetSize() == 0 ) return 0;
return (m_buffer.GetSize() / sizeof(pdf_utf16be)) - 1;
}
Best Regards
Mark
Ma
#x27;s supported by very old compilers
__FILE__ is guaranteed to be defined in C/C++ code by the C/C++ standard (first
standardised in C89)
__FILE__ is documented as undefined in the Windows Resource Compiler
documentation
PoDoFo C++ code won't compile if __FILE__ is undefined since it's
/podofo/PdfExtension.h is missing from SVN (running
create_forward_headers.sh should fix this)
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
-Original Message-
From
ehaviour is
bad (i.e. access faults or out of memory errors). Where the current behaviour
is reasonable there are no changes other than documenting the behaviour.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No
memory layout of member variables) but won’t affect source compatibility.
Do you want me to submit the patches?
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
From
tested compilation on XCode 8 / Clang and Visual C++ 2015
Cheers
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 06/01/2017, 08:30, "zyx" wrote:
On Thu,
mory usage for PdfName from 70 MB to about 4MB in
PDF32000_2008.pdf
Is this worth doing? Can you think of any problems this might cause?
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinbur
If the dictionary keys are changed to PdfName& or PdfName* then it changes to
Total: 4 bytes per dictionary key (Sizeof(PdfName*) = sizeof(void*) = 4 bytes)
with no heap overhead
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered i
delete (*it).second;
++it;
}
m_mapKeys.clear();
}
}
Thoughts?
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 14/01/2017, 22:38, &
platforms.
Eliminating this will reduce memory requirements by 28%, but requires bigger
changes than just re-ordering members. I’ll look at that for next release.
Cheers
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 17/01/2017, 08:17, "zyx" wrote:
On Mon, 2017-01-16 at 20:25 +0100, Dominik Seichter wrote:
> Please report any is
s deleting the assert or putting an #ifdef DEBUG round it:
#ifdef DEBUG
PODOFO_ASSERT( !m_pCanvas );
#endif
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
Fro
g an array or collection of
PdfPainters means some destructors are never called
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 22/01/2017, 11:02, "zyx"
of different compilers PoDoFo needs to support I don’t think
you can ever safely throw an exception in a PoDoFo destructor
Best Regards
Mark
On 22/01/2017, 13:11, "Mark Rogers" wrote:
Hi
If a destructor called during stack unwinding throws an exception, then C++
guar
(EPdfDataType)m_eDataType;
I’ll do some testing them submit a patch.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 16/02/2017, 18:50, "zyx" wrote:
On
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 19/03/2017, 18:51, "Mattia Rizzolo" wrote:
On Mon, Mar 13, 2017 at 01:39:00PM +0100, Mattia Rizzolo wrote:
>
hes since the pObj ==
pObj->GetParent() case is probably the most common, but the depth check covers
other types of loops in the “Parent” structure and protects against deeply
nested PDFs
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermap
VE-2017-6844 and CVE-2017-5853)
The patch may also resolve CVE-2017-5855, but I’ve not been able to confirm
that yet.
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
This fixes an out by one buffer read caused by string loop control using
for( i=0;i<=lStringLen;i++ )
instead of
for( i=0;imailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
patch-CVE-2917-7378.di
PdfPage::GetPageNumber goes into an infinite while loop if “Parent” chain
contains a loop
This is caused by same underlying problem as CVE-2017-5852 (although it’s an
infinite loop rather than infinite recursion)
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.
Previously the encoding table for PdfSimpleEncoding contained 0x entries.
This was one entry too short to encode code point 0x
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registe
The revised patch doesn’t compile because it uses:
+PODOFO_ERROR_INFO( ePdfError_ValueOutOfRange,
+"xref subsection's given entry numbers together too large" );
instead of
+PODOFO_RAISE_ERROR_INFO ( ePdfError_ValueOutOfRange,
+"xref subsect
)
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 02/06/2017, 11:55, "zyx" wrote:
On Fri, 2017-06-02 at 11:58 +0200, Florian Hänel wrote:
> WARNI
0.9.6 – but the fix would be
making all the errors in ReadXRefSubSection all throw ePdfError_InvalidXRef or
all throw ePdfError_InvalidXRef.
If I can also submit the parser unit tests now, but I was planning to wait
until 0.9.6 release was complete
Cheers
Mark
--
Mark Rogers - mark.rog
ally to include the new tests.
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
On 13/04/2018, 21:58, "Mattia Rizzolo" wrote:
On Fri, Apr 13, 2018
std::vector::resize(count)
Without ASAN enabled std::vector::resize with a large count will throw a
std::bad_alloc and be caught by the catch( std::exception ) statement in
ReadXRefSubsection
Does this analysis make sense?
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
patch-CVE-2018-5296.diff
Description: patch-CVE-2018-529
writing
so it can produce PDFs that Adobe Reader can’t read.
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
Hello Mark, hel
support JPEG 2000 or XFA?
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh E
canOutOfMemoryKillUnitTests() at end of ParserTests.cpp
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
ParserTest.h
Description: Parser
o byte copy
// https://stackoverflow.com/a/3751937
memcpy(&pInputBuffer[2], m_buffer.GetBuffer(), 2 - 2);
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 36227
PODOFO_HAVE_OPENSSL is defined by rethrowing exception in methods that didn’t
have try … catch previously.
Patches tested on Windows / Mac without OpenSSL support. Not tested on Linux.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
to a method just involves adding the following local
variable to any method you need to guard:
PdfRecursionGuard guard;
Is option 3) worth investigating?
What does everyone think?
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMa
ets/7/#df09 and the PdfParser unit tests
https://sourceforge.net/p/podofo/mailman/message/36298123/
Cheers
Mark
--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermil
);
--s_nRecursionDepth; // PoDoFo is multi threaded and this needs protected by
a mutex
#endif
}
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
From: Christopher Creutzig
Date
looping
PDF structures. We’ll submit these along with a patch - these tests make it
easy to experiment with different patches for the same issue.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274
I think the standard says it’s a macro:
https://en.cppreference.com/w/c/thread/thread_local
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
From: Christopher Creutzig
Date
797. This is caused by an invalid
negative value for one of the FlateDecode compression parameters which results
in a call to podofo_calloc( -14 ) == podofo_calloc( 0xfff2 )
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
).
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
From: Michal Sudolsky
Date: Thursday, 25 November 2021 at 18:25
To: Christopher Creutzig
Cc: "podofo-
:
ePdfError_InvalidXRef, /* The XRef table is invalid or recursion
is too deep */
1. Don’t think replacing ePdfError_InvalidXRef completely is option since
that gets thrown invalid xrefs and recursion isn’t involved
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
her so produce the wrong buffer size (e.g. if nColumns=1,
m_nBPC=2 and m_nColors=SIZE_MAX/2+1).
This has been tested in production for a few months on Mac 64-bit / Windows
32-bit.
Best Regards
Mark
Mark Rogers - mark.rog...@powermapper.com
PowerMapper Software Ltd - www.powermapper.com
Registe
continuing-work-defect-reports-and-clarifications
It might be ok to use selected C++20 features, but how easy is it to identify
which parts of the C++20 standard are stable and are available across the main
compilers?
Best Regards
Mark
--
Mark Rogers - mark.rog...@powermapper.com
PowerMapper So
66 matches
Mail list logo