ok - that's what it looked like. So nothing bad would happen, we'd just wind
up with resource content streams getting added multiple times. I think that
PdfSmartCopy mostly addresses the downsides of that...
Thanks,
- K
Paulo Soares-3 wrote:
>
> freeReader() makes the writer instance forget
Mike - can we please reserve this thread for a technical discussion of the
merits of the proposal?
I'd be happy to have a conversation in a separate thread regarding how iText
works.
- K
Mike Marchywka-2 wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>> D
After more digging, I'm wondering if the place to do this wouldn't be in the
PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do
the same flattening operation that PdfStamper does.
The ideal would be to factor out the behavior so the code isn't duplicated
in both PdfCopy an
hare?
- K
Giovanni Azua-2 wrote:
>
> Hello,
>
> On Apr 23, 2010, at 10:50 PM, trumpetinc wrote:
>> Don't know if it'll make any difference, but the way you are reading the
>> file
>> is horribly inefficient. If the code you wrote is part of your test
>> ti
I'd love to discuss specific ideas on prediction - are you familiar enough
with the PDF spec to provide any suggestions?
Some obvious ones are the xref table - but iText reads that entirely into
memory one time and holds onto it, so it seems unlikely that pre-fetch would
do much there (other than
Don't know if it'll make any difference, but the way you are reading the file
is horribly inefficient. If the code you wrote is part of your test times,
you might want to re-try, but using this instead (I'm just tossing this
together - there might be type-os):
ByteArrayOutputStream baos = new By
sense to do some sort of paging... I'll have to think on that a bit.
- K
Giovanni Azua-2 wrote:
>
> Hello trumpetinc,
>
> On Apr 23, 2010, at 7:29 PM, trumpetinc wrote:
>
>> Giovanni - if your source PDFs are small enough, you might want to try
>> this,
>
Parsing PDF requires a lot of random access. It tends to be chunked - move
to a particular offset in the file, then parse as a stream (this is why
paging makes sense, and why memory mapping is effective until the file gets
too big). But the parsing is incredibly complex. You can have nested
obj
Nah - I'm not saying that memory is cheap (or that cache misses aren't
important) - just saying that int -> char casting isn't the culprit here.
The parser is a really low level algorithm that is responsible for reading
int from the bytes of a file and figuring out the appropriate value to
conver
determine where the
bottleneck is.
One thing that is quite clear here is that we need to have some sort of
benchmark that we can use for evaluation - for example, if I had a good
benchmark test, I would have just tried the ideas above to see how they
fared.
- K
Giovanni Azua-2 wrote:
>
>
Yes - it needs to be int. Regardless, we need to focus on the things that
are actually consuming run time, and this method isn't one of them (no
matter how much it could be optimized).
Mike Marchywka-2 wrote:
>
>
>
>
> does this have to be int vs char or byte? I think earlier I suggested
>
The semantics are different (the JSE call includes more characters in it's
definition of whitespace than the PDF spec). Not saying that it can't be
easily done, but throwing an if statement at it and seeing what impact it
has on performance is pretty easy also.
What was the overall time %age spe
I like your approach! A simple if (ch > 32) return false; at the very top
would give the most bang for the least effort (if you do go the bitmask
route, be sure to include unit tests!).
I know there were a lot of calls to this method, but I'm curious: in your
pofiling, how much of the total pro
Look at the parser package (com.itextpdf.text.pdf.parser) - you can start
with the PdfContentReaderTool as a starting point. I think you'll find that
this will greatly simplify your efforts.
Only caveat: I don't know if the parser has been ported to iTextSharp yet.
- K
Mircea Zahan wrote:
>
said.
>
> So while swapping the rectangle coordinates twice is certainly ODD, it
> doesn't look like there's anything genuinely broken in there... just an
> "Even Number of Sign Errors". Those are fine as long as you find both of
> them. Finding one and hav
As a point of clarification, I'm pretty sure that, in addition to swapping
width and height, rotate() signals PdfDocument to add a rotation cm entry to
the beginning of the content stream, and adjusts the rotation dictionary
entry for the page.
And I completely agree with the 'messy code for deal
Random thought (and more of a mental exercise than a real solution): I
wonder if it's possible to insert an object reference for the parameters to
a cm operation in a content stream...
I know, for example, that it's possible to do this with text operations, so
I'd imagine that it's possible with
For what it's worth, I've been able to create some pretty good content based
unit tests using the parser... I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from. This makes it pretty easy to determine if
For what it's worth, I've been able to create some pretty good content based
unit tests using the parser... I have a filtering parser that I've put
together (haven't committed it yet) that allows you to specify a region of
the page to extract text from. This makes it pretty easy to determine if
I just committed some new (highly experimental) code to SVN (rev 4221).
See com.itextpdf.text.pdf.parser.ImageRenderListener
To see an example of registering a RenderListener with a PdfContentParser,
see PdfTextExtractor.getTextFromPage() in the same package (you'll pass your
own RenderListener
20 matches
Mail list logo