Re: [iText-questions] Clarification on PdfCopy.freeReader()

2010-04-29 Thread trumpetinc
ok - that's what it looked like. So nothing bad would happen, we'd just wind up with resource content streams getting added multiple times. I think that PdfSmartCopy mostly addresses the downsides of that... Thanks, - K Paulo Soares-3 wrote: > > freeReader() makes the writer instance forget

Re: [iText-questions] Performance when flattening form fields

2010-04-26 Thread trumpetinc
Mike - can we please reserve this thread for a technical discussion of the merits of the proposal? I'd be happy to have a conversation in a separate thread regarding how iText works. - K Mike Marchywka-2 wrote: > > > > > > > > > > > > > >> D

Re: [iText-questions] Performance when flattening form fields

2010-04-25 Thread trumpetinc
After more digging, I'm wondering if the place to do this wouldn't be in the PdfCopy.PageStamp class? It seems like PageStamp.alterContents() could do the same flattening operation that PdfStamper does. The ideal would be to factor out the behavior so the code isn't duplicated in both PdfCopy an

Re: [iText-questions] performance follow up

2010-04-24 Thread trumpetinc
hare? - K Giovanni Azua-2 wrote: > > Hello, > > On Apr 23, 2010, at 10:50 PM, trumpetinc wrote: >> Don't know if it'll make any difference, but the way you are reading the >> file >> is horribly inefficient. If the code you wrote is part of your test >> ti

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
I'd love to discuss specific ideas on prediction - are you familiar enough with the PDF spec to provide any suggestions? Some obvious ones are the xref table - but iText reads that entirely into memory one time and holds onto it, so it seems unlikely that pre-fetch would do much there (other than

Re: [iText-questions] AW ESOME! performance follow up

2010-04-23 Thread trumpetinc
Don't know if it'll make any difference, but the way you are reading the file is horribly inefficient. If the code you wrote is part of your test times, you might want to re-try, but using this instead (I'm just tossing this together - there might be type-os): ByteArrayOutputStream baos = new By

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
sense to do some sort of paging... I'll have to think on that a bit. - K Giovanni Azua-2 wrote: > > Hello trumpetinc, > > On Apr 23, 2010, at 7:29 PM, trumpetinc wrote: > >> Giovanni - if your source PDFs are small enough, you might want to try >> this, >

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
Parsing PDF requires a lot of random access. It tends to be chunked - move to a particular offset in the file, then parse as a stream (this is why paging makes sense, and why memory mapping is effective until the file gets too big). But the parsing is incredibly complex. You can have nested obj

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
Nah - I'm not saying that memory is cheap (or that cache misses aren't important) - just saying that int -> char casting isn't the culprit here. The parser is a really low level algorithm that is responsible for reading int from the bytes of a file and figuring out the appropriate value to conver

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
determine where the bottleneck is. One thing that is quite clear here is that we need to have some sort of benchmark that we can use for evaluation - for example, if I had a good benchmark test, I would have just tried the ideas above to see how they fared. - K Giovanni Azua-2 wrote: > >

Re: [iText-questions] performance follow up

2010-04-23 Thread trumpetinc
Yes - it needs to be int. Regardless, we need to focus on the things that are actually consuming run time, and this method isn't one of them (no matter how much it could be optimized). Mike Marchywka-2 wrote: > > > > > does this have to be int vs char or byte? I think earlier I suggested >

Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc
The semantics are different (the JSE call includes more characters in it's definition of whitespace than the PDF spec). Not saying that it can't be easily done, but throwing an if statement at it and seeing what impact it has on performance is pretty easy also. What was the overall time %age spe

Re: [iText-questions] performance follow up

2010-04-22 Thread trumpetinc
I like your approach! A simple if (ch > 32) return false; at the very top would give the most bang for the least effort (if you do go the bitmask route, be sure to include unit tests!). I know there were a lot of calls to this method, but I'm curious: in your pofiling, how much of the total pro

Re: [iText-questions] Low level browsing of document structure

2010-03-30 Thread trumpetinc
Look at the parser package (com.itextpdf.text.pdf.parser) - you can start with the PdfContentReaderTool as a starting point. I think you'll find that this will greatly simplify your efforts. Only caveat: I don't know if the parser has been ported to iTextSharp yet. - K Mircea Zahan wrote: >

Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-20 Thread trumpetinc
said. > > So while swapping the rectangle coordinates twice is certainly ODD, it > doesn't look like there's anything genuinely broken in there... just an > "Even Number of Sign Errors". Those are fine as long as you find both of > them. Finding one and hav

Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc
As a point of clarification, I'm pretty sure that, in addition to swapping width and height, rotate() signals PdfDocument to add a rotation cm entry to the beginning of the content stream, and adjusts the rotation dictionary entry for the page. And I completely agree with the 'messy code for deal

Re: [iText-questions] Rotate Page After Adding Text to Document

2010-01-19 Thread trumpetinc
Random thought (and more of a mental exercise than a real solution): I wonder if it's possible to insert an object reference for the parameters to a cm operation in a content stream... I know, for example, that it's possible to do this with text operations, so I'd imagine that it's possible with

Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-16 Thread trumpetinc
For what it's worth, I've been able to create some pretty good content based unit tests using the parser... I have a filtering parser that I've put together (haven't committed it yet) that allows you to specify a region of the page to extract text from. This makes it pretty easy to determine if

Re: [iText-questions] Unit Testing, Stress Testing, Profiling...

2010-01-14 Thread trumpetinc
For what it's worth, I've been able to create some pretty good content based unit tests using the parser... I have a filtering parser that I've put together (haven't committed it yet) that allows you to specify a region of the page to extract text from. This makes it pretty easy to determine if

Re: [iText-questions] How do I get the position of an image in a PDF file?

2009-12-21 Thread trumpetinc
I just committed some new (highly experimental) code to SVN (rev 4221). See com.itextpdf.text.pdf.parser.ImageRenderListener To see an example of registering a RenderListener with a PdfContentParser, see PdfTextExtractor.getTextFromPage() in the same package (you'll pass your own RenderListener