Re: [iText-questions] Missing spaces in extracted text

2009-07-22 Thread Alex Vigdor
Yes, I did use the version from SVN. Hopefully we can get Kevin's feedback - I've done a few more side-by-side comparisons with PDFBox and while the "Tl < 200" logic seems entirely consistent, I don't think my change with Td is quite as solid - it has introduced extra spaces in a couple of PDFs.

Re: [iText-questions] Missing spaces in extracted text

2009-07-22 Thread 1T3XT info
Alex Vigdor wrote: > Once again, I don't know if this is an ideal or even > appropriate patch, not knowing the code deeply, but works in the cases I > am testing. I've looked at your changes, but I don't know the parser packages well enough to decide whether or not your approach is the best way

Re: [iText-questions] Missing spaces in extracted text

2009-07-21 Thread Alex Vigdor
One more followup: the words with 0 kerning that needed space had 'Td' or new line commands before them that were not working properly. I found another approach to fix those cases that doesn't introduce space in places where there is legitimately 0 kerning. The new patch follows. Once again, I

Re: [iText-questions] Missing spaces in extracted text

2009-07-21 Thread Alex Vigdor
Sorry to respond so quickly to my own message, but I thought I would at least demonstrate a naive patch - obviously this would need to be validated against many other sources, but at least it solves this particular case. Interestingly, I observed that in some instances words that should be separate

[iText-questions] Missing spaces in extracted text

2009-07-21 Thread Alex Vigdor
Hello, I've begun experimenting with the PdfTextExtractor in iText as a replacement for PDFBox. So far I'm very pleased with the results in many cases, however I've noticed several examples where all the words in the extracted text run together without spaces, so perhaps some tweaking is