Re: Illegible decoding in some pdf documents

2010-05-17 Thread Thomas Fischer
Hello Andreas, >> Either way, these TeX-created documents seem to present specific challenges >> for PDFBox. Since we need to make these files available for full-text >> search, we would be very happy if their text extraction could be improved. >> I'm ready to help with tests and examples; I am af

Re: Re: Illegible decoding in some pdf documents

2010-05-17 Thread Andreas Lehmkühler
Hi, - original Nachricht Betreff: Re: Illegible decoding in some pdf documents Gesendet: So, 16. Mai 2010 Von: Thomas Fischer > Hallo Andreas, > > I added some comments and files to > https://issues.apache.org/jira/browse/PDFBOX-534 > and created three new

Re: Illegible decoding in some pdf documents

2010-05-16 Thread Thomas Fischer
ple's PDF kit, though with the usual quirks >> that made me decide to use pdfbox in the first place. But this shows that >> the file can be transformed. >> But I'm not enough of an expert of either Java or the PDF format to really >> dig into the the pdfbox code, s

Re: Re: Illegible decoding in some pdf documents

2010-05-15 Thread Andreas Lehmkuehler
2010 um 09:16 schrieb Andreas Lehmkühler: Hi Thomas, ----- original Nachricht Betreff: Illegible decoding in some pdf documents Gesendet: Di, 11. Mai 2010 Von: Thomas Fischer Hello, I sent this note last week and didn't receive any response, here is an updated version with some addit

Re: Illegible decoding in some pdf documents

2010-05-12 Thread Thomas Fischer
7;t be of much help there. All the best Thomas Fischer Am 12.05.2010 um 09:16 schrieb Andreas Lehmkühler: > Hi Thomas, > > ----- original Nachricht > Betreff: Illegible decoding in some pdf documents > Gesendet: Di, 11. Mai 2010 > Von: Thomas Fischer >> Hello

Re: Illegible decoding in some pdf documents

2010-05-12 Thread Andreas Lehmkühler
Hi Thomas, - original Nachricht Betreff: Illegible decoding in some pdf documents Gesendet: Di, 11. Mai 2010 Von: Thomas Fischer > Hello, > > I sent this note last week and didn't receive any response, here is an > updated version with some additional information.

Illegible decoding in some pdf documents

2010-05-11 Thread Thomas Fischer
Hello, I sent this note last week and didn't receive any response, here is an updated version with some additional information. To explain the context a little: I tried to extract text from 5091 mathematical PDF files. While I got some messages like "You do not have permission to extract text",