The file is public. GDrive is just being difficult. Here is a Dropbox link instead: https://www.dropbox.com/s/8thckx5crdc15ml/bb3ddd9a7de5aa494cd5611128e433ea8791c569.pdf?dl=0
I had a feeling the file might be corrupt. We’re processing over 6M PDFs with this, so we’re bound to find some edge cases. Dirk On August 1, 2016 at 11:48:51, Tilman Hausherr ([email protected]) wrote: Am 01.08.2016 um 20:20 schrieb Dirk Groeneveld: > https://drive.google.com/a/allenai.org/file/d/0BxI7RAiTuio0a1k2amhoa1kxS1U/view?usp=sharing > > > I hope that works? Yes, although it requires authorization. Is the file public or not? > > There are actually two concerns. Clearly it should not go into an infinite > loop, so that’s concern one. But even if it does, it would be good if the > thread was interruptible. It might already be. I have not tried that yet. It isn't interruptible... Your file is corrupt, it has this: 0000497410 00000 n 0000497457 00000 n 0000497532 00000 n 0000497579 00000 n 0000497654 00000 n 0000497701 00000 ¶ñw%–CÞ—ò.þ=^VPƒ»y2+‰6Aºo;-Ó›^€úrhf-d„lÍ£YYD lƒ}j¶xïÊÞúÊÿ\ü¡ËnP^P–ÜÓ(W=ÊÚò¶enIxGúiº9pÉN‘ÿ¶èˆ> ×À+sJ´ç7à <æ£Ùm/ of course it shouldn't loop forever. Tilman > > Cheers! > > On August 1, 2016 at 11:07:09, Tilman Hausherr ([email protected]) wrote: > > > Am 01.08.2016 um 19:59 schrieb Dirk Groeneveld: >> I found a PDF that causes PDFBox to go into an infinite loop. I >> attached it to this email. The problem is easy to reproduce. > PDF Attachments are not allowed, please upload your file somewhere. > > Tilman > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

