The file is public. GDrive is just being difficult. Here is a Dropbox link 
instead: 
https://www.dropbox.com/s/8thckx5crdc15ml/bb3ddd9a7de5aa494cd5611128e433ea8791c569.pdf?dl=0

I had a feeling the file might be corrupt. We’re processing over 6M PDFs with 
this, so we’re bound to find some edge cases.

Dirk

On August 1, 2016 at 11:48:51, Tilman Hausherr ([email protected]) wrote:

Am 01.08.2016 um 20:20 schrieb Dirk Groeneveld:  
> https://drive.google.com/a/allenai.org/file/d/0BxI7RAiTuio0a1k2amhoa1kxS1U/view?usp=sharing
>   
>  
> I hope that works?  

Yes, although it requires authorization. Is the file public or not?  

>  
> There are actually two concerns. Clearly it should not go into an infinite 
> loop, so that’s concern one. But even if it does, it would be good if the 
> thread was interruptible. It might already be. I have not tried that yet.  

It isn't interruptible... Your file is corrupt, it has this:  

0000497410 00000 n  
0000497457 00000 n  
0000497532 00000 n  
0000497579 00000 n  
0000497654 00000 n  
0000497701 00000 ¶ñw%–CÞ—ò.þ=^VPƒ»y2+‰6Aºo;-Ó›^€úrhf-d„lÍ£YYD  
lƒ}j¶xïÊÞúÊÿ\ü¡ËnP^P–ÜÓ(W=ÊÚò¶enIxGúiº9pÉN‘Á¿¶èˆ> ×À+sJ´ç7à  
<æ£Ùm/  

of course it shouldn't loop forever.  

Tilman  

>  
> Cheers!  
>  
> On August 1, 2016 at 11:07:09, Tilman Hausherr ([email protected]) wrote: 
>  
>  
> Am 01.08.2016 um 19:59 schrieb Dirk Groeneveld:  
>> I found a PDF that causes PDFBox to go into an infinite loop. I  
>> attached it to this email. The problem is easy to reproduce.  
> PDF Attachments are not allowed, please upload your file somewhere.  
>  
> Tilman  
>  
>  
>  


---------------------------------------------------------------------  
To unsubscribe, e-mail: [email protected]  
For additional commands, e-mail: [email protected]  

Reply via email to