Re: Incorrect text extraction of the PDF

2020-01-22 Thread Slava G
Thanks Maruan, I got the explanation. Slava On Wed, Jan 22, 2020 at 12:18 PM Maruan Sahyoun wrote: > Hi, > > please take a look at the FAQ at > > https://pdfbox.apache.org/2.0/faq.html#how-come-i-am-getting-gibberishg38g43g36g51g5-when-extracting-text > > BR > Maruan > > > Hi, > > I have PDF, wh

Incorrect text extraction of the PDF

2020-01-22 Thread Slava G
Hi, I have PDF, which is looks fine in readers but when I trying to extract text I get garbage. What am I doing wrong ? PDF is attached. Thanks - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands,

Re: Corrupted PDF file causing severe OOM

2019-05-16 Thread Slava G
Well, seems that It'll be fixed in PDFBox 2.0.16 On Wed, May 15, 2019 at 5:35 PM Slava G wrote: > Will definitely try, is this rc available via maven? > > On Wed, May 15, 2019, 17:20 Tim Allison wrote: > >> Yay! Tilman and colleagues on PDFBox really are _that_fast.

Re: Corrupted PDF file causing severe OOM

2019-05-15 Thread Slava G
Got you. Thanks On Thu, May 16, 2019 at 6:42 AM Tilman Hausherr wrote: > Am 15.05.2019 um 21:57 schrieb Slava G: > > But I tried to extract text using 2.0.15 and got immidiatelly exception > and > > didn't get OOM. > > > I got slow response on the seco

Re: Corrupted PDF file causing severe OOM

2019-05-15 Thread Slava G
But I tried to extract text using 2.0.15 and got immidiatelly exception and didn't get OOM. On Wed, May 15, 2019, 22:52 Tilman Hausherr wrote: > Am 15.05.2019 um 16:00 schrieb Slava G: > > But seems that in PDFBox 2.0.15 it's already fixed as, when I run > tika-app > &

Re: Corrupted PDF file causing severe OOM

2019-05-15 Thread Slava G
org/thread.html/2c027535156cc6862149490b289552d72ba5a9bff985fb7cce794e21@%3Cdev.tika.apache.org%3E > > On Wed, May 15, 2019 at 10:01 AM Slava G wrote: > > > Sure, I can share it privately. > > But seems that in PDFBox 2.0.15 it's already fixed as, when I run > tika-app > > (1.20) it's

Re: Corrupted PDF file causing severe OOM

2019-05-15 Thread Slava G
4:54 PM Tim Allison wrote: > Sounds like it might be a bug. > > PDFBox colleagues, any recs? > > Slava, if you’re able to share the file even if only privately, that’ll > help. > > On Wed, May 15, 2019 at 9:49 AM Slava G wrote: > > > I have small pdf file (142kb) whi

Re: Fwd: Very slow PDF parsing.

2019-02-28 Thread Slava G
Tim, to what email to send you the PDF ? Thanks On Thu, Feb 28, 2019 at 3:57 PM Slava G wrote: > I'll once I'll get customer's approval. > Meanwhile I can do any checks, if you can specify what to check. > Thanks > > On Thu, Feb 28, 2019 at 3:56 PM Tim Allison

Re: Fwd: Very slow PDF parsing.

2019-02-28 Thread Slava G
I'll once I'll get customer's approval. Meanwhile I can do any checks, if you can specify what to check. Thanks On Thu, Feb 28, 2019 at 3:56 PM Tim Allison wrote: > Any chance you can share the file directly w me or someone else on the > PDFBox team? > > On Wed, Feb 2

Re: Fwd: Very slow PDF parsing.

2019-02-27 Thread Slava G
rehoster (e.g. filedropper.com ) and put the file > into an encrypted ZIP. Please send the link and the password to > tilman at snafu dot de. Make sure you're not breaking any laws by > sending the file. > > Tilman > > > Am 27.02.2019 um 17:33 schrieb Slava G: > > As this is c

Re: Fwd: Very slow PDF parsing.

2019-02-27 Thread Slava G
's going on. > > If you can't share it, you'll have to investigate yourself by using the > profiler. Before that, try with old 2.0.* versions to see if these are > faster. > > Tilman > > Am 27.02.2019 um 17:23 schrieb Slava G: > > After 3h 40m it&

Re: Fwd: Very slow PDF parsing.

2019-02-27 Thread Slava G
After 3h 40m it's still parsing using PDFBox 2.0.14 app... Thanks On Wed, Feb 27, 2019 at 3:29 PM Slava G wrote: > With 2.0.14 it's 40 minutes running, no result, still working... > Seems that issue is still there. > Thanks > > On Wed, Feb 27, 2019 at 2:52 PM Slava G

Re: Fwd: Very slow PDF parsing.

2019-02-27 Thread Slava G
With 2.0.14 it's 40 minutes running, no result, still working... Seems that issue is still there. Thanks On Wed, Feb 27, 2019 at 2:52 PM Slava G wrote: > Checking with 2.0.14. Started as an app. Will update soon. > > On Wed, Feb 27, 2019 at 2:47 PM Tim Allison wrote: > >&g

Re: Fwd: Very slow PDF parsing.

2019-02-27 Thread Slava G
b 27, 2019 at 3:04 AM Slava G wrote: > >> Well, I ran (as was suggested) PDFBox app to extract text , so far 2 >> hours and still counting... >> It's seems to be a PDFBox issue. >> >> On Wed, Feb 27, 2019 at 9:51 AM JB Data31 wrote: >> >>> Why

Re: Fwd: Very slow PDF parsing.

2019-02-26 Thread Slava G
ed message - >> > From: Tim Allison >> > Date: Tue, Feb 26, 2019 at 12:13 PM >> > Subject: Re: Very slow PDF parsing. >> > To: >> > >> > >> > Sorry...that's an OCR tool. One thing that can slow down processing >&g