The autoscaling feature of Beam and the job stealing (not their term) look to be fantastic for Tika jobs.
>Though, it actually does work, for me at least :-) Have you tried the MockParser? That's where the fun really begins. Simulate an oom or permanent hang! -----Original Message----- From: Sergey Beryozkin [mailto:sberyoz...@gmail.com] Sent: Friday, May 19, 2017 12:27 PM To: user@tika.apache.org Subject: Re: Extracting Text from embedded images in PDF docs Hi Chris I'm getting nervous now, what will happen to me if it will not work out in the end :-). Though, it actually does work, for me at least :-) Cheers, Sergey On 19/05/17 17:23, Mattmann, Chris A (3010) wrote: > Thanks Sergey what an awesome surprise you are the best! > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Principal Data Scientist, Engineering Administrative Office (3010) > Manager, NSF & Open Source Projects Formulation and Development > Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 180-503E, Mailstop: 180-503 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Director, Information Retrieval and Data Science Group (IRDS) Adjunct > Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > On 5/19/17, 9:11 AM, "Sergey Beryozkin" <sberyoz...@gmail.com> wrote: > > Hi Tim > On 19/05/17 16:47, Allison, Timothy B. wrote: > > > >> Yes I was asking about it as I thought it was confusing it did not > work > >> - I saw you following up on this possible issue in the other email... > > Y, I agree. That _should_ work. > > > >> I'm doing some work with Tika now so it was of an immediate interest > to me... > > Yay! What are you working on? > > > Was supposed to be a secret for few weeks but I'll let you know, but do > not tell anyone please :-). Well, I'm trying to integrate Tika with > Apache Beam, hoping to get something ready in a couple of weeks, if it > won't make it to the Beam source then I'll create a standalone demo, > will share the link either way... > >> Sure. By the way I was not complaining... > > I didn't take it that way at all! I apologize if anything I wrote > came across that way. > > > Np, my apologies instead :-), I thought may be I asked it the way which > sounded like a 'why does it just not work' question which would indeed > be strange to hear from a Tika committer (nearly veteran I should say > :-)). > > Thanks, Sergey > > Cheers, > > > > Tim > > > >