Another idea...if you have any interest, it would be great to get Apache Beam set up on our Rackspace VM (with Spark?) and use it for our regression tests?
-----Original Message----- From: Sergey Beryozkin [mailto:sberyoz...@gmail.com] Sent: Friday, May 19, 2017 4:21 PM To: user@tika.apache.org Subject: Re: Extracting Text from embedded images in PDF docs Hi Tim Sure, once I get an initial PR ready I'll send an update and I'll explain what I did for a start and we will discuss it further