date:20170608

Re: Limit on input PDF file size in Tika?

2017-06-08 Thread Nick Burch

On Thu, 8 Jun 2017, tesm...@gmail.com wrote: Thanks for your reply. I am calling Apache Tika in Java code like this: public String extractPDFText(String faInputFileName) throws IOException,TikaException { //Handler for body text of the PDF article BodyContentHandler handler = new

Re: Limit on input PDF file size in Tika?

2017-06-08 Thread tesm...@gmail.com

Thanks for your reply. I am calling Apache Tika in Java code like this: public String extractPDFText(String faInputFileName) throws IOException,TikaException { //Handler for body text of the PDF article BodyContentHandler handler = new BodyContentHandler(); //Metadata of the

Re: Limit on input PDF file size in Tika?

2017-06-08 Thread Nick Burch

On Thu, 8 Jun 2017, tesm...@gmail.com wrote: My tika code is not extracting full body text of larger PDF files. Files more than 1 MB in size and around 20 pages are partially extracted. Is there any limit on input PDF file size in tika How are you calling Apache Tika? Direct java calls to

Grobid with TXT and HTML files

2017-06-08 Thread tesm...@gmail.com

Dear Thamme, https://grobid.readthedocs.io/en/latest/grobid-04-2015.pdf The above presentation says that Grobid supports raw text. My input files are in TXT and HTML formats. Do you have any idea how can this be supported as raw text? Regards, On Wed, May 3, 2017 at 6:16 PM, Thamme Gowda