Re: Limit on input PDF file size in Tika?

2017-06-08 Thread Nick Burch
On Thu, 8 Jun 2017, tesm...@gmail.com wrote: Thanks for your reply. I am calling Apache Tika in Java code like this: public String extractPDFText(String faInputFileName) throws IOException,TikaException { //Handler for body text of the PDF article BodyContentHandler handler = new

Re: Limit on input PDF file size in Tika?

2017-06-08 Thread tesm...@gmail.com
Thanks for your reply. I am calling Apache Tika in Java code like this: public String extractPDFText(String faInputFileName) throws IOException,TikaException { //Handler for body text of the PDF article BodyContentHandler handler = new BodyContentHandler(); //Metadata of the

Re: Limit on input PDF file size in Tika?

2017-06-08 Thread Nick Burch
On Thu, 8 Jun 2017, tesm...@gmail.com wrote: My tika code is not extracting full body text of larger PDF files. Files more than 1 MB in size and around 20 pages are partially extracted. Is there any limit on input PDF file size in tika How are you calling Apache Tika? Direct java calls to

Grobid with TXT and HTML files

2017-06-08 Thread tesm...@gmail.com
Dear Thamme, https://grobid.readthedocs.io/en/latest/grobid-04-2015.pdf The above presentation says that Grobid supports raw text. My input files are in TXT and HTML formats. Do you have any idea how can this be supported as raw text? Regards, On Wed, May 3, 2017 at 6:16 PM, Thamme Gowda