Thanks for your reply. I am calling Apache Tika in Java code like this: public String extractPDFText(String faInputFileName) throws IOException,TikaException {
//Handler for body text of the PDF article BodyContentHandler handler = new BodyContentHandler(); //Metadata of the article Metadata metadata = new Metadata(); //Input file path FileInputStream inputstream = new FileInputStream(new File(faInputFileName)); //Parser context. It is used to parse InputStream ParseContext pcontext = new ParseContext(); try { //parsing the document using PDF parser from Tika. Case statement will be added for handling other file types. PDFParser pdfparser = new PDFParser(); //Do the parsing by calling the parse function of pdfparser pdfparser.parse(inputstream, handler, metadata,pcontext); }catch(Exception e) { System.out.println("Exception caught:"); } //Convert the body handler to string and return the string to the calling function return handler.toString(); } Regards, On Thu, Jun 8, 2017 at 4:29 PM, Nick Burch <apa...@gagravarr.org> wrote: > On Thu, 8 Jun 2017, tesm...@gmail.com wrote: > >> My tika code is not extracting full body text of larger PDF files. >> >> Files more than 1 MB in size and around 20 pages are partially extracted. >> Is there any limit on input PDF file size in tika >> > > How are you calling Apache Tika? Direct java calls to TikaConfig + > AutoDetectParser? Using the Tika facade class? Using the Tika App on the > command line? Tika Server? Other? > > Nick >