Hello - this is a Tika question and i am not sure this is possible, but it might just be. Please go to the Tika mailing list and ask them. Markus
-----Original message----- > From:A Laxmi <a.lakshmi...@gmail.com> > Sent: Wednesday 18th May 2016 19:46 > To: user@nutch.apache.org > Subject: Nutch crawl line breaks > > Hi, > > I have crawled PDFs using Nutch 1.7. I found that "content" field has no > line breaks. It grabbed all the paragraphs in the PDF as one aggregated > paragraph without line breaks. Is it possible to crawl such that the > "content" field has line breaks the way it appears in the original PDF? > > Please advise. > > Thanks, > AL >