Hello - this is a Tika question and i am not sure this is possible, but it 
might just be. Please go to the Tika mailing list and ask them.
Markus

 
 
-----Original message-----
> From:A Laxmi <a.lakshmi...@gmail.com>
> Sent: Wednesday 18th May 2016 19:46
> To: user@nutch.apache.org
> Subject: Nutch crawl line breaks
> 
> Hi,
> 
> I have crawled PDFs using Nutch 1.7. I found that "content" field has no
> line breaks. It grabbed all the paragraphs in the PDF as one aggregated
> paragraph without line breaks. Is it possible to crawl such that the
> "content" field has line breaks the way it appears in the original PDF?
> 
> Please advise.
> 
> Thanks,
> AL
> 

Reply via email to