Hi Navin, Crawl the data using crawl command[0]. After that, use the readseg command[1],[2] to dump a text file. You can easily automate using shell script, python etc scripting languages.
[0] : section 3.1 in http://wiki.apache.org/nutch/NutchTutorial [1] : http://www.marco.bianchi.name/myPortal/using-the-binnutch-readseg-command.aspx [2] : http://wiki.apache.org/nutch/bin/nutch_readseg Thanks, Tejas Patil On Tue, Dec 25, 2012 at 9:47 PM, navinkumar <navinkumar...@gmail.com> wrote: > Hi ,I’m newbie to nutch,I have successfully installed and configured nutch > to > crawl the sites.I want to get the data from crawl?1.Is there any way to get > the data programmatically?2.What is the command to extract the data into > plain text? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Extract-data-in-nutch-tp4029072.html > Sent from the Nutch - User mailing list archive at Nabble.com.