[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056690#comment-15056690 ] Lewis John McGibbney commented on NUTCH-2184: - This issue also improves command line parsing

Deploy a Nutch crawler or use Webhose.io?

2015-12-14 Thread Jon.P
Hi all, I need your advice! I need to harvest blog posts and news articles and extract their date, the author, the text, the title and the comments if possible. The way I see it I have two choices, deploy a Nutch crawler or as a friend suggested, use Webhose.io . The