Hi Michael, The Nutch2Tutorial [1] is only for configuring Hbase with Nutch. The 'readdb' commands needs a parameter to work with.
Please check [2] for steps to crawl using Nutch 2 and Hbase. There is also patch in the issue [3] for using a script for crawling with Nutch 2. [1] - http://wiki.apache.org/nutch/Nutch2Tutorial [2] - http://sujitpal.blogspot.com/2011/01/exploring-nutch-20-hbase-storage.html [3] - https://issues.apache.org/jira/browse/NUTCH-1087 Hope this helps! Regards, Kiran. On Mon, Jan 7, 2013 at 11:52 AM, Michael Gang <[email protected]> wrote: > Hi all, > > I am trying to follow the tutorial of nutch2 at > http://wiki.apache.org/nutch/Nutch2Tutorial > but after inject the tutorial ends and i don't know how to continue from > there. > > When i try to run > > nutch readdb > > > I get an error > > :bin/nutch readdb > Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex > regex]) > [-crawlId <id>] [-content] [-headers] [-links] > [-text] > -crawlId <id> - the id to prefix the schemas to operate on, > (default: storage.crawl.id) > -stats [-sort] - print overall statistics to System.out > [-sort] - list status sorted by host > -url <url> - print information on <url> to System.out > -dump <out_dir> [-regex regex] - dump the webtable to a text file in > <out_dir> > -content - dump also raw content > -headers - dump protocol headers > -links - dump links > -text - dump extracted text > [-regex] - filter on the URL of the webtable entry > > I am asking myself how i can configure nutch that it will crawl a certain > page and all his children pages. > I see that this is the topic in the tutorial > http://wiki.apache.org/nutch/NutchTutorial > but i am not sure from which point to continue, as in nutch2 i am working > against hbase and not against a directory. > > Thanks, > David > -- Kiran Chitturi

