Hi Michael,

The Nutch2Tutorial [1] is only for configuring Hbase with Nutch. The
'readdb' commands needs a parameter to work with.

Please check [2] for steps to crawl using Nutch 2 and Hbase. There is also
patch in the issue [3] for using a script for crawling with Nutch 2.

[1] - http://wiki.apache.org/nutch/Nutch2Tutorial
[2] -
http://sujitpal.blogspot.com/2011/01/exploring-nutch-20-hbase-storage.html
[3] - https://issues.apache.org/jira/browse/NUTCH-1087

Hope this helps!

Regards,
Kiran.

On Mon, Jan 7, 2013 at 11:52 AM, Michael Gang <[email protected]> wrote:

> Hi all,
>
> I am trying to follow the tutorial of nutch2 at
> http://wiki.apache.org/nutch/Nutch2Tutorial
> but after inject the tutorial ends and i don't know how to continue from
> there.
>
> When i try to run
>
> nutch readdb
>
>
> I get an error
>
> :bin/nutch readdb
> Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex
> regex])
>                       [-crawlId <id>] [-content] [-headers] [-links]
> [-text]
>     -crawlId <id>  - the id to prefix the schemas to operate on,
>                      (default: storage.crawl.id)
>     -stats [-sort] - print overall statistics to System.out
>     [-sort]        - list status sorted by host
>     -url <url>     - print information on <url> to System.out
>     -dump <out_dir> [-regex regex] - dump the webtable to a text file in
>                      <out_dir>
>     -content       - dump also raw content
>     -headers       - dump protocol headers
>     -links         - dump links
>     -text          - dump extracted text
>     [-regex]       - filter on the URL of the webtable entry
>
> I am asking myself how i can configure nutch that it will crawl a certain
> page and all his children pages.
> I see that this is the topic in the tutorial
> http://wiki.apache.org/nutch/NutchTutorial
> but i am not sure from which point to continue, as in nutch2 i am working
> against hbase and not against a directory.
>
> Thanks,
> David
>



-- 
Kiran Chitturi

Reply via email to