Re: nutch 2 tutorial

Michael Gang Tue, 08 Jan 2013 04:45:32 -0800

Hi,

Thanks for your mail.
It helped me.



Thanks,
David


On Mon, Jan 7, 2013 at 8:30 PM, kiran chitturi <[email protected]>wrote:

> Hi Michael,
>
> The Nutch2Tutorial [1] is only for configuring Hbase with Nutch. The
> 'readdb' commands needs a parameter to work with.
>
> Please check [2] for steps to crawl using Nutch 2 and Hbase. There is also
> patch in the issue [3] for using a script for crawling with Nutch 2.
>
> [1] - http://wiki.apache.org/nutch/Nutch2Tutorial
> [2] -
> http://sujitpal.blogspot.com/2011/01/exploring-nutch-20-hbase-storage.html
> [3] - https://issues.apache.org/jira/browse/NUTCH-1087
>
> Hope this helps!
>
> Regards,
> Kiran.
>
> On Mon, Jan 7, 2013 at 11:52 AM, Michael Gang <[email protected]>
> wrote:
>
> > Hi all,
> >
> > I am trying to follow the tutorial of nutch2 at
> > http://wiki.apache.org/nutch/Nutch2Tutorial
> > but after inject the tutorial ends and i don't know how to continue from
> > there.
> >
> > When i try to run
> >
> > nutch readdb
> >
> >
> > I get an error
> >
> > :bin/nutch readdb
> > Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex
> > regex])
> >                       [-crawlId <id>] [-content] [-headers] [-links]
> > [-text]
> >     -crawlId <id>  - the id to prefix the schemas to operate on,
> >                      (default: storage.crawl.id)
> >     -stats [-sort] - print overall statistics to System.out
> >     [-sort]        - list status sorted by host
> >     -url <url>     - print information on <url> to System.out
> >     -dump <out_dir> [-regex regex] - dump the webtable to a text file in
> >                      <out_dir>
> >     -content       - dump also raw content
> >     -headers       - dump protocol headers
> >     -links         - dump links
> >     -text          - dump extracted text
> >     [-regex]       - filter on the URL of the webtable entry
> >
> > I am asking myself how i can configure nutch that it will crawl a certain
> > page and all his children pages.
> > I see that this is the topic in the tutorial
> > http://wiki.apache.org/nutch/NutchTutorial
> > but i am not sure from which point to continue, as in nutch2 i am working
> > against hbase and not against a directory.
> >
> > Thanks,
> > David
> >
>
>
>
> --
> Kiran Chitturi
>

Re: nutch 2 tutorial

Reply via email to