And the ids and will be automatically stored in HBase?

2013/5/10 feng lu <[email protected]>

> Hi Adriana
>
> you can add metadata to each seed url like this
>
> http://www.example.com  id=123
> http://www.example.com  id=456
>
> each CrawlDatum include many metadatas, you can use that to store any
> information about url.
>
>
>
>
>
> On Fri, May 10, 2013 at 5:26 PM, Adriana Farina
> <[email protected]>wrote:
>
> > Hello,
> >
> > I'm using Nutch 2.1 on top of Hadoop 1.0.4, with HBase 0.90.4 as storage
> > system. I run Nutch in distributed mode.
> >
> > I need to associate an id to each url inside the seed list of nutch and
> to
> > store this information in HBase. I think that I have to create a new
> column
> > family in HBase and modify the gora and hbase configuration files in the
> > nutch conf folder.
> >
> > However, I think I need to modify the code of Nutch, but I don't know
> which
> > classes I have to modify. I googled a bit, but I didn't find any
> > documentation; I've searched inside the code but I wasn't able to solve
> my
> > problem.
> >
> > Can anybody help me?
> >
> > Thank you!
> >
> >
> > --
> > Adriana Farina
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
Adriana Farina

Reply via email to