Re: Segments / Database in Nutch 2.X

Sznajder ForMailingList Sun, 30 Jun 2013 04:10:58 -0700

Thanks!


On Thu, Jun 27, 2013 at 3:03 PM, Tejas Patil <tejas.patil...@gmail.com>wrote:

> On Thu, Jun 27, 2013 at 3:38 AM, Sznajder ForMailingList <
> bs4mailingl...@gmail.com> wrote:
>
> > Hi
> >
> > I do not see the usage of "Segments" in nutch 2.x
> >
> > In addition, I do not see DB path .
> >
>
> "segments" and "crawldb" are notions in 1.x representing the dir over FS
> which has the crawlers' data in it (those are nothing but Hadoops' Map
> files and Sequence files).
> 2.x leverages datastores to store the crawled data. A table is created in
> the datastore to have all the information.
>
> >
> > In such condition, how can we two separate crawls, one starting from url1
> > and the second from another seed, for example?
> >
>
> You could specify different crawlIDs. Being honest, I have never tried
> running multiple crawls at the same time with 2.x.
> Its not seen to be a good thing to do as mentioned by Julien in this
> thread:
>
> http://lucene.472066.n3.nabble.com/Concurrently-running-multiple-nutch-crawls-td3166207.html
>
> >
> > Benjamin
> >
>

Re: Segments / Database in Nutch 2.X

Reply via email to