Re: Segments / Database in Nutch 2.X

Tejas Patil Thu, 27 Jun 2013 05:04:58 -0700

On Thu, Jun 27, 2013 at 3:38 AM, Sznajder ForMailingList <
bs4mailingl...@gmail.com> wrote:


> Hi
>
> I do not see the usage of "Segments" in nutch 2.x
>
> In addition, I do not see DB path .
>

"segments" and "crawldb" are notions in 1.x representing the dir over FS
which has the crawlers' data in it (those are nothing but Hadoops' Map
files and Sequence files).
2.x leverages datastores to store the crawled data. A table is created in
the datastore to have all the information.

>
> In such condition, how can we two separate crawls, one starting from url1
> and the second from another seed, for example?
>

You could specify different crawlIDs. Being honest, I have never tried
running multiple crawls at the same time with 2.x.
Its not seen to be a good thing to do as mentioned by Julien in this thread:
http://lucene.472066.n3.nabble.com/Concurrently-running-multiple-nutch-crawls-td3166207.html

>
> Benjamin
>

Re: Segments / Database in Nutch 2.X

Reply via email to