Gabriele Kahlout wrote:
> 
> On Wed, May 4, 2011 at 6:22 PM, Kelvin <[email protected]> wrote:
> 
>> Hi Gabriele,
>>
>> Thank you for your help. I am sorry, I am a newbie to nutch. If I crawl
>> the
>> whole wikipedia, the whole wikipedia will be stored in the crawldb ofmy
>> server?
>>
> 
> i think so (I'm also a newbie).
> 
wikipedia will get stored in the segments. Once indexed (and did all db
update stuff) you should delete them. 
Only information relating to the fetch/parse status of each link gets saved
to crawldb.  The lnk structure (in linkdb) should be maintained in linkdb.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-custom-crawl-using-Nutch-tp2899270p3081808.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to