HI, I am looking for advice on how to configure Nutch (and Solr) to crawl a private Wikipedia mirror.
- It is my mirror on an intranet so I do not need to be polite to myself. - I need to complete this 11 million page crawl as fast as I reasonably can. - Both crawler and mirror are 1.7GB machines dedicated to this task. - I only need to crawl internal links (not external). - Eventually I will need to update the crawl but a monthly update will be sufficient. Any advice (and sample config files) would be much appreciated! Fred

