Re: Is any one able to successfully run Distributed Crawl?

2006-01-14 Thread Pushpesh Kr. Rajwanshi
Sorry for late reply but thanks for your quick response Doug. I really appreiciate it. Any ideas when would nutch-0.8 be released officially? Thanks and Regards, Pushpesh On 1/10/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Pushpesh Kr. Rajwanshi wrote: > > Just wanted to confirm that this

Re: Is any one able to successfully run Distributed Crawl?

2006-01-09 Thread Doug Cutting
Pushpesh Kr. Rajwanshi wrote: Just wanted to confirm that this distributed crawl you did using nutch version 0.7.1 or some other version? And was that a successful distributed crawl using map reduce or some work around for distributed crawl? No, this is 0.8-dev. This was using in early Decembe

Re: Is any one able to successfully run Distributed Crawl?

2006-01-08 Thread Pushpesh Kr. Rajwanshi
Hi Doug, Thanks alot for your precious time you gave for writing such a detailed and informative reply. Just wanted to confirm that this distributed crawl you did using nutch version 0.7.1 or some other version? And was that a successful distributed crawl using map reduce or some work around for d

Re: Is any one able to successfully run Distributed Crawl?

2006-01-04 Thread Doug Cutting
Earl Cahill wrote: Any chance you could walk through your implementation? Like how the twenty boxes were assigned? Maybe upload your confs somewhere, and outline what commands you actually ran? All 20 boxes are configured identically, running a Debian 2.4 kernel. These are dual-processor box

Re: Is any one able to successfully run Distributed Crawl?

2006-01-03 Thread Gal Nitzan
+1 On Mon, 2006-01-02 at 13:39 -0800, Earl Cahill wrote: > Any chance you could walk through your implementation? > Like how the twenty boxes were assigned? Maybe > upload your confs somewhere, and outline what commands > you actually ran? > > Thanks, > Earl > > --- Doug Cutting <[EMAIL PROTEC

Re: Is any one able to successfully run Distributed Crawl?

2006-01-02 Thread Earl Cahill
Any chance you could walk through your implementation? Like how the twenty boxes were assigned? Maybe upload your confs somewhere, and outline what commands you actually ran? Thanks, Earl --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Pushpesh Kr. Rajwanshi wrote: > > I want to know if anyone i

Re: Is any one able to successfully run Distributed Crawl?

2006-01-02 Thread Doug Cutting
Pushpesh Kr. Rajwanshi wrote: I want to know if anyone is able to successfully run distributed crawl on multiple machines involving crawling millions of pages? and how hard is to do that? Do i just have to do some configuration and set up or do some implementations also? I recently performed a

Re: Is any one able to successfully run Distributed Crawl?

2005-12-29 Thread Pushpesh Kr. Rajwanshi
Hi there, Thanks for reply again. What volume of data you are crawling and on how many machines? Which version of nutch you are using? 0.7.1 or any other? Actually it is working more or less fine but i want to know how much resources i will need (machines) for crawling 20,000 websites in a day? If

Re: Is any one able to successfully run Distributed Crawl?

2005-12-28 Thread Nutch Newbie
Hi I have had no problem doing distributed crawl. On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > Hi NN, > > Thanks for replying me. Actually I wanted to know if distributed crawling in > nutch is working fine and to what success? Like i am successful in setting > up distributed

Re: Is any one able to successfully run Distributed Crawl?

2005-12-28 Thread Pushpesh Kr. Rajwanshi
Hi NN, Thanks for replying me. Actually I wanted to know if distributed crawling in nutch is working fine and to what success? Like i am successful in setting up distributed crawl for 2 machines (1 master and 1 slave) but when i try with more than two machines there seems problem specially while i

Re: Is any one able to successfully run Distributed Crawl?

2005-12-27 Thread Nutch Newbie
Have you tried the following: http://wiki.apache.org/nutch/HardwareRequirements and http://wiki.apache.org/nutch/ There are no quick answer if one is planning to crawl million pages..Read..Try.. Read.. On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote: > Hi, > > I want to know if

Is any one able to successfully run Distributed Crawl?

2005-12-27 Thread Pushpesh Kr. Rajwanshi
Hi, I want to know if anyone is able to successfully run distributed crawl on multiple machines involving crawling millions of pages? and how hard is to do that? Do i just have to do some configuration and set up or do some implementations also? Also can anyone tell me if i want to crawl around 2