Amazon S3/Ec2 problem [injection and fs.rename() problem]

2007-01-16 Thread Mike Smith
I've testing Tom's nice S3/EC2 patch on couple of EC2/S3 machine. Injector fails to inject urls, because fs.rename() in line 145 of CrawlDb.javadeletes the whole content and only renames the parent folder from x to current. Basiclly,. crawl_dir/crawldb/current will an empty folder after renami

Re: Unable to complete a full fetch, reason Child Error

2006-03-03 Thread Mike Smith
trackers still fetch together though I have only > 3 sites in the fetchlist. > > The task trackers fetch the same pages... > > I have used latest build from hadoop trunk. > > Gal. > > > On Fri, 2006-02-24 at 14:15 -0800, Doug Cutting wrote: > > Mike Smith wrote: >

Re: Unable to complete a full fetch, reason Child Error

2006-02-19 Thread Mike Smith
Hi, This problem is killer! I've been strugelling with this for about a month! It doesn't happen all the time, because of this problem the largest crawl I could ever done is about 1 million pages. I have three machines, 3 datanode, 1 data replicate, 1 job tracker, here is what I get: nameserver

[jira] Commented: (NUTCH-136) mapreduce segment generator generates 50 % less than excepted urls

2006-01-22 Thread Mike Smith (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-136?page=comments#action_12363587 ] Mike Smith commented on NUTCH-136: -- I have had the same problem. Florent suggested to use "protocol-http" instead of "protocol-httpclient", this fixed t