Re: Spill failed error

2009-07-23 Thread Vibhooti Verma
This mainly happens when you do not have enough space. Please clean up and run again. On Tue, Jul 21, 2009 at 10:33 PM, George Pang p09...@gmail.com wrote: Hi users, Please help with this one - I got an error at running a two - node cluster on big files, the error is : 2365222 [main] ERROR

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Mathias Herberts
On Thu, Jul 23, 2009 at 09:20, Ted Dunningted.dunn...@gmail.com wrote: Last I heard, the API could be suborned in this scenario.  Real credential based identity would be needed to provide more than this. The hack would involve a changed hadoop library that lies about identity. This would not

DiskChecker$DiskErrorException in TT logs

2009-07-23 Thread Amandeep Khurana
Hi I get these messages in the TT log while running a job: 2009-07-23 02:03:59,091 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200907221738_0020/attempt_200907221738_0020_r_00_0/output/file.out in

NullPtr on Balancer in 19.1

2009-07-23 Thread Ko Baryiames
Any sage advice out there for a 19.1 balancer that is throwing NullPtrExceptions? I don't believe this is an instance of https://issues.apache.org/jira/browse/HADOOP-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682566 #action_12682566 As

Hadoop performance using Mahout

2009-07-23 Thread nfantone
First things first: I want to salute you all and thank you for developing a distributed engine such as Hadoop. It certainly helped me at work. I am now in the process of writing an application for user clustering based on their historical behavior as consumers. For clustering/classification

org.eclipse.core.runtime.IProgressMonitor unresolved in 0.20.0/eclipse

2009-07-23 Thread David Been
I have been trying to get hadoop to compile, i only have 2 errors about missing IProgressMonitor, which i find in both versions of Eclipse i have installed on Linux. The class is not in any of the hadoop jars. Am i missing some? Any ideas? hadoop/lib: commons-cli-2.0-SNAPSHOT.jar

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Aaron Kimball
The current best practice is to firewall off your cluster, configure a SOCKS proxy/gateway, and only allow traffic to the cluster from the gateway. Being able to SSH into the gateway provides authentication. See http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/

Re: Auto Tune off

2009-07-23 Thread Harish Mallipeddi
On Thu, Jul 23, 2009 at 10:30 PM, Richa Khandelwal richa@gmail.comwrote: Hey Everyone, Does anyone know how to turn off auto-tuning in hadoop? Thanks, -- Richa Khandelwal University of California, Santa Cruz CA What's auto-tuning in hadoop? Never heard of it. Are you talking about

a bug or not? hadoop example grep cannot run in a user defined queue...

2009-07-23 Thread Qingyan(Evan) Liu
dears, we simply can run hadoop example grep program to test mapred of our hadoop cluster. in default, all jobs will run in the default queue called 'default'. however, sometimes we'll define a different queue, such as 'users', and remove the queue 'default' . and we want all users submit jobs to

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Ted Dunning
Another best practice is to have a sandbox cluster separated from production. On Thu, Jul 23, 2009 at 9:17 AM, Aaron Kimball aa...@cloudera.com wrote: The current best practice is to firewall off your cluster, configure a SOCKS proxy/gateway, and only allow traffic to the cluster from the

Multi user

2009-07-23 Thread Wasim Bari
Hi, Is there some document available for setting multi user Hadoop cluster ? BR, Wasim

Re: A few questions about Hadoop and hard-drive failure handling.

2009-07-23 Thread Todd Lipcon
On Thu, Jul 23, 2009 at 11:56 AM, Ryan Smith ryan.justin.sm...@gmail.comwrote: I was wondering if someone could give me some answers or maybe some pointers where to look in the code. All these questions are in the same vein of hard drive failure. Question 1: If a master (system

Questions on How the Namenode Assign Blocks to Datanodes

2009-07-23 Thread Boyu Zhang
Dear All, I have a question in my mind about HDFS and I cannot find the answer from the documents on the apache website. I have a cluster of 4 machines, one is the namenode and the other 3 are datanodes. When I put 6 files, each 430 MB, to HDFS, the 6 files are split into 42 blocks(64MB each).

Re: best way to set memory

2009-07-23 Thread Allen Wittenauer
FWIW, we actually push a completely separate config to the name node, jt, etc, because of some of the other settings (like saves and dfs.[in|ex]cludes). But if you wanted to do an all-in-one, well... Hmm. Looking at the code, this worked differently than I always thought it did (at least in

Re: Testing Mappers in Hadoop 0.20.0

2009-07-23 Thread Aaron Kimball
I hacked around this in MRUnit last night. MRUnit now has support for the new API -- See MAPREDUCE-800. You can, in fact, subclass Mapper.Context and Reducer.Context, since they don't actually share any state with the outer class Mapper/Reducer implementation, just the type signatures. But doing

hdfs question when replacing dead node...

2009-07-23 Thread Andy Sautins
I recently had to replace a node on a hadoop 0.20.0 4-node cluster and I can't quite explain what happened. If anyone has any insight I'd appreciate it. When the node failed ( drive failure ) running the command 'hadoop fsck /' correctly showed the data nodes to now be 3 instead of 4

Why only few map tasks are running at a time inspite of plenty of scope for remaining?

2009-07-23 Thread akhil1988
Hi all, I am using a HTable as input to my map jobs and my reducer outputs to another Htable. There are 10 regions of my input HTable. And I have set conf.set(mapred.tasktracker.map.tasks.maximum, 2); conf.set(mapred.tasktracker.map.tasks.maximum, 2);

Re: Remote access to cluster using user as hadoop

2009-07-23 Thread Ted Dunning
Interesting approach. My guess is that this would indeed protect the datanodes from accidental attack by stopping access before they are involved. You might also consider just changing the name of the magic hadoop user to something that is more unlikely. The name hadoop is not far off what

Re: hdfs question when replacing dead node...

2009-07-23 Thread Raghu Angadi
The block reports are every hour by default. They should not cause any false negatives for replication on NN. Andy's observation is not expected AFIK. Andy, please check if you can repeat it.. if it happens again, please file a jira and you can attach your relevant log files there. We have

RE: hdfs question when replacing dead node...

2009-07-23 Thread Andy Sautins
Thanks for the help. It's quite possible I didn't quite happen as it appeared. I will try to reproduce. Thanks again. Andy -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Thursday, July 23, 2009 9:48 PM To: common-user@hadoop.apache.org Subject: