Re: Hadoop Node Monitoring

2009-11-11 Thread Kevin Sweeney
Nagios is always a good start. This webcast has some good information on this subject: http://www.cloudera .com/blog/2009/11/09/hadoop-world-monitoring-best-practices-from-ed-capriolo/ On Wed, Nov

Hadoop Node Monitoring

2009-11-11 Thread John Martyniak
Is there a good solution for Hadoop node monitoring? I know that Cacti and Ganglia are probably the two big ones, but are they the best ones to use? Easiest to setup? Most thorough reporting, etc. I started to play with Ganglia, and the install is crazy, I am installing it on CentOS and h

Re: Could not find any valid local directory for taskTracker

2009-11-11 Thread Amareshwari Sri Ramadasu
This would happen when there is not enough space any of the local directories. -Amareshwari On 11/11/09 11:03 PM, "Saju K K" wrote: Hi, Did you get a solution for this problem ,we are facing a similar problem saju Pallavi Palleti wrote: > > Hi, > I got below error while running my hadoop

Re: log.tmp does not exist, running hadoop in pseudo distributed

2009-11-11 Thread Ahmad Ali Iqbal
Mike, Thanks for your response. I am able to fix it. The problem I figured out was same as I assumed in my last email i.e. directory access issue was creating problems. Cheers, Ahmad On Thu, Nov 12, 2009 at 1:03 PM, Ahmad Ali Iqbal wrote: > Thank you mike, > > Infact running any sample code giv

Re: log.tmp does not exist, running hadoop in pseudo distributed

2009-11-11 Thread Ahmad Ali Iqbal
Thank you mike, Infact running any sample code gives the same error. Yes, I already realized that output folder MUST be deleted before running it and I am doing it in every run. As far as concerned with the tmp directory, I issue the following command and get the output as follows; ah...@aai:/usr

Re: log.tmp does not exist, running hadoop in pseudo distributed

2009-11-11 Thread Mike Kendall
My first guess is that your tmp directory isn't set up correctly. Also, I don't know about WordCountv2 but the original wordcount needed an output directory passed along with an input directory (the directory has to not exist when you start the job). -mike On Wed, Nov 11, 2009 at 5:33 PM, Ahmad

log.tmp does not exist, running hadoop in pseudo distributed

2009-11-11 Thread Ahmad Ali Iqbal
Hi all, I am a new user of hadoop and trying to run the WordCount v2 example in a pseudo distributed operation given at http://hadoop.apache.org/common/docs/current/mapred_tutorial.htmlbut when I run it I

Re: User permissions on dfs ?

2009-11-11 Thread Raymond Jennings III
Ah okay, I was looking at the options for hadoop and it only shows "fs" and not "dfs" - now that I realize they are one in the same. Thanks! --- On Wed, 11/11/09, Allen Wittenauer wrote: > From: Allen Wittenauer > Subject: Re: User permissions on dfs ? > To: common-user@hadoop.apache.org > Da

Re: Hadoop NameNode not starting up

2009-11-11 Thread Edward Capriolo
The property you are going to need to set is dfs.name.dir ${hadoop.tmp.dir}/dfs/name Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the di

Re: Hadoop NameNode not starting up

2009-11-11 Thread Kaushal Amin
which configuration file? On Wed, Nov 11, 2009 at 1:50 PM, Edward Capriolo wrote: > Are you starting hadoop as a different user? > Maybe first time you are starting as user hadoop, now this time you > are starting as user root. > > Or as stated above something is cleaning out your /tmp. Use your

Re: Hadoop NameNode not starting up

2009-11-11 Thread Edward Capriolo
Are you starting hadoop as a different user? Maybe first time you are starting as user hadoop, now this time you are starting as user root. Or as stated above something is cleaning out your /tmp. Use your configuration files to have namenode write to a permanent place. Edward On Wed, Nov 11, 200

Re: Hadoop NameNode not starting up

2009-11-11 Thread Kaushal Amin
I am seeing following error in my NameNode log file. 2009-11-11 10:59:59,407 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. 2009-11-11 10:59:59,449 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.Inconsiste

Re: User permissions on dfs ?

2009-11-11 Thread Allen Wittenauer
On 11/11/09 8:50 AM, "Raymond Jennings III" wrote: > Is there a way that I can setup directories in dfs for individual users and > set the permissions such that only that user can read write such that if I do > a "hadoop dfs -ls" I would get "/user/user1 /user/user2 " etc each directory > only

Workflow Management Poll

2009-11-11 Thread Kevin Peterson
We're not very happy with our homegrown system for managing Hadoop jobs. I'm looking at existing tools vs. improving our own, and I wanted to get a feel for what others are using. http://www.misterpoll.com/polls/460631

Re: Lucene + Hadoop

2009-11-11 Thread Sagar
Checkout MultipleOutputFormat (it is same as per u r implementation ) Having separate index for author may not be a good idea. U can have one index for all authors and query it per author But, I m not sure of requirements -Sagar Hrishikesh Agashe wrote: Hi, I am trying to use Hadoop for Lucene

Re: Could not find any valid local directory for taskTracker

2009-11-11 Thread Saju K K
Hi, Did you get a solution for this problem ,we are facing a similar problem saju Pallavi Palleti wrote: > > Hi, > I got below error while running my hadoop task. But, when I tried after > few hours, it worked fine. > Can some one please tell me why this error occured? > > ERROR Below: >

Re: Java heap size increase caused MORE out of memory exceptions.

2009-11-11 Thread Edward Capriolo
On Wed, Nov 11, 2009 at 11:36 AM, John Clarke wrote: > Hi, > > I've been running our app on EC2 using the small instances and it's been > mostly fine. Very occasionally a task will die due to a heap out of memory > exception. So far these failed tasks have successfully been restarted by > Hadoop o

User permissions on dfs ?

2009-11-11 Thread Raymond Jennings III
Is there a way that I can setup directories in dfs for individual users and set the permissions such that only that user can read write such that if I do a "hadoop dfs -ls" I would get "/user/user1 /user/user2 " etc each directory only being able to read and write to by the respective user? I d

Java heap size increase caused MORE out of memory exceptions.

2009-11-11 Thread John Clarke
Hi, I've been running our app on EC2 using the small instances and it's been mostly fine. Very occasionally a task will die due to a heap out of memory exception. So far these failed tasks have successfully been restarted by Hadoop on other nodes and the job has run to completion. I want to know

Re: Automate EC2 cluster termination

2009-11-11 Thread John Clarke
I've never used Amazon Elastic MapReduce as we are trying to minimise costs but if I cant find a good way to solve my problem then I might reconsider. cheers, John 2009/11/10 Hitchcock, Andrew > Hi John, > > Have you considered Amazon Elastic MapReduce? (Disclaimer: I work on > Elastic MapRed

Re: Automate EC2 cluster termination

2009-11-11 Thread John Clarke
Hi Edmund, I'll look into what you suggested. Yes I'm aware of being able to use S3 directly but I had problems getting it working - I must try again. cheers John 2009/11/10 Edmund Kohlwey > You should be able to detect the status of the job in your java main() > method, just do either: job.wa

Re: NameNode/DataNode & JobTracker/TaskTracker

2009-11-11 Thread John Martyniak
Steve and Todd, Thanks for the info, it is very helpful. I am going to start to it set it up in this fashion, now that I have the cluster working correctly:) Good idea with the DNS entries, that will make it easier if I need to move them to dedicated boxes. -John On Nov 11, 2009, at 6

Re: NameNode/DataNode & JobTracker/TaskTracker

2009-11-11 Thread Steve Loughran
John Martyniak wrote: Thanks Todd. I wasn't sure if that is possible. But you pointed out an important point and that is it is just NN and JT that would run remotely. So in order to do this would I just install the complete hadoop instance on each one. And then would they be configed as ma

SocketTimeoutException: timeout while waiting for channel to be ready for read

2009-11-11 Thread Leon Mergen
Hello, The following problem has occured three times in the past 2 weeks for us now, twice of which in the last 24 hours. Our setup currently is a single server, which runs "everything": namenode, datanode and client. We're using JNI in the client as an interface to HDFS. It seems as if, afte