Best way to handle namespace host failures

2008-11-10 Thread Goel, Ankur
Hi Folks, I am looking for some advice on some the ways / techniques that people are using to get around namenode failures (Both disk and host). We have a small cluster with several job scheduled for periodic execution on the same host where name server runs. What we would like to

Is it possible to avoid replication caused by data node decommission?

2008-11-10 Thread Jinyeon Lee
Hello, Is it possible to avoid replication caused by data node decommission? I want to stop data nodes without moving or copying data blocks even though they have smaller replication factor.

Re: Best way to handle namespace host failures

2008-11-10 Thread Sharad Agarwal
Goel, Ankur wrote: Hi Folks, I am looking for some advice on some the ways / techniques that people are using to get around namenode failures (Both disk and host). We have a small cluster with several job scheduled for periodic execution on the same host where name server runs.

Re: Dynamically terminate a job once Reporter hits a threshold

2008-11-10 Thread Aaron Kimball
Out of curiosity, how reliable are the counters from the perspective of the JobClient while the job is in progress? While hitting 'refresh' on the status web page for a job, I notice that my counters bounce all over the place, showing wildly different figures second-to-second. Is that using a

RE: hadoop start problem

2008-11-10 Thread Brian MacKay
I had a similar problem when I upgraded... not sure of details why, but I had permissions problems trying to develop and run on windows out of cygwin. I found that in cygwin if I ran under my account, I got the null pointer exception, but if I shh localhost first, then format the name node,

Re: File Descriptors not cleaned up

2008-11-10 Thread Jason Venner
We have just realized one reason for the '/no live node contains block/' error from /DFSClient/ is an indication that the /DFSClient/ was unable to open a connection due to insufficient available file descriptors. FsShell is particularly bad about consuming descriptors and leaving the

Re: NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-10 Thread Steve Loughran
C G wrote: I've got a grid which has been up and running for some time. It's been using a 32 bit JVM. I am hitting the wall on memory within NameNode and need to specify max heap size 4G. Is it possible to switch seemlessly from 32bit JVM to 64bit? I've tried this on a small test grid and

Re: Best way to handle namespace host failures

2008-11-10 Thread Amar Kamat
Goel, Ankur wrote: Hi Folks, I am looking for some advice on some the ways / techniques that people are using to get around namenode failures (Both disk and host). We have a small cluster with several job scheduled for periodic execution on the same host where name server runs.

RE: Best way to handle namespace host failures

2008-11-10 Thread Goel, Ankur
Thanks for the replies folks. We are not seeing this frequently but we just want to avoid single point of failure and keep the manual intervention to the min. or at best none. This is to ensure that system runs smoothly in production without abrupt failures. Thanks -Ankur -Original

RE: Dynamically terminate a job once Reporter hits a threshold

2008-11-10 Thread Brian MacKay
Thanks Arun for your tip. This morning I changed to submitJob and polled. It worked very well, and you saved me some trial and error. -Original Message- From: Aaron Kimball [mailto:[EMAIL PROTECTED] Sent: Monday, November 10, 2008 4:35 AM To: core-user@hadoop.apache.org Subject:

Customized InputFormat Problem

2008-11-10 Thread ZhiHong Fu
Hello, I am doing a task, whick will read dbRecord data from web service, and then I will build index on them,But you see, inside the hadoop , The inputFormat is based on the FileInputFormat, So now I have to rewrite my dbRecordInputFormat , And I do it like this: import

Re: hadoop start problem

2008-11-10 Thread Aaron Kimball
Between 0.15 and 0.18 the format for fs.default.name has changed; you should set the value there as hdfs://localhost:9000/ without the quotes. It still shouldn't give you a NPE (that should probably get a JIRA entry) under any circumstances, but putting a value in the (new) proper format might

Re: NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-10 Thread Aaron Kimball
Allen, It sounds like you think the 64- and 32-bit environments are effectively interchangable. May I ask why are you using both? The 64bit environment gives you access to more memory; do you see faster performance for the TT's in 32-bit mode? Do you get bit by library compatibility bugs that

Re: hadoop with tomcat

2008-11-10 Thread Alex Loddengaard
Do you know about the jobtracker page? Visit http://yournamenode:50030. This page (served by Jetty) gives you statistics about your cluster and each MR job. Alex On Sun, Nov 9, 2008 at 11:33 PM, ZhiHong Fu [EMAIL PROTECTED] wrote: Hello: I have implemented a Map/Reduce job, which will

Re: Best way to handle namespace host failures

2008-11-10 Thread Alex Loddengaard
There has been a lot of discussion on this list about handling namenode failover. Generally the most common approach is to backup the namenode to an NFS mount and manually instantiate a new namenode when your current namenode fails. As Hadoop exists today, the namenode is a single point of

Re: NameNode memory usage and 32 vs. 64 bit JVMs

2008-11-10 Thread Allen Wittenauer
On 11/10/08 1:30 AM, Aaron Kimball [EMAIL PROTECTED] wrote: It sounds like you think the 64- and 32-bit environments are effectively interchangable. May I ask why are you using both? The 64bit environment gives you access to more memory; do you see faster performance for the TT's in 32-bit

Re: hadoop start problem

2008-11-10 Thread Allen Wittenauer
On 11/10/08 6:18 AM, Brian MacKay [EMAIL PROTECTED] wrote: I had a similar problem when I upgraded... not sure of details why, but I had permissions problems trying to develop and run on windows out of cygwin. At Apachecon, we think we identified a case where someone forgot to copy the

Can you specify the user on the map-reduce cluster in Hadoop streaming

2008-11-10 Thread Rick Hangartner
Hi, To make a Hadoop/MapReduce available for developers to experiment with, we are setting up a cluster with Hadoop/MapReduce and a dataset, and providing instructions how developers can use streaming to submit jobs from their own machines. For purposes of explanation here, we can assume

Re: Can you specify the user on the map-reduce cluster in Hadoop streaming

2008-11-10 Thread Allen Wittenauer
On 11/10/08 12:21 PM, Rick Hangartner [EMAIL PROTECTED] wrote: But is there a proper way to allow developers to specify a remote_username they legitimately have access to on the cluster if it is not the same as the local_username of the account on their own machine they are using to submit

how many maps in a map task?

2008-11-10 Thread ma qiang
hi all, I hava a data set stored in hbase, and I run a mapreduce program to analyze. Now I want to know how many maps in a map task? I want to use the number of the maps in my program. For example. There are 100 maps in a map task, and I want to collect all the values, and analyze these

Mapper value is null problem

2008-11-10 Thread ZhiHong Fu
Hello: I have customized a DbRecordAndOpInputFormt which will retrieve data from several web Services And the Data format is like the dataItem in Database ResultSets. And Now I have encountered a problem, I get right (key,value) in DbRecordReader next() method, But In Mapper

Re: how many maps in a map task?

2008-11-10 Thread Mice
it there a reducer in your program? or you need to output the result in map-side? 2008/11/11 ma qiang [EMAIL PROTECTED]: hi all, I hava a data set stored in hbase, and I run a mapreduce program to analyze. Now I want to know how many maps in a map task? I want to use the number of the

Re: how many maps in a map task?

2008-11-10 Thread ma qiang
yes, It need further analyze in reducer. On Tue, Nov 11, 2008 at 10:28 AM, Mice [EMAIL PROTECTED] wrote: it there a reducer in your program? or you need to output the result in map-side? 2008/11/11 ma qiang [EMAIL PROTECTED]: hi all, I hava a data set stored in hbase, and I run a

Passing information from one job to the next in a JobControl

2008-11-10 Thread Saptarshi Guha
Hello, I am using JobControl to run a sequence of jobs(Job_1,Job_2,..Job_n) on after the other. Each job returns some information e.g key1 value1,value2 key2 value1,value2 and so on. This can be found in the outdir passed to the jar file. Is there a way for Job_1 to return some data (which can be

RE: Best way to handle namespace host failures

2008-11-10 Thread Goel, Ankur
In case we are starting namenode on a different host, the configuration on all the cluster nodes will need to be updated before a cluster restart. right? -Original Message- From: Alex Loddengaard [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 11, 2008 12:07 AM To:

Re: Customized InputFormat Problem

2008-11-10 Thread Sharad Agarwal
But when I run , It will throw the exception in DbRecordReader.next() method, Although I have Logged in it, I can't still see anything, and don't know where I shoud to check, who can help me where I can get the real excution status, so I can where the error is ! Thansks! Check the logs

Re: Best way to handle namespace host failures

2008-11-10 Thread Dhruba Borthakur
Couple of things that one can do: 1. dfs.name.dir should have at least two locations, one on the local disk and one on NFS. This means that all transactions are synchronously logged into two places. 2. Create a virtual IP, say name.xx.com that points to the real machine name of the machine on