RE: Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-08 Thread Sagar Kohli
Hi Adarsh, Try this link http://shuyo.wordpress.com/2011/03/08/hadoop-development-environment-with-eclipse/ regards Sagar From: Adarsh Sharma [adarsh.sha...@orkash.com] Sent: Friday, April 08, 2011 9:45 AM To: common-user@hadoop.apache.org Subject:

Fw: start-up with safe mode?

2011-04-08 Thread springring
Hi, When I start up hadoop, the namenode log show STATE* Safe mode ON like that , how to set it off? I can set it off with command hadoop fs -dfsadmin leave after start up, but how can I just start HDFS out of Safe mode? Thanks. Ring the startup

Input Sampler with Custom Key Type

2011-04-08 Thread Meena_86
Hi, I am a beginner in Hadoop Map Reduce. Please redirect me if I am not posting in the correct forum. I have created my own Key Type which implements from WritableComparable. I would like to use TotalOrderPartitioner with this Key and Text as Value. But I keep encountering errors when the

Re:HDFS start-up with safe mode?

2011-04-08 Thread springring
Hi, I guess that something about threshold 0.9990. When HDFS start up, it come in safe mode first, then check a value(I don't know what value or percent?) of my hadoop,and fine the value below 99.9%, so the safe mode will not turn off? but the conclusion of the log file is Safe mode

Re: Re:HDFS start-up with safe mode?

2011-04-08 Thread springring
I modify the value of dfs.safemode.threshold.pct to zero, now everything is ok. log file as below But there are still three questions 1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement to 99.9%? hadoop balancer? For I feel it will be more

Re: Developing, Testing, Distributing

2011-04-08 Thread Ankur C. Goel
Did you try Pig ? It drastically reduces boiler plate code for common operations like, Join, Group, Cogroup, Filter, Projection, Order Pig also gives you some advanced stuff like Multi-query optimization that is ugly to code my hand and difficult to maintain. Most of the people that I know,

Re: Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-08 Thread Adarsh Sharma
Thanks Sagar, But is the process is same for Eclipse Helios that is used for running C++ Code. I am not able to locate Map-Reduce Perspective in it. Best Regards Adarsh Sagar Kohli wrote: Hi Adarsh, Try this link

Re: Configuring Hadoop With Eclipse Environment for C++ CDT Code

2011-04-08 Thread Adarsh Sharma
Thanks sagar, But is the process is same for Eclipse Helios that is used for running C++ code. I am not able to locate Map-reduce Perspective in it. Best Regards, Adarsh SHarma Sagar Kohli wrote: Hi Adarsh, Try this link

Reg HDFS checksum

2011-04-08 Thread Thamizh
Hi All, This is question regarding HDFS checksum computation. I understood that When we read a file from HDFS by default it verifies the checksum and your read would not succeed if the file is corrupted. Also CRC is internal to hadoop. Here are my questions: 1. How can I use hadoop dfs -get

available hadoop logs

2011-04-08 Thread bikash sharma
Hi, For research purpose, I need some real Hadoop MapReduce job traces (ideally both inter and intra-job(in terms of Hadoop job configuration parameters like mapred.io.sort.factor)). Is there some freely available Hadoop traces corresponding to some real large setup? Thanks, Bikash

Re: How is hadoop going to handle the next generation disks?

2011-04-08 Thread sridhar basam
How many files do you have per node? What i find is that most of my inodes/dentries are almost always cached so calculating the 'du -sk' on a host even with hundreds of thousands of files the du -sk generally uses high i/o for a couple of seconds. I am using 2TB disks too. Sridhar On Fri, Apr

Re: How is hadoop going to handle the next generation disks?

2011-04-08 Thread sridhar basam
BTW this is on systems which have a lot of RAM and aren't under high load. If you find that your system is evicting dentries/inodes from its cache, you might want to experiment with drop vm.vfs_cache_pressure from its default so that the they are preferred over the pagecache. At the extreme,

Re: 0.21.0 - Java Class Error

2011-04-08 Thread Tom White
Hi Witold, Is this on Windows? The scripts were re-structured after Hadoop 0.20, and looking at them now I notice that the cygwin path translation for the classpath seems to be missing. You could try adding the following line to the if $cygwin clause in bin/hadoop-config.sh: CLASSPATH=`cygpath

Re: Developing, Testing, Distributing

2011-04-08 Thread W.P. McNeill
I use IntelliJ, though Eclipse works too. I don't have any Hadoop-specific plug-ins; both IDEs are just set up as vanilla Java programming environments. Chapter 5 of *Hadoop: the Definitive Guidehttp://www.librarything.com/work/8488103 *has a good overview of testing methodology. It's what I

Re: Re:HDFS start-up with safe mode?

2011-04-08 Thread Harsh J
Hello, I'm not quite clear why you'd want to disable a consistency check such as the safemode feature. It is to guarantee that your DFS is to be made ready only after it has sufficient blocks reported to start handling your dfs requests. If your NN ever goes into safemode later, it is vital that

Re: How is hadoop going to handle the next generation disks?

2011-04-08 Thread Edward Capriolo
On Fri, Apr 8, 2011 at 12:24 PM, sridhar basam s...@basam.org wrote: BTW this is on systems which have a lot of RAM and aren't under high load. If you find that your system is evicting dentries/inodes from its cache, you might want to experiment with drop vm.vfs_cache_pressure from its default

Re: Developing, Testing, Distributing

2011-04-08 Thread Tsz Wo (Nicholas), Sze
(Resent with -hadoopuser. Apologize if you receive multiple copies.) From: Tsz Wo (Nicholas), Sze s29752-hadoopgene...@yahoo.com To: common-user@hadoop.apache.org Sent: Fri, April 8, 2011 11:08:22 AM Subject: Re: Developing, Testing, Distributing First of

Re: How is hadoop going to handle the next generation disks?

2011-04-08 Thread sridhar basam
On Fri, Apr 8, 2011 at 1:59 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Right. Most inodes are always cached when: 1) small disks 2) light load. But that is not the case with hadoop. Making the problem worse: It seems like hadoop seems to issues 'du -sk' for all disks at the same

Re: start-up with safe mode?

2011-04-08 Thread Matthew Foley
Hi Ring, The purpose of starting up with safe mode enabled, is to prevent replication thrashing before and during Initial Block Reports from all the datanodes. Consider this thought experiment: - a cluster with 100 datanodes and replication 3 - so any pair of datanodes only have aprx 2%

Re: 0.21.0 - Java Class Error

2011-04-08 Thread Witold Januszewski
Hi Tom, Thanks for your anwser. Yet after adding the suggested line I get another bug (enclosed in .png). Yes, it's Windows 7. I'm considering switching to Linux now :-) There's also a virtual machine with Hadoop provided by Yahoo!, but I don't know if I can install Mahout there. That's my goal

Re: HDFS start-up with safe mode?

2011-04-08 Thread Matthew Foley
1.. Can I regain percentage of blocks that should satisfy the minimal replication requirement to 99.9%? hadoop balancer? For I feel it will be more safe. 2. I set dfs.safemode.threshold.pct to 0 or 0f, two value both work, but which one is better? I guess 0 3.

Re: Developing, Testing, Distributing

2011-04-08 Thread Tsz Wo (Nicholas), Sze
First of all, I am a Hadoop contributor and I am familiar with the Hadoop code base/build mechanism. Here is what I do: Q1: What IDE you are using, Eclipse. Q2: What plugins to the IDE you are using No plugins. Q3: How do you test your code, which Unit test libraries your using, how do

Tasks failing with OutOfMemory after jvm upgrade to 1.6.0_18 or later

2011-04-08 Thread Koji Noguchi
This is technically a java issue but I thought other hadoop users would find it interesting. When we upgraded from old jvm to a newer (32bit) jvm of 1.6.0_21, we started seeing users' tasks having issues at random places 1. Tasks running 2-3 times slower 2. Tasks failing with OutOfMemory 3.

‏‏RE: Developing, Testing, Distributing

2011-04-08 Thread Guy Doulberg
Thanks, I think I will try your way of developing (replacing the ant) ‏‏מאת: Tsz Wo (Nicholas), Sze [s29752-hadoopgene...@yahoo.com] ‏‏נשלח: ‏‏יום שישי 08 אפריל 2011 21:08 ‏‏אל: common-user@hadoop.apache.org ‏‏נושא: Re: Developing, Testing, Distributing

Shared lib?

2011-04-08 Thread Mark
If I have some jars that I would like including into ALL of my jobs is there some shared lib directory on HDFS that will be available to all of my nodes? Something similar to how Oozie uses a shared lib directory when submitting workflows. As of right now I've been cheating and copying these