Application errors with one disk on datanode getting filled up to 100%

2013-06-10 Thread Mayank
We are running a hadoop cluster with 10 datanodes and a namenode. Each datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each disk having a capacity 414GB. hdfs-site.xml has following property set: property namedfs.data.dir/name

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-10 Thread Nitin Pawar
when you say application errors out .. does that mean your mapreduce job is erroring? In that case apart from hdfs space you will need to look at mapred tmp directory space as well. you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a replication factor of 3 so at max you will

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-10 Thread Mayank
No it's not a map-reduce job. We've a java app running on around 80 machines which writes to hdfs. The error that I'd mentioned is being thrown by the application and yes we've replication factor set to 3 and following is status of hdfs: Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS

Bad datanode error while writing to datanodes

2013-06-10 Thread prem yadav
Hi, This is one issue that lately we have seen quite frequently. We have a 10 data node cluster with each data node running with 1 GB memory. Total disk space is about 17 TB out of which 12 TB are full. Hadoop version-1.0.4 Each of the datanodes have 4 disks attached to them which we have

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-10 Thread Nitin Pawar
From the snapshot, you got around 3TB for writing data. Can you check individual datanode's storage health. As you said you got 80 servers writing parallely to hdfs, I am not sure can that be an issue. As suggested in past threads, you can do a rebalance of the blocks but that will take some time

Re: How to run MIA Kmean Code.

2013-06-10 Thread Nirmal Kumar
http://stackoverflow.com/questions/11479600/how-do-i-build-run-this-simple-mahout-program-without-getting-exceptions From: Apurv Khare apurv.kh...@lntinfotech.com To: mahout-u...@apache.org mahout-u...@apache.org; user@hadoop.apache.org user@hadoop.apache.org

hdfs -copyToLocal file permission

2013-06-10 Thread Abhishek Gayakwad
Hi, While copying file from hdfs file permissions are getting changed, my assumption was that while copying permissions should be retained. Is this behavior correct ? [abhishek@int019 ~]$ hdfs dfs -ls drwxr-xr-x - abhishek abhishek 0 2013-06-06 10:14 in-dir1 [abhishek@int019 ~]$

ALL HDFS Blocks on the Same Machine if Replication factor = 1

2013-06-10 Thread Razen Al Harbi
Hello, I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked. For higher replication factor, I see the same behavior but

Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

2013-06-10 Thread Kai Voigt
Hello, Am 10.06.2013 um 15:36 schrieb Razen Al Harbi razen.alha...@gmail.com: I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put

Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

2013-06-10 Thread Daryn Sharp
It's normal. The default placement strategy stores the first block on the same node for performance, then choses a second random node on another rack, then a third node on the same rack as the second node. Using a replication factor of 1 is not advised if you value your data. However, if you

Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

2013-06-10 Thread Shahab Yunus
Yeah Kai si right. You can read more details for your understanding at: http://hadoop.apache.org/docs/stable/hdfs_design.html#Data+Replication and right from the horse's mouth (Pgs 70-75):

Re: hdfs -copyToLocal file permission

2013-06-10 Thread Daryn Sharp
No, permissions are not preserved. FsShell copy commands match *nix's cp behavior by using the current umask (the conf setting you cited) for the permissions of new file. Perhaps we could add a -p option, like the *nix cp, to preserve permissions. In the meantime you can set the umask if

gz containing null chars?

2013-06-10 Thread William Oberman
I posted this to the pig mailing list, but it might be more related to hadoop itself, I'm not sure. Quick recap: I had a file of \n separated lines of JSON. I decided to compress it to save on storage costs. After compression I got a different answer for a pig query that basically == count

subscribe

2013-06-10 Thread Nathan Bamford
Nathan Bamford Engineer, RedPoint Global Inc. 1515 Walnut Street | Suite 200 | Boulder, CO 80302 T: +1 303 541 1518 | F: +1 720 294 8344 Skype: natebamford | nathan.bamf...@redpoint.netmailto:nathan.bamf...@redpoint.net | www.redpoint.nethttp://www.redpoint.net/ This email is confidential and

Re: Job History files location of 2.0.4

2013-06-10 Thread Shahab Yunus
Thanks for letting us know the solution. Regards, Shahab On Mon, Jun 10, 2013 at 3:40 PM, Boyu Zhang boyuzhan...@gmail.com wrote: I solved the problem, in my case, it is because I did not start the job history server daemon. After starting it, the history logs are generated in 2 places: the

Re: gz containing null chars?

2013-06-10 Thread Niels Basjes
My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the null byte and the basic record reader that follows simply continues. In this situation your input file

Re: Resource manager question - Yarn

2013-06-10 Thread Arun C Murthy
The reason is that you don't want a socket to the RM machine to be held up - we expect to have thousands of AMs running concurrently. Essentially a blocking call could result in a DoS attack on the RM. Eventually, YARN will move to async RPC and won't have this quirk. Arun On Jun 7, 2013, at

Re: hadoop 2.0 client configuration

2013-06-10 Thread Azuryy Yu
if you want work with HA, yes , all these configuration needed. --Send from my Sony mobile. On Jun 11, 2013 8:05 AM, Praveen M lefthandma...@gmail.com wrote: Hello, I'm a hadoop n00b, and I had recently upgraded from hadoop 0.20.2 to hadoop 2 (chd-4.2.1) For a client configuration to

SSD support in HDFS

2013-06-10 Thread Lucas Stanley
Hi, Is it possible to tell Apache HDFS to store some files on SSD and the rest of the files on spinning disks? So if each on my nodes has 1 SSD and 5 spinning disks, can I configure a directory in HDFS to put all files in that dir on the SSD? I think Intel's Hadoop distribution is working on