We are running a hadoop cluster with 10 datanodes and a namenode. Each
datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each
disk having a capacity 414GB.
hdfs-site.xml has following property set:
property
namedfs.data.dir/name
when you say application errors out .. does that mean your mapreduce job is
erroring? In that case apart from hdfs space you will need to look at
mapred tmp directory space as well.
you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a
replication factor of 3 so at max you will
No it's not a map-reduce job. We've a java app running on around 80
machines which writes to hdfs. The error that I'd mentioned is being thrown
by the application and yes we've replication factor set to 3 and following
is status of hdfs:
Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS
Hi,
This is one issue that lately we have seen quite frequently. We have a 10
data node cluster with each data node running with 1 GB memory. Total disk
space is about 17 TB out of which 12 TB are full.
Hadoop version-1.0.4
Each of the datanodes have 4 disks attached to them which we have
From the snapshot, you got around 3TB for writing data.
Can you check individual datanode's storage health.
As you said you got 80 servers writing parallely to hdfs, I am not sure can
that be an issue.
As suggested in past threads, you can do a rebalance of the blocks but that
will take some time
http://stackoverflow.com/questions/11479600/how-do-i-build-run-this-simple-mahout-program-without-getting-exceptions
From: Apurv Khare apurv.kh...@lntinfotech.com
To: mahout-u...@apache.org mahout-u...@apache.org; user@hadoop.apache.org
user@hadoop.apache.org
Hi,
While copying file from hdfs file permissions are getting changed, my
assumption was that while copying permissions should be retained. Is this
behavior correct ?
[abhishek@int019 ~]$ hdfs dfs -ls
drwxr-xr-x - abhishek abhishek 0 2013-06-06 10:14 in-dir1
[abhishek@int019 ~]$
Hello,
I have deployed Hadoop on a cluster of 20 machines. I set the replication
factor to one. When I put a file (larger than HDFS block size) into HDFS,
all the blocks are stored on the machine where the Hadoop put command is
invoked.
For higher replication factor, I see the same behavior but
Hello,
Am 10.06.2013 um 15:36 schrieb Razen Al Harbi razen.alha...@gmail.com:
I have deployed Hadoop on a cluster of 20 machines. I set the replication
factor to one. When I put a file (larger than HDFS block size) into HDFS, all
the blocks are stored on the machine where the Hadoop put
It's normal. The default placement strategy stores the first block on the same
node for performance, then choses a second random node on another rack, then a
third node on the same rack as the second node. Using a replication factor of
1 is not advised if you value your data. However, if you
Yeah Kai si right.
You can read more details for your understanding at:
http://hadoop.apache.org/docs/stable/hdfs_design.html#Data+Replication
and right from the horse's mouth (Pgs 70-75):
No, permissions are not preserved. FsShell copy commands match *nix's cp
behavior by using the current umask (the conf setting you cited) for the
permissions of new file. Perhaps we could add a -p option, like the *nix cp,
to preserve permissions. In the meantime you can set the umask if
I posted this to the pig mailing list, but it might be more related to
hadoop itself, I'm not sure.
Quick recap: I had a file of \n separated lines of JSON. I decided to
compress it to save on storage costs. After compression I got a different
answer for a pig query that basically == count
Nathan Bamford
Engineer, RedPoint Global Inc.
1515 Walnut Street | Suite 200 | Boulder, CO 80302
T: +1 303 541 1518 | F: +1 720 294 8344
Skype: natebamford |
nathan.bamf...@redpoint.netmailto:nathan.bamf...@redpoint.net |
www.redpoint.nethttp://www.redpoint.net/
This email is confidential and
Thanks for letting us know the solution.
Regards,
Shahab
On Mon, Jun 10, 2013 at 3:40 PM, Boyu Zhang boyuzhan...@gmail.com wrote:
I solved the problem, in my case, it is because I did not start the job
history server daemon. After starting it, the history logs are generated in
2 places: the
My best guess is that at a low level a string is often terminated by having
a null byte at the end.
Perhaps that's where the difference lies.
Perhaps the gz decompressor simply stops at the null byte and the basic
record reader that follows simply continues.
In this situation your input file
The reason is that you don't want a socket to the RM machine to be held up - we
expect to have thousands of AMs running concurrently. Essentially a blocking
call could result in a DoS attack on the RM.
Eventually, YARN will move to async RPC and won't have this quirk.
Arun
On Jun 7, 2013, at
if you want work with HA, yes , all these configuration needed.
--Send from my Sony mobile.
On Jun 11, 2013 8:05 AM, Praveen M lefthandma...@gmail.com wrote:
Hello,
I'm a hadoop n00b, and I had recently upgraded from hadoop 0.20.2 to
hadoop 2 (chd-4.2.1)
For a client configuration to
Hi,
Is it possible to tell Apache HDFS to store some files on SSD and the rest
of the files on spinning disks?
So if each on my nodes has 1 SSD and 5 spinning disks, can I configure a
directory in HDFS to put all files in that dir on the SSD?
I think Intel's Hadoop distribution is working on
19 matches
Mail list logo