Couple of things:
1. Hadoop's strength is in data locality. So having most of your Hadoop heavy
lifting on local filesystem (HDFS where hadoop computation is shipped to the
nodes with the data).
2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the
resulting data back int
This is on a EMR cluster.
This does not work!
hadoop@ip-10-34-7-51:~$ hadoop dfs -mkdir hdfs://10.34.7.51:9000/user/foobar
mkdir: This file system object (hdfs://ip-10-34-7-51.ec2.internal:9000) does
not support access to the request path 'hdfs://10.34.7.51:9000/user/foobar' You
possibly called
Hi,
How do I know which file has this block that is causing a Distcp to fail?
Copy failed: java.io.IOException: Fail to get block MD5 for
blk_4722582869815671042_13395
at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:844)
at org.apache.hadoop.hdfs.DFSClient.getFi
So we had an NFS mount and a local disk for dfs.name.dir. Our NFS mount lost
connection a week ago. So the question is that if we now remount it and it has
data from back then, will the namenode detect the staleness and bring it in
sync with the other partition? Or should we manually delete the
When I use -overwrite everything gets copied over fine. And the files are not
corrupt.
When I use the -update option for distcp, I constantly get this WARN +
exception. What is it trying to do and what is failing?
11/08/23 22:43:06 WARN hdfs.DFSClient:
src=/analytics_hive_tables/web_etl_tables
Hi,
I'm trying to copy data from one cluster to the other. Without the -i option it
fails at a certain point so when I run with -i option it completes but I can be
sure that the entire contents are reliably copied.
The documentation for the -i option is not very good. Can someone explain if
the
You need to have the hdfs classes on your client machine, which will provide
you with the HDFS API to read/write from the cluster. Your client machine does
not have to be a part of the cluster.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
only asked questions.
From: Charles Gonçalves
To: hdfs-user@hadoop.apache.org; Ayon Sinha
Sent: Friday, June 17, 2011 10:07 AM
Subject: Re: unzip gz file in HDFS ?
I know that it's not perfect but you can always use unix pipe
;P
hdfs -cat X | gzip -d | hdfs -put - Y
Or somethin like that!
The hadoop dfs -cp or -mv seem like the perfect candidate to add an uncompress
option.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
From: Harsh J
To: Ayon Sinha
Cc: "hdfs-user@hadoop.apache.org"
Se
To: hdfs-user@hadoop.apache.org; Ayon Sinha
Sent: Friday, June 17, 2011 1:00 AM
Subject: Re: unzip gz file in HDFS ?
Ayon,
You can run an identity map job with no output compression set to it.
On Fri, Jun 17, 2011 at 12:59 PM, Ayon Sinha wrote:
> Is there a way to unzip a gzip file within H
Is there a way to unzip a gzip file within HDFS where source & target both live
on HDFS? I don't want to pull a large file to local and put it back.
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
Do newly created files get the default blocksize and old files remain the
same? Yes
Is there a way to change the blocksize of existing files? I have done this
using copy-out and copy back in script. Couldn't find a short-cut analogous to
setrep.
-Ayon
See My Photos on Flickr
Also check out
.
From: Yang Xiaoliang
To: hdfs-user@hadoop.apache.org; Ayon Sinha
Sent: Sunday, May 22, 2011 9:51 AM
Subject: Re: FileSystemCat ipc.Client: Retrying connect to server:
yes , I sure of that.
2011/5/23, Ayon Sinha :
> Is yours hdfs running?
>
> -Ayon
> See My Photos on F
Is yours hdfs running?
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.
From: Yang Xiaoliang
To: hdfs-user@hadoop.apache.org
Sent: Sunday, May 22, 2011 7:23 AM
Subject: FileSystemCat ipc.Client: Retrying connect to
Hello,
We are occasionally seeing that due to some reason (we are using a scribe
client) some files stay open for write even after the writing process has long
died. Is there a way on the HDFS side that we can do to flush and close these
files without having to restart the namenode?
Is this a pr
One thing to note is that the HDFS client code fetches the block directly from
the datanode after obtaining the location info from the name node. That way the
namenode does not become the bottleneck for all data transfers. The clients
only
get the information about the sequence and location fro
ot
be the reason. Even the in_use.lock is more than a month old. However, we did
shut it down few days ago and restarted it afterward. Then the second shutdown
might not be clean.
>
>
>
>On Tue, Apr 12, 2011 at 7:52 AM, Ayon Sinha wrote:
>
>The datanode used the dfs
The datanode used the dfs config xml file to tell the datanode process, what
disks are available for storage. Can you check that the config xml has all the
partitions mentioned and has not been overwritten during the restore process?
-Ayon
See My Photos on Flickr
Also check out my Blog for answe
I think your edits file is corrupt. Your best bet is to rename the edits.new
file on the namenode and try to see if the secondary node has a good copy of
it.
Loss of data is possible.
It happened to me and I did manage to start my namenode but lost some data.
-Ayon
_
as mainly worried if they all went to the same node - which
would be bad.
Take care,
-stu
____
From: Ayon Sinha
Date: Thu, 10 Mar 2011 07:41:17 -0800 (PST)
To:
ReplyTo: hdfs-user@hadoop.apache.org
Subject: Re: how does hdfs determine what node to use?
I think Stu m
I think Stu meant that each block will have a copy on at most 2 nodes.
Before Hadoop .20 rack awareness was not built-in the algo to pick the
replication node. With .20 and later, the rack awareness does the following:
1. First copy of the block is picked at "random" from one of the least loaded
Use this API
http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/FileSystem.html
The code is pretty straightforward.
-Ayon
From: Alessandro Binhara
To: hdfs-user@hadoop.apache.org
Sent: Fri, February 25, 2011 5:08:57 AM
Subject:
ze
>
> Neither one was working.
>
> Is there anything I can do? I always have problems like this in hdfs. It
> seems even experts are guessing at the answers :-/
>
>
> On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:
> conf/hdfs-site.xml
>
> restart dfs.
Is there anything I can do? I always have problems like this in hdfs. It
> seems even experts are guessing at the answers :-/
>
>
> On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:
> conf/hdfs-site.xml
>
> restart dfs. I believe it should be sufficient to restart the nam
always have problems like this in hdfs. It seems
even experts are guessing at the answers :-/
On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:
conf/hdfs-site.xml
>
>restart dfs. I believe it should be sufficient to restart the namenode only,
>but
>others can confi
conf/hdfs-site.xml
restart dfs. I believe it should be sufficient to restart the namenode only,
but
others can confirm.
-Ayon
From: Rita
To: hdfs-user@hadoop.apache.org
Sent: Thu, February 3, 2011 4:35:09 AM
Subject: changing the block size
Currently I am u
e cluster, so I think there must be a configuration
parameter
or some nobs that I can turn to make namenode serve files to client from
different network.
Felix
On Mon, Jan 31, 2011 at 2:03 PM, Ayon Sinha wrote:
Also, be careful about this when you try to connect to HDFS and it doesn't
respond. T
Also, be careful about this when you try to connect to HDFS and it doesn't
respond. There was a place in the code where it was hard-coded to rety 45 times
when there was a socketConnectExpectption trying every 15 secs. It was not (at
least on 0.18 version code I looked at) honoring the configura
You can run that code w/o running hadoop. As long as you have the libraries in
classpath and the code is doing FileSystem operations, it will execute in
standalone mode just fine. The last I remember the conf values from the client
would be default values of the HDFS version and will not get pic
We did the same exercise a few months back. When we run the balancer which
takes
a while to balance, it will balance based on the percentage of disk usage on
each node, so you will end up with usage of nodes between say 45-55% on all
nodes.
Sometimes the balancer does not balance well initially
Hi Matthew,
Congratulations. Having a HDFS back is quite a relief and you were lucky enough
to not loose any files/blocks.
Another thing I ended up doing was to decommission the namenode machine from
being a data node. That is what had caused the namenode to get out of disk
space.
-Ayon
_
Have you tried getting rid of the edits.new file completely (by renaming it to
something else)?
-Ayon
From: Matthew LeMieux
To: hdfs-user@hadoop.apache.org
Sent: Tue, October 5, 2010 10:14:21 AM
Subject: Re: NameNode crash - cannot start dfs - need help
No
Hi Matthew,
"(BTW, what if I didn't want to keep my recent edits, and just wanted to start
up the namenode? This is currently expensive downtime; I'd rather lose a
small
amount of data and be up and running than continue the down time). "
This was exactly my use-case as well. I chose small dat
We had almost exact problem of namenode filling up and namnode failing at this
exact same point. Since you have created space now you can copy over the
edits.new, fsimage and the other 2 files from your
/mnt/namesecondarynode/current and try restarting the namenode.
I believe you will loose some
Use the dfs.data.dir config property. You can use comma separated directories
to
use multiple partitions.
Also, specify the percentage of total space to use for HDFS using
the dfs.datanode.du.pct property.
-Ayon
From: Khaled BEN BAHRI
To: hdfs-user@hadoop.
Our Namenode ran out of diskspace and became unresponsive. Tried to bounce DFS
and it fails with the following error. How do I recover HDFS even at the cost
of
loosing a whole bunch on recent edits.
2010-09-28 00:45:00,777 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metri
From: Ayon Sinha
To: hdfs-user@hadoop.apache.org
Sent: Tue, August 17, 2010 3:49:56 PM
Subject: HDFS java client connect retry count
Hi,
I have a java HDFS client which connects to a production cluster and gets data.
On our staging environment we see that since it cannot connect to the namenode
Hi,
I have a java HDFS client which connects to a production cluster and gets data.
On our staging environment we see that since it cannot connect to the namenode
(expected) it keeps retrying for 45 times. I am looking for a way to set the
retry count to much much lower.
This is what we see in
I have a HDFS java client on the same network domain as the cluster which needs
to read the dfs file with permissions rw-r--r--
I get
java.io.IOException
at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:169)
at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.jav
39 matches
Mail list logo