Re: hadoop cluster for querying data on mongodb

2011-12-20 Thread Ayon Sinha
Couple of things: 1. Hadoop's strength is in data locality. So having most of your Hadoop heavy lifting on local filesystem (HDFS where hadoop computation is shipped to the nodes with the data). 2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the resulting data back int

HDFS on EMR weird.

2011-12-05 Thread Ayon Sinha
This is on a EMR cluster. This does not work! hadoop@ip-10-34-7-51:~$ hadoop dfs -mkdir hdfs://10.34.7.51:9000/user/foobar mkdir: This file system object (hdfs://ip-10-34-7-51.ec2.internal:9000) does not support access to the request path 'hdfs://10.34.7.51:9000/user/foobar' You possibly called

How to find file with bad block during Distcp

2011-11-07 Thread Ayon Sinha
Hi, How do I know which file has this block that is causing a Distcp to fail? Copy failed: java.io.IOException: Fail to get block MD5 for blk_4722582869815671042_13395         at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:844)         at org.apache.hadoop.hdfs.DFSClient.getFi

one namenode partition went down. The right way to reconnect?

2011-08-25 Thread Ayon Sinha
So we had an NFS mount and a local disk for dfs.name.dir. Our NFS mount lost connection a week ago. So the question is that if we now remount it and it has data from back then, will the namenode detect the staleness and bring it in sync with the other partition? Or should we manually delete the

distcp -update option problem

2011-08-23 Thread Ayon Sinha
When I use -overwrite everything gets copied over fine. And the files are not corrupt. When I use the -update option for distcp, I constantly get this WARN + exception. What is it trying to do and what is failing? 11/08/23 22:43:06 WARN hdfs.DFSClient: src=/analytics_hive_tables/web_etl_tables

distcp -i option help

2011-08-15 Thread Ayon Sinha
Hi, I'm trying to copy data from one cluster to the other. Without the -i option it fails at a certain point so when I run with -i option it completes but I can be sure that the entire contents are reliably copied. The documentation for the -i option is not very good. Can someone explain if the

Re: access HDFS file from outside the Hadoop Cluster

2011-07-24 Thread Ayon Sinha
You need to have the hdfs classes on your client machine, which will provide you with the HDFS API to read/write from the cluster. Your client machine does not have to be a part of the cluster.   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions.

Re: unzip gz file in HDFS ?

2011-06-17 Thread Ayon Sinha
only asked questions. From: Charles Gonçalves To: hdfs-user@hadoop.apache.org; Ayon Sinha Sent: Friday, June 17, 2011 10:07 AM Subject: Re: unzip gz file in HDFS ? I know  that it's not perfect but you can always use unix pipe   ;P hdfs -cat X | gzip -d | hdfs -put -  Y Or somethin like that!

Re: unzip gz file in HDFS ?

2011-06-17 Thread Ayon Sinha
The hadoop dfs -cp or -mv seem like the perfect candidate to add an uncompress option.   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Harsh J To: Ayon Sinha Cc: "hdfs-user@hadoop.apache.org" Se

Re: unzip gz file in HDFS ?

2011-06-17 Thread Ayon Sinha
To: hdfs-user@hadoop.apache.org; Ayon Sinha Sent: Friday, June 17, 2011 1:00 AM Subject: Re: unzip gz file in HDFS ? Ayon, You can run an identity map job with no output compression set to it. On Fri, Jun 17, 2011 at 12:59 PM, Ayon Sinha wrote: > Is there a way to unzip a gzip file within H

unzip gz file in HDFS ?

2011-06-17 Thread Ayon Sinha
Is there a way to unzip a gzip file within HDFS where source & target both live on HDFS? I don't want to pull a large file to local and put it back.    -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions.

Re: Changing dfs.block.size

2011-06-06 Thread Ayon Sinha
 Do newly created files get the default blocksize and old files remain the same? Yes Is there a way to change the blocksize of existing files? I have done this using copy-out and copy back in script. Couldn't find a short-cut analogous to setrep.   -Ayon See My Photos on Flickr Also check out

Re: FileSystemCat ipc.Client: Retrying connect to server:

2011-05-22 Thread Ayon Sinha
. From: Yang Xiaoliang To: hdfs-user@hadoop.apache.org; Ayon Sinha Sent: Sunday, May 22, 2011 9:51 AM Subject: Re: FileSystemCat ipc.Client: Retrying connect to server: yes , I sure of that. 2011/5/23, Ayon Sinha : > Is yours hdfs running? > > -Ayon > See My Photos on F

Re: FileSystemCat ipc.Client: Retrying connect to server:

2011-05-22 Thread Ayon Sinha
Is yours hdfs running?   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: Yang Xiaoliang To: hdfs-user@hadoop.apache.org Sent: Sunday, May 22, 2011 7:23 AM Subject: FileSystemCat ipc.Client: Retrying connect to

Incomplete HDFS file written by Scribe

2011-04-20 Thread Ayon Sinha
Hello, We are occasionally seeing that due to some reason (we are using a scribe client) some files stay open for write even after the writing process has long died. Is there a way on the HDFS side that we can do to flush and close these files without having to restart the namenode? Is this a pr

Re: Merging of files back in hadoop

2011-04-19 Thread Ayon Sinha
One thing to note is that the HDFS client code fetches the block directly from the datanode after obtaining the location info from the name node. That way the namenode does not become the bottleneck for all data transfers. The clients only get the information about the sequence and location fro

Re: Question regarding datanode been wiped by hadoop

2011-04-12 Thread Ayon Sinha
ot be the reason. Even the in_use.lock is more than a month old. However, we did shut it down few days ago and restarted it afterward. Then the second shutdown might not be clean. > > > >On Tue, Apr 12, 2011 at 7:52 AM, Ayon Sinha wrote: > >The datanode used the dfs

Re: Question regarding datanode been wiped by hadoop

2011-04-12 Thread Ayon Sinha
The datanode used the dfs config xml file to tell the datanode process, what disks are available for storage. Can you check that the config xml has all the partitions mentioned and has not been overwritten during the restore process? -Ayon See My Photos on Flickr Also check out my Blog for answe

Re: Error Lunching NameNode

2011-03-12 Thread Ayon Sinha
I think your edits file is corrupt. Your best bet is to rename the edits.new file on the namenode and try to see if the secondary node has a good copy of it. Loss of data is possible. It happened to me and I did manage to start my namenode but lost some data. -Ayon _

Re: how does hdfs determine what node to use?

2011-03-10 Thread Ayon Sinha
as mainly worried if they all went to the same node - which would be bad. Take care, -stu ____ From: Ayon Sinha Date: Thu, 10 Mar 2011 07:41:17 -0800 (PST) To: ReplyTo: hdfs-user@hadoop.apache.org Subject: Re: how does hdfs determine what node to use? I think Stu m

Re: how does hdfs determine what node to use?

2011-03-10 Thread Ayon Sinha
I think Stu meant that each block will have a copy on at most 2 nodes. Before Hadoop .20 rack awareness was not built-in the algo to pick the replication node. With .20 and later, the rack awareness does the following: 1. First copy of the block is picked at "random" from one of the least loaded

Re: copy a file from hdfs to local file system with java

2011-02-25 Thread Ayon Sinha
Use this API http://hadoop.apache.org/common/docs/current/api/index.html?org/apache/hadoop/fs/FileSystem.html The code is pretty straightforward. -Ayon From: Alessandro Binhara To: hdfs-user@hadoop.apache.org Sent: Fri, February 25, 2011 5:08:57 AM Subject:

Re: changing the block size

2011-02-06 Thread Ayon Sinha
ze > > Neither one was working. > > Is there anything I can do? I always have problems like this in hdfs. It > seems even experts are guessing at the answers :-/ > > > On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote: > conf/hdfs-site.xml > > restart dfs.

Re: changing the block size

2011-02-06 Thread Ayon Sinha
Is there anything I can do? I always have problems like this in hdfs. It > seems even experts are guessing at the answers :-/ > > > On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote: > conf/hdfs-site.xml > > restart dfs. I believe it should be sufficient to restart the nam

Re: changing the block size

2011-02-06 Thread Ayon Sinha
always have problems like this in hdfs. It seems even experts are guessing at the answers :-/ On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote: conf/hdfs-site.xml > >restart dfs. I believe it should be sufficient to restart the namenode only, >but >others can confi

Re: changing the block size

2011-02-03 Thread Ayon Sinha
conf/hdfs-site.xml restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm. -Ayon From: Rita To: hdfs-user@hadoop.apache.org Sent: Thu, February 3, 2011 4:35:09 AM Subject: changing the block size Currently I am u

Re: Configure NameNode to accept connection from external ips

2011-02-01 Thread Ayon Sinha
e cluster, so I think there must be a configuration parameter or some nobs that I can turn to make namenode serve files to client from different network. Felix On Mon, Jan 31, 2011 at 2:03 PM, Ayon Sinha wrote: Also, be careful about this when you try to connect to HDFS and it doesn't respond. T

Re: Configure NameNode to accept connection from external ips

2011-01-31 Thread Ayon Sinha
Also, be careful about this when you try to connect to HDFS and it doesn't respond. There was a place in the code where it was hard-coded to rety 45 times when there was a socketConnectExpectption trying every 15 secs. It was not (at least on 0.18 version code I looked at) honoring the configura

Re: Latency and speed of HDFS

2011-01-28 Thread Ayon Sinha
You can run that code w/o running hadoop. As long as you have the libraries in classpath and the code is doing FileSystem operations, it will execute in standalone mode just fine. The last I remember the conf values from the client would be default values of the HDFS version and will not get pic

Re: Adding new data nodes to existing cluster, with different storage capcity

2011-01-20 Thread Ayon Sinha
We did the same exercise a few months back. When we run the balancer which takes a while to balance, it will balance based on the percentage of disk usage on each node, so you will end up with usage of nodes between say 45-55% on all nodes. Sometimes the balancer does not balance well initially

Re: NameNode crash - cannot start dfs - need help

2010-10-05 Thread Ayon Sinha
Hi Matthew, Congratulations. Having a HDFS back is quite a relief and you were lucky enough to not loose any files/blocks. Another thing I ended up doing was to decommission the namenode machine from being a data node. That is what had caused the namenode to get out of disk space. -Ayon _

Re: NameNode crash - cannot start dfs - need help

2010-10-05 Thread Ayon Sinha
Have you tried getting rid of the edits.new file completely (by renaming it to something else)? -Ayon From: Matthew LeMieux To: hdfs-user@hadoop.apache.org Sent: Tue, October 5, 2010 10:14:21 AM Subject: Re: NameNode crash - cannot start dfs - need help No

Re: NameNode crash - cannot start dfs - need help

2010-10-05 Thread Ayon Sinha
Hi Matthew, "(BTW, what if I didn't want to keep my recent edits, and just wanted to start up the namenode? This is currently expensive downtime; I'd rather lose a small amount of data and be up and running than continue the down time). " This was exactly my use-case as well. I chose small dat

Re: NameNode crash - cannot start dfs - need help

2010-10-05 Thread Ayon Sinha
We had almost exact problem of namenode filling up and namnode failing at this exact same point. Since you have created space now you can copy over the edits.new, fsimage and the other 2 files from your /mnt/namesecondarynode/current and try restarting the namenode. I believe you will loose some

Re: Hdfs Space

2010-09-28 Thread Ayon Sinha
Use the dfs.data.dir config property. You can use comma separated directories to use multiple partitions. Also, specify the percentage of total space to use for HDFS using the dfs.datanode.du.pct property. -Ayon From: Khaled BEN BAHRI To: hdfs-user@hadoop.

HDFS will not start after Namenode ran out of disk space

2010-09-28 Thread Ayon Sinha
Our Namenode ran out of diskspace and became unresponsive. Tried to bounce DFS and it fails with the following error. How do I recover HDFS even at the cost of loosing a whole bunch on recent edits. 2010-09-28 00:45:00,777 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metri

Re: HDFS java client connect retry count

2010-08-17 Thread Ayon Sinha
From: Ayon Sinha To: hdfs-user@hadoop.apache.org Sent: Tue, August 17, 2010 3:49:56 PM Subject: HDFS java client connect retry count Hi, I have a java HDFS client which connects to a production cluster and gets data. On our staging environment we see that since it cannot connect to the namenode

HDFS java client connect retry count

2010-08-17 Thread Ayon Sinha
Hi, I have a java HDFS client which connects to a production cluster and gets data. On our staging environment we see that since it cannot connect to the namenode (expected) it keeps retrying for 45 times. I am looking for a way to set the retry count to much much lower. This is what we see in

HDFS client java access

2010-08-09 Thread Ayon Sinha
I have a HDFS java client on the same network domain as the cluster which needs to read the dfs file with permissions rw-r--r-- I get java.io.IOException at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:169) at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.jav