Re: FSDataInputStream.read(byte[]) only reads to a block boundary?

2009-06-28 Thread Raghu Angadi
This seems to be the case. I don't think there is any specific reason not to read across the block boundary... Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of t

Re: HDFS Random Access

2009-06-27 Thread Raghu Angadi
Yes, FSDataInputStream allows random access. There are way to read x bytes at a position p: 1) in.seek(p); read(buf, 0, x); 2) in.(p, buf, 0, x); These two have slightly different semantics. The second one is preferred and is easier for HDFS to optimize further. Random access should be prett

Re: "Too many open files" error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi
ld it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each thread, which make it close to the number I mentioned? Or it always 3 at maximum per thread / stream? Up to 10 sec looks quite the correct number, it seems it gets freed arround this time indeed. Regards. 2009/6/23 Raghu An

Re: "Too many open files" error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi
situation. Please check number of threads if you still facing the problem. Raghu. Raghu Angadi wrote: since you have HADOOP-4346, you should not have excessive epoll/pipe fds open. First of all do you still have the problem? If yes, how many hadoop streams do you have at a time? System.gc

Re: "Too many open files" error, which gets resolved after some time

2009-06-23 Thread Raghu Angadi
Stas Oskin wrote: Hi. Any idea if calling System.gc() periodically will help reducing the amount of pipes / epolls? since you have HADOOP-4346, you should not have excessive epoll/pipe fds open. First of all do you still have the problem? If yes, how many hadoop streams do you have at a time

Re: UnknownHostException

2009-06-23 Thread Raghu Angadi
Raghu Angadi wrote: This is at RPC client level and there is requirement for fully qualified I meant to say "there is NO requirement ..." hostname. May be "." at the end of "10.2.24.21" causing the problem? btw, in 0.21 even fs.default.name does not need to

Re: UnknownHostException

2009-06-23 Thread Raghu Angadi
This is at RPC client level and there is requirement for fully qualified hostname. May be "." at the end of "10.2.24.21" causing the problem? btw, in 0.21 even fs.default.name does not need to be fully qualified name.. anything that resolves to an ipaddress is fine (at least for common/FS an

Re: Disk Usage Overhead of Hadoop Upgrade

2009-06-22 Thread Raghu Angadi
The initial overhead is fairly small (extra hard link for each file). After that, the overhead grows as you delete the files (thus its blocks) that existed before the upgrade.. since the physical files for blocks are deleted only after you finalize. So the overhead == (the blocks that got de

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Raghu Angadi
000. I'm still trying to find out how well it behaves if I set the maximum fd number to 65K. Regards. 2009/6/22 Raghu Angadi Is this before 0.20.0? Assuming you have closed these streams, it is mostly https://issues.apache.org/jira/browse/HADOOP-4346 It is the JDK internal implementatio

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Raghu Angadi
Is this before 0.20.0? Assuming you have closed these streams, it is mostly https://issues.apache.org/jira/browse/HADOOP-4346 It is the JDK internal implementation that depends on GC to free up its cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache. Raghu. Stas Oskin w

Re: HDFS data transfer!

2009-06-11 Thread Raghu Angadi
Thanks Brian for the good advice. Slightly off topic from original post: there will be occasions where it is necessary or better to copy different portions of a file in parallel (distcp can benefit a lot). There is a proposal to let HDFS 'stitch' multiple files into one: something like Name

Re: Multiple NIC Cards

2009-06-09 Thread Raghu Angadi
I still need to go through the whole thread. but we feel your pain. First, please try setting fs.default.name to namenode internal ip on the datanodes. This should make NN to attach internal ip so the datanodes (assuming your routing is correct). NameNode webUI should list internal ips for da

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread Raghu Angadi
Did you try 'telnet 198.55.35.229 54310' from this datanode? The log show that it is not able to connect to "master:54310". ssh from datanode does not matter. Raghu. asif md wrote: I can SSH both ways .i.e. From master to slave and slave to master. the datanode is getting intialized at mas

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Raghu Angadi
Tom White wrote: Actually, the space is needed, to be interpreted as a Hadoop option by ToolRunner. Without the space it sets a Java system property, which Hadoop will not automatically pick up. I don't think space is required. Something like -Dfs.default.name=host:port works. I don't see Tool

Re: Renaming all nodes in Hadoop cluster

2009-06-02 Thread Raghu Angadi
Renaming datanodes should not affect HDFS. HDFS does not depend on hostname or ip for consistency of data. You can try renaming a few of the nodes. Of course, if you rename NameNode, you need to update the config file to reflect that. Stuart White wrote: Is it possible to rename all nodes

Re: Question on HDFS write performance

2009-06-01 Thread Raghu Angadi
Can you post the patch for these measurements? I can guess where these are measured but better to see the actual changes. For. e.g. the third datanode does only two things : receiving and writing data to the disk. So "avg block writing time" for you should be around sum of these two (~6-7k)

Re: InputStream.open() efficiency

2009-05-26 Thread Raghu Angadi
time. Raghu. Regards. 2009/5/26 Raghu Angadi 'in.seek(); in.read()' is certainly better than, 'in = fs.open(); in.seek(); in.read()' The difference is is exactly one open() call. So you would save an RPC to NameNode. There are couple of issues that affect apps that ke

Re: InputStream.open() efficiency

2009-05-26 Thread Raghu Angadi
'in.seek(); in.read()' is certainly better than, 'in = fs.open(); in.seek(); in.read()' The difference is is exactly one open() call. So you would save an RPC to NameNode. There are couple of issues that affect apps that keep the handlers open very long time (many hours to days).. but those

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
Raghu Angadi wrote: As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. Socks proxy would not be useful since you don't want datanode traffic to go through the

Re: Circumventing Hadoop's data placement policy

2009-05-23 Thread Raghu Angadi
As hack, you could tunnel NN traffic from GridFTP clients through a different machine (by changing fs.default.name). Alternately these clients could use a socks proxy. The amount of traffic to NN is not much and tunneling should not affect performance. Raghu. Brian Bockelman wrote: Hey all

Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi
Stas Oskin wrote: I think you should file a jira on this. Most likely this is what is happening : Here it is - hope it's ok: https://issues.apache.org/jira/browse/HADOOP-5886 looks good. I will add my earlier post as comment. You could update the jira with any more tests. Next time, it w

Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi
Brian Bockelman wrote: On May 21, 2009, at 2:01 PM, Raghu Angadi wrote: I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since

Re: Could only be replicated to 0 nodes, instead of 1

2009-05-21 Thread Raghu Angadi
I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other b

Re: How to replace the storage on a datanode without formatting the namenode?

2009-05-15 Thread Raghu Angadi
, 2009 at 11:35 AM, Raghu Angadi wrote: Along these lines, even simpler approach I would think is : 1) set data.dir to local and create the data. 2) stop the datanode 3) rsync local_dir network_dir 4) start datanode with data.dir with network_dir There is no need to format or rebalnace. This way

Re: public IP for datanode on EC2

2009-05-14 Thread Raghu Angadi
Philip Zeyliger wrote: You could use ssh to set up a SOCKS proxy between your machine and ec2, and setup org.apache.hadoop.net.SocksSocketFactory to be the socket factory. http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ has more information. very useful wri

Re: How to replace the storage on a datanode without formatting the namenode?

2009-05-14 Thread Raghu Angadi
Along these lines, even simpler approach I would think is : 1) set data.dir to local and create the data. 2) stop the datanode 3) rsync local_dir network_dir 4) start datanode with data.dir with network_dir There is no need to format or rebalnace. This way you can switch between local and netw

Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-11 Thread Raghu Angadi
oded at 80 -- a queue of 5MB (packets are 64k). You thinking I should experiment with that? I suppose that won't hel w/ much w/ getting my writes on the datanode. Maybe I should be digging on datanode side to figure why its slow getting back to the client? Thanks, St.Ack On Sun, May 10, 2009 at

Re: Huge DataNode Virtual Memory Usage

2009-05-10 Thread Raghu Angadi
what do 'jmap' and 'jmap -histo:live' show?. Raghu. Stefan Will wrote: Chris, Thanks for the tip ... However I'm already running 1.6_10: java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) Do you know of

Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-10 Thread Raghu Angadi
It should not be waiting unnecessarily. But the client has to, if any of the datanodes in the pipeline is not able to receive the as fast as client is writing. IOW writing goes as fast as the slowest of nodes involved in the pipeline (1 client and 3 datanodes). But based on what your case is

Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-07 Thread Raghu Angadi
oes anyone have configuration recommendations to minimize or remove these errors under any of these circumstances, or perhaps there is another explanation? Thanks, Albert On 5/5/09 11:34 AM, "Raghu Angadi" wrote: This can happen for example when a client is killed when it has some

Re: Is HDFS protocol written from scratch?

2009-05-07 Thread Raghu Angadi
Philip Zeyliger wrote: It's over TCP/IP, in a custom protocol. See DataXceiver.java. My sense is that it's a custom protocol because Hadoop's IPC mechanism isn't optimized for large messages. yes, and job classes are not distributed using this. It is a very simple protocol used to read and

Re: Namenode failed to start with "FSNamesystem initialization failed" error

2009-05-06 Thread Raghu Angadi
le ? yes. Jira is a better place for tracking and fixing bugs. I am pretty sure what you saw is a bug (either already or needs to be fixed). Raghu. Thanks, Tamir On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi wrote: Tamir, Please file a jira on the problem you are seeing with 'save

Re: Namenode failed to start with "FSNamesystem initialization failed" error

2009-05-05 Thread Raghu Angadi
the image is stored in two files : fsimage and edits (under namenode-directory/current/). Stas Oskin wrote: Well, it definitely caused the SecondaryNameNode to crash, and also seems to have triggered some strange issues today as well. By the way, how the image file is named?

Re: Namenode failed to start with "FSNamesystem initialization failed" error

2009-05-05 Thread Raghu Angadi
t preserve the logs or the image. If this happens again - I will surely do so. Regards. 2009/5/5 Raghu Angadi Stas, This is indeed a serious issue. Did you happen to store the the corrupt image? Can this be reproduced using the image? Usually you can recover manually from a corrupt or trunc

Re: java.io.EOFException: while trying to read 65557 bytes

2009-05-05 Thread Raghu Angadi
This can happen for example when a client is killed when it has some files open for write. In that case it is an expected error (the log should really be at WARN or INFO level). Raghu. Albert Sunwoo wrote: Hello Everyone, I know there's been some chatter about this before but I am seeing t

Re: Namenode failed to start with "FSNamesystem initialization failed" error

2009-05-05 Thread Raghu Angadi
Tamir, Please file a jira on the problem you are seeing with 'saveLeases'. In the past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724, and more mentioned in HADOOP-3724). Also refer the thread you started http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397

Re: Namenode failed to start with "FSNamesystem initialization failed" error

2009-05-05 Thread Raghu Angadi
Stas, This is indeed a serious issue. Did you happen to store the the corrupt image? Can this be reproduced using the image? Usually you can recover manually from a corrupt or truncated image. But more importantly we want to find how it got in to this state. Raghu. Stas Oskin wrote: Hi.

Re: No route to host prevents from storing files to HDFS

2009-04-24 Thread Raghu Angadi
Telnet failure to localhost is expected and is unrelated since servers are not listening on it. What is the ip address of this machine? Try 'telnet different_datanode_ip 50010' _from_ this machine. What do you see? Raghu. Stas Oskin wrote: Hi. Shouldn't you be testing connecting _from_

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi
Stas Oskin wrote: Tried in step 3 to telnet both the 50010 and the 8010 ports of the problematic datanode - both worked. Shouldn't you be testing connecting _from_ the datanode? The error you posted is while this DN is trying connect to another DN. Raghu. I agree there is indeed an inter

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Raghu Angadi
There is some mismatch here.. what is the expected ip address of this machine (or does it have multiple interfaces and properly routed)? Looking at the "Receiving Block" message DN thinks its address is 192.168.253.20 but NN thinks it is 253.32 (and client is able to connect using 253.32).

Re: More Replication on dfs

2009-04-14 Thread Raghu Angadi
Aseem, Regd over-replication, it is mostly app related issue as Alex mentioned. But if you are concerned about under-replicated blocks in fsck output : These blocks should not stay under-replicated if you have enough nodes and enough space on them (check NameNode webui). Try grep-ing for one

Re: DataXceiver Errors in 0.19.1

2009-04-13 Thread Raghu Angadi
It need not be anything to worry about. Do you see anything at user level (task, job, copy, or script) fail because of this? On a distributed system with many nodes, there would be some errors on some of the nodes for various reasons (load, hardware, reboot, etc). HDFS usually should work ar

Re: Changing block size of hadoop

2009-04-12 Thread Raghu Angadi
Aaron Kimball wrote: Blocks already written to HDFS will remain their current size. Blocks are immutable objects. That procedure would set the size used for all subsequently-written blocks. I don't think you can change the block size while the cluster is running, because that would require the Na

Re: swap hard drives between datanodes

2009-03-31 Thread Raghu Angadi
Raghu Angadi wrote: IP Adress mismatch should not matter. What is the actual error you saw? The mismatch might be unintentional. The reason I say ip address should not matter is that if you change the ip address of a datanode, it should still work correctly. Raghu. Raghu. Mike Andrews

Re: swap hard drives between datanodes

2009-03-31 Thread Raghu Angadi
IP Adress mismatch should not matter. What is the actual error you saw? The mismatch might be unintentional. Raghu. Mike Andrews wrote: i tried swapping two hot-swap sata drives between two nodes in a cluster, but it didn't work: after restart, one of the datanodes shut down since namenode s

Re: Socket closed Exception

2009-03-30 Thread Raghu Angadi
If it is NameNode, then there is probably a log about closing the socket around that time. Raghu. lohit wrote: Recently we are seeing lot of Socket closed exception in our cluster. Many task's open/create/getFileInfo calls get back 'SocketException' with message 'Socket closed'. We seem to

Re: about dfsadmin -report

2009-03-27 Thread Raghu Angadi
stchu wrote: But when the web-ui shows the node dead, -report still shows "in service" and the living nodes=3 (in web-ui: living=2 dead=1). please file a jira and describe how to reproduce in as much detail as you can in a comment. thanks, Raghu. stchu 2009/3/26 Raghu Angad

Re: about dfsadmin -report

2009-03-25 Thread Raghu Angadi
stchu wrote: Hi, I do a test about the datanode crash. I stop the networking on one of the datanode. The Web app and fsck report that datanode dead after 10 mins. But dfsadmin -report are not report that over 25 mins. Is this correct? Nope. Both web-ui and '-report' from the same source of inf

Re: hadoop need help please suggest

2009-03-24 Thread Raghu Angadi
What is scale you are thinking of? (10s, 100s or more nodes)? The memory for metadata at NameNode you mentioned is that main issue with small files. There are multiple alternatives for the dealing with that. This issue is discussed many times here. Also please use core-user@ id alone for ask

Re: hadoop configuration problem hostname-ip address

2009-03-23 Thread Raghu Angadi
you need https://issues.apache.org/jira/browse/HADOOP-5191 I don't why there is no response to the simple patch I attached. alternately you could use hostname that it expects instead of ip address. Raghu. snehal nagmote wrote: Hi, I am using Hadoop version 0.19. I set up a hadoop cluster for

Re: Downgrading Hdfs

2009-03-18 Thread Raghu Angadi
is a bit scary - what are the reasons to go with 0.20.0 instead of 0.19.2? Yahoo is jumping from 0.18.x directly to 0.20.0? Why is Yahoo skipping the 0.19.x release? Is the expectation that 0.19.2 will be released at the same time as 0.20.0? Thanks, David On Wed, Mar 18, 2009 at 1:31 PM,

Re: Downgrading Hdfs

2009-03-18 Thread Raghu Angadi
Short is answer I am afraid is no. As an alternative, I recommend upgrading to latest 0.19.x or 0.20.0 (to be released in couple of days). 0.19.2 is certainly a lot better than 0.19.0. Yahoo is rolling out 0.20.x if that helps your confidence. Raghu. David Ritch wrote: There is an establis

Re: DataNode stops cleaning disk?

2009-03-17 Thread Raghu Angadi
node web ui reported size. I'm waiting for the next time this happens to collect more details, but ever since I wrote the first email - everything works perfectly well (another application of Murphy law). Thanks, Igor -----Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.c

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-03-12 Thread Raghu Angadi
TCK wrote: How well does the read throughput from HDFS scale with the number of data nodes ? For example, if I had a large file (say 10GB) on a 10 data node cluster, would the time taken to read this whole file in parallel (ie, with multiple reader client processes requesting different parts of

Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi
Raghu Angadi wrote: Amandeep Khurana wrote: My dfs.datanode.socket.write.timeout is set to 0. This had to be done to get Hbase to work. ah.. I see, we should fix that. Not sure how others haven't seen it till now. Affects only those with write.timeout set to 0 on the clients.

Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi
value? very large value like 100 years is same as setting it to 0 (for all practical purposes). Raghu. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 11, 2009 at 12:00 PM, Raghu Angadi wrote: Amandeep Khurana wrote: My

Re: HDFS is corrupt, need to salvage the data.

2009-03-11 Thread Raghu Angadi
you should not have 100% of blocks missing. There are many possibilities, it not easy for me list the the right one in your case without much info or list all possible conditions. Raghu. Mayuran Yogarajah wrote: Mayuran Yogarajah wrote: Raghu Angadi wrote: The block files usually don&#

Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi
is a work around, please change that to some extremely large value for now. Raghu. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi wrote: Did you change dfs.datanode.socket.write.timeout to 5 seconds

Re: Not a host:port pair when running balancer

2009-03-11 Thread Raghu Angadi
Doug Cutting wrote: Konstantin Shvachko wrote: Clarifying: port # is missing in your configuration, should be fs.default.name hdfs://hvcwydev0601:8020 where 8020 is your port number. That's the work-around, but it's a bug. One should not need to specify the default port number (8020).

Re: Error while putting data onto hdfs

2009-03-11 Thread Raghu Angadi
Did you change dfs.datanode.socket.write.timeout to 5 seconds? The exception message says so. It is extremely small. The default is 8 minutes and is intentionally pretty high. Its purpose is mainly to catch extremely unresponsive datanodes and other network issues. Raghu. Amandeep Khurana

Re: HDFS is corrupt, need to salvage the data.

2009-03-10 Thread Raghu Angadi
Mayuran Yogarajah wrote: lohit wrote: How many Datanodes do you have. From the output it looks like at the point when you ran fsck, you had only one datanode connected to your NameNode. Did you have others? Also, I see that your default replication is set to 1. Can you check if your datanodes

Re: DataNode stops cleaning disk?

2009-03-05 Thread Raghu Angadi
ding these files. hope this helps. Raghu. Thank you for help! Igor -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Thursday, March 05, 2009 11:05 AM To: core-user@hadoop.apache.org Subject: Re: DataNode stops cleaning disk? This is unexpected unless some

Re: DataNode stops cleaning disk?

2009-03-05 Thread Raghu Angadi
This is unexpected unless some other process is eating up space. Couple of things to collect next time (along with log): - All the contents under datanode-directory/ (especially including 'tmp' and 'current') - Does 'du' of this directory match with what is reported to NameNode (shown on we

Re: Big HDFS deletes lead to dead datanodes

2009-02-24 Thread Raghu Angadi
What is the Hadoop version? DN limits deletes per heartbeat to 100 or so I think. So the dead datanodes might not be dieing only because of deletes... does stacktrace show that? > [...] Ideally we would never see "dead" datanodes from doing deletes. yes : HADOOP-4584 moves deletions out of h

Re: Hadoop Write Performance

2009-02-18 Thread Raghu Angadi
what is the hadoop version? You could check log on a datanode around that time. You could post any suspicious errors. For e.g. you can trace a particular block in client and datanode logs. Most likely it not a NameNode issue, but you can check NameNode log as well. Raghu. Xavier Stevens wr

Re: "Too many open files" in 0.18.3

2009-02-13 Thread Raghu Angadi
of file descriptors (something like 6 times the number of active 'xceivers'). In your case you seem to have lot of simultaneous clients. I suggest increasing file limit to much higher (something like 64k). Raghu. Regards, Sean 2009/2/13 Raghu Angadi

Re: "Too many open files" in 0.18.3

2009-02-13 Thread Raghu Angadi
didn't appear to be affecting things to begin with. Regards, Sean On Thu, Feb 12, 2009 at 2:07 PM, Raghu Angadi wrote: You are most likely hit by https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back ported. There is a 0.18 patch posted there. btw, does 16k help in your ca

Re: "Too many open files" in 0.18.3

2009-02-12 Thread Raghu Angadi
You are most likely hit by https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back ported. There is a 0.18 patch posted there. btw, does 16k help in your case? Ideally 1k should be enough (with small number of clients). Please try the above patch with 1k limit. Raghu. Sea

Re: stable version

2009-02-11 Thread Raghu Angadi
Vadim Zaliva wrote: The particular problem I am having is this one: https://issues.apache.org/jira/browse/HADOOP-2669 I am observing it in version 19. Could anybody confirm that it have been fixed in 18, as Jira claims? I am wondering why bug fix for this problem might have been committed to 1

Re: can't read the SequenceFile correctly

2009-02-09 Thread Raghu Angadi
+1 on something like getValidBytes(). Just the existence of this would warn many programmers about getBytes(). Raghu. Owen O'Malley wrote: On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote: Hey Tom, I got also burned by this ?? Why does BytesWritable.getBytes() returns non-vaild bytes ??

Re: Connect to namenode

2009-02-05 Thread Raghu Angadi
I don't think it is intentional. Please file a jira with all the details about how to reproduce (with actual configuration files). thanks, Raghu. Habermaas, William wrote: After creation and startup of the hadoop namenode, you can only connect to the namenode via hostname and not IP. EX

Re: problem with completion notification from block movement

2009-02-04 Thread Raghu Angadi
Karl Kleinpaste wrote: On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote: The Datanode's use multiple threads with locking and one of the assumptions is that the block report (1ce per hour by default) takes little time. The datanode will pause while the block report is running and if it happ

Re: Question about HDFS capacity and remaining

2009-01-29 Thread Raghu Angadi
Doug Cutting wrote: Ext2 by default reserves 5% of the drive for use by root only. That'd be 45MB of your 907GB capacity which would account for most of the discrepancy. You can adjust this with tune2fs. plus, I think DataNode reports only 98% of the space by default. Raghu. Doug Bryan D

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Raghu Angadi
Owen O'Malley wrote: On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote: By "scrub" I mean, have a tool that reads every block on a given data node. That way, I'd be able to find corrupted blocks proactively rather than having an app read the file and find it. The datanode already has a thread t

Re: Hadoop 0.19 over OS X : dfs error

2009-01-26 Thread Raghu Angadi
nitesh bhatia wrote: Thanks. It worked. :) in hadoop-env.sh its required to write exact path for java framework. I changed it to export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home and it started. In hadoop 0.18.2 export JAVA_HOME=/Library/Java/Home is working fine.

Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi
nitesh bhatia wrote: Hi Apple provides opensource discovery service called Bonjour (zeroconf). Is it possible to integrate Zeroconf with Hadoop so that discovery of nodes become automatic ? Presently for setting up multi-node cluster we need to add IPs manually. Integrating it with bonjour can ma

Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi
Nitay wrote: Why not use the distributed coordination service ZooKeeper? When nodes come up they write some ephemeral file in a known ZooKeeper directory and anyone who's interested, i.e. NameNode, can put a watch on the directory and get notified when new children come up. NameNode does not do

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Raghu Angadi
Mark Kerzner wrote: Raghu, if I write all files only one, is the cost the same in one directory or do I need to find the optimal directory size and when full start another "bucket?" If you write only once, then writing won't be much of an issue. You can write them in lexical order to help wit

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi
delete files). Raghu. On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi wrote: If you are adding and deleting files in the directory, you might notice CPU penalty (for many loads, higher CPU on NN is not an issue). This is mainly because HDFS does a binary search on files in a directory each

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi
Raghu Angadi wrote: If you are adding and deleting files in the directory, you might notice CPU penalty (for many loads, higher CPU on NN is not an issue). This is mainly because HDFS does a binary search on files in a directory each time it inserts a new file. I should add that equal or

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi
If you are adding and deleting files in the directory, you might notice CPU penalty (for many loads, higher CPU on NN is not an issue). This is mainly because HDFS does a binary search on files in a directory each time it inserts a new file. If the directory is relatively idle, then there is

Re: HDFS loosing blocks or connection error

2009-01-23 Thread Raghu Angadi
> It seems hdfs isn't so robust or reliable as the website says and/or I > have a configuration issue. quite possible. How robust does the website say it is? I agree debuggings failures like the following is pretty hard for casual users. You need look at the logs for block, or run 'bin/hadoop

Re: watch out: Hadoop and Linux kernel 2.6.27

2009-01-22 Thread Raghu Angadi
Thanks Peter for the heads up. Note that the problem is more severe with JVM's use of per-thread selectors. https://issues.apache.org/jira/browse/HADOOP-4346 avoids using JVM selectos. Even with HADOOP-4346, a limit of 128 is too small. I wish 4346 went into earlier versions of Hadoop. Ragh

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Raghu Angadi
Jason Venner wrote: There is no reason to do the block scans. All of the modern kernels will provide you notification when an file or directory is altered. This could be readily handled with a native application that writes structured data to a receiver in the Datanode, or via JNA/JNI for pure

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-12 Thread Raghu Angadi
Sagar Naik wrote: Hi Raghu, The periodic "du" and block reports thread thrash the disk. (Block Reports takes abt on an avg 21 mins ) and I think all the datanode threads are not able to do much and freeze yes, that is the known problem we talked about in the earlier mails in this thread.

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Raghu Angadi
2M files is excessive. But there is no reason block reports should break. My preference is to make block reports handle this better. DNs dropping in and out of the cluster causes too many other problems. Raghu. Konstantin Shvachko wrote: Hi Jason, 2 million blocks per data-node is not goin

Re: 0.18.1 datanode psuedo deadlock problem

2009-01-09 Thread Raghu Angadi
The scan required for each block report is well known issue and it can be fixed. It was discussed multiple times (e.g. https://issues.apache.org/jira/browse/HADOOP-3232?focusedCommentId=12587795#action_12587795 ). Earlier, inline 'du' on datanodes used to cause the same problem and they the

Re: xceiverCount limit reason

2009-01-08 Thread Raghu Angadi
Jean-Adrien wrote: Is it the responsibility of hadoop client too manage its connection pool with the server ? In which case the problem would be an HBase problem? Anyway I found my problem, it is not a matter of performances. Essentially, yes. Client has to close the file to relinquish connec

Re: Question about the Namenode edit log and syncing the edit log to disk. 0.19.0

2009-01-07 Thread Raghu Angadi
Did you look at FSEditLog.EditLogFileOutputStream.flushAndSync()? This code was re-organized sometime back. But the guarantees it provides should be exactly same as before. Please let us know otherwise. Raghu. Jason Venner wrote: I have always assumed (which is clearly my error) that edit lo

Re: Map-Reduce job is using external IP instead of internal

2008-12-31 Thread Raghu Angadi
Your configuration for task tracker, job tracker might be using external hostnames. Essentially any hostnames in configuration files should resolve to internal ips. Raghu. Genady wrote: Hi, We're using Hadoop 0.18.2/Hbase 0.18.1 four-nodes cluster on CentOS Linux, in /etc/hosts the fol

Re: Performance testing

2008-12-31 Thread Raghu Angadi
I should add that your test should both create and delete files. Raghu. Raghu Angadi wrote: Sandeep Dhawan wrote: Hi, I am trying to create a hadoop cluster which can handle 2000 write requests per second. In each write request I would writing a line of size 1KB in a file. This is

Re: Performance testing

2008-12-31 Thread Raghu Angadi
Sandeep Dhawan wrote: Hi, I am trying to create a hadoop cluster which can handle 2000 write requests per second. In each write request I would writing a line of size 1KB in a file. This is essentially a matter of deciding how many datanodes (with the given configuration) do you need to write

Re: cannot allocate memory error

2008-12-31 Thread Raghu Angadi
Your OS is running out of memory. Usually a sign of too many processes (or threads) on the machine. Check what else is happening on the system. Raghu. sagar arlekar wrote: Hello, I am new to hadoop. I am running hapdoop 0.17 in a Eucalyptus cloud instance (its a centos image on xen) bin/hado

Re: DFS replication and Error Recovery on failure

2008-12-29 Thread Raghu Angadi
Konstantin Shvachko wrote: 1) If i set value of dfs.replication to 3 only in hadoop-site.xml of namenode(master) and then restart the cluster will this take effect. or i have to change hadoop-site.xml at all slaves ? dfs.replication is the name-node parameter, so you need to restart only the

Re: Does datanode acts as readonly in case of DiskFull ?

2008-12-17 Thread Raghu Angadi
Sagar Naik wrote: Hi , I would like to know what happens in case of DiskFull on a datanode Does the datanode acts as block server only ? Yes. I think so. Does it rejects anymore Block creation request OR Namenode does not list it for new blocks yes. NN will not allocate it any more blocks.

Re: Hit a roadbump in solving truncated block issue

2008-12-16 Thread Raghu Angadi
Brian Bockelman wrote: Hey, I hit a bit of a roadbump in solving the "truncated block issue" at our site: namely, some of the blocks appear perfectly valid to the datanode. The block verifies, but it is still the wrong size (it appears that the metadata is too small too). What's the best w

Re: File loss at Nebraska

2008-12-09 Thread Raghu Angadi
Brian Bockelman wrote: On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote: Also it might be useful to strongly word hadoop-default.conf as many people might not know a downside exists for using 2 rather then 3 as the replication factor. Before reading this thread I would have thought 2 to be su

Re: Hadoop datanode crashed - SIGBUS

2008-12-01 Thread Raghu Angadi
FYI : Datanode does not run any user code and does not link with any native/JNI code. Raghu. Chris Collins wrote: Was there anything mentioned as part of the tombstone message about "problematic frame"? What java are you using? There are a few reasons for SIGBUS errors, one is illegal add

Re: Namenode BlocksMap on Disk

2008-11-26 Thread Raghu Angadi
Dennis Kubes wrote: From time to time a message pops up on the mailing list about OOM errors for the namenode because of too many files. Most recently there was a 1.7 million file installation that was failing. I know the simple solution to this is to have a larger java heap for the nameno

NN JVM process takes a lot more memory than assigned

2008-11-21 Thread Raghu Angadi
There is one instance of NN where JVM process takes 40GB memory though jvm is started with 24GB. Java heap is still 24GB. Looks like it ends up taking a lot of memory outside. There are a lot entries in pmap similar to below that account for the difference. Anyone knows what this might be? F

  1   2   >