This seems to be the case. I don't think there is any specific reason
not to read across the block boundary...
Even if HDFS does read across the blocks, it is still not a good idea to
ignore the JavaDoc for read(). If you want all the bytes read, then you
should have a while loop or one of t
Yes, FSDataInputStream allows random access. There are way to read x
bytes at a position p:
1) in.seek(p); read(buf, 0, x);
2) in.(p, buf, 0, x);
These two have slightly different semantics. The second one is preferred
and is easier for HDFS to optimize further.
Random access should be prett
ld it be that HDFS opens 3 x 3 (input - output - epoll) fd's per each
thread, which make it close to the number I mentioned? Or it always 3 at
maximum per thread / stream?
Up to 10 sec looks quite the correct number, it seems it gets freed arround
this time indeed.
Regards.
2009/6/23 Raghu An
situation. Please check number of
threads if you still facing the problem.
Raghu.
Raghu Angadi wrote:
since you have HADOOP-4346, you should not have excessive epoll/pipe fds
open. First of all do you still have the problem? If yes, how many
hadoop streams do you have at a time?
System.gc
Stas Oskin wrote:
Hi.
Any idea if calling System.gc() periodically will help reducing the amount
of pipes / epolls?
since you have HADOOP-4346, you should not have excessive epoll/pipe fds
open. First of all do you still have the problem? If yes, how many
hadoop streams do you have at a time
Raghu Angadi wrote:
This is at RPC client level and there is requirement for fully qualified
I meant to say "there is NO requirement ..."
hostname. May be "." at the end of "10.2.24.21" causing the problem?
btw, in 0.21 even fs.default.name does not need to
This is at RPC client level and there is requirement for fully qualified
hostname. May be "." at the end of "10.2.24.21" causing the problem?
btw, in 0.21 even fs.default.name does not need to be fully qualified
name.. anything that resolves to an ipaddress is fine (at least for
common/FS an
The initial overhead is fairly small (extra hard link for each file).
After that, the overhead grows as you delete the files (thus its blocks)
that existed before the upgrade.. since the physical files for blocks
are deleted only after you finalize.
So the overhead == (the blocks that got de
000.
I'm still trying to find out how well it behaves if I set the maximum fd
number to 65K.
Regards.
2009/6/22 Raghu Angadi
Is this before 0.20.0? Assuming you have closed these streams, it is mostly
https://issues.apache.org/jira/browse/HADOOP-4346
It is the JDK internal implementatio
Is this before 0.20.0? Assuming you have closed these streams, it is
mostly https://issues.apache.org/jira/browse/HADOOP-4346
It is the JDK internal implementation that depends on GC to free up its
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
Raghu.
Stas Oskin w
Thanks Brian for the good advice.
Slightly off topic from original post: there will be occasions where it
is necessary or better to copy different portions of a file in parallel
(distcp can benefit a lot). There is a proposal to let HDFS 'stitch'
multiple files into one: something like
Name
I still need to go through the whole thread. but we feel your pain.
First, please try setting fs.default.name to namenode internal ip on the
datanodes. This should make NN to attach internal ip so the datanodes
(assuming your routing is correct). NameNode webUI should list internal
ips for da
Did you try 'telnet 198.55.35.229 54310' from this datanode? The log
show that it is not able to connect to "master:54310". ssh from datanode
does not matter.
Raghu.
asif md wrote:
I can SSH both ways .i.e. From master to slave and slave to master.
the datanode is getting intialized at mas
Tom White wrote:
Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.
I don't think space is required. Something like
-Dfs.default.name=host:port works. I don't see Tool
Renaming datanodes should not affect HDFS. HDFS does not depend on
hostname or ip for consistency of data. You can try renaming a few of
the nodes.
Of course, if you rename NameNode, you need to update the config file to
reflect that.
Stuart White wrote:
Is it possible to rename all nodes
Can you post the patch for these measurements? I can guess where these
are measured but better to see the actual changes.
For. e.g. the third datanode does only two things : receiving and
writing data to the disk. So "avg block writing time" for you should be
around sum of these two (~6-7k)
time.
Raghu.
Regards.
2009/5/26 Raghu Angadi
'in.seek(); in.read()' is certainly better than,
'in = fs.open(); in.seek(); in.read()'
The difference is is exactly one open() call. So you would save an RPC to
NameNode.
There are couple of issues that affect apps that ke
'in.seek(); in.read()' is certainly better than,
'in = fs.open(); in.seek(); in.read()'
The difference is is exactly one open() call. So you would save an RPC
to NameNode.
There are couple of issues that affect apps that keep the handlers open
very long time (many hours to days).. but those
Raghu Angadi wrote:
As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name).
Alternately these
clients could use a socks proxy.
Socks proxy would not be useful since you don't want datanode traffic to
go through the
As hack, you could tunnel NN traffic from GridFTP clients through a
different machine (by changing fs.default.name). Alternately these
clients could use a socks proxy.
The amount of traffic to NN is not much and tunneling should not affect
performance.
Raghu.
Brian Bockelman wrote:
Hey all
Stas Oskin wrote:
I think you should file a jira on this. Most likely this is what is
happening :
Here it is - hope it's ok:
https://issues.apache.org/jira/browse/HADOOP-5886
looks good. I will add my earlier post as comment. You could update the
jira with any more tests.
Next time, it w
Brian Bockelman wrote:
On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:
I think you should file a jira on this. Most likely this is what is
happening :
* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn as
well since
I think you should file a jira on this. Most likely this is what is
happening :
* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn as
well since '# active writes' on it is larger than '2 * avg'.
* Even if there is one other b
, 2009 at 11:35 AM, Raghu Angadi wrote:
Along these lines, even simpler approach I would think is :
1) set data.dir to local and create the data.
2) stop the datanode
3) rsync local_dir network_dir
4) start datanode with data.dir with network_dir
There is no need to format or rebalnace.
This way
Philip Zeyliger wrote:
You could use ssh to set up a SOCKS proxy between your machine and
ec2, and setup org.apache.hadoop.net.SocksSocketFactory to be the
socket factory.
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
has more information.
very useful wri
Along these lines, even simpler approach I would think is :
1) set data.dir to local and create the data.
2) stop the datanode
3) rsync local_dir network_dir
4) start datanode with data.dir with network_dir
There is no need to format or rebalnace.
This way you can switch between local and netw
oded at 80 -- a queue of 5MB (packets are 64k). You thinking I
should experiment with that? I suppose that won't hel w/ much w/ getting my
writes on the datanode. Maybe I should be digging on datanode side to
figure why its slow getting back to the client?
Thanks,
St.Ack
On Sun, May 10, 2009 at
what do 'jmap' and 'jmap -histo:live' show?.
Raghu.
Stefan Will wrote:
Chris,
Thanks for the tip ... However I'm already running 1.6_10:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
Do you know of
It should not be waiting unnecessarily. But the client has to, if any of
the datanodes in the pipeline is not able to receive the as fast as
client is writing. IOW writing goes as fast as the slowest of nodes
involved in the pipeline (1 client and 3 datanodes).
But based on what your case is
oes anyone have configuration recommendations to
minimize or remove these errors under any of these circumstances, or perhaps
there is another explanation?
Thanks,
Albert
On 5/5/09 11:34 AM, "Raghu Angadi" wrote:
This can happen for example when a client is killed when it has some
Philip Zeyliger wrote:
It's over TCP/IP, in a custom protocol. See DataXceiver.java. My sense is
that it's a custom protocol because Hadoop's IPC mechanism isn't optimized
for large messages.
yes, and job classes are not distributed using this. It is a very simple
protocol used to read and
le ?
yes. Jira is a better place for tracking and fixing bugs. I am pretty
sure what you saw is a bug (either already or needs to be fixed).
Raghu.
Thanks,
Tamir
On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi wrote:
Tamir,
Please file a jira on the problem you are seeing with 'save
the image is stored in two files : fsimage and edits
(under namenode-directory/current/).
Stas Oskin wrote:
Well, it definitely caused the SecondaryNameNode to crash, and also seems to
have triggered some strange issues today as well.
By the way, how the image file is named?
t preserve the logs or the image.
If this happens again - I will surely do so.
Regards.
2009/5/5 Raghu Angadi
Stas,
This is indeed a serious issue.
Did you happen to store the the corrupt image? Can this be reproduced
using the image?
Usually you can recover manually from a corrupt or trunc
This can happen for example when a client is killed when it has some
files open for write. In that case it is an expected error (the log
should really be at WARN or INFO level).
Raghu.
Albert Sunwoo wrote:
Hello Everyone,
I know there's been some chatter about this before but I am seeing t
Tamir,
Please file a jira on the problem you are seeing with 'saveLeases'. In
the past there have been multiple fixes in this area (HADOOP-3418,
HADOOP-3724, and more mentioned in HADOOP-3724).
Also refer the thread you started
http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397
Stas,
This is indeed a serious issue.
Did you happen to store the the corrupt image? Can this be reproduced
using the image?
Usually you can recover manually from a corrupt or truncated image. But
more importantly we want to find how it got in to this state.
Raghu.
Stas Oskin wrote:
Hi.
Telnet failure to localhost is expected and is unrelated since servers
are not listening on it.
What is the ip address of this machine?
Try 'telnet different_datanode_ip 50010' _from_ this machine. What do
you see?
Raghu.
Stas Oskin wrote:
Hi.
Shouldn't you be testing connecting _from_
Stas Oskin wrote:
Tried in step 3 to telnet both the 50010 and the 8010 ports of the
problematic datanode - both worked.
Shouldn't you be testing connecting _from_ the datanode? The error you
posted is while this DN is trying connect to another DN.
Raghu.
I agree there is indeed an inter
There is some mismatch here.. what is the expected ip address of this
machine (or does it have multiple interfaces and properly routed)?
Looking at the "Receiving Block" message DN thinks its address is
192.168.253.20 but NN thinks it is 253.32 (and client is able to connect
using 253.32).
Aseem,
Regd over-replication, it is mostly app related issue as Alex mentioned.
But if you are concerned about under-replicated blocks in fsck output :
These blocks should not stay under-replicated if you have enough nodes
and enough space on them (check NameNode webui).
Try grep-ing for one
It need not be anything to worry about. Do you see anything at user
level (task, job, copy, or script) fail because of this?
On a distributed system with many nodes, there would be some errors on
some of the nodes for various reasons (load, hardware, reboot, etc).
HDFS usually should work ar
Aaron Kimball wrote:
Blocks already written to HDFS will remain their current size. Blocks are
immutable objects. That procedure would set the size used for all
subsequently-written blocks. I don't think you can change the block size
while the cluster is running, because that would require the Na
Raghu Angadi wrote:
IP Adress mismatch should not matter. What is the actual error you saw?
The mismatch might be unintentional.
The reason I say ip address should not matter is that if you change the
ip address of a datanode, it should still work correctly.
Raghu.
Raghu.
Mike Andrews
IP Adress mismatch should not matter. What is the actual error you saw?
The mismatch might be unintentional.
Raghu.
Mike Andrews wrote:
i tried swapping two hot-swap sata drives between two nodes in a
cluster, but it didn't work: after restart, one of the datanodes shut
down since namenode s
If it is NameNode, then there is probably a log about closing the socket
around that time.
Raghu.
lohit wrote:
Recently we are seeing lot of Socket closed exception in our cluster. Many
task's open/create/getFileInfo calls get back 'SocketException' with message
'Socket closed'. We seem to
stchu wrote:
But when the web-ui shows the node dead, -report still shows "in service"
and the living nodes=3 (in web-ui: living=2 dead=1).
please file a jira and describe how to reproduce in as much detail as
you can in a comment.
thanks,
Raghu.
stchu
2009/3/26 Raghu Angad
stchu wrote:
Hi,
I do a test about the datanode crash. I stop the networking on one of the
datanode.
The Web app and fsck report that datanode dead after 10 mins. But dfsadmin
-report
are not report that over 25 mins. Is this correct?
Nope. Both web-ui and '-report' from the same source of inf
What is scale you are thinking of? (10s, 100s or more nodes)?
The memory for metadata at NameNode you mentioned is that main issue
with small files. There are multiple alternatives for the dealing with
that. This issue is discussed many times here.
Also please use core-user@ id alone for ask
you need https://issues.apache.org/jira/browse/HADOOP-5191
I don't why there is no response to the simple patch I attached.
alternately you could use hostname that it expects instead of ip address.
Raghu.
snehal nagmote wrote:
Hi,
I am using Hadoop version 0.19. I set up a hadoop cluster for
is a bit scary - what are the reasons to go with 0.20.0 instead
of 0.19.2? Yahoo is jumping from 0.18.x directly to 0.20.0? Why is Yahoo
skipping the 0.19.x release?
Is the expectation that 0.19.2 will be released at the same time as 0.20.0?
Thanks,
David
On Wed, Mar 18, 2009 at 1:31 PM,
Short is answer I am afraid is no.
As an alternative, I recommend upgrading to latest 0.19.x or 0.20.0 (to
be released in couple of days). 0.19.2 is certainly a lot better than
0.19.0. Yahoo is rolling out 0.20.x if that helps your confidence.
Raghu.
David Ritch wrote:
There is an establis
node
web ui reported size.
I'm waiting for the next time this happens to collect more details, but
ever since I wrote the first email - everything works perfectly well
(another application of Murphy law).
Thanks,
Igor
-----Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.c
TCK wrote:
How well does the read throughput from HDFS scale with the number of data nodes
?
For example, if I had a large file (say 10GB) on a 10 data node cluster, would the time taken to read this whole file in parallel (ie, with multiple reader client processes requesting different parts of
Raghu Angadi wrote:
Amandeep Khurana wrote:
My dfs.datanode.socket.write.timeout is set to 0. This had to be done
to get
Hbase to work.
ah.. I see, we should fix that. Not sure how others haven't seen it till
now. Affects only those with write.timeout set to 0 on the clients.
value?
very large value like 100 years is same as setting it to 0 (for all
practical purposes).
Raghu.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 12:00 PM, Raghu Angadi wrote:
Amandeep Khurana wrote:
My
you should not have 100% of blocks missing.
There are many possibilities, it not easy for me list the the right one
in your case without much info or list all possible conditions.
Raghu.
Mayuran Yogarajah wrote:
Mayuran Yogarajah wrote:
Raghu Angadi wrote:
The block files usually don
is a work around, please change that to
some extremely large value for now.
Raghu.
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 11, 2009 at 10:23 AM, Raghu Angadi wrote:
Did you change dfs.datanode.socket.write.timeout to 5 seconds
Doug Cutting wrote:
Konstantin Shvachko wrote:
Clarifying: port # is missing in your configuration, should be
fs.default.name
hdfs://hvcwydev0601:8020
where 8020 is your port number.
That's the work-around, but it's a bug. One should not need to specify
the default port number (8020).
Did you change dfs.datanode.socket.write.timeout to 5 seconds? The
exception message says so. It is extremely small.
The default is 8 minutes and is intentionally pretty high. Its purpose
is mainly to catch extremely unresponsive datanodes and other network
issues.
Raghu.
Amandeep Khurana
Mayuran Yogarajah wrote:
lohit wrote:
How many Datanodes do you have.
From the output it looks like at the point when you ran fsck, you had
only one datanode connected to your NameNode. Did you have others?
Also, I see that your default replication is set to 1. Can you check
if your datanodes
ding these files.
hope this helps.
Raghu.
Thank you for help!
Igor
-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
Sent: Thursday, March 05, 2009 11:05 AM
To: core-user@hadoop.apache.org
Subject: Re: DataNode stops cleaning disk?
This is unexpected unless some
This is unexpected unless some other process is eating up space.
Couple of things to collect next time (along with log):
- All the contents under datanode-directory/ (especially including
'tmp' and 'current')
- Does 'du' of this directory match with what is reported to NameNode
(shown on we
What is the Hadoop version? DN limits deletes per heartbeat to 100 or so
I think. So the dead datanodes might not be dieing only because of
deletes... does stacktrace show that?
> [...] Ideally we would never see "dead" datanodes from doing deletes.
yes : HADOOP-4584 moves deletions out of h
what is the hadoop version?
You could check log on a datanode around that time. You could post any
suspicious errors. For e.g. you can trace a particular block in client
and datanode logs.
Most likely it not a NameNode issue, but you can check NameNode log as well.
Raghu.
Xavier Stevens wr
of file descriptors (something like 6 times the number
of active 'xceivers'). In your case you seem to have lot of simultaneous
clients. I suggest increasing file limit to much higher (something like
64k).
Raghu.
Regards,
Sean
2009/2/13 Raghu Angadi
didn't appear to be affecting things to
begin with.
Regards,
Sean
On Thu, Feb 12, 2009 at 2:07 PM, Raghu Angadi wrote:
You are most likely hit by
https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back
ported. There is a 0.18 patch posted there.
btw, does 16k help in your ca
You are most likely hit by
https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back
ported. There is a 0.18 patch posted there.
btw, does 16k help in your case?
Ideally 1k should be enough (with small number of clients). Please try
the above patch with 1k limit.
Raghu.
Sea
Vadim Zaliva wrote:
The particular problem I am having is this one:
https://issues.apache.org/jira/browse/HADOOP-2669
I am observing it in version 19. Could anybody confirm that
it have been fixed in 18, as Jira claims?
I am wondering why bug fix for this problem might have been committed
to 1
+1 on something like getValidBytes(). Just the existence of this would
warn many programmers about getBytes().
Raghu.
Owen O'Malley wrote:
On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:
Hey Tom,
I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ??
I don't think it is intentional. Please file a jira with all the details
about how to reproduce (with actual configuration files).
thanks,
Raghu.
Habermaas, William wrote:
After creation and startup of the hadoop namenode, you can only connect
to the namenode via hostname and not IP.
EX
Karl Kleinpaste wrote:
On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote:
The Datanode's use multiple threads with locking and one of the
assumptions is that the block report (1ce per hour by default) takes
little time. The datanode will pause while the block report is running
and if it happ
Doug Cutting wrote:
Ext2 by default reserves 5% of the drive for use by root only. That'd
be 45MB of your 907GB capacity which would account for most of the
discrepancy. You can adjust this with tune2fs.
plus, I think DataNode reports only 98% of the space by default.
Raghu.
Doug
Bryan D
Owen O'Malley wrote:
On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote:
By "scrub" I mean, have a tool that reads every block on a given data
node. That way, I'd be able to find corrupted blocks proactively
rather than having an app read the file and find it.
The datanode already has a thread t
nitesh bhatia wrote:
Thanks. It worked. :) in hadoop-env.sh its required to write exact path for
java framework. I changed it to
export
JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
and it started.
In hadoop 0.18.2 export JAVA_HOME=/Library/Java/Home is working fine.
nitesh bhatia wrote:
Hi
Apple provides opensource discovery service called Bonjour (zeroconf). Is it
possible to integrate Zeroconf with Hadoop so that discovery of nodes become
automatic ? Presently for setting up multi-node cluster we need to add IPs
manually. Integrating it with bonjour can ma
Nitay wrote:
Why not use the distributed coordination service ZooKeeper? When nodes come
up they write some ephemeral file in a known ZooKeeper directory and anyone
who's interested, i.e. NameNode, can put a watch on the directory and get
notified when new children come up.
NameNode does not do
Mark Kerzner wrote:
Raghu,
if I write all files only one, is the cost the same in one directory or do I
need to find the optimal directory size and when full start another
"bucket?"
If you write only once, then writing won't be much of an issue. You can
write them in lexical order to help wit
delete files).
Raghu.
On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi wrote:
If you are adding and deleting files in the directory, you might notice CPU
penalty (for many loads, higher CPU on NN is not an issue). This is mainly
because HDFS does a binary search on files in a directory each
Raghu Angadi wrote:
If you are adding and deleting files in the directory, you might notice
CPU penalty (for many loads, higher CPU on NN is not an issue). This is
mainly because HDFS does a binary search on files in a directory each
time it inserts a new file.
I should add that equal or
If you are adding and deleting files in the directory, you might notice
CPU penalty (for many loads, higher CPU on NN is not an issue). This is
mainly because HDFS does a binary search on files in a directory each
time it inserts a new file.
If the directory is relatively idle, then there is
> It seems hdfs isn't so robust or reliable as the website says and/or I
> have a configuration issue.
quite possible. How robust does the website say it is?
I agree debuggings failures like the following is pretty hard for casual
users. You need look at the logs for block, or run 'bin/hadoop
Thanks Peter for the heads up.
Note that the problem is more severe with JVM's use of per-thread
selectors. https://issues.apache.org/jira/browse/HADOOP-4346 avoids
using JVM selectos. Even with HADOOP-4346, a limit of 128 is too small.
I wish 4346 went into earlier versions of Hadoop.
Ragh
Jason Venner wrote:
There is no reason to do the block scans. All of the modern kernels will
provide you notification when an file or directory is altered.
This could be readily handled with a native application that writes
structured data to a receiver in the Datanode, or via JNA/JNI for pure
Sagar Naik wrote:
Hi Raghu,
The periodic "du" and block reports thread thrash the disk. (Block
Reports takes abt on an avg 21 mins )
and I think all the datanode threads are not able to do much and freeze
yes, that is the known problem we talked about in the earlier mails in
this thread.
2M files is excessive. But there is no reason block reports should
break. My preference is to make block reports handle this better. DNs
dropping in and out of the cluster causes too many other problems.
Raghu.
Konstantin Shvachko wrote:
Hi Jason,
2 million blocks per data-node is not goin
The scan required for each block report is well known issue and it can
be fixed. It was discussed multiple times (e.g.
https://issues.apache.org/jira/browse/HADOOP-3232?focusedCommentId=12587795#action_12587795
).
Earlier, inline 'du' on datanodes used to cause the same problem and
they the
Jean-Adrien wrote:
Is it the responsibility of hadoop client too manage its connection pool
with the server ? In which case the problem would be an HBase problem?
Anyway I found my problem, it is not a matter of performances.
Essentially, yes. Client has to close the file to relinquish
connec
Did you look at FSEditLog.EditLogFileOutputStream.flushAndSync()?
This code was re-organized sometime back. But the guarantees it provides
should be exactly same as before. Please let us know otherwise.
Raghu.
Jason Venner wrote:
I have always assumed (which is clearly my error) that edit lo
Your configuration for task tracker, job tracker might be using external
hostnames. Essentially any hostnames in configuration files should
resolve to internal ips.
Raghu.
Genady wrote:
Hi,
We're using Hadoop 0.18.2/Hbase 0.18.1 four-nodes cluster on CentOS Linux,
in /etc/hosts the fol
I should add that your test should both create and delete files.
Raghu.
Raghu Angadi wrote:
Sandeep Dhawan wrote:
Hi,
I am trying to create a hadoop cluster which can handle 2000 write
requests
per second.
In each write request I would writing a line of size 1KB in a file.
This is
Sandeep Dhawan wrote:
Hi,
I am trying to create a hadoop cluster which can handle 2000 write requests
per second.
In each write request I would writing a line of size 1KB in a file.
This is essentially a matter of deciding how many datanodes (with the
given configuration) do you need to write
Your OS is running out of memory. Usually a sign of too many processes
(or threads) on the machine. Check what else is happening on the system.
Raghu.
sagar arlekar wrote:
Hello,
I am new to hadoop. I am running hapdoop 0.17 in a Eucalyptus cloud
instance (its a centos image on xen)
bin/hado
Konstantin Shvachko wrote:
1) If i set value of dfs.replication to 3 only in hadoop-site.xml of
namenode(master) and
then restart the cluster will this take effect. or i have to change
hadoop-site.xml at all slaves ?
dfs.replication is the name-node parameter, so you need to restart
only the
Sagar Naik wrote:
Hi ,
I would like to know what happens in case of DiskFull on a datanode
Does the datanode acts as block server only ?
Yes. I think so.
Does it rejects anymore Block creation request OR Namenode does not list
it for new blocks
yes. NN will not allocate it any more blocks.
Brian Bockelman wrote:
Hey,
I hit a bit of a roadbump in solving the "truncated block issue" at our
site: namely, some of the blocks appear perfectly valid to the
datanode. The block verifies, but it is still the wrong size (it
appears that the metadata is too small too).
What's the best w
Brian Bockelman wrote:
On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:
Also it might be useful to strongly word hadoop-default.conf as many
people might not know a downside exists for using 2 rather then 3 as
the replication factor. Before reading this thread I would have
thought 2 to be su
FYI : Datanode does not run any user code and does not link with any
native/JNI code.
Raghu.
Chris Collins wrote:
Was there anything mentioned as part of the tombstone message about
"problematic frame"? What java are you using? There are a few reasons
for SIGBUS errors, one is illegal add
Dennis Kubes wrote:
From time to time a message pops up on the mailing list about OOM
errors for the namenode because of too many files. Most recently there
was a 1.7 million file installation that was failing. I know the simple
solution to this is to have a larger java heap for the nameno
There is one instance of NN where JVM process takes 40GB memory though
jvm is started with 24GB. Java heap is still 24GB. Looks like it ends up
taking a lot of memory outside. There are a lot entries in pmap similar
to below that account for the difference. Anyone knows what this might be?
F
1 - 100 of 198 matches
Mail list logo