My intention isn't to make it a mandatory feature just as an option.
Keeping data locally on a filesystem as a method of Lx cache is far better
than getting it from the network and the cost of fs buffer cache is much
cheaper than a RPC call.
On Mon, Jan 16, 2012 at 1:07 PM, Edward Capriolo
Hello,
How much memory/JVM heap does NameNode use for each block?
I've tried locating this in the FAQ and on search-hadoop.com, but couldn't find
a ton of concrete numbers, just these two:
http://search-hadoop.com/m/RmxWMVyVvK1 - 150 bytes/block?
http://search-hadoop.com/m/O886P1VyVvK1 - 1 GB
How much memory/JVM heap does NameNode use for each block?
I don't remember the exact number, it also depends on which version of
Hadoop you're using
http://search-hadoop.com/m/O886P1VyVvK1 - 1 GB heap for every object?
It's 1 GB for every *million* objects (files, blocks, etc.). This is a
On Tue, Jan 17, 2012 at 10:08 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hello,
How much memory/JVM heap does NameNode use for each block?
I've tried locating this in the FAQ and on search-hadoop.com, but
couldn't find a ton of concrete numbers, just these two:
Hi,
The significant factor in cluster loading is memory, not CPU. Hadoop views the
cluster only with respect to memory and cares not about CPU utilization or Disk
saturation. If you run too many TaskTrackers, you risk memory overcommit where
the Linux OOM will come out of the closet and
Hi Ravi,
You'll probably need to up the replication level of the affected files
and then drop it back down to the desired level. Current versions of
HDFS do not automatically repair rack policy violations if they're
introduced in this manner.
-Todd
On Mon, Jan 16, 2012 at 3:53 PM, rk vishu
Thank you very much Todd. I hope futute versions of hadoop rebalcer will
include this check.
I have one more question.
If we are in the process of setting up additional nodes incrementally in
different rack (say rack-2) and rack-2 size is only 25% of rack-1, how
would data be balanced (with
I think I've found a bug in the Merger code for Hadoop.
When the Map job runs, it creates spill files based on io.sort.mb. It then
sorts io.sort.factor files at a time in order to create an output file
that's passed to the reduce job. The higher these two settings are
configured, the more
Hi,
This posting is essentially about a bug, but it is also related to a
programmatic idiom endemic to hadoop. Thus, I am posting to
'common-user' as opposed to 'common-dev'; if the latter is more
appropriate, please
let me know. Also, I checked jira and was unable to find a bug match.
Hi Guys,
I'm running a Clojure code inside Solr 3.4 that makes call to Mahout
.4 for some text clustering job. Due to some issues with Clojure I had
to put all the jar files in the solr war file ('WEB-INF/lib'). I also
made sure to put hadoop core and mapreduce config xml files in the
same
Konstantin's paper
http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf
mentions that on average a file consumes about 600 bytes of memory in the
name-node (1 file object + 2 block objects).
To quote from his paper (see page 9)
.. in order to store 100 million files
Hi,
tl;dr DUMMY should not be static.
On Tue, Jan 17, 2012 at 3:21 PM, Stan Rosenberg
srosenb...@proclivitysystems.com wrote:
class MyKeyT implements WritableComparableT {
private String ip; // first part of the key
private final static Text DUMMY = new Text();
...
public void
On Tue, Jan 17, 2012 at 6:38 PM, Brock Noland br...@cloudera.com wrote:
This class is invalid. A single thread will be executing your mapper
or reducer but there will be multiple threads (background threads such
as the SpillThread) creating MyKey instances which is exactly what you
are seeing.
Guys !
So can i say that if memory usage is more than say 90 % the node is
overloaded.
If so, what can be that threshold percent value or how can we find it ?
Arun
--
View this message in context:
Hi,
whatever I do, I can't make it work, that is, I cannot use
s3://host
or s3n://host
as a replacement for HDFS while runnings EC2 cluster. I change the settings
in the core-file.xml, in hdfs-site.xml, and start hadoop services, and it
fails with error messages.
Is there a place where this
Hey Mark,
What is the exact trouble you run into? What do the error messages indicate?
This should be definitive enough I think: http://wiki.apache.org/hadoop/AmazonS3
On Wed, Jan 18, 2012 at 11:55 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote:
Hi,
whatever I do, I can't make it work, that
Well, here is my error message
Starting Hadoop namenode daemon: starting namenode, logging to
/usr/lib/hadoop-0.20/logs/hadoop-hadoop-namenode-ip-10-126-11-26.out
ERROR. Could not start Hadoop namenode daemon
Starting Hadoop secondarynamenode daemon: starting secondarynamenode,
logging to
When using S3 you do not need to run any component of HDFS at all. It
is meant to be an alternate FS choice. You need to run only MR.
The wiki page at http://wiki.apache.org/hadoop/AmazonS3 mentions on
how to go about specifying your auth details to S3, either directly
via the fs.default.name URI
Hi,
I often run into situations like this:
I am running a very heavy job(let's say job 1) on a hadoop cluster(which
takes many hours). Then something comes up that needs to be done very
quickly(let's say job 2).
Job 2 only takes a couple of hours when executed on hadoop. But it will
take a couple
That wiki page mentiones hadoop-site.xml, but this is old, now you have
core-site.xml and hdfs-site.xml, so which one do you put it in?
Thank you (and good night Central Time:)
mark
On Wed, Jan 18, 2012 at 12:52 AM, Harsh J ha...@cloudera.com wrote:
When using S3 you do not need to run any
Edward,
You need to invest in configuring a non-FIFO scheduler. FairScheduler may be
what you are looking for. Take a look at
http://hadoop.apache.org/common/docs/current/fair_scheduler.html for the docs.
On 18-Jan-2012, at 12:27 PM, edward choi wrote:
Hi,
I often run into situations like
Ah sorry about missing that. Settings would go in core-site.xml (hdfs-site.xml
will no longer be relevant anymore, once you switch to using S3).
On 18-Jan-2012, at 12:36 PM, Mark Kerzner wrote:
That wiki page mentiones hadoop-site.xml, but this is old, now you have
core-site.xml and
Hi,
What is the minimum size of the container in hadoop yarn.
capability.setmemory(xx);
23 matches
Mail list logo