On Aug 24, 2011, at 7:42 AM, Koert Kuipers wrote:
Does anyone have any experience using HP Proliant SL hardware for hadoop?
Yes. We've got ~600 of them.
We
are currently using DL 160 and DL 180 servers, and the SL hardware seems to
fit the bill for our new servers in many ways.
On Aug 22, 2011, at 3:00 AM, אבי ווקנין wrote:
I assumed that the 1.7GB RAM will be the bottleneck in my environment that's
why I am trying to change it now.
I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
replaced them with
2 datanodes with 7.5GB RAM (Amazon
On Aug 21, 2011, at 11:00 PM, steven zhuang wrote:
thanks Allen, I really wish there wasn't such a version 0.21.0. :)
It is tricky (lots of config work), but you could always run the two
versions in parallel on the same gear, distcp from 0.21 to 0.20.203, then
shutdown the 0.21
On Aug 21, 2011, at 7:17 PM, Michel Segel wrote:
Avi,
First why 32 bit OS?
You have a 64 bit processor that has 4 cores hyper threaded looks like 8cpus.
With only 1.7gb of mem, there likely isn't much of a reason to use a
64-bit OS. The machines (as you point out) are already tight
On Aug 19, 2011, at 12:39 AM, steven zhuang wrote:
I updated my hadoop cluster from 0.20.2 to higher version
0.21.0 because of MAPREDUCE-1286, and now I have problem running a Hbase on
it.
I saw the 0.21.0 version is marked as unstable, unsupported, does
not include
On Aug 17, 2011, at 10:53 AM, Matt Davies wrote:
Hello,
I'm playing around with the Capacity Scheduler (coming from the Fair
Scheduler), and it appears that a queue with jobs submitted by the same user
are treated as FIFO. So, for example, if I submit job1 and job2 to the
low queue as
On Aug 17, 2011, at 12:36 AM, Steven Hafran wrote:
after reviewing the hadoop docs, i've tried setting the following properties
when starting my streaming job; however, they don't seem to have any impact.
-jobconf mapred.tasktracker.reduce.tasks.maximum=1
tasktracker is the hint:
On Aug 15, 2011, at 9:00 PM, Chris Song wrote:
Why hadoop should be built in JAVA?
http://www.quora.com/Why-was-Hadoop-written-in-Java
How will it be if HADOOP is implemented in C or Phython?
http://www.quora.com/Would-Hadoop-be-different-if-it-were-coded-in-C-C++-instead-of-Java-How
On Jul 31, 2011, at 12:08 PM, jonathan.hw...@accenture.com
jonathan.hw...@accenture.com wrote:
I was asked by our IT folks if we can put hadoop name nodes storage using a
shared disk storage unit.
What do you mean by shared disk storage unit? There are lots of
products out there
We really need to build a working example to the wiki and add a link from the
FAQ page. Any volunteers?
On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:
Here's the meat of my post earlier...
Sample code on putting a file on the cache:
DistributedCache.addCacheFile(new
On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:
Thanks, I'm independently doing some digging into Hadoop networking
requirements and
had a couple of quick follow-ups. Could I have some specific info on why
different data centers
cannot be supported for master
On Jul 18, 2011, at 5:01 PM, Rita wrote:
I made the big mistake by using the latest version, 0.21.0 and found bunch
of bugs so I got pissed off at hdfs. Then, after reading this thread it
seems I should of used 0.20.x .
I really wish we can fix this on the website, stating 0.21.0 as
On Jul 18, 2011, at 6:02 PM, Rita wrote:
I am a dimwit.
We are conditioned by marketing that a higher number is always better.
Experience tells us that this is not necessarily true.
On Jul 11, 2011, at 9:28 AM, Juan P. wrote:
* property*
*namemapred.child.java.opts/name*
*value-Xmx400m/value*
* /property*
Single core machines with 600MB of RAM.
2x400m = 800m just for the heap of the map and reduce phases,
not counting the other memory that
On Jul 6, 2011, at 5:22 AM, Nitin Khandelwal wrote:
Hi,
I am using Hadoop 0.20.203 with the new API ( mapreduce package) . I want to
use Jobpriority, but unfortunately there is no option to set that in Job (
the option is there in 0.21.0). Can somebody plz tell me is there is a
walkaround
On Jun 15, 2011, at 3:18 AM, Steve Loughran wrote:
yes, put it in the same place on your HA storage and you may not even need to
reconfigure it. If you didn't shut down the filesystem cleanly, you'll need
to replay the edit logs.
As a sidenote...
Lots of weird
On Jun 13, 2011, at 5:52 AM, Steve Loughran wrote:
Unless your cluster is bigger than Facebooks, you have too many small files
+1
(I'm actually sort of surprised the NN is still standing with only 24mb. The
gc logs would be interesting to look at.)
I'd also likely increase the block
On Jun 12, 2011, at 11:15 PM, Nitesh Kaushik wrote:
Dear Sir/Madam,
I am Nitesh kaushik, working with institution dealing in
satellite images. I am trying to read bulk images
using Hadoop but unfortunately that is not working, i am not
getting any clue how to
On May 27, 2011, at 7:26 AM, DAN wrote:
You see you have 2 Solaris servers for now, and dfs.replication is setted
as 3.
These don't match.
That doesn't matter. HDFS will basically flag any files written with a
warning that they are under-replicated.
The problem is that
On May 27, 2011, at 1:18 PM, Xu, Richard wrote:
Hi Allen,
Thanks a lot for your response.
I agree with you that it does not matter with replication settings.
What really bothered me is same environment, same configures, hadoop 0.20.203
takes us 3 mins, why 0.20.2 took 3 days.
Can
On May 17, 2011, at 1:01 PM, Mark question wrote:
Hi
I need to use hadoop-tool-kit for monitoring. So I followed
http://code.google.com/p/hadoop-toolkit/source/checkout
and applied the patch in my hadoop.20.2 directory as: patch -p0 patch.20.2
Looking at the code, be aware
On May 17, 2011, at 3:11 PM, Mark question wrote:
So what other memory consumption tools do you suggest? I don't want to do it
manually and dump statistics into file because IO will affect performance
too.
We watch memory with Ganglia. We also tune our systems such that a
task will
On May 11, 2011, at 11:11 AM, Adi wrote:
By our calculations hadoop should not exceed 70% of memory.
Allocated per node - 48 map slots (24 GB) , 12 reduce slots (6 GB), 1 GB
each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
allocation.
It sounds like you are only
On May 10, 2011, at 9:57 AM, Gang Luo wrote:
I was confused by the configuration and file system in hadoop. when we create
a
FileSystem object and read/write something through it, are we writing to or
reading from HDFS?
Typically, yes.
Could it be local file system?
On Apr 13, 2011, at 12:38 PM, Jeffrey Wang wrote:
It's just in my home directory, which is an NFS mount. I moved it off NFS and
it seems to work fine. Is there some reason it doesn't work with NFS?
Locking on NFS--regardless of application--is a dice roll, especially when
client/server are
On Apr 5, 2011, at 5:22 AM, Matthew John wrote:
Can HDFS run over a RAW DISK which is mounted over a mount point with
no FIle System ? Or does it interact only with POSIX compliant File
sytem ?
It needs a POSIX file system.
On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:
Hi All
I have created a map reduce job and to run on it on the cluster, i have
bundled all jars(hadoop, hbase etc) into single jar which increases the size
of overall file. During the development process, i need to copy again and
again this
On Apr 1, 2011, at 9:47 PM, Guang-Nan Cheng wrote:
The problem is MapRed process seems can't load RVM. I added
/etc/profile.d/rvm.sh in hadoop-env.sh. But the script still fails due
to the same error.
Add this to the .bashrc:
[ -x /etc/profile ] . /etc/profile
and
On Apr 1, 2011, at 3:03 AM, Nitin Khandelwal wrote:
Hi,
I am right now stuck on the issue of division of tasks among slaves for a
job. Currently, as far as I know, hadoop does not allow us to fix/determine
in advance how many tasks of a job would run on each slave. I am trying to
design
On Mar 31, 2011, at 7:43 AM, XiaoboGu wrote:
I have trouble browsing the file system vi namenode web interface, namenode
saying in log file that th –G option is invalid to get the groups for the
user.
I don't but I suspect you'll need to enable one of the POSIX
personalities
On Mar 26, 2011, at 3:50 PM, Ted Dunning wrote:
I think that the namenode remembers the rack. Restarting the datanode
doesn't make it forget.
Correct.
https://issues.apache.org/jira/browse/HDFS-870
On Mar 23, 2011, at 7:29 AM, Rita wrote:
I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
instead of the standard Hadoop distribution.
What do most people use? Is CDH free? do they provide the tars or does it
provide source code and I simply compile? Can I have
On Mar 23, 2011, at 2:02 PM, Keith Wiley wrote:
hadoop job -kill-task and -fail-task don't work for me. I get this kind of
error:
Exception in thread main java.lang.IllegalArgumentException: TaskAttemptId
string : task_201103101623_12995_r_00 is not properly formed
at
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:
On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
In addition to what everyone else has said, I'm fairly certain that
-rmr / is specifically safeguarded against. But /* might have slipped through
the cracks.
What are
On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:
On HDFS, anyone can run hadoop fs -rmr /* and delete everything.
In addition to what everyone else has said, I'm fairly certain that
-rmr / is specifically safeguarded against. But /* might have slipped through
the cracks.
What are
(Removing common-dev, because this isn't a dev question)
On Feb 26, 2011, at 7:25 AM, bikash sharma wrote:
Hi,
I have a 10 nodes Hadoop cluster, where I am running some benchmarks for
experiments.
Surprisingly, when I initialize the Hadoop cluster
(hadoop/bin/start-mapred.sh), in many
On Mar 8, 2011, at 1:21 PM, Ratner, Alan S (IS) wrote:
We had tried putting all the libraries directly in HDFS with a pointer in
mapred-site.xml:
propertynamemapred.child.env/namevalueLD_LIBRARY_PATH=/user/ngc/lib/value/property
as described in
On Mar 6, 2011, at 9:37 PM, phil young wrote:
I do appreciate the body of work that this is all built on and your
responses, but I am surprised that there isn't a more comprehensive set of
How-Tos. I'll assist in writing up something like that when I get a chance.
Are other people using
I'm more than a little concerned that you missed the whole multiple
directories--including a remote one--for the fsimage thing. That's probably
the #1 thing that most of the big grids do to maintain the NN data. I can only
remember one failure where the NFS copy wasn't used to
See also https://issues.apache.org/jira/browse/HADOOP-5670 .
On Feb 16, 2011, at 3:32 PM, Ted Yu wrote:
You need to develop externalization yourself.
Our installer uses place holders such as:
property
namefs.checkpoint.dir/name
On Feb 11, 2011, at 6:12 AM, Alexander Schätzle wrote:
Hello,
I'm a little bit confused about the right key for specifying the User History
Location in CDH3B3 (which is Hadoop 0.20.2+737). Could anybody please give me
a short answer which key is the right one and which configuration file
On Feb 8, 2011, at 7:45 AM, Oleg Ruchovets wrote:
Hi , we are going to production and have some questions to ask:
We are using 0.20_append version (as I understand it is hbase 0.90
requirement).
1) Currently we have to process 50GB text files per day , it can grow to
150GB
On Feb 4, 2011, at 7:46 AM, Keith Wiley wrote:
I have since discovered that in the case of streaming, mapred.map.tasks is a
good way to achieve this goal. Ironically, if I recall correctly, this
seemingly obvious method for setting the number mappers did not work so well
in my original
On Feb 3, 2011, at 9:16 AM, Keith Wiley wrote:
I've seen this asked before, but haven't seen a response yet.
If the input to a streaming job is not actual data splits but simple HDFS
file names which are then read by the mappers, then how can data locality be
achieved.
If I
On Feb 1, 2011, at 11:40 PM, Keith Wiley wrote:
I would really appreciate any help people can offer on the following matters.
When running a streaming job, -D, -files, -libjars, and -archives don't seem
work, but -jobconf, -file, -cacheFile, and -cacheArchive do. With the first
four
On Jan 25, 2011, at 12:48 PM, Renaud Delbru wrote:
As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not
allow a hard upper limit in number of concurrent tasks, do anybody know any
other solutions to achieve this ?
The specific change for capacity scheduler has been
On Jan 28, 2011, at 1:09 AM, rishi pathak wrote:
Hi,
Is there a way to drain a tasktracker. What we require is not to
schedule any more map/red tasks onto a tasktracker(mark it offline) but
still the running tasks should not be affected.
Decommissioning task trackers was
On Jan 28, 2011, at 5:47 PM, Keith Wiley wrote:
On Jan 28, 2011, at 15:50 , Greg Roelofs wrote:
Does your .so depend on any other potentially thread-unsafe .so that other
(non-Hadoop) processes might be using? System libraries like zlib are safe
(else they wouldn't make very good system
On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote:
Dear all,
Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform
some jobs ). But today when I start my cluster I came across a problem on one
of my datanodes.
Are you running this on NFS?
2011-01-11
On Jan 6, 2011, at 12:39 AM, Otis Gospodnetic wrote:
In the case of Hadoop, no. There has usually been at least a core dump,
message in syslog, message in datanode log, etc, etc. [You *do* have
cores
enabled, right?]
Hm, cores enabled what do you mean by that? Are you
On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:
Ah, more manual work! :(
You guys never have JVM die just because? I just had a DN's JVM die the
other day just because and with no obvious cause. Restarting it brought it
back to life, everything recovered smoothly. Had some
On Jan 5, 2011, at 7:57 PM, Lance Norskog wrote:
Isn't this what Ganglia is for?
No.
Ganglia does metrics, not monitoring.
On 1/5/11, Allen Wittenauer awittena...@linkedin.com wrote:
On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:
Ah, more manual work
On Jan 4, 2011, at 10:53 AM, sagar naik wrote:
The only reason, I can think of not starting a reduce task is to
avoid the un-necessary transfer of map output data in case of
failures.
Reduce tasks also eat slots while doing the map output. On shared
grids, this can be extremely
On Jan 3, 2011, at 2:22 AM, Otis Gospodnetic wrote:
I see over on http://search-hadoop.com/?q=monit+daemontools that people *do*
use
tools like monit and daemontools (and a few other ones) to keep revive their
Hadoop processes when they die.
I'm not a fan of doing this for
On Dec 15, 2010, at 2:13 AM, maha wrote:
Hi everyone,
Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is
supposed to put each file from the input directory in a SEPARATE split.
Is there some reason you don't just use normal InputFormat with an
extremely high
On Dec 15, 2010, at 9:26 AM, Konstantin Boudnik wrote:
Hey, commit rights won't give you a nice looking certificate, would it? ;)
Isn't that what Photoshop is for?
On Dec 13, 2010, at 3:14 PM, Seth Lepzelter wrote:
Alright, a little further investigation along that line (thanks for the
hint, can't believe I didn't think of that), shows that there's actually a
carriage return character (%0D, aka \r) at the end of the filename.
This falls into
On Dec 13, 2010, at 8:51 AM, Seth Lepzelter wrote:
I've got a smallish cluster of 12 nodes up from 6, that we're using to dip
our feet into hadoop. One of my users has a few directories in his HDFS
home which he was using to test, and which exist, according to
hadoop fs -ls home
On Dec 11, 2010, at 3:09 AM, Rob Stewart wrote:
Or - is there support in Hadoop for multi-core nodes?
Be aware that writing a job that is specifically uses multi-threaded
tasks usually means that a) you probably aren't really doing map/reduce anymore
and b) the job will likely tickle
On Nov 19, 2010, at 5:56 PM, Skye Berghel wrote:
All of the information I've seen online suggests that this is because
mapreduce.jobtracker.address is set to local. However, in
conf/mapred-site.xml I have
property
namemapreduce.jobtracker.address/name
[Yes, gmail people, this likely went to your junk folder. ]
On Nov 13, 2010, at 5:28 PM, Lance Norskog wrote:
It is considered good manners :)
Seriously, if you want to attract a community you have an obligation
to tell them when you're going to jerk the rug out from under their
feet.
On Nov 14, 2010, at 12:05 AM, Allen Wittenauer wrote:
The rug has been jerked in various ways for every micro version since
as long as I've been with Hadoop.
s,micro,minor,
But i'm sure you knew that.
On Nov 7, 2010, at 9:44 AM, Zhenhua Guo wrote:
It seems that currently Hadoop assumes that it is installed under the
same directory using the same userid on different machines.
Currently, following two bullets cannot be done without hacks (correct
me if I am wrong):
1) install Hadoop as
On Nov 7, 2010, at 10:48 AM, sudharshan wrote:
Hello I am a newbie to hadoop environment. I got stuck with a problem on
running jar file in hadoop
I have a mapreduce application for which I have created a web interface to
run. The application runs perfectly in command line terminal. But
On Nov 5, 2010, at 8:33 AM, Shavit Netzer wrote:
Hi,
I have hadoop cluster with 24 nodes.
Each node have 4 mount disks mnt - mnt3.
I'm getting alerts on mnt and the capacity is 99% full on it.
I have free space on other partitions mnt1-mnt3.
I used the balancer, but I still get
On Nov 3, 2010, at 7:27 PM, David B. Ritch wrote:
The parameter mapred.hosts.exclude has existed in documentation for many
versions, but I do not believe it has ever been implemented in the
actual code.
mradmin -refreshNodes was added in 0.21. So code definitely exists for it.
That said,
On Oct 27, 2010, at 10:44 AM, Geet Garg wrote:
I'm absolutely stuck. I've tried increasing the java heap size in
hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.
Can anyone help me please?
hadoop-env.sh is for the daemons. You need to increase the heap in
http://wiki.apache.org/hadoop/LimitingTaskSlotUsage
On Oct 20, 2010, at 4:43 PM, bichonfrise74 wrote:
When I run hive, the derby.log will be created in the current directory. I
was looking around and found that if I create derby.properties and add the
line derby.stream.error.file=/hadoop/hive/logs/hadoop/derby.log, then the
derby.log will be
On Oct 18, 2010, at 3:33 AM, elton sky wrote:
When I use blockSize bigger than 2GB, which is out of the boundary of
integer something weird would happen. For example, for a 3GB block it will
create more than 2Million packets.
Anyone noticed this before?
On Oct 18, 2010, at 4:08 PM, elton sky wrote:
Why would you want to use a block size of 2GB?
For keeping a maps input split in a single block~
Just use mapred.min.split.size + multifileinputformat.
On Oct 17, 2010, at 2:41 PM, Gregor Willemsen wrote:
Of course it is simple if you know about it. From an Linux and Java
point of view the Solaris way is not that obvious.
Java has nothing to do with GNU tar's extensions or gcc.
Forward compatibility is definitely not the
On Oct 17, 2010, at 10:29 PM, shangan wrote:
then you just write a shell to remove the logs periodically as a workaround?
or better ideas ?
We basically have a cron job that does a few things as part of our maintenance.
We have it rigged up such that it runs on the namenode and then, over
On Oct 16, 2010, at 1:08 PM, Bruce Williams wrote:
I am doing a student Independent Study Project and Harvery Mudd has given
me 13 Sun Netra X1 I can use as a dedicated Hadoop cluster. Right now they
are without an OS.
If anyone with experience with Hadoop and Solaris can contact me off
On Sep 23, 2010, at 11:57 AM, Andrew Purtell wrote:
From: Todd Lipcon t...@cloudera.com
[...]
4000 xcievers is a lot.
2:1 ratio of file descriptors to xceivers. 4000 xceivers is
quite normal on a heavily loaded HBase cluster in my experience.
We run with 10K xceivers...
The problem
It is a not enough information error.
Check the tasks, jobtracker, tasktracker, datanode, and namenode logs.
On Sep 21, 2010, at 12:30 PM, C.V.Krishnakumar wrote:
Hi,
Just wanted to know if anyone has any idea about this one? This happens every
time I run a job.
Is this issue hardware
On Sep 10, 2010, at 11:53 AM, Mark wrote:
If I submit a jar that has a lib directory that contains a bunch of jars,
shouldn't those jars be in the classpath and available to all nodes?
Are you using distributed cache?
On Sep 9, 2010, at 10:13 AM, jiang licht wrote:
In case of multiple folders from different disk drives are used for DFS on a
data node, what is the best way to balance their disk usage?
As I understand, hadoop writes data to these folders in a round-robin
fashion. Most time, it reaches a
On Sep 9, 2010, at 3:15 PM, Mark wrote:
How would I go about gracefully stopping/aborting a job?
http://wiki.apache.org/hadoop/FAQ#A32
On Sep 8, 2010, at 10:00 AM, Harsh J wrote:
Hosts file or the slaves file? A valid datanode must be in the slaves
file. Alternatively you can see if they are 'triggered' to start by
start-dfs.sh or not.
No it doesn't.
The slaves file is only used by the start commands.
The hosts file is
On Sep 1, 2010, at 9:08 AM, Todd Lipcon wrote:
Currently hadoop gets its user groups from the posix user/groups.
... based upon what the client sends, not what the server knows.
Not anymore in trunk or the security branch - now it's mapped on the
server side with a configurable resolver
On Aug 31, 2010, at 12:58 PM, jiang licht wrote:
Is the number of nodes to be decommissioned bounded by replication factor?
No, it is bounded mainly by network bandwidth and NN/DN RPC call rate. I've
seen decommissions of like 400 nodes at once.
On Aug 23, 2010, at 6:49 AM, cliff palmer wrote:
Thanks Harsh, but I am still not sure I understand what is going on.
The directory specified in the dfs.name.dir property,
/var/lib/hadoop-0.20/dfsname, does exist and rights to that directory have
been granted to the OS user that is running
tail -f
On Aug 15, 2010, at 10:34 AM, Kris Jirapinyo wrote:
1) Our new cluster has 25 machines but 100 mappers. When distcp is
triggered, it seems to allocate 4 mappers per machine. Is this normal? The
issue here is that say distcp only needs 8 mappers, I would think that distcp
would try to
On Aug 15, 2010, at 8:07 PM, Kevin . wrote:
I tried your recommendation, absolute path, it worked, I was able to run the
jobs successfully. Thank you!
I was wondering why hadoop.tmp.dir ( or mapred.local.dir ? ) with relative
path didn't work.
Allen's Hadoop Operations Rule #1: Nothing
On Aug 13, 2010, at 11:41 AM, Jinsong Hu wrote:
and run the namenode with the following jvm config
-Xmx1000m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:+UseCompressedOops
-XX:+DoEscapeAnalysis -XX:+AggressiveOpts -Xmx2G
On Aug 11, 2010, at 3:05 PM, Bobby Dennett wrote:
Has anyone hit any nasty issues with regards to the Capacity Scheduler
and, in general, are there any gotchas to look out for with either
scheduler?
We've been running Capacity for about 8 months now on all of our grids. In
order to make it
On Aug 10, 2010, at 6:46 AM, Raj V wrote:
I need to start setting up a large - hadoop cluster of 512 nodes . My biggest
problem is the SSH keys. Is there a simpler way of generating and exchanging
ssh
keys among the nodes? Any best practices? If there is none, I could volunteer
to
do
On Aug 10, 2010, at 7:07 AM, Brian Bockelman wrote:
Hi Erik,
You can also do this one-by-one (aka, a rolling reboot). Shut it down, wait
for it to be recognized as dead, then bring it back up with a new hostname.
It will take a much longer time, but you won't have any decrease in
On Aug 10, 2010, at 10:54 AM, Bill Graham wrote:
Is is correct to say that that would work fine? We have a replication factor
of 2, so we'd be copying twice as much data as we'd need to so I'm sure
there's a more efficient approach.
It should work fine. But yes, highly inefficient.
What
On Aug 6, 2010, at 8:35 AM, He Chen wrote:
Way#3
1) bring up all 8 dn and the nn
2) retire one of your 4 nodes:
kill the datanode process
hadoop dfsadmin -refreshNodes (this should be done on nn)
No need to refresh nodes. It only re-reads the dfs.hosts.* files.
On Jul 22, 2010, at 4:40 AM, Vitaliy Semochkin wrote:
If it was a context switching would the increasing number of
mappers/reducers lead to performance improvement?
Woops, I misspoke. I meant process switching (which I guess is a form of
context switching). More on that later.
I have
On Jul 21, 2010, at 6:29 PM, Bobby Dennett wrote:
The team that manages our Hadoop clusters is currently being pressured
to reduce block replication from 3 to 2 in our production cluster. This
request is for various reasons -- particularly the reduction of used
space in the cluster and
On Jul 14, 2010, at 6:07 AM, Some Body wrote:
I moved everything out of lib/native/Linux-amd64/ then linked the libhadoop
to libz.
ldd liobhadoop.so shows this:
Is libhadoop still in there?
On Jul 14, 2010, at 6:54 AM, Some Body wrote:
[r...@namenode] # ls -lgo
lrwxrwxrwx 1 6 2010-07-14 01:54 libhadoop.a - libz.a
lrwxrwxrwx 1 18 2010-07-14 01:19 libhadoop.so - libhadoop.so.1.0.0
lrwxrwxrwx 1 18 2010-07-14 01:19 libhadoop.so.1 - libhadoop.so.1.0.0
lrwxrwxrwx
On Jul 14, 2010, at 8:45 AM, Some Body wrote:
I changed it back to the way it was. This is Cloudera's 0.20.2+228 release.
...
[r...@namenode] # ldd libhadoop.so.1.0.0
./libhadoop.so.1.0.0: /lib64/tls/libc.so.6: version `GLIBC_2.4' not found
(required by ./libhadoop.so.1.0.0)
OK, here's
On Jul 13, 2010, at 7:17 AM, Some Body wrote:
I followed the steps from the native library guide
We need to rewrite that guide. It is pretty clear that we have overloaded the
term native libraries enough that no one understands what anyone else is
talking about.
1. put the OS's libz libs
When you write on a machine running a datanode process, the data is *always*
written locally first. This is to provide an optimization to the MapReduce
framework. The lesson here is that you should *never* use a datanode machine
to load your data. Always do it outside the grid.
On Jul 13, 2010, at 5:00 PM, u235sentinel wrote:
So we're talking to Dell about their new PowerEdge c2100 servers for a Hadoop
cluster but I'm wondering. Isn't this still a little overboard for nodes in
a cluster? I'm wondering if we bought say 100 poweredge 2750's instead of
just 50
1 - 100 of 202 matches
Mail list logo