Hey there,
I have a doubt about reduce tasks and block writes. Do a reduce task always
first write to hdfs in the node where they it is placed? (and then these
blocks would be replicated to other nodes)
In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and one
(node A) just run
Marc, see my inline comments.
On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote:
Hey there,
I have a doubt about reduce tasks and block writes. Do a reduce task always
first write to hdfs in the node where they it is placed? (and then these
blocks would be
Assuming that node A only contains replica, there is no garante that its
data would never be read.
First, you might lose a replica. The copy inside the node A could be used
to create the missing replica again.
Second, data locality is on best effort. If all the map slots are occupied
except one on
But since node A has no TT running, it will not run map or reduce tasks. When
the reducer node writes the output file, the fist block will be written on the
local node and never on node A.
So, to answer the question, Node A will contain copies of blocks of all output
files. It wont contain the
And that would help you with performance too.
Were you originally planning to have one file per word document?
What is the average size of you word documents?
It shouldn't be much. I am afraid your map startup time won't be negligible
in that case.
Regards
Bertrand
On Fri, Aug 24, 2012 at 8:07
did you run the command bin/hadoop namenode -format before starting
the namenode ?
On Fri, Aug 24, 2012 at 12:58 PM, Abhay Ratnaparkhi
abhay.ratnapar...@gmail.com wrote:
Hello,
I had a running hadoop cluster.
I restarted it and after that namenode is unable to start. I am getting
error
Hi,
Thank you for the suggestion. Actually I was using poi to extract text, but
since now I have so many documents I thought I will use hadoop directly to
parse as well. Average size of each document is around 120 kb. Also I want to
read multiple lines from the text until I find a blank
hi,
Have u rubn the command namenode -format???
Thanks regards ,
Vivek
On Fri, Aug 24, 2012 at 12:58 PM, Abhay Ratnaparkhi
abhay.ratnapar...@gmail.com wrote:
Hello,
I had a running hadoop cluster.
I restarted it and after that namenode is unable to start. I am getting
error saying that
Hi Abhay
What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to /tmp
the contents would be deleted on a OS restart. You need to change this location
before you start your NN.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Abhay
Hello,
I was using cluster for long time and not formatted the namenode.
I ran bin/stop-all.sh and bin/start-all.sh scripts only.
I am using NFS for dfs.name.dir.
hadoop.tmp.dir is a /tmp directory. I've not restarted the OS. Any way to
recover the data?
Thanks,
Abhay
On Fri, Aug 24, 2012 at
Hello Siddharth,
You can tweak the NLineInputFormat as per your requirement and use
it. It allows us to read a specified no of lines
unlike TextInputFormat. Here is a good post by Boris and Michael on
custom record reader. Also I would suggest you to
combine similar files together into one
Hi, Sonal
Thanks for your information.
Because some users want to modify the source codes of hadoop, these
users want to install their own hadoop version in the same clusters. After
they modify their hadoop version, they may compile and install the modified
hadoop version and don't
Hi Li,
The approach of everyone having their own version of hadoop on same
cluster is way too much complicated.
Good aproach would be all of them test the patches they are compiling
for hadoop in pseudo mode on their person laptops/desktops and then
merge the features or patches with a single
Hi, a vpn or simply first uploading the files to an ec2 node is the best option
but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...
-Håvard
On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn
You should start with a reboot of the system.
A lesson to everyone, this is exactly why you should have a secondary
name node
(http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F)
and run the namenode a mirrored RAID-5/10 disk.
-Håvard
On Fri, Aug 24, 2012 at
All the links at: http://www.apache.org/dyn/closer.cgi/hadoop/common/ are
returning 404s, even the backup site at:
http://www.us.apache.org/dist/hadoop/common/. However, the eu site:
http://www.eu.apache.org/dist/hadoop/common/ does work.
-Steven Willis
I just tried and could go to
http://apache.techartifact.com/mirror/hadoop/common/hadoop-2.0.1-alpha/
Is this still happening for you?
Best Regards,
Sonal
Crux: Reporting for HBase https://github.com/sonalgoyal/crux
Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
Hi Abhay,
I totaly conform with Bejoy. Can you paste your mapred-site.xml and
hdfs-site.xml content here ?
**
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of
God.”
Maybe other people
Hi Abhishek,
You can use fsck for this purpose
hadoop fsck HDFS directory -files -blocks -locations --- Displays what you
want
**
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of
God.”
Hi, there
I'm just want to know that for hadoop dfs -mv. Does 'mv' just change the
meta info, or really copy data around on the hdfs? Thank you very much!
Thanks
Pls email CDH lists.
On Aug 24, 2012, at 2:34 AM, jing wang wrote:
Hi,
I'm curious of what the release notes
said,http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0+91.releasenotes.html
Is cdh4 based on ‘Release 2.0.0-alpha’
Any help on below would be really appreciated. i am stuck with it
**
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
Every duty is holy, and devotion to duty is the highest form of worship of
God.”
Maybe other people will try to limit me but I don't limit
23 matches
Mail list logo