doubt about reduce tasks and block writes

2012-08-24 Thread Marc Sturlese
Hey there, I have a doubt about reduce tasks and block writes. Do a reduce task always first write to hdfs in the node where they it is placed? (and then these blocks would be replicated to other nodes) In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and one (node A) just run

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Minh Duc Nguyen
Marc, see my inline comments. On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I have a doubt about reduce tasks and block writes. Do a reduce task always first write to hdfs in the node where they it is placed? (and then these blocks would be

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Bertrand Dechoux
Assuming that node A only contains replica, there is no garante that its data would never be read. First, you might lose a replica. The copy inside the node A could be used to create the missing replica again. Second, data locality is on best effort. If all the map slots are occupied except one on

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Raj Vishwanathan
But since node A has no TT running, it will not run map or reduce tasks. When the reducer node writes the output file, the fist block will be written on the local node and never on node A. So, to answer the question, Node A will contain copies of blocks of all output files. It wont contain the

Re: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Bertrand Dechoux
And that would help you with performance too. Were you originally planning to have one file per word document? What is the average size of you word documents? It shouldn't be much. I am afraid your map startup time won't be negligible in that case. Regards Bertrand On Fri, Aug 24, 2012 at 8:07

Re: namenode not starting

2012-08-24 Thread Nitin Pawar
did you run the command bin/hadoop namenode -format before starting the namenode ? On Fri, Aug 24, 2012 at 12:58 PM, Abhay Ratnaparkhi abhay.ratnapar...@gmail.com wrote: Hello, I had a running hadoop cluster. I restarted it and after that namenode is unable to start. I am getting error

RE: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Siddharth Tiwari
Hi, Thank you for the suggestion. Actually I was using poi to extract text, but since now I have so many documents I thought I will use hadoop directly to parse as well. Average size of each document is around 120 kb. Also I want to read multiple lines from the text until I find a blank

Re: namenode not starting

2012-08-24 Thread vivek
hi, Have u rubn the command namenode -format??? Thanks regards , Vivek On Fri, Aug 24, 2012 at 12:58 PM, Abhay Ratnaparkhi abhay.ratnapar...@gmail.com wrote: Hello, I had a running hadoop cluster. I restarted it and after that namenode is unable to start. I am getting error saying that

Re: namenode not starting

2012-08-24 Thread Bejoy KS
Hi Abhay What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to /tmp the contents would be deleted on a OS restart. You need to change this location before you start your NN. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhay

Re: namenode not starting

2012-08-24 Thread Abhay Ratnaparkhi
Hello, I was using cluster for long time and not formatted the namenode. I ran bin/stop-all.sh and bin/start-all.sh scripts only. I am using NFS for dfs.name.dir. hadoop.tmp.dir is a /tmp directory. I've not restarted the OS. Any way to recover the data? Thanks, Abhay On Fri, Aug 24, 2012 at

Re: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Mohammad Tariq
Hello Siddharth, You can tweak the NLineInputFormat as per your requirement and use it. It allows us to read a specified no of lines unlike TextInputFormat. Here is a good post by Boris and Michael on custom record reader. Also I would suggest you to combine similar files together into one

Re: About many user accounts in hadoop platform

2012-08-24 Thread Li Shengmei
Hi, Sonal Thanks for your information. Because some users want to modify the source codes of hadoop, these users want to install their own hadoop version in the same clusters. After they modify their hadoop version, they may compile and install the modified hadoop version and don't

Re: About many user accounts in hadoop platform

2012-08-24 Thread Nitin Pawar
Hi Li, The approach of everyone having their own version of hadoop on same cluster is way too much complicated. Good aproach would be all of them test the patches they are compiling for hadoop in pseudo mode on their person laptops/desktops and then merge the features or patches with a single

Re: Hadoop on EC2 Managing Internal/External IPs

2012-08-24 Thread Håvard Wahl Kongsgård
Hi, a vpn or simply first uploading the files to an ec2 node is the best option but an alternative is to use the external interface/IP instead of the internal in the hadoop config¸ I assume this will be slower and more costly... -Håvard On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn

Re: namenode not starting

2012-08-24 Thread Håvard Wahl Kongsgård
You should start with a reboot of the system. A lesson to everyone, this is exactly why you should have a secondary name node (http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F) and run the namenode a mirrored RAID-5/10 disk. -Håvard On Fri, Aug 24, 2012 at

hadoop download path missing

2012-08-24 Thread Steven Willis
All the links at: http://www.apache.org/dyn/closer.cgi/hadoop/common/ are returning 404s, even the backup site at: http://www.us.apache.org/dist/hadoop/common/. However, the eu site: http://www.eu.apache.org/dist/hadoop/common/ does work. -Steven Willis

Re: hadoop download path missing

2012-08-24 Thread Sonal Goyal
I just tried and could go to http://apache.techartifact.com/mirror/hadoop/common/hadoop-2.0.1-alpha/ Is this still happening for you? Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal

RE: namenode not starting

2012-08-24 Thread Siddharth Tiwari
Hi Abhay, I totaly conform with Bejoy. Can you paste your mapred-site.xml and hdfs-site.xml content here ? ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people

RE: How do we view the blocks of a file in HDFS

2012-08-24 Thread Siddharth Tiwari
Hi Abhishek, You can use fsck for this purpose hadoop fsck HDFS directory -files -blocks -locations --- Displays what you want ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.”

unsubscribe

2012-08-24 Thread Dan Yi

easy mv or heavy mv

2012-08-24 Thread Yue Guan
Hi, there I'm just want to know that for hadoop dfs -mv. Does 'mv' just change the meta info, or really copy data around on the hdfs? Thank you very much! Thanks

Re: questions about CDH Version 4.0.1

2012-08-24 Thread Arun C Murthy
Pls email CDH lists. On Aug 24, 2012, at 2:34 AM, jing wang wrote: Hi, I'm curious of what the release notes said,http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0+91.releasenotes.html Is cdh4 based on ‘Release 2.0.0-alpha’

RE: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Siddharth Tiwari
Any help on below would be really appreciated. i am stuck with it ** Cheers !!! Siddharth Tiwari Have a refreshing day !!! Every duty is holy, and devotion to duty is the highest form of worship of God.” Maybe other people will try to limit me but I don't limit