Biggest cluster running YARN in the world?

2013-01-14 Thread Tan, Wangda
Hi guys, I've a question in my head for a long time, what's the biggest cluster running YARN? I just heard some rumor about some biggest cluster running map-reduce 1.0 with 10,000+ nodes, but rarely heard about such rumor about YARN. Welcome any message about this, like inside information or

Re: Multi-threaded map task

2013-01-14 Thread Bertrand Dechoux
Well... It all depends on where is your bottleneck. Do a benchmark for your use case if it is critical. Multi-threading might be useful not always. And you would rather want to avoid having a locally shared mutable state because it can become a pain to manage. But it doesn't mean you can't do

Re: Multi-threaded map task

2013-01-14 Thread Mark Olimpiati
Never mind, depends on plantform, in my case would work fine. Thanks guys! Mark On Mon, Jan 14, 2013 at 12:23 PM, Mark Olimpiati markq2...@gmail.comwrote: Thanks Bertrand, I shall try it and hope to gain some speed. One last question though, do you think the threads used are user-level or

TupleWritable Format

2013-01-14 Thread Stuti Awasthi
Hi, I wanted to write the my input file which is in Text format to TupleWritable but when I collect the output using output.collect(intWritable, TupleWritable) its giving me empty tuple. private final static IntWritable one = new IntWritable(1); public void map(LongWritable key, Text value,

Re: TupleWritable Format

2013-01-14 Thread Harsh J
As the doc on http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/join/TupleWritable.htmlstates, I don't think you should be relying on the join package's TupleWritable directly and should instead either use your own tested implementation or an alternative, better serializer such as

RE: Exit code 126?

2013-01-14 Thread Dave Shine
I was referring to https://issues.apache.org/jira/browse/MAPREDUCE-2374 Dave Shine Sr. Software Engineer 321.939.5093 direct | 407.314.0122 mobile CI Boost™ Clients Outperform Online™ www.ciboost.com -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent:

RE: TupleWritable Format

2013-01-14 Thread Stuti Awasthi
Thanks Harsh, Actually I wanted tuplewritable for the MatrixMultiplication Job in Mahout which takes (intWritable, TupleWritable) as input but now I have created input file with (intWritable, VectorWritable) which worked perfectly fine for me. Thanks Stuti From: Harsh J

Adding new HUG in Sao Paulo to HadoopUsersGroup wiki page

2013-01-14 Thread Paulo E . V . Magalhaes
Hi all, I need some help to get the Sao Paulo Hadoop users group in the HadoopUserGroups wiki page. Either giving me the rights or making the update will do. My id: PauloMagalhaes What I want to add: === South America === * [http://www.meetup.com/SaoPauloHUG/| Sao Paulo HUG]: Hadoop Users

Re: Compile error using contrib.utils.join package with new mapreduce API

2013-01-14 Thread Hemanth Yamijala
Hi, No. I didn't find any reference to a working sample. I also didn't find any JIRA that asks for a migration of this package to the new API. Not sure why. I have asked on the dev list. Thanks hemanth On Mon, Jan 14, 2013 at 6:25 PM, Michael Forage michael.for...@livenation.co.uk wrote:

Adding new HUG in Sao Paulo to HadoopUsersGroup

2013-01-14 Thread Paulo E. V. Magalhaes
Hi all, I need help to get the Sao Paulo Hadoop users group in the HadoopUserGroups wiki page. Either giving me the rights or making the update will do. My id: PauloMagalhaes What I want to add: === South America === * [http://www.meetup.com/SaoPauloHUG/| Sao Paulo HUG]: Hadoop Users Group in

Re: Exit code 126?

2013-01-14 Thread Jean-Marc Spaggiari
Thanks. I will see if I can migrate from 1.0.3 to something more recent... JM 2013/1/14, Dave Shine dave.sh...@channelintelligence.com: I was referring to https://issues.apache.org/jira/browse/MAPREDUCE-2374 Dave Shine Sr. Software Engineer 321.939.5093 direct | 407.314.0122 mobile CI

Re: Adding new HUG in Sao Paulo to HadoopUsersGroup

2013-01-14 Thread Harsh J
I've added you in as a wiki contributor - feel free to edit the HUGs page by self. Thanks for taking the time to contribute! On Mon, Jan 14, 2013 at 7:01 PM, Paulo E. V. Magalhaes paulo.magalh...@gmail.com wrote: Hi all, I need help to get the Sao Paulo Hadoop users group in the

Re: Adding new HUG in Sao Paulo to HadoopUsersGroup wiki page

2013-01-14 Thread Harsh J
Done. You can proceed and edit it in now. On Mon, Jan 14, 2013 at 7:21 PM, Paulo E. V. Magalhaes paulo.magalh...@gmail.com wrote: Hi all, I need some help to get the Sao Paulo Hadoop users group in the HadoopUserGroups wiki page. Either giving me the rights or making the update will do.

Re: question about ZKFC daemon

2013-01-14 Thread Colin McCabe
Hi ESGLinux, In production, you need to run QJM on at least 3 nodes. You also need to run ZKFC on at least 3 nodes. You can run them on the same nodes if you like, though. Of course, none of this is needed to set up an example cluster. If you just want to try something out, you can run

Re: question about ZKFC daemon

2013-01-14 Thread Colin McCabe
On Mon, Jan 14, 2013 at 11:49 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi ESGLinux, In production, you need to run QJM on at least 3 nodes. You also need to run ZKFC on at least 3 nodes. You can run them on the same nodes if you like, though. Er, this should read You also need to run

Re: /etc/hosts

2013-01-14 Thread Colin McCabe
Hi Pavel, There's some information about the need for 127.0.1.1 here: http://www.leonardoborda.com/blog/127-0-1-1-ubuntu-debian/ If you have more questions about this, I recommend asking on a Kerberos-related mailing list (perhaps the specific one for your Kerberos implementation). best, Colin

Map Failure reading .gz (gzip) files

2013-01-14 Thread Terry Healy
I'm trying to run a Map-only job using .gz input format. For testing, I have one compressed log file in the input directory. If the file is un-zipped, the code works fine. Watching the jobs with .gz input via the job tracker shows that the mapper apparently has read the correct number of records

Re: Some mappers are much slower than others in reading data from HDFS

2013-01-14 Thread Andy Isaacson
It's hard to speculate on the minimal information you've provided so far. Seems like it's time to break out the performance analysis toolkit and see what's going on differently between the fast and the slow nodes. I'd look at raw performance with dd, then watch behavior during mapper runs with

probably very stupid question

2013-01-14 Thread jamal sasha
Hi, Probably a very lame question. I have two documents and I want to find the overlap of both documents in map reduce fashion and then compare the overlap (lets say I have some measure to do that) SO this is what I am thinking: 1) Run the normal wordcount job on one document (

Issue with partitioning using streaming

2013-01-14 Thread Aleksandr Elbakyan
Hello All, I am trying to partition data and sort it in hadoop streaming. Most of the time the data is sorted and partitioned correctly but if I run multiple times sometimes data goes to other partition The data looks like asdas 0 ada asdas 1 asd 12123 1 ccc 12123 0 xxx   hadoop  jar

hadoop namenode recovery

2013-01-14 Thread Panshul Whisper
Hello, Is there a standard way to prevent the failure of Namenode crash in a Hadoop cluster? or what is the standard or best practice for overcoming the Single point failure problem of Hadoop. I am not ready to take chances on a production server with Hadoop 2.0 Alpha release, which claims to

Re: hadoop namenode recovery

2013-01-14 Thread bejoy . hadoop
Hi Panshul, Usually for reliability there will be multiple dfs.name.dir configured. Of which one would be a remote location such as a nfs mount. So that even if the NN machine crashes on a whole you still have the fs image and edit log in nfs mount. This can be utilized for reconstructing the

Re: probably very stupid question

2013-01-14 Thread bejoy . hadoop
Hi Jamal I believe a reduce side join is what you are looking for. You can use MultipleInputs and achieve a reduce side join to achieve this. http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html Regards Bejoy KS Sent from remote device, Please excuse typos

Re: Map Failure reading .gz (gzip) files

2013-01-14 Thread bejoy . hadoop
Hi Terry When the file is unzipped and zipped, what is the number of map tasks running in each case? If the file is large, I assume the below should be the case. gz is not splttable compression codec so the whole file would be processed by a single mapper. And this might be causing the job to

Re: hadoop namenode recovery

2013-01-14 Thread Panshul Whisper
thank you for the reply. Is there a way with which I can configure my cluster to switch to the Secondary Name Node automatically in case of the Primary Name Node failure? When I run my current Hadoop, I see the primary and secondary both Name nodes running. I was wondering what is that Secondary

Re: hadoop namenode recovery

2013-01-14 Thread bejoy . hadoop
Hi Panshul SecondaryNameNode is rather known as check point node. At periodic intervals it merges the editlog from NN with FS image to prevent the edit log from growing too large. This is its main functionality. At any point the SNN would have the latest fs image but not the updated edit log.

Re: hadoop namenode recovery

2013-01-14 Thread Panshul Whisper
Hello Bejoy, Thank you for the information. about the Hadoop HA 2.x releases, they are in Alpha phase and I cannot use them for production. For my requirements, the cluster is supposed to be extremely Available. Availability is of highest concern. I have looked into different distributions as

Re: hadoop namenode recovery

2013-01-14 Thread Panshul Whisper
Hello, I have another idea regarding solving the single point failure of Hadoop... What If I have multiple Name Nodes setup and running behind a load balancer in the cluster. So this way I can have multiple Name Nodes at the same IP Address of the load balancer. Which resolves the problem of

Re: hadoop namenode recovery

2013-01-14 Thread nagarjuna kanamarlapudi
I am not sure if this is possible as in 0.2X or 1.0 releases of Hadoop . On Tuesday, January 15, 2013, Panshul Whisper wrote: Hello, I have another idea regarding solving the single point failure of Hadoop... What If I have multiple Name Nodes setup and running behind a load balancer in

Re: hadoop namenode recovery

2013-01-14 Thread anil gupta
Inline On Mon, Jan 14, 2013 at 7:48 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I have another idea regarding solving the single point failure of Hadoop... What If I have multiple Name Nodes setup and running behind a load balancer in the cluster. So this way I can have

Re: hadoop namenode recovery

2013-01-14 Thread Harsh J
Its very rare to observe an NN crash due to a software bug in production. Most of the times its a hardware fault you should worry about. On 1.x, or any non-HA-carrying release, the best you can get to safeguard against a total loss is to have redundant disk volumes configured, one preferably over

Re: OutofMemoryError when running an YARN application with 25 containers

2013-01-14 Thread anil gupta
The following log tells you the exact error: *JVMDUMP013I Processed dump event systhrow, detail java/lang/OutOfMemoryError.* * * *Exception in thread Thread-7 java.lang.OutOfMemoryError* *at ApplicationMaster.readMessage(**ApplicationMaster.java:241)* *at

Re: Hadoop Pseudo-configuration

2013-01-14 Thread Nitin Pawar
see if this helps you https://github.com/nitinpawar/hadoop/tree/master/installation not concrete but should be able to get a working hadoop setup On Tue, Jan 15, 2013 at 11:55 AM, Shagun Bhardwaj shagun...@gmail.comwrote: Hi, I am not able to install Apache Hadoop in a Pseudo-distributed

Re: Hadoop Pseudo-configuration

2013-01-14 Thread Yuva Raj raghunapu
Check this out it maybe helpful , but this illustrates on ubuntu hope they are almost same . http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Shagun Bhardwaj shagun...@gmail.com wrote: Hi,   I am not able to install Apache Hadoop in a