Re: muti-thread mapreduce

2012-12-13 Thread Yu Yang
Thank you all. In fact, I don't expect that this way can help to enhance the performance. I need to process 3 different logs (with different format). I just want to sart all these 3 logs processing at the same time , all in just this one program. but I can give different separator to each

Re: muti-thread mapreduce

2012-12-13 Thread Harsh J
I suppose you could also leverage job configuration or per-input mapper impl. via MultipleInputs to do this. On Thu, Dec 13, 2012 at 5:44 PM, Yu Yang clouder...@gmail.com wrote: Thank you all. In fact, I don't expect that this way can help to enhance the performance. I need to process 3

Map output files and partitions.

2012-12-13 Thread Pedro Sá da Costa
Hi, There only 2 types of map output files, Sequence and Text files. If those files are going to be used as input to several reduce tasks, they need to be partitioned into blocks. Is there any SEPARATOR bits that limits each partition? Can I read a specific partition of a map output file? Is

Re: Map output files and partitions.

2012-12-13 Thread Harsh J
Map output files, by which you perhaps mean intermediate data files for temporary K/V persistence, are stored in IFiles. They do not use text nor sequence files (historically though, they did use sequence files at some point). You can read the IFile's sources at

Re: Map output files and partitions.

2012-12-13 Thread Mohammad Tariq
Hello Pedro, The first part of your question is very well covered by Harsh. For the second part, the generation and no. of partitions is governed by the getPartition() Method present in the 'Partition' Interface. The default behavior is to create partitions based on Hashing. You can have

Re: which version should I take

2012-12-13 Thread Mohammad Tariq
I agree with Harsh. Regards, Mohammad Tariq On Thu, Dec 13, 2012 at 12:26 PM, Harsh J ha...@cloudera.com wrote: If your production target is bit far away, I'd encourage setting up and using the 2.x based releases for its feature set that may aid you in your design. We'll be releasing

Re: which version should I take

2012-12-13 Thread Ivan Ryndin
Harsh, can you please tell will 2.0.3 release be ready to the end of Jan 2013? Regards, Ivan 2012/12/13 Harsh J ha...@cloudera.com If your production target is bit far away, I'd encourage setting up and using the 2.x based releases for its feature set that may aid you in your design.

Re: which version should I take

2012-12-13 Thread Harsh J
I do feel so. This is the ongoing discussion with further details: http://search-hadoop.com/m/4U27S1Zf9eF1 On Thu, Dec 13, 2012 at 2:53 PM, Ivan Ryndin iryn...@gmail.com wrote: Harsh, can you please tell will 2.0.3 release be ready to the end of Jan 2013? Regards, Ivan 2012/12/13 Harsh

Erasure Coding in HDFS

2012-12-13 Thread Pankaj Misra
Dear All, I was looking at options for reducing the overall cost of storage that is incurred due to replication of data across the datanodes for higher availability and data localization for processing. I stumbled on a few articles suggesting erasure coding (software-raid) as one such

compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Emile Kao
Hello Guys, Now that I have downloaded the hadoop 1.1.1 source tar ball, I am trying to compile it for my platform (s390) running SLES 11. I am encountering a couple of problem for which I have some questions: 1) Is there an official guide from the hadoop project showing how to build a binary

eclipse plugin for hadoop 0.22.0

2012-12-13 Thread Jennifer Lopez
After all my RD, I have setup hadoop 0.22.0 succesfully. Right now, I am using Eclipse Indigo Service Release 2 and hadoop 0.22.0 on win 7. Trying to use the eclipse plugin provided in the Hadoop package but that does not seem to work. When I try adding a new Hadoop Location, i get an error

compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Emile Kao
Hello Guys, Now that I have downloaded the hadoop 1.1.1 source tar ball, I am trying to compile it for my platform (s390) running SLES 11. I am encountering a couple of problem for which I have some questions: 1) Is there an official guide from the hadoop project showing how to build a binary

Re: compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Nicolas Liochon
branch1 does not use maven but ant. There are some docs here: http://wiki.apache.org/hadoop/BuildingHadoopFromSVN, not sure it's totally up to date. On Thu, Dec 13, 2012 at 11:08 AM, Emile Kao emile...@gmx.net wrote: 3) Can I compile the package in a simpler way other then maven?

Re: compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Jean-Marc Spaggiari
Fyi, I compiles 1.0.3 successfully using ant last week. So steps seems still to be good. JM Le 13 déc. 2012 05:28, Nicolas Liochon nkey...@gmail.com a écrit : branch1 does not use maven but ant. There are some docs here: http://wiki.apache.org/hadoop/BuildingHadoopFromSVN, not sure it's

Re: which version should I take

2012-12-13 Thread Hernán Leoni
Great news, thanks a lot, So, yes, our production date would be around may or june 2013, so, do you think we would have a productive stable 2.x version for these days. Thanks a lot, Hernan 2012/12/13 Harsh J ha...@cloudera.com I do feel so. This is the ongoing discussion with further

compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Lebikasa Kao
Hello Guys, Now that I have downloaded the hadoop 1.1.1 source tar ball, I am trying to compile it for my platform (s390) running SLES 11. I am encountering a couple of problem for which I have some questions: 1) Is there an official guide from the hadoop project showing how to build a binary for

Re: compile hadoop-1.1.1 on zLinux using apache maven

2012-12-13 Thread Jean-Marc Spaggiari
Hi, Take a look here: I think you should be using and instead... http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk#Building_branch-1 JM

Encrypted Shuffle Help

2012-12-13 Thread Ryan Garvey
Hi, I am relatively new to Hadoop and completely new to SSL encryption. I am having issues getting encrypted shuffle working on a small test cluster with Mapreduce V1. I am using self signed certificates I generated with the java keytool. I followed the instructions on the site Apache Hadoop

Re: Sane max storage size for DN

2012-12-13 Thread Hemanth Yamijala
This is a dated blog post, so it would help if someone with current HDFS knowledge can validate it: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/ . There is a bit about the RAM required for the Namenode and how to compute it: You can look at the 'Namespace

patch pre-commit help in branch-1

2012-12-13 Thread pengwenwu2008
Hi all, I am relatively new to Hadoop and want to do pre-commit in branch-1 before check patch into community, however, there is no pre-commit job in community jenkins. Could anyone have any good suggestion or community jenkins can help? Thanks in advance!! Regards, Wenwu,Peng

Re: Hadoop-1.1.1 namenode and datanode version mismatch

2012-12-13 Thread Harsh J
I do think running a build would cause that. Doing an 'ant clean' would resolve it for you. But I agree that the default version may perhaps be the release itself (although then it gets harder to identify that the user ran a build?), please file a JIRA for further discussion. On Fri, Dec 14,

How to submit Tool jobs programatically in parallel?

2012-12-13 Thread David Parks
I'm submitting unrelated jobs programmatically (using AWS EMR) so they run in parallel. I'd like to run an s3distcp job in parallel as well, but the interface to that job is a Tool, e.g. ToolRunner.run(...). ToolRunner blocks until the job completes though, so presumably I'd need to create a

Re: eclipse plugin for hadoop 0.22.0

2012-12-13 Thread Jennifer Lopez
i did bin/hadoop tasktracker and that started it :) Thanks, andy On Thu, Dec 13, 2012 at 10:40 PM, Kartashov, Andy andy.kartas...@mpac.cawrote: #service –status-all

Re: How to submit Tool jobs programatically in parallel?

2012-12-13 Thread Manoj Babu
David, You try like below instead of runJob() you can try submitJob(). JobClient jc = new JobClient(job); jc.submitJob(job); Cheers! Manoj. On Fri, Dec 14, 2012 at 10:09 AM, David Parks davidpark...@yahoo.comwrote: I'm submitting unrelated jobs programmatically (using AWS EMR) so they

Re: How to submit Tool jobs programatically in parallel?

2012-12-13 Thread Manoj Babu
Can you show some sample code of submitting distcp job? Cheers! Manoj. On Fri, Dec 14, 2012 at 11:44 AM, David Parks davidpark...@yahoo.comwrote: Can I do that with s3distcp / distcp? The job is being configured in the run() method of s3distcp (as it implements Tool). So I think I can’t