Re: Mapreduce and unit tests

2012-02-20 Thread Mohit Anchlia
Thanks! Also, is there some place that can help me setup eclipse for map reduce and pig udf? Is there a maven for that too? Thanks On Mon, Feb 20, 2012 at 6:20 PM, Brock Noland wrote: > Hi, > > On Mon, Feb 20, 2012 at 6:03 PM, Mohit Anchlia > wrote: > > Could someone give me some directions or

failed to import python package when using cacheArchive

2012-02-20 Thread devdoer bird
HI: I packed a python module to "mypackage.tar.gz" and upload it to hdfs ,then visit the package with " -cacheArchive /app/mypackage.tar.gz#mypackage" But the python script failed to "import mypacakge" , it throw the failed to import exception " no module named mypacakge". I need some help. T

Re: Mapreduce and unit tests

2012-02-20 Thread Brock Noland
Hi, On Mon, Feb 20, 2012 at 6:03 PM, Mohit Anchlia wrote: > Could someone give me some directions or examples of writing mapreduce and > unit tests to test them? There is an apache project for this called MRUnit: http://cwiki.apache.org/confluence/display/MRUNIT Example: https://cwiki.apache.or

Mapreduce and unit tests

2012-02-20 Thread Mohit Anchlia
Could someone give me some directions or examples of writing mapreduce and unit tests to test them? Also, need some help on how to set it up in eclipse.

Re: Optimized Hadoop

2012-02-20 Thread Schubert Zhang
We just update the slides of this improvements: http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a Updates: (1) modified some describes to make things more clear and accuracy. (2) add some benchmarks to make sense. On Sat, Feb 18, 2012 at 11:12 PM, Anty wrote:

Re: Query regarding Hadoop Partitioning

2012-02-20 Thread Piyush Kansal
Thanks Harsh. I will try it and will get back to you. On Mon, Feb 20, 2012 at 3:55 AM, Harsh J wrote: > I do not think you can do it out of the box with streaming, but > last.fm's Dumbo (highly recommended if you use Python M/R) and its > add-on Feathers libraries can do it apparently. > > See E

Re: Query regarding Hadoop Partitioning

2012-02-20 Thread Harsh J
I do not think you can do it out of the box with streaming, but last.fm's Dumbo (highly recommended if you use Python M/R) and its add-on Feathers libraries can do it apparently. See Erik Forsberg's detailed answer (second) on http://stackoverflow.com/questions/1626786/generating-separate-output-f

Re: Query regarding Hadoop Partitioning

2012-02-20 Thread Piyush Kansal
Thanks for the immediate reply Harsh. I will try using it. By the way, cant we achieve the same goal with Hadoop Streaming (using Python)? On Mon, Feb 20, 2012 at 2:59 AM, Harsh J wrote: > Piyush, > > Yes. Currently the partitioned data is always sorted by (and then > grouped by) keys before th

Re: Query regarding Hadoop Partitioning

2012-02-20 Thread Harsh J
Piyush, Yes. Currently the partitioned data is always sorted by (and then grouped by) keys before the reduce() calls begin. On Mon, Feb 20, 2012 at 12:51 PM, Piyush Kansal wrote: > Thanks Harsh. > > But will it also sort the data as Partitioner does. > > > On Sun, Feb 19, 2012 at 10:54 PM, Harsh