Thanks! Also, is there some place that can help me setup eclipse for map
reduce and pig udf? Is there a maven for that too?
Thanks
On Mon, Feb 20, 2012 at 6:20 PM, Brock Noland wrote:
> Hi,
>
> On Mon, Feb 20, 2012 at 6:03 PM, Mohit Anchlia
> wrote:
> > Could someone give me some directions or
HI:
I packed a python module to "mypackage.tar.gz" and upload it to hdfs ,then
visit the package with " -cacheArchive /app/mypackage.tar.gz#mypackage"
But the python script failed to "import mypacakge" , it throw the failed to
import exception " no module named mypacakge".
I need some help.
T
Hi,
On Mon, Feb 20, 2012 at 6:03 PM, Mohit Anchlia wrote:
> Could someone give me some directions or examples of writing mapreduce and
> unit tests to test them?
There is an apache project for this called MRUnit:
http://cwiki.apache.org/confluence/display/MRUNIT
Example: https://cwiki.apache.or
Could someone give me some directions or examples of writing mapreduce and
unit tests to test them?
Also, need some help on how to set it up in eclipse.
We just update the slides of this improvements:
http://www.slideshare.net/hanborq/hanborq-optimizations-on-hadoop-mapreduce-20120216a
Updates:
(1) modified some describes to make things more clear and accuracy.
(2) add some benchmarks to make sense.
On Sat, Feb 18, 2012 at 11:12 PM, Anty wrote:
Thanks Harsh. I will try it and will get back to you.
On Mon, Feb 20, 2012 at 3:55 AM, Harsh J wrote:
> I do not think you can do it out of the box with streaming, but
> last.fm's Dumbo (highly recommended if you use Python M/R) and its
> add-on Feathers libraries can do it apparently.
>
> See E
I do not think you can do it out of the box with streaming, but
last.fm's Dumbo (highly recommended if you use Python M/R) and its
add-on Feathers libraries can do it apparently.
See Erik Forsberg's detailed answer (second) on
http://stackoverflow.com/questions/1626786/generating-separate-output-f
Thanks for the immediate reply Harsh. I will try using it.
By the way, cant we achieve the same goal with Hadoop Streaming (using
Python)?
On Mon, Feb 20, 2012 at 2:59 AM, Harsh J wrote:
> Piyush,
>
> Yes. Currently the partitioned data is always sorted by (and then
> grouped by) keys before th
Piyush,
Yes. Currently the partitioned data is always sorted by (and then
grouped by) keys before the reduce() calls begin.
On Mon, Feb 20, 2012 at 12:51 PM, Piyush Kansal wrote:
> Thanks Harsh.
>
> But will it also sort the data as Partitioner does.
>
>
> On Sun, Feb 19, 2012 at 10:54 PM, Harsh