Rules of thumb IMO:
You should be using Pig in place of MR jobs at all times that performance
isn't absolutely crucial. Writing unnecessary MR is needless technical
debt that you will regret as people are replaced and your organization
scales. Pig gets it done in much less time. If you need fas
Streaming is good for simulation. Long running map-only processes, where pig
doesn't really help and it is simple to fire off a streaming process. You do
have to set some options so they can take a long time to return/return counters.
Russell Jurney http://datasyndrome.com
On Mar 5, 2012, at 1
I'm really interested in this as well. I have trouble seeing a really
good use case for streaming map-reduce. Is there something I can do in
streaming that I can't do in Pig? If I want to re-use previously made
Python functions from my code base, I can do that in Pig as much as
Streaming, and f
On Fri, Mar 2, 2012 at 12:38 PM, Harsh J wrote:
> On Fri, Mar 2, 2012 at 10:18 AM, Subir S
> wrote:
> > Hello Folks,
> >
> > Are there any pointers to such comparisons between Apache Pig and Hadoop
> > Streaming Map Reduce jobs?
>
> I do not see why you seek to compare these two. Pig offers a la
Thank you Jie!
I have downloaded Pig Experience and will read it.
On Fri, Mar 2, 2012 at 12:36 PM, Jie Li wrote:
> Considering Pig essentially translates scripts into Map Reduce jobs, one
> can always write as good Map Reduce jobs as Pig does. You can refer to "Pig
> experience" paper to see th
On Fri, Mar 2, 2012 at 10:18 AM, Subir S wrote:
> Hello Folks,
>
> Are there any pointers to such comparisons between Apache Pig and Hadoop
> Streaming Map Reduce jobs?
I do not see why you seek to compare these two. Pig offers a language
that lets you write data-flow operations and runs these st
Considering Pig essentially translates scripts into Map Reduce jobs, one
can always write as good Map Reduce jobs as Pig does. You can refer to "Pig
experience" paper to see the overhead Pig introduces, but it's been
improved all the time.
Btw if you really care about the performance, how you conf
Hello Folks,
Are there any pointers to such comparisons between Apache Pig and Hadoop
Streaming Map Reduce jobs?
Also there was a claim in our company that Pig performs better than Map
Reduce jobs? Is this true? Are there any such benchmarks available
Thanks, Subir