Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-06 Thread Russell Jurney
Rules of thumb IMO: You should be using Pig in place of MR jobs at all times that performance isn't absolutely crucial. Writing unnecessary MR is needless technical debt that you will regret as people are replaced and your organization scales. Pig gets it done in much less time. If you need fas

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-05 Thread Russell Jurney
Streaming is good for simulation. Long running map-only processes, where pig doesn't really help and it is simple to fire off a streaming process. You do have to set some options so they can take a long time to return/return counters. Russell Jurney http://datasyndrome.com On Mar 5, 2012, at 1

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-05 Thread Eli Finkelshteyn
I'm really interested in this as well. I have trouble seeing a really good use case for streaming map-reduce. Is there something I can do in streaming that I can't do in Pig? If I want to re-use previously made Python functions from my code base, I can do that in Pig as much as Streaming, and f

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-02 Thread Subir S
On Fri, Mar 2, 2012 at 12:38 PM, Harsh J wrote: > On Fri, Mar 2, 2012 at 10:18 AM, Subir S > wrote: > > Hello Folks, > > > > Are there any pointers to such comparisons between Apache Pig and Hadoop > > Streaming Map Reduce jobs? > > I do not see why you seek to compare these two. Pig offers a la

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-02 Thread Subir S
Thank you Jie! I have downloaded Pig Experience and will read it. On Fri, Mar 2, 2012 at 12:36 PM, Jie Li wrote: > Considering Pig essentially translates scripts into Map Reduce jobs, one > can always write as good Map Reduce jobs as Pig does. You can refer to "Pig > experience" paper to see th

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Harsh J
On Fri, Mar 2, 2012 at 10:18 AM, Subir S wrote: > Hello Folks, > > Are there any pointers to such comparisons between Apache Pig and Hadoop > Streaming Map Reduce jobs? I do not see why you seek to compare these two. Pig offers a language that lets you write data-flow operations and runs these st

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Jie Li
Considering Pig essentially translates scripts into Map Reduce jobs, one can always write as good Map Reduce jobs as Pig does. You can refer to "Pig experience" paper to see the overhead Pig introduces, but it's been improved all the time. Btw if you really care about the performance, how you conf

Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Subir S
Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? Also there was a claim in our company that Pig performs better than Map Reduce jobs? Is this true? Are there any such benchmarks available Thanks, Subir