Message-
From: Ankur C. Goel [mailto:gan...@yahoo-inc.com]
Sent: Wednesday, February 24, 2010 1:24 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Algorithm implementations in Pig
Pallavi,
Thanks for your comments. Some clarifications w.r.t pig.
Pig does not generate any M/R code. What
for sharing the information. I am
looking forward to experiment with it.
Thanks
Pallavi
-Original Message-
From: Ankur C. Goel [mailto:gan...@yahoo-inc.com]
Sent: Wednesday, February 24, 2010 1:24 PM
To: mahout-dev@lucene.apache.org
Subject: Re: Algorithm implementations in Pig
Pallavi
Indeed. I have observed Pig running considerably faster than hand-written
MR programs, precisely because it is willing and able to do optimizations
that decrease the number of passes over the data. These optimizations break
abstraction boundaries in a way that would be very unpleasant or
Subject: Re: Algorithm implementations in Pig
As an interesting test case, can you write a pig program that counts
words.
BUT, it takes an input file name AND an input field name.
On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
That isn't an issue here
-dev@lucene.apache.org
Subject: Re: Algorithm implementations in Pig
As an interesting test case, can you write a pig program that counts
words.
BUT, it takes an input file name AND an input field name.
On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
That isn't an issue
Hi,
Glad to hear here that mahout devs are interested in pig. Actually I believe
pig is very helpful when you want to quickly implement a prototype of
machine learning algorithms. And Pig has java API, it is easy to integrate
pig script with java. Maybe we can start with implementing NB using
I see pig as useful for data preparation, but for any numerical tasks, it is
likely to be completely hopeless.
On Mon, Feb 22, 2010 at 12:16 AM, Jeff Zhang zjf...@gmail.com wrote:
Glad to hear here that mahout devs are interested in pig. Actually I
believe
pig is very helpful when you want
Pig can only make the implementation of map-reduce easier, the numerical
computation can been done in UDF. And piglet is a DSL upon pig latin which
make pig support loop.
http://github.com/iconara/piglet
On Mon, Feb 22, 2010 at 4:25 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I see pig as
On Mon, Feb 22, 2010 at 1:55 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I see pig as useful for data preparation, but for any numerical tasks, it
is
likely to be completely hopeless.
PIG will be a great tool to experiment quickly on algorithms. But, with
people here trying to focus on
Ted,
The latest pig release 0.6.0 on hadoop 20 is a clear winner not just for
performance but also for doing a better job of managing memory in its MR job
pipeline. Also support for both inner and outer skewed join is something that I
found indispensable when dealing with really large
I'm all for Pig, especially once we are a TLP. I haven't had the proper time
to review the PLSI implementation, but it looks useful. I agree on the other
points, though, in that I think we it would be nice to have consistent formats
based on Vector so that things can be more portable.
On
Seems like the guys at twitter are going down the pig/hadoop
http://highscalability.com/blog/2010/2/19/twitters-plan-to-analyze-100-billion-tweets.html
route could be worth getting them on board the Mahout wagon especially with
previous discussion had about classification efforts
Actually, no.
I meant other programs written in pure Java. It used to be that the very
restricted scripting ability of Pig made processing chains composed of Pig
and map-reduce programs very brittle. In fact, just gluing together
multiple Pig programs used to be very ugly.
On Mon, Feb 22, 2010
Has the interface for writing UDF's stabilized? For quite some time, the
UDF API was changing every 3 months.
On Mon, Feb 22, 2010 at 12:35 AM, Jeff Zhang zjf...@gmail.com wrote:
Pig can only make the implementation of map-reduce easier, the numerical
computation can been done in UDF.
--
In the next pig release (0.7) Pig's load/store func would be moving to use
hadoop's input/output format. So there are some changes planned for that -
http://wiki.apache.org/pig/Pig070IncompatibleChanges
After that I don't expect any interface level change in UDF.
-...@nkur
On 2/22/10 10:10
I agree with you and while some of that has been remedied, I wouldn't say
things are perfect.
Scripting ability while still limited has better streaming support so you can
have relations streamed
Into a custom script executing in either map or reduce phase depending upon
where it is placed.
If
That isn't an issue here. It is the invocation of pig programs and passing
useful information to them that is the problem.
On Mon, Feb 22, 2010 at 9:20 AM, Ankur C. Goel gan...@yahoo-inc.com wrote:
Scripting ability while still limited has better streaming support so you
can have relations
As an interesting test case, can you write a pig program that counts words.
BUT, it takes an input file name AND an input field name.
On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning ted.dunn...@gmail.com wrote:
That isn't an issue here. It is the invocation of pig programs and passing
useful
Those would be passed as parameters either through -param option or through a
parameter file with -param_file option and the pig's preprocessor just
substitutes the values in your script.
Since its just a blind parameter substitution, in my shingling script I even
had the schema definition
Good answer.
On Mon, Feb 22, 2010 at 8:52 PM, Ankur C. Goel gan...@yahoo-inc.com wrote:
Those would be passed as parameters either through -param option or through
a parameter file with -param_file option and the pig's preprocessor just
substitutes the values in your script.
Since its just a
Hi Folks,
I would like to know how mahout community feels about having
some of the Mahout algorithms implemented in pig -
http://hadoop.apache.org/pig. The benefits of using Pig are many including.
1. Small learning curve, people with a bit of SQL knowledge will find it very
I have had both positive and negative results with PIG.
The positive results were that I was able to express large recommendation
computations in a very concise way. That was really helpful.
My negative results have been to do with the brittle nature of PIG vis a vis
the version of the
22 matches
Mail list logo