subject:"RE\: Algorithm implementations in Pig"

Re: Algorithm implementations in Pig

2010-02-25 Thread Ankur C. Goel

m: Ankur C. Goel [mailto:gan...@yahoo-inc.com] Sent: Wednesday, February 24, 2010 1:24 PM To: mahout-dev@lucene.apache.org Subject: Re: Algorithm implementations in Pig Pallavi, Thanks for your comments. Some clarifications w.r.t pig. Pig does not generate any M/R code. What is it generates

Re: Algorithm implementations in Pig

2010-02-24 Thread Ted Dunning

Indeed. I have observed Pig running considerably faster than hand-written MR programs, precisely because it is willing and able to do optimizations that decrease the number of passes over the data. These optimizations break abstraction boundaries in a way that would be very unpleasant or infeasib

RE: Algorithm implementations in Pig

2010-02-24 Thread Palleti, Pallavi

sharing the information. I am looking forward to experiment with it. Thanks Pallavi -Original Message- From: Ankur C. Goel [mailto:gan...@yahoo-inc.com] Sent: Wednesday, February 24, 2010 1:24 PM To: mahout-dev@lucene.apache.org Subject: Re: Algorithm implementations in Pig Pallavi

Re: Algorithm implementations in Pig

2010-02-23 Thread Ankur C. Goel

2, 2010 11:32 PM To: mahout-dev@lucene.apache.org Subject: Re: Algorithm implementations in Pig As an interesting test case, can you write a pig program that counts words. BUT, it takes an input file name AND an input field name. On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning wrote: > > That isn

RE: Algorithm implementations in Pig

2010-02-23 Thread Palleti, Pallavi

e.org Subject: Re: Algorithm implementations in Pig As an interesting test case, can you write a pig program that counts words. BUT, it takes an input file name AND an input field name. On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning wrote: > > That isn't an issue here. It is the i

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

Good answer. On Mon, Feb 22, 2010 at 8:52 PM, Ankur C. Goel wrote: > Those would be passed as parameters either through -param option or through > a parameter file with -param_file option and the pig's preprocessor just > substitutes the values in your script. > Since its just a blind parameter

Re: Algorithm implementations in Pig

2010-02-22 Thread Ankur C. Goel

Those would be passed as parameters either through -param option or through a parameter file with -param_file option and the pig's preprocessor just substitutes the values in your script. Since its just a blind parameter substitution, in my shingling script I even had the schema definition pass

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

As an interesting test case, can you write a pig program that counts words. BUT, it takes an input file name AND an input field name. On Mon, Feb 22, 2010 at 9:56 AM, Ted Dunning wrote: > > That isn't an issue here. It is the invocation of pig programs and passing > useful information to them

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

That isn't an issue here. It is the invocation of pig programs and passing useful information to them that is the problem. On Mon, Feb 22, 2010 at 9:20 AM, Ankur C. Goel wrote: > Scripting ability while still limited has better streaming support so you > can have relations streamed > Into a cus

Re: Algorithm implementations in Pig

2010-02-22 Thread Ankur C. Goel

I agree with you and while some of that has been remedied, I wouldn't say things are perfect. Scripting ability while still limited has better streaming support so you can have relations streamed Into a custom script executing in either map or reduce phase depending upon where it is placed. If

Re: Algorithm implementations in Pig

2010-02-22 Thread Ankur C. Goel

In the next pig release (0.7) Pig's load/store func would be moving to use hadoop's input/output format. So there are some changes planned for that - http://wiki.apache.org/pig/Pig070IncompatibleChanges After that I don't expect any interface level change in UDF. -...@nkur On 2/22/10 10:10 PM,

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

Has the interface for writing UDF's stabilized? For quite some time, the UDF API was changing every 3 months. On Mon, Feb 22, 2010 at 12:35 AM, Jeff Zhang wrote: > Pig can only make the implementation of map-reduce easier, the numerical > computation can been done in UDF. > -- Ted Dunning,

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

Actually, no. I meant other programs written in pure Java. It used to be that the very restricted scripting ability of Pig made processing chains composed of Pig and map-reduce programs very brittle. In fact, just gluing together multiple Pig programs used to be very ugly. On Mon, Feb 22, 2010

Re: Algorithm implementations in Pig

2010-02-22 Thread David Stuart

Seems like the guys at twitter are going down the pig/hadoop http://highscalability.com/blog/2010/2/19/twitters-plan-to-analyze-100-billion-tweets.html route could be worth getting them on board the Mahout wagon especially with previous discussion had about classification efforts http://old.nab

Re: Algorithm implementations in Pig

2010-02-22 Thread Grant Ingersoll

I'm all for Pig, especially once we are a TLP. I haven't had the proper time to review the PLSI implementation, but it looks useful. I agree on the other points, though, in that I think we it would be nice to have consistent formats based on Vector so that things can be more portable. On Feb

Re: Algorithm implementations in Pig

2010-02-22 Thread Ankur C. Goel

Ted, The latest pig release 0.6.0 on hadoop 20 is a clear winner not just for performance but also for doing a better job of managing memory in its MR job pipeline. Also support for both inner and outer skewed join is something that I found indispensable when dealing with really large datas

Re: Algorithm implementations in Pig

2010-02-22 Thread Robin Anil

On Mon, Feb 22, 2010 at 1:55 PM, Ted Dunning wrote: > I see pig as useful for data preparation, but for any numerical tasks, it > is > likely to be completely hopeless. > PIG will be a great tool to experiment quickly on algorithms. But, with people here trying to focus on using Vector to stand

Re: Algorithm implementations in Pig

2010-02-22 Thread Jeff Zhang

Pig can only make the implementation of map-reduce easier, the numerical computation can been done in UDF. And piglet is a DSL upon pig latin which make pig support loop. http://github.com/iconara/piglet On Mon, Feb 22, 2010 at 4:25 PM, Ted Dunning wrote: > I see pig as useful for data prepara

Re: Algorithm implementations in Pig

2010-02-22 Thread Ted Dunning

I see pig as useful for data preparation, but for any numerical tasks, it is likely to be completely hopeless. On Mon, Feb 22, 2010 at 12:16 AM, Jeff Zhang wrote: > > Glad to hear here that mahout devs are interested in pig. Actually I > believe > pig is very helpful when you want to quickly imp

Re: Algorithm implementations in Pig

2010-02-22 Thread Jeff Zhang

Hi, Glad to hear here that mahout devs are interested in pig. Actually I believe pig is very helpful when you want to quickly implement a prototype of machine learning algorithms. And Pig has java API, it is easy to integrate pig script with java. Maybe we can start with implementing NB using pig

Re: Algorithm implementations in Pig

2010-02-21 Thread Ted Dunning

I have had both positive and negative results with PIG. The positive results were that I was able to express large recommendation computations in a very concise way. That was really helpful. My negative results have been to do with the brittle nature of PIG vis a vis the version of the underlyin

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

RE: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

RE: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

Re: Algorithm implementations in Pig

21 matches

Site Navigation

Mail list logo

Footer information