Mahout - Pig Hackday

2012-05-02 Thread Timothy Potter
Hi, The Dachis Group data analytics team are big users of Pig and just a little Mahout so far, but that's changing soon so we'd like to contribute some of our works and know-how back to the Mahout / Pig communities. In the near term, we're planning a Pig-Mahout hackday (code-named Pigout) at our

Re: Mahout - Pig Hackday

2012-05-02 Thread Ted Dunning
On Wed, May 2, 2012 at 11:06 AM, Timothy Potter thelabd...@gmail.comwrote: We're really keen on Ted's pig-vector project (https://github.com/tdunning/pig-vector) as we're building a number of classifiers on Mahout's SGD framework, with the bulk of our data being in Cassandra processed almost

Re: Mahout - Pig Hackday

2012-05-02 Thread praneet mhatre
On Wed, May 2, 2012 at 11:13 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, May 2, 2012 at 11:06 AM, Timothy Potter thelabd...@gmail.com wrote: We're really keen on Ted's pig-vector project (https://github.com/tdunning/pig-vector) as we're building a number of classifiers on

Re: Mahout - Pig Hackday

2012-05-02 Thread Timothy Potter
Thanks Ted! Removing the elephant-bird dependency / build problems sounds like a good task we should include in our plans for the hackday ... what are your thoughts on adding pig-vector to Mahout as a contrib module? Do you want to keep it separate or eventually make its way into the project?

Re: Mahout - Pig Hackday

2012-05-02 Thread Andy Schlaikjer
Hi Tim, Ted, I wanted to chime in here regarding Elephant Bird utilities for Pig-Mahout integration. I'm the author of EB's SequenceFileLoader, SequenceFileStorage, and all the supporting WritableConverters, including the VectorWritableConverter which facilitates conversion of Mahout Vector data

Re: Mahout - Pig Hackday

2012-05-02 Thread Ted Dunning
Making a pig module for mahout is a fine idea. The twitter guys may have something better, though, so we should explore that as well. Andy's comments make that possibility very interesting. On Wed, May 2, 2012 at 5:20 PM, Timothy Potter thelabd...@gmail.com wrote: Thanks Ted! Removing the

Re: [mahout] labels in clustering algorythms

2012-05-02 Thread Konstantin Shmakov
Mahout is missing integration tools, this is true. Data have to be converted to Mahout-accepted input. One way of doing it is outlined below: 1) collect unique terms from your data and make the dictionary of terms. This can be done by any means, e.g. Hadoop streaming job in 2 steps - collect

Re: Mahout - Pig Hackday

2012-05-02 Thread Jake Mannix
On Wed, May 2, 2012 at 8:07 PM, Ted Dunning ted.dunn...@gmail.com wrote: Making a pig module for mahout is a fine idea. The twitter guys may have something better, though, so we should explore that as well. Andy's comments make that possibility very interesting. What I'd want to suggest is

Re: Mahout - Pig Hackday

2012-05-02 Thread Ted Dunning
On Wed, May 2, 2012 at 9:05 PM, Jake Mannix jake.man...@gmail.com wrote: On Wed, May 2, 2012 at 8:07 PM, Ted Dunning ted.dunn...@gmail.com wrote: Making a pig module for mahout is a fine idea. The twitter guys may have something better, though, so we should explore that as well. Andy's