Re: Spec for a common import/export service for Mahout jobs

2011-09-13 Thread Sean Owen
On Tue, Sep 13, 2011 at 6:27 AM, Lance Norskog goks...@gmail.com wrote: Machine learning has quite a few algorithms where data is processed in a way foreign to its domain. Running SVD on user/item/preference matrices is a great example: this makes no sense whatsoever. (Why?? this is one of the

Re: Spec for a common import/export service for Mahout jobs

2011-09-13 Thread Ted Dunning
The cluster, classification and decompositional jobs all like the same kind of input. These can be viewed as matrices or sequences of vectors; it comes to much the same sort of thing. The gotcha is that the user often has tokens in fielded documents (ratings, documents, purchase history). Other

Re: Spec for a common import/export service for Mahout jobs

2011-09-12 Thread Sean Owen
I think we discussed several of these points on the mailing list. I am not sure I would ever expect there to be a common format across all jobs. They just don't all operate on the same information. Even where two jobs ingest vectors, it doesn't mean vectors for one are meaningful for another. If

Re: Spec for a common import/export service for Mahout jobs

2011-09-12 Thread Lance Norskog
I am not sure I would ever expect there to be a common format across all jobs. They just don't all operate on the same information. Even where two jobs ingest vectors, it doesn't mean vectors for one are meaningful for another. Machine learning has quite a few algorithms where data is

Spec for a common import/export service for Mahout jobs

2011-09-11 Thread Lance Norskog
https://cwiki.apache.org/confluence/display/MAHOUT/Import+Export+Sequence+File+Formats Please have a look; comment or rewrite as you please. It's a wish list of what I would want, approaching Mahout either as an experienced user or as a newbie. -- Lance Norskog goks...@gmail.com