On Tue, Sep 13, 2011 at 6:27 AM, Lance Norskog goks...@gmail.com wrote:
Machine learning has quite a few algorithms where data is processed in a way
foreign to its domain. Running SVD on user/item/preference matrices is a
great example: this makes no sense whatsoever.
(Why?? this is one of the
The cluster, classification and decompositional jobs all like the same kind
of input. These can be viewed as matrices or sequences of vectors; it comes
to much the same sort of thing. The gotcha is that the user often has
tokens in fielded documents (ratings, documents, purchase history). Other
I think we discussed several of these points on the mailing list.
I am not sure I would ever expect there to be a common format across
all jobs. They just don't all operate on the same information. Even
where two jobs ingest vectors, it doesn't mean vectors for one are
meaningful for another.
If
I am not sure I would ever expect there to be a common format across all
jobs. They just don't all operate on the same information. Even
where two jobs ingest vectors, it doesn't mean vectors for one are
meaningful for another.
Machine learning has quite a few algorithms where data is
https://cwiki.apache.org/confluence/display/MAHOUT/Import+Export+Sequence+File+Formats
Please have a look; comment or rewrite as you please. It's a wish list of
what I would want, approaching Mahout either as an experienced user or as a
newbie.
--
Lance Norskog
goks...@gmail.com