On 21.06.2010 Ted Dunning wrote:
> I would like to start a discussion about a framework that we can fit all of
> these approaches together in much the same way that the recommendations
> stuff has such nice pluggable properties.
+1 Like the ideas that have been tossed around in this discussion. Do
I agree that models should be highly generic. I just don't think that we
should legislate the content of either their internal model, nor of their
serialized representation.
The contract is pretty clear, however. There are just a few methods and it
isn't hard for all models to support them, espe
On Tue, Jun 22, 2010 at 9:47 AM, Robin Anil wrote:
> >
> > Again, I would recommend a blob as the on-disk
> > format.
Why a blob. Why not a flexible multi list of matrices and vectors?
> Is there any model storing byte level information ?
>
The SGD has a parameter vector as well as a trace dic
On Tue, Jun 22, 2010 at 9:44 AM, Robin Anil wrote:
> > > On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil
> > wrote:
> > >
> > >> A Classifier Training Job will take a Trainer, and a Vector location
> and
> > >
> > > produce a Model
> >
>
> How about A tranform layer which converts ondisk data into v
Wikipedia unigram dictionary is 381MB on disk. Bigram and trigram sizes will
explode like anything. So Vectorizer could be a pass through if reading
vectors(parallely generated) in each of the jobs or on the fly converted if
using the randomizer
The reason I said models be generic is because they
>
> Again, I would recommend a blob as the on-disk
> format. Why a blob. Why not a flexible multi list of matrices and vectors?
> Is there any model storing byte level information ?
>
>
> >
> >
> > On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil
> wrote:
> >
> >> A Classifier Training Job will take a Trainer, and a Vector location and
> >
> > produce a Model
>
How about A tranform layer which converts ondisk data into vectors
seamlessly? That should solve the issue
On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil wrote:
> See how this sound(listing down requirements)
>
> A model can be class with a list of matrices, a list of vectors. Each
> algorithm takes care of naming these matrices/vectors and reading and
> writing values to it (similar to Datastore)
>
I t
On Tue, Jun 22, 2010 at 8:33 AM, Grant Ingersoll wrote:
>
> On Jun 21, 2010, at 1:12 PM, Ted Dunning wrote:
>
> > We really need to have a simple way to integrate all of the input
> processing
> > options easily into new and old code
>
> More or less, what we need is a pipeline that can ingest man
On Tue, Jun 22, 2010 at 9:25 AM, Ted Dunning wrote:
>
>
> On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil wrote:
>
>> A Classifier Training Job will take a Trainer, and a Vector location and
>
> produce a Model
>>
>
> No. Well, not exclusively, anyway. We can't be limited to reading vectors
> due
On Jun 21, 2010, at 1:12 PM, Ted Dunning wrote:
> We are now beginning to have lots of classifiers in Mahout. The naive
> Bayes, complementary naive Bayes and random Forest grandfathers have been
> joined by my recent SGD and Zhao Zhendong's prolific set of approaches for
> logistic regression a
See how this sound(listing down requirements)
A model can be class with a list of matrices, a list of vectors. Each
algorithm takes care of naming these matrices/vectors and reading and
writing values to it (similar to Datastore)
All Classifiers will work with vectors
All Trainers will work with v
12 matches
Mail list logo