I think It changed after Jeff commit his code. It was there earlier.
On Mon, Apr 26, 2010 at 12:24 AM, Sean Owen wrote:
> Where though, I just deleted all the methods to try it and every test
> passes.
>
> On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote:
> > Its used in clustering to generat
Where though, I just deleted all the methods to try it and every test passes.
On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote:
> Its used in clustering to generate clusterid -> point id. Also to be used in
> classification(by end of this summer) to keep class labels.
On Mon, Apr 26, 2010 at 12:17 AM, Sean Owen wrote:
> I agree that it'd be good to kind of finalize the Vector stuff. I
> don't think it's reasonable for users to expect data output by 0.3 to
> be compatible with 0.4 though, so wouldn't worry about that.
>
> I think we're on the verge of wanting a
I agree that it'd be good to kind of finalize the Vector stuff. I
don't think it's reasonable for users to expect data output by 0.3 to
be compatible with 0.4 though, so wouldn't worry about that.
I think we're on the verge of wanting a proper serialization system
like Avro for vectors here -- but
Vector is simply any one of (array of doubles) or array of(int:double) and
this info and other stuff are stored in a MetadataWritable. Makes sense to
me, assuming MetadataWritable allows us to skip over efficiently without
Deserializing
On Sun, Apr 25, 2010 at 8:58 PM, Sean Owen wrote:
> Yes, I
Yes, I think if we can convince ourselves that there won't be that
many different possibilities for representing a vector, then a simple
boolean might unify everything. This approach doesn't 'scale' but I
don't know there are other representations we must have.
The issue of named vectors is intere
>
>
> - How about moving label bindings out to NamedVector?
> - How about similar restructuring of matrices?
>
I dont know what the correct choice is here. It depends on whether we
should keep a single written representation for all vectors on disk. Then an
optional field could be there for name
PS let's see a patch to keep discussing, I'm seeing ideas on lots of
good topics here and want to take the opportunity to strike while the
iron is hot and continue overhauling this.
But things like making everything a named vector is sort of stepping
backwards to where we just agreed to move from
NamedVectorWritable already extends VectorWritable, though honestly I
don't like that and kept it to minimize disruption.
Serialized vector formats aren't exactly "polymorphic". I can't read
and X vector with the code intended to deserialize something that
extends X. So, really the Writables shoul
Put in other words, this would mean that there is either one or two output
formats but most importantly only one input format that would always read
NamedVectorWritables, possibly by inserting default names. Due to
inheritance, those objects would serve both purposes.
That sounds good and simple.
On Sat, Apr 24, 2010 at 11:50 PM, Ted Dunning wrote:
> If we are talking about the Writable aspect of this, then whatever input
> format we use should reasonably be able to handle both kinds of data with
> the conversions as you suggest.
>
Yes, Having two separate writable classes as of the momen
If we are talking about the Writable aspect of this, then whatever input
format we use should reasonably be able to handle both kinds of data with
the conversions as you suggest.
For algorithms that are accepting arguments of a particular type, it might
be reasonable to let NVW extend VW (I am not
Some algorithms are using NamedVectorWritable, Some using VectorWritable.
Shouldn't we need an identity convertor for forward and some form of naming
assign convertor for backward conversion. Otherwise its going to be messy
Robin
13 matches
Mail list logo