Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Robin Anil
I think It changed after Jeff commit his code. It was there earlier. On Mon, Apr 26, 2010 at 12:24 AM, Sean Owen wrote: > Where though, I just deleted all the methods to try it and every test > passes. > > On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote: > > Its used in clustering to generat

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Sean Owen
Where though, I just deleted all the methods to try it and every test passes. On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote: > Its used in clustering to generate clusterid -> point id. Also to be used in > classification(by end of this summer) to keep class labels.

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Robin Anil
On Mon, Apr 26, 2010 at 12:17 AM, Sean Owen wrote: > I agree that it'd be good to kind of finalize the Vector stuff. I > don't think it's reasonable for users to expect data output by 0.3 to > be compatible with 0.4 though, so wouldn't worry about that. > > I think we're on the verge of wanting a

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Sean Owen
I agree that it'd be good to kind of finalize the Vector stuff. I don't think it's reasonable for users to expect data output by 0.3 to be compatible with 0.4 though, so wouldn't worry about that. I think we're on the verge of wanting a proper serialization system like Avro for vectors here -- but

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Robin Anil
Vector is simply any one of (array of doubles) or array of(int:double) and this info and other stuff are stored in a MetadataWritable. Makes sense to me, assuming MetadataWritable allows us to skip over efficiently without Deserializing On Sun, Apr 25, 2010 at 8:58 PM, Sean Owen wrote: > Yes, I

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Sean Owen
Yes, I think if we can convince ourselves that there won't be that many different possibilities for representing a vector, then a simple boolean might unify everything. This approach doesn't 'scale' but I don't know there are other representations we must have. The issue of named vectors is intere

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Robin Anil
> > > - How about moving label bindings out to NamedVector? > - How about similar restructuring of matrices? > I dont know what the correct choice is here. It depends on whether we should keep a single written representation for all vectors on disk. Then an optional field could be there for name

Re: How to tackle Vector->NamedVector and back conversion

2010-04-25 Thread Sean Owen
PS let's see a patch to keep discussing, I'm seeing ideas on lots of good topics here and want to take the opportunity to strike while the iron is hot and continue overhauling this. But things like making everything a named vector is sort of stepping backwards to where we just agreed to move from

Re: How to tackle Vector->NamedVector and back conversion

2010-04-24 Thread Sean Owen
NamedVectorWritable already extends VectorWritable, though honestly I don't like that and kept it to minimize disruption. Serialized vector formats aren't exactly "polymorphic". I can't read and X vector with the code intended to deserialize something that extends X. So, really the Writables shoul

Re: How to tackle Vector->NamedVector and back conversion

2010-04-24 Thread Ted Dunning
Put in other words, this would mean that there is either one or two output formats but most importantly only one input format that would always read NamedVectorWritables, possibly by inserting default names. Due to inheritance, those objects would serve both purposes. That sounds good and simple.

Re: How to tackle Vector->NamedVector and back conversion

2010-04-24 Thread Robin Anil
On Sat, Apr 24, 2010 at 11:50 PM, Ted Dunning wrote: > If we are talking about the Writable aspect of this, then whatever input > format we use should reasonably be able to handle both kinds of data with > the conversions as you suggest. > Yes, Having two separate writable classes as of the momen

Re: How to tackle Vector->NamedVector and back conversion

2010-04-24 Thread Ted Dunning
If we are talking about the Writable aspect of this, then whatever input format we use should reasonably be able to handle both kinds of data with the conversions as you suggest. For algorithms that are accepting arguments of a particular type, it might be reasonable to let NVW extend VW (I am not

How to tackle Vector->NamedVector and back conversion

2010-04-24 Thread Robin Anil
Some algorithms are using NamedVectorWritable, Some using VectorWritable. Shouldn't we need an identity convertor for forward and some form of naming assign convertor for backward conversion. Otherwise its going to be messy Robin