I think It changed after Jeff commit his code. It was there earlier.
On Mon, Apr 26, 2010 at 12:24 AM, Sean Owen wrote:
> Where though, I just deleted all the methods to try it and every test
> passes.
>
> On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote:
> > Its used in clustering to generat
Where though, I just deleted all the methods to try it and every test passes.
On Sun, Apr 25, 2010 at 7:51 PM, Robin Anil wrote:
> Its used in clustering to generate clusterid -> point id. Also to be used in
> classification(by end of this summer) to keep class labels.
On Mon, Apr 26, 2010 at 12:17 AM, Sean Owen wrote:
> I agree that it'd be good to kind of finalize the Vector stuff. I
> don't think it's reasonable for users to expect data output by 0.3 to
> be compatible with 0.4 though, so wouldn't worry about that.
>
> I think we're on the verge of wanting a
I agree that it'd be good to kind of finalize the Vector stuff. I
don't think it's reasonable for users to expect data output by 0.3 to
be compatible with 0.4 though, so wouldn't worry about that.
I think we're on the verge of wanting a proper serialization system
like Avro for vectors here -- but
Vector is simply any one of (array of doubles) or array of(int:double) and
this info and other stuff are stored in a MetadataWritable. Makes sense to
me, assuming MetadataWritable allows us to skip over efficiently without
Deserializing
On Sun, Apr 25, 2010 at 8:58 PM, Sean Owen wrote:
> Yes, I
Yes, I think if we can convince ourselves that there won't be that
many different possibilities for representing a vector, then a simple
boolean might unify everything. This approach doesn't 'scale' but I
don't know there are other representations we must have.
The issue of named vectors is intere
[
https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860690#action_12860690
]
Jake Mannix commented on MAHOUT-369:
Danny, thanks for looking into this so carefully,
[
https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix reassigned MAHOUT-369:
--
Assignee: Jake Mannix
> Issues with DistributedLanczosSolver output
> -
>
>
> - How about moving label bindings out to NamedVector?
> - How about similar restructuring of matrices?
>
I dont know what the correct choice is here. It depends on whether we
should keep a single written representation for all vectors on disk. Then an
optional field could be there for name
PS let's see a patch to keep discussing, I'm seeing ideas on lots of
good topics here and want to take the opportunity to strike while the
iron is hot and continue overhauling this.
But things like making everything a named vector is sort of stepping
backwards to where we just agreed to move from
I'm not seeing it in my client, hmm.
While I'd tend to guess my change broke it, I don't see the direct
link... this code writes TreeID -> MapredOutput in its test and then
tries to read exactly that. I don't yet see how the
SequenceFile.Reader expects anything related to VectorWritable nor why
it
Is this happening to anyone else?
---
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.184 sec
<<< FAILURE!
testProcessOutput(org.apache.mahout.df.mapreduce.partial.PartialBuilderTest)
Time elapsed: 0.171
12 matches
Mail list logo