[jira] Created: (MAHOUT-323) Classify new data using Decision Forest

2010-03-05 Thread Deneche A. Hakim (JIRA)
Classify new data using Decision Forest --- Key: MAHOUT-323 URL: https://issues.apache.org/jira/browse/MAHOUT-323 Project: Mahout Issue Type: Improvement Components: Classification Affects Ve

[jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842213#action_12842213 ] Jake Mannix commented on MAHOUT-322: It should actually be noted that Danny's original

[jira] Resolved: (MAHOUT-314) DistributedRowMatrix needs a sparse DistributedRowMatrix times(DistributedRowMatrix other) implementation

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-314. Resolution: Fixed Fix Version/s: 0.3 Committed. Current implementation is a map-side join

[jira] Resolved: (MAHOUT-313) DistributedRowMatrix needs times(Vector) implementation as M/R job

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-313. Resolution: Fixed Fix Version/s: 0.3 Committed, code piggybacks on timesSquared() with a lit

[jira] Resolved: (MAHOUT-310) LanczosSolver and DistributedLanczosSolver always assume rectangular input, but should also handle symmetric eigensystems.

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-310. Resolution: Fixed Fix Version/s: 0.3 committed > LanczosSolver and DistributedLanczosSolver

[jira] Resolved: (MAHOUT-312) DistributedRowMatrix iterateAll() and iterate() don't work on multi-part SequenceFiles

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-312. Resolution: Fixed Fix Version/s: 0.3 committed > DistributedRowMatrix iterateAll() and iter

Re: Yet another 0.3 patch?

2010-03-05 Thread Ted Dunning
+1 Sounds right to me. On Fri, Mar 5, 2010 at 6:33 PM, Drew Farris wrote: > Unit tests are included and I've regression tested the patch against > the original implementation on the 20news corpus -- it produces the > same results. > > So, with the group's blessing I will commit. > -- Ted Du

Re: svd algorithms

2010-03-05 Thread Ted Dunning
Speaking of spinning, Mike, there is a bit of a move afoot to use the 0.3 release to do some *really* big SVD in order to claim a size record of sorts. The goal is to find some realistic and interesting matrix with about 5 x 10^9 non-zero elements. On Fri, Mar 5, 2010 at 8:05 PM, Jake Mannix wro

Re: svd algorithms

2010-03-05 Thread Jake Mannix
Hi Mike, Welcome to the long journey down the road of dimensional reduction. :) On Fri, Mar 5, 2010 at 5:05 PM, mike bowles wrote: > > Really large matrices require using one of the randomizing methods to get > done. "Require" is a strong term. Really really large (but still sparse!) matric

Yet another 0.3 patch?

2010-03-05 Thread Drew Farris
In the spirit of Jake's message, would anyone be opposed to a commit of MAHOUT-317? (https://issues.apache.org/jira/browse/MAHOUT-317) It is a re-factoring of the LLR Collocation work to eliminate in-memory frequency calculations for ngram and n-1gram frequencies. Using a secondary sort eliminates

Re: svd algorithms

2010-03-05 Thread Ted Dunning
Mike, http://issues.apache.org/jira/browse/MAHOUT-180 might be of interest. Jake has done a fair bit of work beyond that. Next up is a stochastic decomposition version. You can see the seeds of that in Jake's other JIRA's. On Fri, Mar 5, 2010 at 5:05 PM, mike bowles wrote: > ... I thought i

svd algorithms

2010-03-05 Thread mike bowles
I've been trying to figure out how to code an svd algorithm and I've seen some questions about svd algorithms floating around the Mahout mailing lists. I thought it might be helpful to share what I've found so far. Really large matrices require using one of the randomizing methods to get don

Re: 0.3 Patches

2010-03-05 Thread Jake Mannix
Ha! Perpetual code freeze to get new features, now there's a concept! Three +1's, ok if I don't get any negative feedback before I get back to a computer, I'll check in. I've just added more wiki pages for me to write, too I guess... -jake On Mar 5, 2010 2:53 PM, "Jeff Eastman" wrote: Robi

Re: 0.3 Patches

2010-03-05 Thread Jeff Eastman
Robin Anil wrote: Seems we are most productive when its a code freeze :) +1 from me as well. You have time till Hadoop resolves 6617, assuming nothing gets broken Maybe we should just declare a perpetual code freeze . But you guys are really on a roll so I'm +1 too Jeff

Re: Who owns mahout bucket on s3?

2010-03-05 Thread Jake Mannix
On Thu, Mar 4, 2010 at 7:41 AM, Robin Anil wrote: > Based on what i have in mind, the usage will just be > > mahout vectorize -i s3://input -o s3://output -tmp hdfs://file (here, there > is a risk of fixing a exact path and not knowing the hadoop user, I would > have preferred a relative path) >

Re: 0.3 Patches

2010-03-05 Thread Robin Anil
Seems we are most productive when its a code freeze :) +1 from me as well. You have time till Hadoop resolves 6617, assuming nothing gets broken Robin On Sat, Mar 6, 2010 at 12:50 AM, Jake Mannix wrote: > Hey all, > > Our "flash-freeze" has unthawed considerably, but I've been trying to be > g

Re: 0.3 Patches

2010-03-05 Thread Ted Dunning
Tentative +1 from me. On Fri, Mar 5, 2010 at 11:20 AM, Jake Mannix wrote: > Can I get these in for 0.3? I could commit today if it's ok with the team. -- Ted Dunning, CTO DeepDyve

0.3 Patches

2010-03-05 Thread Jake Mannix
Hey all, Our "flash-freeze" has unthawed considerably, but I've been trying to be good and not check in stuff with functionality improvements I've been wanting to check in. What do you folks say about me checking in fixes for MAHOUT-312 DistributedRowMatrix iterateAll() and iterate() don't wo

[jira] Resolved: (MAHOUT-315) VectorDumper should also do printing to simple {index : value, index : value, ... } output, if no dictionary is specified.

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-315. Resolution: Fixed Fix Version/s: (was: 0.4) 0.3 Committed. > VectorD

Re: 0.3 release issues

2010-03-05 Thread Drew Farris
Ahh, a JIRA issue, I should have thought of that. Thanks Benson. Drew On Fri, Mar 5, 2010 at 12:59 PM, Benson Margulies wrote: > https://issues.apache.org/jira/browse/HADOOP-6617 > > > On Fri, Mar 5, 2010 at 9:32 AM, Grant Ingersoll wrote: > >> Has anyone filed a JIRA with them to do so? >> >>

Re: 0.3 release issues

2010-03-05 Thread Benson Margulies
https://issues.apache.org/jira/browse/HADOOP-6617 On Fri, Mar 5, 2010 at 9:32 AM, Grant Ingersoll wrote: > Has anyone filed a JIRA with them to do so? > > The Extremely Esteemed PMC Chair (aka Paper Pusher Extraordinaire), > Grant > > On Mar 5, 2010, at 7:28 AM, Benson Margulies wrote: > > > Co

Re: 0.3 release issues

2010-03-05 Thread Benson Margulies
Coming Right Up. On Fri, Mar 5, 2010 at 9:32 AM, Grant Ingersoll wrote: > Has anyone filed a JIRA with them to do so? > > The Extremely Esteemed PMC Chair (aka Paper Pusher Extraordinaire), > Grant > > On Mar 5, 2010, at 7:28 AM, Benson Margulies wrote: > > > Could I be stupid for a moment? Our

Re: 0.3 release issues

2010-03-05 Thread Grant Ingersoll
Has anyone filed a JIRA with them to do so? The Extremely Esteemed PMC Chair (aka Paper Pusher Extraordinaire), Grant On Mar 5, 2010, at 7:28 AM, Benson Margulies wrote: > Could I be stupid for a moment? Our fellow Apache project, Hadoop, makes > releases but doesn't bother to stick them into

Re: 0.3 release issues

2010-03-05 Thread Drew Farris
On Fri, Mar 5, 2010 at 7:28 AM, Benson Margulies wrote: > > Are we proposing to just do their work for them, or to publish them under a > Mahout-specific Maven triple? The former situation would be the most ideal of the two. > If the later, I would ask our esteemed PMC > chair to make a personal

Re: 0.3 release issues

2010-03-05 Thread Benson Margulies
Could I be stupid for a moment? Our fellow Apache project, Hadoop, makes releases but doesn't bother to stick them into the Apache repo where they will replicate to central? Are we proposing to just do their work for them, or to publish them under a Mahout-specific Maven triple? If the later, I wo

Re: 0.3 release issues

2010-03-05 Thread Drew Farris
OK, Sean. I still need to take a final pass over the pom: I'll replace the placeholder version variables with real versions and switch the group to be org.apache.mahout.hadoop (as before) instead of org.apache.hadoop. Once the pom's in good shape, I'll switch the dependency in mahout, do a full re

Re: 0.3 release issues

2010-03-05 Thread Sean Owen
If you have the mojo ready and working, commit? Sounds OK to me. On Fri, Mar 5, 2010 at 5:14 AM, Drew Farris wrote: > Ok, the unit tests completed successfully with this setup. I suspect > this means we probably want to deploy our own dependency for hadoop > 0.20.2 with the proper versions specif