[jira] Commented: (MAHOUT-369) Issues with DistributedLanczosSolver output

2010-04-25 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860690#action_12860690 ] Jake Mannix commented on MAHOUT-369: Danny, thanks for looking into this so caref

[jira] Assigned: (MAHOUT-369) Issues with DistributedLanczosSolver output

2010-04-25 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix reassigned MAHOUT-369: -- Assignee: Jake Mannix > Issues with DistributedLanczosSolver out

[jira] Commented: (MAHOUT-364) [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop

2010-04-19 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858659#action_12858659 ] Jake Mannix commented on MAHOUT-364: Moving this discussion over to MAHOUT-383

Re: AbstractVector.minus(Vector)

2010-04-19 Thread Jake Mannix
52 AM, Sean Owen wrote: > On Mon, Apr 19, 2010 at 5:33 PM, Jake Mannix > wrote: > > result.times(-1.0) > > with > > result.assign(Functions.negate) > > Cool, good one. > > > The efficiency points are twofold: number of nonzero elements, and > > the impl: y

Re: AbstractVector.minus(Vector)

2010-04-19 Thread Jake Mannix
On Mon, Apr 19, 2010 at 9:13 AM, Sean Owen wrote: > More on Vector, as I'm browsing through it: > > AbstractVector.minus(Vector) says: > //snip > The stanza after the instanceof checks can just become the body of an > overriding method in these two subclasses right? > Yep, sure. > Since we'

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
On Sun, Apr 18, 2010 at 3:23 PM, Sean Owen wrote: > On Sun, Apr 18, 2010 at 11:16 PM, Jake Mannix > wrote: > > VectorWritable currently is a proper decorator, right? It doesn't even > > implement Vector at all. > > Yeah, the other *Writable classes should be as we

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
at is just math, not labels. That does begin to make things > > complex, though. > > > > This polymorphism pain makes putting the name into the vector and > accepting > > whatever strange semantics that result (missing == "" instead of null, > for > > i

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
about making this a decorator pattern rather than subclass. On Sun, Apr 18, 2010 at 7:26 PM, Jake Mannix wrote: > What would be the Wri...

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
What would be the Writable hierarchy with this NamedVector proposal? > > On Apr 18, 2010 11:05 AM, "Sean Owen" wrote: > > On keeping 'name': sure, I ... On Sun, Apr 18, 2010 at 6:45 PM, Jake Mannix wrote: > Ok this is a good con...

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
tes that contract to compile-time checking which seems like a good thing. Am I convincing? On Sun, Apr 18, 2010 at 6:45 PM, Jake Mannix wrote: > Ok this is a good con...

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
istingCanopies emits clusterId :: VectorWritable On 4/18/10 10:07 AM, Jake Mannix wrote: > > In code we already have? > > -jake > > On Apr 18, 2...

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
> > On Sun, Apr 18, 2010 at 4:41 PM, Jake Mannix > wrote: > > Which one is "this"? Wrapping Vector impls into a > > NamedVector/LabeledVector, > > or seeing if we even need the label *inside* of the Vector itself, and > > instead > &g

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-18 Thread Jake Mannix
Which one is "this"? Wrapping Vector impls into a NamedVector/LabeledVector, or seeing if we even need the label *inside* of the Vector itself, and instead just having those live in the "key" part of the key-value pair in hadoop, like DistributedRowMatrix has it? -jake On Sun, Apr 18, 2010 at

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-17 Thread Jake Mannix
On Sat, Apr 17, 2010 at 2:14 PM, Robin Anil wrote: > > For this bug, lets put the id back in and remove it from the > comparator/equals. Lets focus on getting the document structure correct > You mean put the 'name' back in? Since Sean has done the initial work of possibly completely removing it

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
On Fri, Apr 16, 2010 at 11:56 AM, Sean Owen wrote: > On Fri, Apr 16, 2010 at 7:39 PM, Jake Mannix > wrote: > > I will start playing around with Anthony's github-based stuff, and > > see where a patch can be made. The question is where it would > > go? It's a

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
On Fri, Apr 16, 2010 at 11:26 AM, Grant Ingersoll wrote: > > On Apr 16, 2010, at 2:21 PM, Jake Mannix wrote: > > > So here's my take: once we're a TLP (next month sometime?), it is > > a good time to start allowing subprojects or submodules which are > > Subm

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
On Fri, Apr 16, 2010 at 11:31 AM, Robin Anil wrote: > > > > > > > > Hmm... this was a bit scattered of a response, but I'm really loathe > > to turn away a) nice hooks between Solr and Mahout, b) scripting-style > > wrappers which could expand our community, and c) simply new > > functionality. >

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
So here's my take: once we're a TLP (next month sometime?), it is a good time to start allowing subprojects or submodules which are "scripting" layers on top of Mahout - whether they are PigLatin, or Cascalog, JRuby, or Clojure. If it's JVM-based, especially, having code/scripts which are "drivers

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
On Fri, Apr 16, 2010 at 7:56 AM, Robin Anil wrote: > On Fri, Apr 16, 2010 at 7:52 PM, Anthony wrote: > > > All, > > > > I have begun work on an integration of Apache Solr and Mahout, > > http://github.com/algoriffic/lsa4solr which is related to #MAHOUT-343 > > (https://issues.apache.org/jira/b

Re: mahout/solr integration

2010-04-16 Thread Jake Mannix
gin implementing a > hierarchical clustering algorithm so that the number of clusters does > not need to be specified in advance. Has anyone done anything like > this in Mahout yet? Also, I'd be happy to contribute the code to > Mahout if anyone is interested. > > Thanks, > An

Re: Having some trouble with SequentialAccessSparseVector.DenseVector

2010-04-15 Thread Jake Mannix
Hey Sean, On Thu, Apr 15, 2010 at 7:16 AM, Sean Owen wrote: > Along the way to a patch for MAHOUT-379, I'm having some trouble > figuring out SequentialAccessSparseVector.DenseVector. I think it can > be simplified, but unless I'm misunderstanding there are several bugs > here. I'd like to find

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-14 Thread Jake Mannix
+1 -jake On Apr 14, 2010 3:20 PM, "Jeff Eastman" wrote: Ted Dunning wrote: > > On Wed, Apr 14, 2010 at 12:53 PM, Sean Owen < sro...@gmail.com> wrote: > > >... +1 from the creator thereof, even. Especially since they never got used.

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

2010-04-14 Thread Jake Mannix
Ok, back on list with this then (Thanks Danny for reminding us to deal with this perennial issue we have!) On Wed, Apr 14, 2010 at 2:26 AM, Sean Owen (JIRA) wrote: > > Yeah let's take some time to get this right. At the moment I see four > notions of equivalence in Vector (which is down from fi

[jira] Commented: (MAHOUT-364) [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop

2010-04-13 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856711#action_12856711 ] Jake Mannix commented on MAHOUT-364: Zoran, Any form of BSD-style license

Re: Status of Mahout TLP

2010-04-12 Thread Jake Mannix
>From what Grant said last time we talked about this, we need to wait until the next Apache directors meeting (or whatever it's called) before we move forward with that, I thought. -jake On Mon, Apr 12, 2010 at 2:43 PM, Robin Anil wrote: > Hi everyone, > I am just checking on

Re: [jira] Updated: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

2010-04-11 Thread Jake Mannix
I haven't had a chance to read your attached pdf, but I *have* had a chance to code up an impl of this jira. Patch coming soon. On Apr 11, 2010 6:50 AM, "Ted Dunning (JIRA)" wrote: [ https://issues.apache.org/jira/browse/MAHOUT-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-t

[jira] Commented: (MAHOUT-364) [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop

2010-04-11 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855742#action_12855742 ] Jake Mannix commented on MAHOUT-364: Hi Zoran, Neuroph looks very interes

[jira] Commented: (MAHOUT-369) Issues with DistributedLanczosSolver output

2010-04-08 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855014#action_12855014 ] Jake Mannix commented on MAHOUT-369: Hold on that Sean, I made the loop like that

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-08 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854950#action_12854950 ] Jake Mannix commented on MAHOUT-363: If possible, Shannon, if you could simply

[jira] Commented: (MAHOUT-364) [GSOC] Proposal to implement Neural Network with backpropagation learning on Hadoop

2010-04-06 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854304#action_12854304 ] Jake Mannix commented on MAHOUT-364: I've got to say, this is a fantastic

Re: Proposal: make collections releases independent of the rest of Mahout

2010-04-06 Thread Jake Mannix
I guess I'm fine with whatever, making fast releases of collections is in fact pretty cool, it will give us practice with making releases in mahout in general. And if we can do this for mahout-math as well, some of us who care about, for example, eventually adding unit tests for all of the old Col

Re: Proposal: make collections releases independent of the rest of Mahout

2010-04-06 Thread Jake Mannix
I agree in principal, but having a whole different set of versionings seems kinda... messy? If m-collections goes 1.0, and then 1.1, and then m-math goes 1.0, and core goes to 0.5, we have a whole pile of different version numbers to keep track of. Didn't Lucene and Solr just intentionally do the

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Jake Mannix
Hi Richard, A few notes about what would be required to get a nice distributed SVD recommender in Mahout: if you look at the current distributed recommenders (in org.apache.mahout.cf.taste.hadoop package and children), you can see how it works: using HDFS-backed data, a batch of recommendations

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
thanks. On Sun, Apr 4, 2010 at 10:40 PM, Sean Owen wrote: > Oh OK I'll revert the change then, didn't know you wanted that. Some > of the other statements could probably go but not worth digging > through it. > > On Mon, Apr 5, 2010 at 6:33 AM, Jake Mannix wrote: &

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
table thought but for me collections and Math are just tools to aid > complex algorithms in Mahout core. Maybe we can move it under core and > adding the required logging. > > Robin > > > On Mon, Apr 5, 2010 at 11:03 AM, Jake Mannix > wrote: > > > Umm, I actuall

[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853311#action_12853311 ] Jake Mannix commented on MAHOUT-362: It would be really nice if this code coul

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
Umm, I actually depend pretty heavily on the logging in the SVD solvers. They are very long-running processes, and give off a ton of useful information about what the heck is going on. Reducing dependencies is great, but logging? I think the math stuff could really use logging. I haven't been a

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853308#action_12853308 ] Jake Mannix commented on MAHOUT-363: ... and actually, there is no need for Hama

[jira] Commented: (MAHOUT-350) add one "JobName" and reduceNumber parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

2010-04-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852477#action_12852477 ] Jake Mannix commented on MAHOUT-350: bq. I suppose I hadn't wanted to be pre

Re: Javadocs?

2010-03-30 Thread Jake Mannix
Awesome, thanks guys. Doesn't Maven do this kind of thing for us, if we tell it to? (ie can't we also have daily updates of the 0.4-SNAPSHOT javadocs automagically posted up there too?) -jake On Tue, Mar 30, 2010 at 6:28 AM, Sean Owen wrote: > Done, they're all up under > http://lucene.apac

Javadocs?

2010-03-29 Thread Jake Mannix
Hey gang, Where are the 0.3 javadocs on the web? All I can find right now are the 0.1's . -jake

[jira] Commented: (MAHOUT-350) add one "JobName" and reduceNumber parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

2010-03-29 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851015#action_12851015 ] Jake Mannix commented on MAHOUT-350: Don't the jobs which implement Tool

[jira] Commented: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques

2010-03-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848583#action_12848583 ] Jake Mannix commented on MAHOUT-228: Excellent. The only thing I did to mak

[jira] Updated: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques

2010-03-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-228: --- Attachment: MAHOUT-228.patch *bump* I think this is now the third time I'd brought this patch

Re: Stochastic SVD

2010-03-22 Thread Jake Mannix
PM, Ted Dunning wrote: > You are probably right. I had a wild hare tromp through my thoughts the > other day saying that one pass should be possible, but I can't reconstruct > the details just now. > > On Mon, Mar 22, 2010 at 6:00 PM, Jake Mannix > wrote: > > > I gues

Re: Stochastic SVD

2010-03-22 Thread Jake Mannix
; > On Mon, Mar 22, 2010 at 4:38 PM, Jake Mannix > wrote: > > > If you could help get MAHOUT-228 finished and put in trunk, we could > > quickly move forward on MAHOUT-309. I think this can be done in > > possibly only 2 MR passes, but we can chat about that a bit more > > as we dig into it. :) > > >

Re: Stochastic SVD

2010-03-22 Thread Jake Mannix
Hi Dmitriy, Stochastic SVD is high on my list of pieces to get into Mahout as well, but is partly dependent on getting some of Ted's murmurhash stuff from the SGD work he's got sitting idle in a patch on MAHOUT-228. If you could help get MAHOUT-228 finished and put in trunk, we could quickly

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-03-22 Thread Jake Mannix
And to provide some options, I can say that I would certainly help, Sisir, with getting the distributed SVD into the recommender framework. While there is not much fundamental "computer science" left to that, there is a fair amount of high-performance *engineering* left to do in that direction.

Re: git or svn

2010-03-22 Thread Jake Mannix
Dmitriy, The official apache repository (where the committers write to) is the subversion repo. Git is just a clone/read-only mirror. But since you're not writing to either of them, use whichever you are more comfortable working with. :) -jake On Mon, Mar 22, 2010 at 1:03 PM, D L wrote:

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-03-22 Thread Jake Mannix
Hi Sisir, I'm the one who added the most recent SVD stuff to Mahout, and while I'd love to see improvements in that, and incorporation into netflix-style recommenders, I would even more like to see a stacked-RBM implementation if you think you can do that. We don't currently have anything like

Re: [VOTE] Mahout as TLP

2010-03-19 Thread Jake Mannix
+1 -jake

Re: Significance of "name" in AbstractVector

2010-03-18 Thread Jake Mannix
Hi Pallavi, I personally agree that keeping the name as part of the mathematical vector is wrong, because it leads to not only the issues you've brought up, but also means we still have these *4* different ways of saying that two vectors are "the same": ==, equals(), equivalent(), and strictEqui

Re: Significance of "name" in AbstractVector

2010-03-17 Thread Jake Mannix
On Wed, Mar 17, 2010 at 6:14 AM, Jeff Eastman wrote: > Pallavi Palleti wrote: > >> Hi, >> >> Could some one kindly let me know the significance of instance variable >> "name" in AbstractVector? It is causing problems, when I write a vector to >> file and read and compare with the same vector if th

[jira] Commented: (MAHOUT-337) Don't serialize cached length squared in JSON vector representation

2010-03-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845519#action_12845519 ] Jake Mannix commented on MAHOUT-337: Depending on how Pallavi was seeing this com

[jira] Commented: (MAHOUT-337) Don't serialize cached length squared in JSON vector representation

2010-03-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845495#action_12845495 ] Jake Mannix commented on MAHOUT-337: [quote] Yes it's possible to fix this b

Re: [NOMINATION] Sean Owen as Mahout PMC Chair

2010-03-15 Thread Jake Mannix
+1 from over here. On Mon, Mar 15, 2010 at 11:36 AM, Drew Farris wrote: > +1 as well. > > On Mon, Mar 15, 2010 at 2:34 PM, Ted Dunning > wrote: > > Dang. I can only second second it now. > > > > On Mon, Mar 15, 2010 at 11:28 AM, Robin Anil > wrote: > > > >> I second the nomination. > >> > >>

[jira] Commented: (MAHOUT-337) Don't serialize cached length squared in JSON vector representation

2010-03-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845434#action_12845434 ] Jake Mannix commented on MAHOUT-337: So a question about this: do we really want t

Re: [DISCUSS] Mahout TLP Board Resolution

2010-03-15 Thread Jake Mannix
+1 and I'm in (my email @apache is just jmannix btw, for some reason its not listed on those resolutions) On Mar 15, 2010 9:07 AM, "Robin Anil" wrote: I'm in :) :thumbs up: On Mon, Mar 15, 2010 at 8:01 PM, Grant Ingersoll wrote: > Now that 0.3 is almost out and also given discussions over on

Re: A mahout logo Revamp

2010-03-14 Thread Jake Mannix
+1 for RC1 On Sun, Mar 14, 2010 at 8:18 AM, Sean Owen wrote: > Coming in late -- I like RC1 too, and for all the reasons given on this > thread. > > To be honest, I had always thought that was a stylish cap on the > elephant in the original logo, which I liked, but which made no sense > when I t

Re: A mahout logo Revamp

2010-03-13 Thread Jake Mannix
I'm in favor most of 5) - blue color for both the mahout and "mahout", yellow elephant for hadoop, and not with hair because I don't think we should genderify the mahout: machine learning mahouts can be female too! (I'd be ok with arms but no hair, however) -jake On Sat, Mar 13, 2010 at 3:48 PM

Re: [VOTE]: release Mahout 0.3 (resend, I forgot gene...@lucene.apache.org)

2010-03-12 Thread Jake Mannix
Won't vote for it before I get a chance to really play with the artifacts, which I can't do just yet. -jake On Fri, Mar 12, 2010 at 2:33 PM, Benson Margulies wrote: > Sean's problems aside, we're not exactly drowning in +1 votes for the > release here ... >

Re: Release process status report

2010-03-11 Thread Jake Mannix
Benson for President pro-tempore of Mahout Release! +1 On Thu, Mar 11, 2010 at 6:19 PM, Ted Dunning wrote: > Should we vote Benson in? > > On Thu, Mar 11, 2010 at 5:04 PM, Benson Margulies >wrote: > > > 1) Vote someone as release manager. This shifts liability from the person > > to > > the fo

[jira] Updated: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-07 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-322: --- Fix Version/s: (was: 0.3) pulling this out of the track for 0.3 > DistributedRowMatrix sho

[jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842213#action_12842213 ] Jake Mannix commented on MAHOUT-322: It should actually be noted that Danny's

[jira] Resolved: (MAHOUT-314) DistributedRowMatrix needs a sparse DistributedRowMatrix times(DistributedRowMatrix other) implementation

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-314. Resolution: Fixed Fix Version/s: 0.3 Committed. Current implementation is a map-side

[jira] Resolved: (MAHOUT-313) DistributedRowMatrix needs times(Vector) implementation as M/R job

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-313. Resolution: Fixed Fix Version/s: 0.3 Committed, code piggybacks on timesSquared() with a

[jira] Resolved: (MAHOUT-310) LanczosSolver and DistributedLanczosSolver always assume rectangular input, but should also handle symmetric eigensystems.

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-310. Resolution: Fixed Fix Version/s: 0.3 committed > LanczosSolver

[jira] Resolved: (MAHOUT-312) DistributedRowMatrix iterateAll() and iterate() don't work on multi-part SequenceFiles

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-312. Resolution: Fixed Fix Version/s: 0.3 committed > DistributedRowMatrix iterateAll()

Re: svd algorithms

2010-03-05 Thread Jake Mannix
Hi Mike, Welcome to the long journey down the road of dimensional reduction. :) On Fri, Mar 5, 2010 at 5:05 PM, mike bowles wrote: > > Really large matrices require using one of the randomizing methods to get > done. "Require" is a strong term. Really really large (but still sparse!) matric

Re: 0.3 Patches

2010-03-05 Thread Jake Mannix
Ha! Perpetual code freeze to get new features, now there's a concept! Three +1's, ok if I don't get any negative feedback before I get back to a computer, I'll check in. I've just added more wiki pages for me to write, too I guess... -jake On Mar 5, 2010 2:53 PM, "Jeff Eastman" wrote: Robi

Re: Who owns mahout bucket on s3?

2010-03-05 Thread Jake Mannix
On Thu, Mar 4, 2010 at 7:41 AM, Robin Anil wrote: > Based on what i have in mind, the usage will just be > > mahout vectorize -i s3://input -o s3://output -tmp hdfs://file (here, there > is a risk of fixing a exact path and not knowing the hadoop user, I would > have preferred a relative path) >

0.3 Patches

2010-03-05 Thread Jake Mannix
Hey all, Our "flash-freeze" has unthawed considerably, but I've been trying to be good and not check in stuff with functionality improvements I've been wanting to check in. What do you folks say about me checking in fixes for MAHOUT-312 DistributedRowMatrix iterateAll() and iterate() don't wo

[jira] Resolved: (MAHOUT-315) VectorDumper should also do printing to simple {index : value, index : value, ... } output, if no dictionary is specified.

2010-03-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-315. Resolution: Fixed Fix Version/s: (was: 0.4) 0.3 Committed

Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix
Er... I'm not sure I even had 3 "Yes/No" questions, Ted... On Thu, Mar 4, 2010 at 9:43 AM, Ted Dunning wrote: > Yes. > > Maybe. > > Yes. > > On Thu, Mar 4, 2010 at 9:33 AM, Jake Mannix wrote: > > > Ok, you're just saying that you can *hav

[jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841403#action_12841403 ] Jake Mannix commented on MAHOUT-322: "not continuous" also meaning unbo

Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix
). -jake > > On Thu, Mar 4, 2010 at 8:59 AM, Jake Mannix wrote: > > > On Thu, Mar 4, 2010 at 8:54 AM, Ted Dunning > wrote: > > > > > I haven't examined the out-of-core scenarios at all, but in-memory, it > is > > > possible to have labels

Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix
On Thu, Mar 4, 2010 at 8:54 AM, Ted Dunning wrote: > I haven't examined the out-of-core scenarios at all, but in-memory, it is > possible to have labels with no performance cost if you assume add the > constraint that labeled matrices are only conformable if they share the > identical label dicti

[jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841382#action_12841382 ] Jake Mannix commented on MAHOUT-322: Meaning what, Robin? We can certainly com

[jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

2010-03-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841371#action_12841371 ] Jake Mannix commented on MAHOUT-322: The implementation of some of the orig

The new improved command-line: MahoutDriver (get it?)

2010-03-02 Thread Jake Mannix
Hey all, Just an update on the new-and-improved command-line "UI" we have now. After a ton of iterations back and forth with Drew (thanks!), MAHOUT-301 has been committed, and brings with it the easy ability to trim down your long long command lines for most of our *Driver main() methods, by sa

[jira] Updated: (MAHOUT-311) Update assemblies to include components of launcher script from MAHOUT-301

2010-03-02 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-311: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed > Update assemblies

[jira] Resolved: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-03-02 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-301. Resolution: Fixed Checked in a version of this which works, not sure if it had the most updated

Re: Assign hack slowdown

2010-03-02 Thread Jake Mannix
On Tue, Mar 2, 2010 at 5:21 AM, Sean Owen wrote: > I'll have a look there. May be worth piling in one more little thing > like this in the 'code freeze'. > > Incidentally Hadoop announced version 0.20.2 a few days ago -- still > looking for it on Maven but I will be starting up our release proces

Re: Assign hack slowdown

2010-03-02 Thread Jake Mannix
Adding a skipZero() method to all the functions is probably better here, because that will be faster than an instanceof check, and easier to document than other interfaces. On Tue, Mar 2, 2010 at 1:22 AM, Sean Owen wrote: > How about merely a flag/method on BinaryFunction / UnaryFunction? or > D

Re: [jira] Created: (MAHOUT-315) VectorDumper should also do printing to simple {index : value, index : value, ... } output, if no dictionary is specified.

2010-03-01 Thread Jake Mannix
does this I think, though it has not yet been wired into > > ClusterDumper.printClusters. I wanted to give the ClusterDumper users a > > chance to critique my formatting but it is like the below. > > > > Jeff > > > > > > > > Jake Mannix (JIRA) wrot

[jira] Updated: (MAHOUT-310) LanczosSolver and DistributedLanczosSolver always assume rectangular input, but should also handle symmetric eigensystems.

2010-03-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-310: --- Attachment: MAHOUT-lots.diff I hope we get this release out soon, I've got a giant pile of

[jira] Created: (MAHOUT-319) SVD solvers should be gracefully stoppable/restartable

2010-03-01 Thread Jake Mannix (JIRA)
: 0.3 Reporter: Jake Mannix Assignee: Jake Mannix Fix For: 0.4 LanczosSolver, DistributedLanczosSolver, and HebbianSolver all keep copious amounts of memory-resident data which is lost if the app crashes or is killed (OOM, forgetting to run in a screen

Re: Who owns mahout bucket on s3?

2010-02-28 Thread Jake Mannix
ar 1, 2010 at 8:56 AM, Jake Mannix wrote: > What's the final size...

Re: Who owns mahout bucket on s3?

2010-02-28 Thread Jake Mannix
What's the final size of the vectoized output? -jake On Feb 28, 2010 6:47 PM, "Robin Anil" wrote: Finally some good news tried with cloudera 4 node c1.medium on 6 GB compressed(26GB uncompressed wikipeda) org.apache.mahout.text.SparseVectorsFromSequenceFiles -i wikipedia/ -o wikipedia-unigra

[jira] Created: (MAHOUT-316) CardinalityException and IndexException should remove the default constructor, and always construct with arguments saying what the error was

2010-02-28 Thread Jake Mannix (JIRA)
Key: MAHOUT-316 URL: https://issues.apache.org/jira/browse/MAHOUT-316 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.2 Reporter: Jake Mannix Fix For: 0.4 CardinalityException already has

[jira] Created: (MAHOUT-315) VectorDumper should also do printing to simple {index : value, index : value, ... } output, if no dictionary is specified.

2010-02-28 Thread Jake Mannix (JIRA)
URL: https://issues.apache.org/jira/browse/MAHOUT-315 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Jake Mannix Assignee: Jake Mannix Fix For: 0.4 I've got a patch for this, tied up in other code. --

[jira] Created: (MAHOUT-314) DistributedRowMatrix needs a sparse DistributedRowMatrix times(DistributedRowMatrix other) implementation

2010-02-28 Thread Jake Mannix (JIRA)
/jira/browse/MAHOUT-314 Project: Mahout Issue Type: New Feature Affects Versions: 0.3 Reporter: Jake Mannix Assignee: Jake Mannix If the matrix which is being multiplied by has been transformed into a column-sparse matrix backed by a SequenceFile

[jira] Created: (MAHOUT-313) DistributedRowMatrix needs times(Vector) implementation as M/R job

2010-02-28 Thread Jake Mannix (JIRA)
Feature Affects Versions: 0.3 Reporter: Jake Mannix Assignee: Jake Mannix pretty self-explanatory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-312) DistributedRowMatrix iterateAll() and iterate() don't work on multi-part SequenceFiles

2010-02-28 Thread Jake Mannix (JIRA)
T-312 Project: Mahout Issue Type: Bug Affects Versions: 0.3 Reporter: Jake Mannix Assignee: Jake Mannix DistributedRowMatrixIterator does not properly handle file glob paths of the various part-0 files. -- This message is automatically generated by JIRA. - Yo

Re: Who owns mahout bucket on s3?

2010-02-28 Thread Jake Mannix
I thought you were doing the secondary sort idea? That's certainly the way to make sure you need nothing significant kept in memory, and this clearly won't scale without that optimization... I'd say this should get fixed before we release 0.3 -jake On Sun, Feb 28, 2010 at 7:30 AM, Drew Farris

Re: Who owns mahout bucket on s3?

2010-02-27 Thread Jake Mannix
t; up in like 4 hours. So bye > > Robin > > > On Sun, Feb 28, 2010 at 3:57 AM, Robin Anil wrote: > > > like i said only 5 mil articles. Maybe you can generate a co-occurrence > > matrix :) every ngram to every other ngram :) Sounds fun? It will be > HUGE! > &

Re: Who owns mahout bucket on s3?

2010-02-27 Thread Jake Mannix
ly 5 mil articles. Maybe you can generate a co-occurrence > matrix :) every ngram to every other ngram :) Sounds fun? It will be HUGE! > > > On Sun, Feb 28, 2010 at 3:43 AM, Jake Mannix > wrote: > > > 15GB of tokenized documents, not bad, not bad. We're not going > >

Re: Who owns mahout bucket on s3?

2010-02-27 Thread Jake Mannix
15GB of tokenized documents, not bad, not bad. We're not going to get a multi-billion entry matrix out of this though, are we? -jake On Sat, Feb 27, 2010 at 2:06 PM, Robin Anil wrote: > Update: > > in 20 mins the tokenization stage is complete But its not evident in the > online UI. > I foun

Re: Who owns mahout bucket on s3?

2010-02-27 Thread Jake Mannix
t; On Sun, Feb 28, 2010 at 3:04 AM, Jake Mannix > wrote: > > > Er, the one you posted! > > > > > > > > > http://mahout-wikipedia.s3.amazonaws.com/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] > > > > < > > > http://mahout-wikipedia.s3.amazo

Re: Who owns mahout bucket on s3?

2010-02-27 Thread Jake Mannix
you tried > > > On Sun, Feb 28, 2010 at 2:59 AM, Jake Mannix > wrote: > > > Hey Robin, that http url gives me a permission denied response... I'm not > > too S3 savvy, not sure if I'm checking on it right... > > > > On Sat, Feb 27, 2010 at 12:40 PM, Robin

  1   2   3   4   5   6   >