VectorWritable bug et al

2010-02-11 Thread Sean Owen
I'm writing up an appendix on Vector and Matrix. In the course of this, I noticed a big problem with VectorWrtiable. It is pretty glaringly un-thread-safe. It caches, in a static member, the class of the vector to be read. The read method is not synchronized. Oops. Synchronization fixes this, but

[jira] Commented: (MAHOUT-236) Cluster Evaluation Tools

2010-02-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832615#action_12832615 ] Grant Ingersoll commented on MAHOUT-236: I don't have any, but should be pretty eas

Re: VectorWritable bug et al

2010-02-11 Thread Jake Mannix
This code was copied right out of AbstractVector, I never really understood why we had to do that caching. On Thu, Feb 11, 2010 at 9:10 AM, Sean Owen wrote: > I'm writing up an appendix on Vector and Matrix. In the course of > this, I noticed a big problem with VectorWrtiable. It is pretty > gla

Re: VectorWritable bug et al

2010-02-11 Thread Jake Mannix
As a side note: for this appendix, we've got lots more stuff coming down the pipe regarding distributed / HDFS-backed matrices too, which is going to be pretty critical to be covered in this appendix (see latest patches for MAHOUT-180). On Thu, Feb 11, 2010 at 9:10 AM, Sean Owen wrote: > I'm wri

[jira] Assigned: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-185: -- Assignee: Grant Ingersoll > Add mahout shell script for easy launching of various algor

[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832626#action_12832626 ] Grant Ingersoll commented on MAHOUT-185: Looks like a good start. Longer term, we

[jira] Updated: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-185: --- Affects Version/s: (was: 0.2) Fix Version/s: (was: 0.4)

[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832661#action_12832661 ] Grant Ingersoll commented on MAHOUT-185: Committed revision 909120. > Add mahout s

Re: VectorWritable bug et al

2010-02-11 Thread Sean Owen
On Thu, Feb 11, 2010 at 6:37 PM, Jake Mannix wrote: > Why would the sparse representation be the only way to represent it > on disk?  It's nearly twice as big as the dense form for dense vectors > (ok, 50% bigger). On disk (well, in any serialized form) you just have key-value, key-value pairs in

Re: VectorWritable bug et al

2010-02-11 Thread Ted Dunning
On Thu, Feb 11, 2010 at 11:51 AM, Sean Owen wrote: > On Thu, Feb 11, 2010 at 6:37 PM, Jake Mannix > wrote: > > Why would the sparse representation be the only way to represent it > > on disk? It's nearly twice as big as the dense form for dense vectors > > (ok, 50% bigger). > > On disk (well, i

Re: VectorWritable bug et al

2010-02-11 Thread Jake Mannix
On Thu, Feb 11, 2010 at 11:51 AM, Sean Owen wrote: > On Thu, Feb 11, 2010 at 6:37 PM, Jake Mannix > wrote: > > Where do we actually use the VectorWritable.readVector() static > > method? > > Looks like it's used in about 16 places across the code. > We should remove them, I think. I'm pretty s

Re: VectorWritable bug et al

2010-02-11 Thread Drew Farris
+1 to eliminating the statics, they are indeed evil. The type to read should be stored in the thing doing/facilitating the reading not the vector itself and definitely not in a static field. Pretty sure vector shouldn't be facilitating the reading of itself. No need for synchronization then. The st

Freq. Pattern Mining page?

2010-02-11 Thread Grant Ingersoll
Robin, Any chance you could add a page on FPM on http://cwiki.apache.org/MAHOUT/algorithms.html? I'm trying to find out more about it, but don't see much for documentation. Thanks, Grant

Re: VectorWritable bug et al

2010-02-11 Thread Ted Dunning
Seems like Avro is a great way to manage this enum (as in, we don't have to think about it). On Thu, Feb 11, 2010 at 12:31 PM, Drew Farris wrote: > +1 to eliminating class names in serializations (this is especially > bad when an efficiently managed enum can do the job) > -- Ted Dunning, CTO

Re: VectorWritable bug et al

2010-02-11 Thread Drew Farris
On Thu, Feb 11, 2010 at 3:37 PM, Ted Dunning wrote: > Seems like Avro is a great way to manage this enum (as in, we don't have to > think about it). Yes, I hope so. Now that the dictionary vectorizer/n-gram integration is complete I will be getting back to that. Drew

Re: VectorWritable bug et al

2010-02-11 Thread Jake Mannix
Sure, but we're not doing Avro for 0.3, so we should probably at least fix this in some minimal way before another release. -jake On Thu, Feb 11, 2010 at 12:37 PM, Ted Dunning wrote: > Seems like Avro is a great way to manage this enum (as in, we don't have to > think about it). > > On Thu, F

Re: VectorWritable bug et al

2010-02-11 Thread Drew Farris
On Thu, Feb 11, 2010 at 3:40 PM, Jake Mannix wrote: > Sure, but we're not doing Avro for 0.3, so we should probably at least fix > this in some minimal way before another release. Agreed. I'm wondering -- and this would probably be pretty obvious if I just looked at the code (sorry!) -- are the

[jira] Commented: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-02-11 Thread Edward J. Yoon (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832813#action_12832813 ] Edward J. Yoon commented on MAHOUT-180: --- Hi, Quick question. It works using M/R iter

[jira] Commented: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-02-11 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832824#action_12832824 ] Jake Mannix commented on MAHOUT-180: Yes. Multiplication of a matrix (or the square of

logging? log4j?

2010-02-11 Thread Drew Farris
Hi All, java.util.logging is really getting me down - I never really paid much attention to it because I've always used log4j in the past, but it looks like it can't do things like change the format of the logs using a config file, do mapped diagnostic contextes, etc.. Does anyone have any issue