DenseMatrix and [][]

2009-05-29 Thread Benson Margulies
Folks, In my experience, you can save a good deal of memory and some time by implementing a 2d matrix as a single vector and some multiplication, instead of the [][] resulting in a vector of pointers to vectors. Would a patch for this purpose be viewed as helpful, or was my experience anomalous?

Canopy implementation

2009-05-29 Thread Benson Margulies
I've looked at the implementation of Canopy in DisplayKMeans, and then tried to compare it to the MapReduce version. I'm sure that the simple version in DisplayKMeans has the potential to loop indefinitely. I can't prove that the problem exists in the real map/reduce code one way or the other. Th

Re: Text pipeline, or what comes before (ids on vectors)?

2009-05-29 Thread Ted Dunning
This is a great thing to have. In general, I think that rows and columns of matrices should be labeled. A vector is either a row or a column and thus it should have a label and its elements should have labels. It should not cost you anything if you don't want labels and I do. Tracing back to a

Text pipeline, or what comes before (ids on vectors)?

2009-05-29 Thread Benson Margulies
I think I have a grip on this, but I'm not quite sure. Please forgive me if I'm confused and this belongs back on 'users'. Drivers like the K-Means driver eat vectors and emit clusters of vectors. The vectors don't carry any sort of field that can be used to back-associate them with documents or o

[jira] Commented: (MAHOUT-129) Kmeans sample does not expose numIterations control from KMeansDriver

2009-05-29 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714632#action_12714632 ] Jeff Eastman commented on MAHOUT-129: - The KMeansDriver numCentroids argument is incorr

[jira] Resolved: (MAHOUT-129) Kmeans sample does not expose numIterations control from KMeansDriver

2009-05-29 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman resolved MAHOUT-129. - Resolution: Fixed Rename completed > Kmeans sample does not expose numIterations control from KM

Re: warnings

2009-05-29 Thread Benson Margulies
if it is Class, we need then an instance of Class> and I don't know how to get one. On May 29, 2009, at 3:36 PM, Ted Dunning wrote: Probably needs Class WHat mumble should be is tricky to say without looking at the code. Object is a candidate. On Fri, May 29, 2009 at 6:16 AM, Sean Owen

[jira] Updated: (MAHOUT-127) Remove warnings

2009-05-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-127: - Priority: Minor (was: Major) Affects Version/s: 0.2 Fix Version/s: 0.2

[jira] Updated: (MAHOUT-127) Remove warnings

2009-05-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-127: - Attachment: warnings.patch How about this patch? removed the serialVersionUID and tweaked what appeared t

[jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714604#action_12714604 ] Benson Margulies commented on MAHOUT-127: - OK, I'm convinced that I should turn off

Re: [jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Sean Owen
I can't find any documentation that 1L is a special value, that prompts the JVM to go construct its own value. I find some posts to the contrary too, like http://www.nabble.com/serialVersionUID-td23001300.html and https://jira.jboss.org/jira/browse/SECURITY-341 Setting it to 1L is worse than setti

Re: warnings

2009-05-29 Thread Sean Owen
"?" and "? extends Object" should be equivalent and yeah nothing seems to work for any value of mumble. On Fri, May 29, 2009 at 8:36 PM, Ted Dunning wrote: > Probably needs Class

Re: warnings

2009-05-29 Thread Ted Dunning
Probably needs Class WHat mumble should be is tricky to say without looking at the code. Object is a candidate. On Fri, May 29, 2009 at 6:16 AM, Sean Owen wrote: > Anyway, yeah if you make the type parameter Class, then it > complains that, essentially, Class.class is not of this type, which >

Re: maven-shade-plugin

2009-05-29 Thread Benson Margulies
The problem with the ant stuff is that it is a lot of work for an end-user to copy and incorporate outside of the mahout tree. So, to make the example more exemplary, shade is attractive. On Fri, May 29, 2009 at 12:56 PM, Grant Ingersoll wrote: > Feel free to submit a patch. The ant approach wor

[jira] Updated: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-128: Attachment: pom.diff Here's the right patch. > maven parent not included in build > --

[jira] Updated: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-128: Attachment: (was: pom.diff) > maven parent not included in build >

[jira] Updated: (MAHOUT-129) Kmeans sample does not expose numIterations control from KMeansDriver

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-129: Component/s: (was: Collaborative Filtering) Clustering Summary:

[jira] Updated: (MAHOUT-129) Kmeans sample does not expose numIterations control from KMeansDriver

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-129: Description: The KMeans driver forces the numReduceTasks parameter of KMeans to 1, and ther

[jira] Commented: (MAHOUT-129) kmeans sample makes one cluster

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714518#action_12714518 ] Benson Margulies commented on MAHOUT-129: - This patch isn't quite right, as follows

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714515#action_12714515 ] Grant Ingersoll commented on MAHOUT-126: Shashikant, Couple of comments on the Luc

[jira] Updated: (MAHOUT-129) kmeans sample makes one cluster

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-129: Attachment: kmeans-patch.diff > kmeans sample makes one cluster > -

[jira] Created: (MAHOUT-129) kmeans sample makes one cluster

2009-05-29 Thread Benson Margulies (JIRA)
kmeans sample makes one cluster --- Key: MAHOUT-129 URL: https://issues.apache.org/jira/browse/MAHOUT-129 Project: Mahout Issue Type: Bug Components: Collaborative Filtering Affects Versions: 0.2

[jira] Commented: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714510#action_12714510 ] Benson Margulies commented on MAHOUT-128: - yes the problem is that the pom isn't in

Re: maven-shade-plugin

2009-05-29 Thread Grant Ingersoll
Feel free to submit a patch. The ant approach works for now, but I'm open to better ways of doing it. On May 29, 2009, at 12:01 PM, Benson Margulies wrote: http://maven.apache.org/plugins/maven-shade-plugin/ I post an example of this once I get through the puzzle of how to map input to a

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714509#action_12714509 ] Grant Ingersoll commented on MAHOUT-126: So just kind of brainstorming here, but I

[jira] Commented: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714499#action_12714499 ] Grant Ingersoll commented on MAHOUT-128: I'm not following what this does. When I

Re: maven-shade-plugin

2009-05-29 Thread Benson Margulies
http://maven.apache.org/plugins/maven-shade-plugin/ I post an example of this once I get through the puzzle of how to map input to a SparseVector instead of a vector. On Fri, May 29, 2009 at 11:57 AM, Grant Ingersoll wrote: > Link? > > > On May 29, 2009, at 11:51 AM, Benson Margulies wrote: > >

Re: maven-shade-plugin

2009-05-29 Thread Grant Ingersoll
Link? On May 29, 2009, at 11:51 AM, Benson Margulies wrote: Any reason not to use maven-shade-plugin to make jobs?

maven-shade-plugin

2009-05-29 Thread Benson Margulies
Any reason not to use maven-shade-plugin to make jobs?

[jira] Updated: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-128: Attachment: pom.diff > maven parent not included in build > ---

[jira] Created: (MAHOUT-128) maven parent not included in build

2009-05-29 Thread Benson Margulies (JIRA)
maven parent not included in build -- Key: MAHOUT-128 URL: https://issues.apache.org/jira/browse/MAHOUT-128 Project: Mahout Issue Type: Bug Reporter: Benson Margulies The maven parent isn't inclu

[jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714487#action_12714487 ] Benson Margulies commented on MAHOUT-127: - Are you sure? Eclipse claims that coding

Re: [jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Sean Owen
By default ID do you mean no serialVersionUID? Then I agree. Setting any particular value though would introduce the concern I mentioned. And fixing it to a particular vale like 1 would be even more problematic. On May 29, 2009 4:11 PM, "Benson Margulies (JIRA)" wrote: [ https://issues.apach

[jira] Updated: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-127: Attachment: warnings.diff Here is a patch with less gratuitous changes. Perhaps few enough?

[jira] Updated: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-127: Attachment: (was: warnings.diff) > Remove warnings > > >

[jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714480#action_12714480 ] Benson Margulies commented on MAHOUT-127: - The cast changes were overuse of the Ecl

[jira] Commented: (MAHOUT-127) Remove warnings

2009-05-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714477#action_12714477 ] Sean Owen commented on MAHOUT-127: -- - I like the changes you mentioned, like @SuppressWarn

[jira] Updated: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-127: Attachment: warnings.diff > Remove warnings > > > Key: MA

[jira] Created: (MAHOUT-127) Remove warnings

2009-05-29 Thread Benson Margulies (JIRA)
Remove warnings Key: MAHOUT-127 URL: https://issues.apache.org/jira/browse/MAHOUT-127 Project: Mahout Issue Type: Bug Reporter: Benson Margulies The patch I'm about to attach gets rid of all the current yellow

Re: warnings

2009-05-29 Thread Sean Owen
Ah right. I am not sure why I don't see a warning on that anymore in IntelliJ 9. Anyway, yeah if you make the type parameter Class, then it complains that, essentially, Class.class is not of this type, which doesn't make sense to me. I agree, it is worth @SuppressWarnings. Send over your patch an

Re: warnings

2009-05-29 Thread Benson Margulies
Eclipse always warns for a naked use of Class. So, Eclipse thinks, more or less, that any time you might want to say X, that you really meant to say X>, or X>, which is in some cases the same sort of beast. However, this is not one of those cases. I think that an @SuppressWarnings is called for. O

Re: warnings

2009-05-29 Thread Sean Owen
I don't think so... that class doesn't need a new parameter. It is an AbstractParameter, parameterized by Class, already. Actually, I opened this up again in IntelliJ and no longer see the unchecked-cast sort of warnings I remember on this code. I certainly recall something like that here. What's

Re: warnings

2009-05-29 Thread Benson Margulies
Should that class really be ClassParameter instead of just ClassParameter? On Fri, May 29, 2009 at 8:13 AM, Sean Owen wrote: > I personally would be very into it. I use IntelliJ and yeah it yells > about a whole lot of the same stuff every time I open a file. > > I also cannot figure out what to

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-126: Attachment: mahout-126-benson.patch Improved patch. Allows specification of file character

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714426#action_12714426 ] Benson Margulies commented on MAHOUT-126: - This patch needs to explicitly manage th

Re: warnings

2009-05-29 Thread Sean Owen
I personally would be very into it. I use IntelliJ and yeah it yells about a whole lot of the same stuff every time I open a file. I also cannot figure out what to do with that Class line! On Fri, May 29, 2009 at 1:11 PM, Benson Margulies wrote: > Can I interest you in a patch to zero out the ec

warnings

2009-05-29 Thread Benson Margulies
Can I interest you in a patch to zero out the eclipse 3.4 warning count? This would involve deleting some unused variables, removing some unnecesssary @SuppressWarning's, and in one case (ClassParameter) adding an @SupressWarning, since I can't for the life me figure out how to make that into clas

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714414#action_12714414 ] Grant Ingersoll commented on MAHOUT-126: See SOLR-1193. > Prepare document vecto

[jira] Assigned: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-126: -- Assignee: Grant Ingersoll > Prepare document vectors from the text > --

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714401#action_12714401 ] Grant Ingersoll commented on MAHOUT-126: Passing in a way to make a custom weight o

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread David Hall (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714362#action_12714362 ] David Hall commented on MAHOUT-126: --- Sure, I just want to be able to have: double weig

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714356#action_12714356 ] Shashikant Kore commented on MAHOUT-126: David, Sorry, I don't have any background

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Kore updated MAHOUT-126: --- Attachment: MAHOUT-126.patch Patch to create index and document vectors from text. > Prepare

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread David Hall (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714348#action_12714348 ] David Hall commented on MAHOUT-126: --- I actually need something like this as well for LDA,

[jira] Created: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Shashikant Kore (JIRA)
Prepare document vectors from the text -- Key: MAHOUT-126 URL: https://issues.apache.org/jira/browse/MAHOUT-126 Project: Mahout Issue Type: New Feature Reporter: Shashikant Kore Clustering al