Re: Hadoop upgrade

2009-03-18 Thread Grant Ingersoll
D'oh! Thanks! On Mar 18, 2009, at 4:32 AM, Sean Owen wrote: (I upgraded to 0.19.1 last week.) On Tue, Mar 17, 2009 at 10:41 PM, Grant Ingersoll gsing...@apache.org wrote: OK, pending MAHOUT-110, I think we are good to go on the release. Not sure who volunteered to upgrade Hadoop, so go

[jira] Commented: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682980#action_12682980 ] Sean Owen commented on MAHOUT-110: -- I say go for it. I will merge with my patch locally

[jira] Commented: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682983#action_12682983 ] Grant Ingersoll commented on MAHOUT-110: Will do. Thanks for the kick in the pants

[jira] Resolved: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-99. --- Resolution: Fixed Fix Version/s: 0.1 Committed revision 755548. Thanks! Improving

[jira] Updated: (MAHOUT-111) Redirect Test output to file

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-111: --- Affects Version/s: (was: 0.2) 0.1 Redirect Test output to file

[jira] Resolved: (MAHOUT-111) Redirect Test output to file

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-111. Resolution: Fixed Fix Version/s: 0.1 Fixed Redirect Test output to file

Re: Concerns about Maven

2009-03-18 Thread Grant Ingersoll
On Mar 17, 2009, at 9:06 AM, Enis Soztutar wrote: -Grant Knowing nothing about the mahout build script(s), I think that having both ant and maven scripts might prove to be problematic. However keeping one module(taste) in ant will work. As a side note, we have discussed this same thing

Thoughts on ...

2009-03-18 Thread Grant Ingersoll
http://lingpipe-blog.com/2009/03/12/speeding-up-k-means-clustering-algebra-sparse-vectors/ -Grant

[jira] Resolved: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-110. Resolution: Fixed Committed Ant script for building Taste web app

Re: Concerns about Maven

2009-03-18 Thread Enis Soztutar
Grant Ingersoll wrote: On Mar 17, 2009, at 9:06 AM, Enis Soztutar wrote: -Grant Knowing nothing about the mahout build script(s), I think that having both ant and maven scripts might prove to be problematic. However keeping one module(taste) in ant will work. As a side note, we have

Dirchlet Job example

2009-03-18 Thread Grant Ingersoll
Hey Jeff, Is it appropriate to have a Job example like we do for k-means and some of the other clustering algorithms for dirichlet? I see you do have some type of UI in there, right?Are there directions somewhere for running the example?

Re: Thoughts on ...

2009-03-18 Thread Jeff Eastman
Interesting optimization. We can incorporate it by adding a centroid^2 argument to DistanceMeasure interface and adjusting the affected clustering algorithms. All would benefit from this optimization. I will build a test to assess its impact and report. Jeff Grant Ingersoll wrote:

Re: [jira] Reopened: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Did you reopen this issue because of this error? I just ran the example and it ran without error. Jeff Grant Ingersoll (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-99:

Re: Dirchlet Job example

2009-03-18 Thread Jeff Eastman
Not only appropriate but essential. I will add a README file in the code and instructions in the wiki today. Jeff Grant Ingersoll wrote: Hey Jeff, Is it appropriate to have a Job example like we do for k-means and some of the other clustering algorithms for dirichlet? I see you do have

Re: Dirchlet Job example

2009-03-18 Thread Otis Gospodnetic
Yeah, I was wondering about that simple, but nice cluster-showing UI... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll gsing...@apache.org To: mahout-dev@lucene.apache.org Sent: Wednesday, March 18, 2009 12:01:28 PM

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
I'm running the example in Eclipse using the stand-alone mode in the hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in Eclipse. I cannot; however, get any hadoop stuff to work from the command line. Even though my JAVA_HOME environment is set to /Library/Java/Home and java

mvn package tar file issue

2009-03-18 Thread Otis Gospodnetic
Hi, Am I the only person getting the following after mvn package? [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Failed to create assembly: Error creating

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll
On my Mac, I have: $ echo $JAVA_HOME /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home -Grant On Mar 18, 2009, at 2:10 PM, Jeff Eastman wrote: I'm running the example in Eclipse using the stand-alone mode in the hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683140#action_12683140 ] Grant Ingersoll commented on MAHOUT-99: --- I seem to recall hitting something similar

Re: mvn package tar file issue

2009-03-18 Thread Otis Gospodnetic
Yes, at the top. Bad? Doing it from core worked. How come it doesn't work from root and should it, at least for 0.2? WOuld be more intuitive, no? Otis - Original Message From: Grant Ingersoll gsing...@apache.org To: mahout-dev@lucene.apache.org Sent: Wednesday, March 18, 2009

Taste: user's neighbours and their similarity

2009-03-18 Thread Otis Gospodnetic
Hi, Is there a way to get a collection of neighbours for a given user? I'm referring to the same neighbour collection that recommendations are derived from. I didn't see a way, so I simply made NearestNUserNeighborhood.Estimator public (diff below), so I could do something like this:

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683170#action_12683170 ] Sean Owen commented on MAHOUT-103: -- 1. How do you feel about, therefore, changing to use

Re: Taste: user's neighbours and their similarity

2009-03-18 Thread Sean Owen
How about the method UserBasedRecommender.mostSimilarUsers()? or a bit more directly, UserNeighborhood.getUserNeighborhood()? (They are arguably kind of redundant but it's 'for historical reasons' and low on my list of design sins.) These in turn largely use TopItems.getTopUsers() and you

Re: Packaging step taking forever... is this right?

2009-03-18 Thread Sean Owen
Took me ~15 minutes the first time, 5 minutes subsequent times. Yeah it still seems long, and does seem like something is amiss, but if it works it seems OK for now. On Wed, Mar 18, 2009 at 9:52 PM, Jeff Eastman j...@windwardsolutions.com wrote: [WARNING] Entry:

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683232#action_12683232 ] Ted Dunning commented on MAHOUT-103: 1. How do you feel about, therefore, changing to

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683238#action_12683238 ] Sean Owen commented on MAHOUT-103: -- The comparison would be to Item. You could say that's

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683251#action_12683251 ] Sean Owen commented on MAHOUT-103: -- The comparison would be to Item. You could say that's

[jira] Commented: (MAHOUT-59) Create some examples of clustering well-known datasets

2009-03-18 Thread Richard Tomsett (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683254#action_12683254 ] Richard Tomsett commented on MAHOUT-59: --- Ugh, I had an example almost done but managed

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683297#action_12683297 ] Pallavi Palleti commented on MAHOUT-99: --- Yup. That must be the issue. But I am

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683312#action_12683312 ] Pallavi Palleti commented on MAHOUT-99: --- I have used KeyValueLineRecordReader

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
The Synthetic Control kMeans job calls the Canopy job to build its initial clusters as is commonly done. If the kMeans record format was changed and the Canopy not changed accordingly, then everything would still compile but there would be a mismatch when the kMeans mapper tried to read in the

RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
Yeah. But, I am wondering how the testcases succeeded? I ran them using mvn clean install command. Thanks Pallavi -Original Message- From: Jeff Eastman [mailto:j...@windwardsolutions.com] Sent: Thursday, March 19, 2009 9:56 AM To: mahout-dev@lucene.apache.org Subject: Re: [jira]

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Sure, why don't you go ahead and post a patch? Pallavi Palleti (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683312#action_12683312 ] Pallavi Palleti commented on MAHOUT-99:

RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
It depends on the kind of output. If we are just outputting only some numeric values then it is preferred to have SequenceFile as the data is written as binary. If not, it is preferred to write as simple text. Text file is readable where as binary is not readable. As we consider the data as