[jira] Resolved: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil resolved MAHOUT-285. --- Resolution: Fixed Committed. More or less working. Haven't tested against large dataset just to make

[jira] Assigned: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil reassigned MAHOUT-285: - Assignee: Robin Anil > Wrap up collocation and dictionary vectorizer integration > --

[jira] Updated: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-285: --- Attachment: MAHOUT-285.patch Robin got the bulk of this done yesterday night, reviewed his changes an

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-02-10 Thread Rohini Uppuluri (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832380#action_12832380 ] Rohini Uppuluri commented on MAHOUT-153: Hi all, I have implemented an extension

Re: possible bug in org.apache.mahout.cf.taste.hadoop.item.RecommenderMapper class

2010-02-10 Thread Sean Owen
Good catch, I will look at this more tonight but I am pretty certain that you are correct. I will commit a fix soon if applicable. On Wed, Feb 10, 2010 at 9:27 PM, Guohua Hao wrote: > Hello All, > > When I studied the code on the trunk, I was wondering that on line 130 in > the class org.apache.m

[jira] Commented: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832263#action_12832263 ] Robin Anil commented on MAHOUT-285: --- Success. I just finished the integration of Dictiona

[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832239#action_12832239 ] Jake Mannix commented on MAHOUT-185: Why don't we just commit the shell script and clos

possible bug in org.apache.mahout.cf.taste.hadoop.item.RecommenderMapper class

2010-02-10 Thread Guohua Hao
Hello All, When I studied the code on the trunk, I was wondering that on line 130 in the class org.apache.mahout.cf.taste.hadoop.item.RecommenderMapper, shall we use the condition userVector.get(index) == 0.0 instead? My understanding is that only the item which is not rated by the user (i.e.,

Re: Twister: Iterative MapReduce

2010-02-10 Thread Ted Dunning
Applicable for tiny clusters only. There is no fault tolerance and all data is streamed from map to reduce. There is also no distributed store (they are depending on NFS or local data copies). That is highly effective for algorithms like k-means on small clusters which are I/O bound. Small clus

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Jake Mannix
I actually want to try and see how much runs on Amazon EMR (0.18.3*), as that would be good to document. I like running on 0.20 better, and I certainly think we should recommend people use it, but there are certainly some jobs which simply won't run on 0.18, although it would be good to document w

Re: Some more dependencies

2010-02-10 Thread Ted Dunning
Right. On Wed, Feb 10, 2010 at 10:45 AM, Robin Anil wrote: > On Thu, Feb 11, 2010 at 12:10 AM, Ted Dunning > wrote: > > > The use of MapMaker should probably be updated to use the same object > from > > google collections (which is now in guava). > > > So without watchmaker making that change n

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Ted Dunning
+1 from me even though I am still on 19 at work. On Wed, Feb 10, 2010 at 3:53 AM, Isabel Drost wrote: > On Wed Sean Owen wrote: > > > I'd say we recommend 0.20, since that's what we develop against and > > it's the current stable release, and everything we have works on it. > > > > We can also

Re: Some more dependencies

2010-02-10 Thread Robin Anil
On Thu, Feb 11, 2010 at 12:10 AM, Ted Dunning wrote: > The use of MapMaker should probably be updated to use the same object from > google collections (which is now in guava). > So without watchmaker making that change nothing could be done right > > On Wed, Feb 10, 2010 at 9:27 AM, Robin Anil

Re: Some more dependencies

2010-02-10 Thread Ted Dunning
The use of MapMaker should probably be updated to use the same object from google collections (which is now in guava). On Wed, Feb 10, 2010 at 9:27 AM, Robin Anil wrote: > Kicked the two files out. We can always bring it back as its in the > repository > > > > On Wed, Feb 10, 2010 at 10:56 PM, J

Fwd: Twister: Iterative MapReduce

2010-02-10 Thread Robin Anil
Well, Things seems to be heating up. We better start refactoring :) Robin -- Forwarded message -- From: Jaliya Ekanayake Date: Wed, Feb 10, 2010 at 11:37 PM Subject: Twister: Iterative MapReduce To: common-...@hadoop.apache.org Hi All, We would like to announce the first op

[jira] Commented: (MAHOUT-227) Parallel SVM

2010-02-10 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832089#action_12832089 ] Ted Dunning commented on MAHOUT-227: Zhao, My thought is that having a good sequentia

Re: Some more dependencies

2010-02-10 Thread Robin Anil
Kicked the two files out. We can always bring it back as its in the repository On Wed, Feb 10, 2010 at 10:56 PM, Jeff Eastman wrote: > Robin Anil wrote: > >> any more +1s ? >> >> >> > +1 keep Mahout as unentangled as possible >

Re: Some more dependencies

2010-02-10 Thread Jeff Eastman
Robin Anil wrote: any more +1s ? +1 keep Mahout as unentangled as possible

[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms

2010-02-10 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-281: Status: Patch Available (was: Open) > scm urls are wrong in the poms > ---

[jira] Updated: (MAHOUT-281) scm urls are wrong in the poms

2010-02-10 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost updated MAHOUT-281: Attachment: MAHOUT-281.diff Changed scm connection strings. (Needed a comparably simple example to

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Benson Margulies
We could have a profile for that. On Wed, Feb 10, 2010 at 11:17 AM, Drew Farris wrote: > On Wed, Feb 10, 2010 at 6:40 AM, Sean Owen wrote: >> >> We can also say it should work on 0.19 and 0.18, but we don't >> guarantee or support that. (Slightly different than my last suggestion >> -- we don't

[jira] Updated: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-285: --- Attachment: MAHOUT-285.patch Robin, check out the DocumentProcessor integration here, is this what yo

[jira] Commented: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832047#action_12832047 ] Drew Farris commented on MAHOUT-285: Yes, I'm very close on this and should be able to

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Drew Farris
On Wed, Feb 10, 2010 at 6:40 AM, Sean Owen wrote: > > We can also say it should work on 0.19 and 0.18, but we don't > guarantee or support that. (Slightly different than my last suggestion > -- we don't actually know how it all goes on 0.19) > +1 -- we can't really know how it will work unless we

[jira] Commented: (MAHOUT-232) Implementation of sequential SVM solver based on Pegasos

2010-02-10 Thread zhao zhendong (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832022#action_12832022 ] zhao zhendong commented on MAHOUT-232: -- Hi Sean, For Mahout-232, I suppose to finishe

[jira] Commented: (MAHOUT-288) Select only Binary attributes from ARFF format for Bayes Classifier

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831973#action_12831973 ] Sean Owen commented on MAHOUT-288: -- It's up to your judgment about whether it's useful eno

[jira] Commented: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831964#action_12831964 ] Robin Anil commented on MAHOUT-285: --- This wont take much time nor does it depend on anyth

[jira] Commented: (MAHOUT-288) Select only Binary attributes from ARFF format for Bayes Classifier

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831961#action_12831961 ] Robin Anil commented on MAHOUT-288: --- Its a hacky solution for 0.3, just to get ARFF runni

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Wed Sean Owen wrote: > I'd say we recommend 0.20, since that's what we develop against and > it's the current stable release, and everything we have works on it. > > We can also say it should work on 0.19 and 0.18, but we don't > guarantee or support that. (Slightly different than my last sug

[jira] Commented: (MAHOUT-285) Wrap up collocation and dictionary vectorizer integration

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831958#action_12831958 ] Sean Owen commented on MAHOUT-285: -- Do you guys think the current patch is commitable? or

[jira] Updated: (MAHOUT-232) Implementation of sequential SVM solver based on Pegasos

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-232: - Fix Version/s: (was: 0.3) 0.4 This is evidently linked to MAHOUT-227 and so pushes

[jira] Updated: (MAHOUT-227) Parallel SVM

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-227: - Fix Version/s: (was: 0.3) 0.4 Moving to 0.4 per Zhao's comment > Parallel SVM > -

Re: Some more dependencies

2010-02-10 Thread Isabel Drost
On Wed Jake Mannix wrote: > > May I kick them out? > > > > +1 +1 from me as well. Isabel

[jira] Updated: (MAHOUT-288) Select only Binary attributes from ARFF format for Bayes Classifier

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-288: - Fix Version/s: (was: 0.3) 0.4 For 0.4 right? we shouldn't be opening anything agai

[jira] Updated: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-185: - Fix Version/s: (was: 0.3) 0.4 This timed out for 0.3 methinks > Add mahout shell

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Sean Owen
I'd say we recommend 0.20, since that's what we develop against and it's the current stable release, and everything we have works on it. We can also say it should work on 0.19 and 0.18, but we don't guarantee or support that. (Slightly different than my last suggestion -- we don't actually know ho

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Wed, 10 Feb 2010 11:10:41 + Sean wrote: > For simplicity, I'd document that Mahout works on 0.19 and 0.20, and > may work on 0.18 +1 Assuming that the majority of the algorithms may work on e.g. 0.19, we could tell users something along the lines of "works with Hadoop 0.19, except $algor

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Robin Anil
fpm is purely based on 0.20.x api and works perfectly fine on that On Wed, Feb 10, 2010 at 4:40 PM, Sean wrote: > For simplicity, I'd document that Mahout works on 0.19 and 0.20, and > may work on 0.18. That's more what people need to know, rather than > confuse the issue with talk of old/new

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Sean
For simplicity, I'd document that Mahout works on 0.19 and 0.20, and may work on 0.18. That's more what people need to know, rather than confuse the issue with talk of old/new APIs, since even I am confused about what's going on. The two are blending together, while one is deprecated, and it causes

Re: Mahout 0.3 Plan and other changes

2010-02-10 Thread Isabel Drost
On Thu deneche abdelhakim wrote: > although I maintain two versions of Decision Forests, one with the old > api and with the new one, the differences between the two APIs are so > important that I can't just keep working on the two versions. Thus all > the new stuff is being committed using the ne

Mahout Usage and Beyond

2010-02-10 Thread Robin Anil
Hi Mahouters I am trying to find out how you are using Mahout for your work or project, or which among the algorithms in Mahout are more important for you to do that work. And finally what do you expect to see in Mahout(A kind of a wish list). It wont take much of your time. Please reply with

[jira] Commented: (MAHOUT-153) Implement kmeans++ for initial cluster selection in kmeans

2010-02-10 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831926#action_12831926 ] Shashikant Kore commented on MAHOUT-153: Pallavi, I can see two potential improvem

[jira] Updated: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-02-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-180: --- Attachment: MAHOUT-180.patch Adds an EigenVerificationJob, which takes just as long as the Distribut

Re: Some more dependencies

2010-02-10 Thread Sean
Yes, I imagine lots of the code in there can be removed On Wed, Feb 10, 2010 at 8:50 AM, Robin Anil wrote: > any more +1s ? >

[jira] Created: (MAHOUT-288) Select only Binary attributes from ARFF format for Bayes Classifier

2010-02-10 Thread Robin Anil (JIRA)
Select only Binary attributes from ARFF format for Bayes Classifier --- Key: MAHOUT-288 URL: https://issues.apache.org/jira/browse/MAHOUT-288 Project: Mahout Issue Type: Sub-tas

[jira] Commented: (MAHOUT-286) Need to be able to run classifiers from non-text input (such as ARFF data)

2010-02-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831912#action_12831912 ] Robin Anil commented on MAHOUT-286: --- I will have to move this to 0.4. Bayes classifier on

[jira] Created: (MAHOUT-287) Bayes Classifier should use Vector as input

2010-02-10 Thread Robin Anil (JIRA)
Bayes Classifier should use Vector as input --- Key: MAHOUT-287 URL: https://issues.apache.org/jira/browse/MAHOUT-287 Project: Mahout Issue Type: Improvement Components: Classification Af

[jira] Updated: (MAHOUT-286) Need to be able to run classifiers from non-text input (such as ARFF data)

2010-02-10 Thread JIRA
[ https://issues.apache.org/jira/browse/MAHOUT-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Häger updated MAHOUT-286: Attachment: run.sh data.training.arff data.arff Attaching: * data.

Re: Some more dependencies

2010-02-10 Thread Robin Anil
any more +1s ?

Re: Some more dependencies

2010-02-10 Thread Robin Anil
In case we need to do need multithread all the algos should be reusable in that framework without any code modification. And I have a feeling hadoop will strive to improve multicore processor utilisation. Robin On Wed, Feb 10, 2010 at 2:13 PM, Jake Mannix wrote: > On Wed, Feb 10, 2010 at 12:39

Re: Some more dependencies

2010-02-10 Thread Jake Mannix
On Wed, Feb 10, 2010 at 12:39 AM, Robin Anil wrote: > Smp.java is not used anywhere. > SmpBlas is used at one place and could be replaced by Sequential version. > In > Mahout we dont need to run multithreading anyways. Assuming our allegiance > is to Hadoop M/R. and a map job shouldn't be doing f

Re: Some more dependencies

2010-02-10 Thread Robin Anil
Smp.java is not used anywhere. SmpBlas is used at one place and could be replaced by Sequential version. In Mahout we dont need to run multithreading anyways. Assuming our allegiance is to Hadoop M/R. and a map job shouldn't be doing further spliting of work May I kick them out? Robin On Wed, Fe