Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
This was on reuters collection(real sparse vectors) On Sat, Feb 20, 2010 at 11:31 AM, Jake Mannix jake.man...@gmail.com wrote: On Fri, Feb 19, 2010 at 3:56 PM, Robin Anil robin.a...@gmail.com wrote: Another tidbit: The getDistanceSquared of AbstractVector is much faster than the

Re: Welcome Drew Farris

2010-02-20 Thread Isabel Drost
On 18.02.2010 Drew Farris wrote: I'm looking forward to working with you all, Welcome to the Mahout community, Drew. Looking forward to working with you. Isabel signature.asc Description: This is a digitally signed message part.

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Jeff Eastman
+1 to upgrade, addTo did not exist when clustering was written. Should be pretty easy to upgrade it though. Robin Anil wrote: ah! Its not being used anywhere :). Should we make that a big task before 0.3 ? Sweep through code(mainly clustering) and change all these things. Robin On Fri, Feb

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
Hi Jeff, I will take care of Canopy and Kmeans, If you can take a look at the others It would be great.. I have kept the issue open here https://issues.apache.org/jira/browse/MAHOUT-297 Robin On Sat, Feb 20, 2010 at 5:44 PM, Jeff Eastman j...@windwardsolutions.comwrote: +1 to upgrade, addTo

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Jeff Eastman
Will do. I'm on a jet back to CA tomorrow for 11 hrs and will do it then. You doing fuzzyK too? Jeff Robin Anil wrote: Hi Jeff, I will take care of Canopy and Kmeans, If you can take a look at the others It would be great.. I have kept the issue open here

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
FuzzyK is creating problems as is. Still for reuters it is converging to the same point as is, i tried m=1,2,3,4 no difference. I found one slowdown thought (that is distance calculation with centroid as second parameter(its much faster with centroid as the first parameter). Better we tackle

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Sean Owen
I know I silently fixed a similar error a while ago, and someone else mention such an error before. This would be the third time. This seems like a dangerous optimization if competent developers have overlooked it consistently. Is it such a performance win that it justifies a likely bug in the

[jira] Commented: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-02-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836159#action_12836159 ] Sean Owen commented on MAHOUT-180: -- It's looking good to me, from a cursory visual

[jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836162#action_12836162 ] Sean Owen commented on MAHOUT-299: -- Broadly it looks fine to me, especially as proven by

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836163#action_12836163 ] Sean Owen commented on MAHOUT-300: -- Tiny stuff -- in things like dotSelf(), you don't need

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
+1 for more tests to the Vector implementations. Really, If vectors start acting weirdly there is no way we can debug a ML algorithm and less so on top of a distributed system. Like Grant once said, debugging such a system would result in loss of hair. I am ok with pulling out caching

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836167#action_12836167 ] Robin Anil commented on MAHOUT-300: --- I removed hasNoElements check as per sean's and teds

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836169#action_12836169 ] Robin Anil commented on MAHOUT-300: --- An issue i found here was for empty dense vectors

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Jake Mannix
On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil robin.a...@gmail.com wrote: +1 for more tests to the Vector implementations. Really, If vectors start acting weirdly there is no way we can debug a ML algorithm and less so on top of a distributed system. Like Grant once said, debugging such a

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
On Sat, Feb 20, 2010 at 8:55 PM, Jake Mannix jake.man...@gmail.com wrote: On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil robin.a...@gmail.com wrote: +1 for more tests to the Vector implementations. Really, If vectors start acting weirdly there is no way we can debug a ML algorithm and less so

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Jake Mannix
On Sat, Feb 20, 2010 at 7:27 AM, Robin Anil robin.a...@gmail.com wrote: And we do have v1.plus() vs. v1.plusMutable() - the latter is addTo(). What about other things like minus, divide etc etc Those methods all return copies, and the mutable versions are simply generalizations of the

[jira] Resolved: (MAHOUT-180) port Hadoop-ified Lanczos SVD implementation from decomposer

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-180. Resolution: Fixed Committed revision 912134. Wiki on usage forthcoming. port Hadoop-ified

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
Adding to that current tests dont cover all cases and at all levels of sparseness and across multiple implementations Seq.fn(Rand) Rand.fn(Dense) and so on, so need to add a framework which does that Robin On Sat, Feb 20, 2010 at 9:10 PM, Jake Mannix jake.man...@gmail.com wrote: On Sat, Feb

[jira] Updated: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-300: -- Attachment: MAHOUT-300.patch Solve performance issues with Vector Implementations

[jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836207#action_12836207 ] Drew Farris commented on MAHOUT-299: Thanks for the review Sean, I'll get it committed

Re: [jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Jake Mannix
Personally I'm a fan of judicious use of static imports if readability is good (esp. If there's only one class you're statically importing from), because who writes java code without an ide? Just my two cents. On Feb 20, 2010 9:08 AM, Drew Farris (JIRA) j...@apache.org wrote: [

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836209#action_12836209 ] Drew Farris commented on MAHOUT-301: This is pretty nice, it gets to the point where

Need comments on Proposal for linear SVM framework (Google Summer of Code 2010)

2010-02-20 Thread zhao zhendong
Hi all, Robin told me such great chance for continuous contributing code here (many thanks to Robin). Because I still work on Sequential SVM (Mahout-232) and I prefer to extend it to a unified framework that incorporates some other state-of-the-art linear SVM classifiers, I propose Linear Support

[jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836224#action_12836224 ] Drew Farris commented on MAHOUT-299: bq. I'd not throw RuntimeException -

test output in mahout-utils?

2010-02-20 Thread Drew Farris
I've noticed that output and testdata directories are being created in mahout-utils -- does anyone know where they're coming from? The eclipse svn client wants to add them of course which is why it's bugging me -- and I can set svn:ignore on them or figure out how to change the tests so that

more svn:ignore

2010-02-20 Thread Drew Farris
While I'm on the subject of svn:ignore, does anyone have a problem if I set svn:ignore on the various detritus eclipse litters all over the projectspace -- e.g: .settings, .classpath, .project

Re: test output in mahout-utils?

2010-02-20 Thread Robin Anil
Many of the clustering and classification algorithms use these dirs for tests. Sean had suggest earlier we move away from them and use temp directories. Its not changed yet. Robin On Sun, Feb 21, 2010 at 12:06 AM, Drew Farris drew.far...@gmail.com wrote: I've noticed that output and testdata

Re: more svn:ignore

2010-02-20 Thread Robin Anil
+1 On Sun, Feb 21, 2010 at 12:09 AM, Drew Farris drew.far...@gmail.com wrote: While I'm on the subject of svn:ignore, does anyone have a problem if I set svn:ignore on the various detritus eclipse litters all over the projectspace -- e.g: .settings, .classpath, .project

Re: [jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Drew Farris
On Sat, Feb 20, 2010 at 12:23 PM, Jake Mannix jake.man...@gmail.com wrote: Personally I'm a fan of judicious use of static imports if readability is good (esp. If there's only one class you're statically importing from), because who writes java code without an ide? Just my two cents. I

Re: test output in mahout-utils?

2010-02-20 Thread Robin Anil
MAHOUT-301 will help track this, so we wont miss it next time On Sun, Feb 21, 2010 at 12:09 AM, Robin Anil robin.a...@gmail.com wrote: Many of the clustering and classification algorithms use these dirs for tests. Sean had suggest earlier we move away from them and use temp directories.

[jira] Created: (MAHOUT-302) Change tests to use temp directories instead of output, testdata

2010-02-20 Thread Robin Anil (JIRA)
Change tests to use temp directories instead of output, testdata Key: MAHOUT-302 URL: https://issues.apache.org/jira/browse/MAHOUT-302 Project: Mahout Issue Type: Task

[jira] Assigned: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris reassigned MAHOUT-299: -- Assignee: Drew Farris Collocations: improve performance by making Gram BinaryComparable

[jira] Updated: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-299: --- Resolution: Fixed Status: Resolved (was: Patch Available) resolved in r912189

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Jake Mannix
Ah, you mean SequentialAccessVector.assign(RandomAccessVector, BinaryFunction map), etc? Yes, we do need to make sure all combinations are properly checked for that in the unit tests. We need a Jira ticket for this too! :) -jake On Sat, Feb 20, 2010 at 8:05 AM, Robin Anil

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836231#action_12836231 ] Jake Mannix commented on MAHOUT-301: The TODO refers to the issue that I think there,

Re: more svn:ignore

2010-02-20 Thread Drew Farris
Ok, I have this all set to commit, I'll pause a bit for further opinions. On Sat, Feb 20, 2010 at 1:41 PM, Robin Anil robin.a...@gmail.com wrote: +1 On Sun, Feb 21, 2010 at 12:09 AM, Drew Farris drew.far...@gmail.com wrote: While I'm on the subject of svn:ignore, does anyone have a problem

[jira] Created: (MAHOUT-303) Exhaustive Tests for Vector implementations

2010-02-20 Thread Robin Anil (JIRA)
Exhaustive Tests for Vector implementations --- Key: MAHOUT-303 URL: https://issues.apache.org/jira/browse/MAHOUT-303 Project: Mahout Issue Type: Task Affects Versions: 0.4 Reporter:

Re: Profiling SequentialAccessSparseVector

2010-02-20 Thread Robin Anil
https://issues.apache.org/jira/browse/MAHOUT-303 Ticket. All aboard the test train!. Robin On Sun, Feb 21, 2010 at 12:33 AM, Jake Mannix jake.man...@gmail.com wrote: Ah, you mean SequentialAccessVector.assign(RandomAccessVector, BinaryFunction map), etc? Yes, we do need to make sure all

[jira] Commented: (MAHOUT-300) Solve performance issues with Vector Implementations

2010-02-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836238#action_12836238 ] Ted Dunning commented on MAHOUT-300: {quote} I dont know what to do in the edge case of

Re: Need comments on Proposal for linear SVM framework (Google Summer of Code 2010)

2010-02-20 Thread Ted Dunning
This seems like a good idea for a project, but I see two issues: a) it seems very ambitious for one summer. This is good and bad. Good because you are excited and want to accomplish something grand, bad if it is too ambitious and would cause you to officially fail while still accomplishing

Re: [jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Ted Dunning
Doug Cutting. On Sat, Feb 20, 2010 at 9:23 AM, Jake Mannix jake.man...@gmail.com wrote: who writes java code without an ide? -- Ted Dunning, CTO DeepDyve

Re: [jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Jake Mannix
Well then when he joins us in Mahout, I'll offer to go back and swap out all the import statics for him! :P On Sat, Feb 20, 2010 at 11:56 AM, Ted Dunning ted.dunn...@gmail.com wrote: Doug Cutting. On Sat, Feb 20, 2010 at 9:23 AM, Jake Mannix jake.man...@gmail.com wrote: who writes java

[jira] Commented: (MAHOUT-299) Collocations: improve performance by making Gram BinaryComparable

2010-02-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836246#action_12836246 ] Ted Dunning commented on MAHOUT-299: {quote} Just wanted to check on this - I think the

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836247#action_12836247 ] Ted Dunning commented on MAHOUT-301: THis also helps non command line usage, actually.

Re: more svn:ignore

2010-02-20 Thread Ted Dunning
WOuldn't hurt to do the same for the IDEA project (*.ipr), module (*.iml) and workspace (*.iws) files. Lately, it seems idea is keeping this all in a .idea sub-directory of the parent. On Sat, Feb 20, 2010 at 11:20 AM, Drew Farris drew.far...@gmail.com wrote: Ok, I have this all set to commit,

Re: svn commit: r912198 - in /lucene/mahout/site: publish/ publish/skin/images/ src/documentation/content/xdocs/

2010-02-20 Thread Jake Mannix
How does one regenerate this? I never added myself to here as well. I've got forrest, and I can get it to regenerate the site in lucene/mahout/site/build, but I'm not sure what target there is to push into the svn-watched directories of site/publish... -jake On Sat, Feb 20, 2010 at 11:25 AM,

Re: svn commit: r912198 - in /lucene/mahout/site: publish/ publish/skin/images/ src/documentation/content/xdocs/

2010-02-20 Thread Drew Farris
Jake, I just did a cp -a ./build/site/* ./publish and committed per the instructions at http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html -- the only gotcha I ran into was that forest didn't like running under jdk 1.6, but I'd remembered mention of that on the list. Of course we won't see

Re: svn commit: r912198 - in /lucene/mahout/site: publish/ publish/skin/images/ src/documentation/content/xdocs/

2010-02-20 Thread Jake Mannix
On Sat, Feb 20, 2010 at 1:32 PM, Drew Farris drew.far...@gmail.com wrote: Jake, I just did a cp -a ./build/site/* ./publish and committed per the instructions at http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html -- the only That's the page I was looking for! Thanks! gotcha I

[jira] Created: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil (JIRA)
MeanShift doesn't read from VectorWritable -- Key: MAHOUT-304 URL: https://issues.apache.org/jira/browse/MAHOUT-304 Project: Mahout Issue Type: Improvement Components: Clustering Affects

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836268#action_12836268 ] Drew Farris commented on MAHOUT-301: {blockquote} What does GenericOptionsParser do if

Re: [jira] Created: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil
Hi Jeff, I am trying to create a M/R to create the MeanShiftCanopy from the Vectors. Do they need unique identifiers when they are being created ? In a Map/Reduce format it becomes difficult to assign unique int ids. I also cannot use the id of the vector as it is a String Robin

[jira] Updated: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-304: -- Attachment: MAHOUT-304.patch Added MeanShiftCanopyCreatorMapper (a map only job) to convert vectors to

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836271#action_12836271 ] Jake Mannix commented on MAHOUT-301: So this current patch will totally take -conf /

[jira] Updated: (MAHOUT-294) Uniform API behavior for Jobs

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-294: -- Description: * Move AbstractJob to common and convert all the Driver classes to extend that. One

[jira] Commented: (MAHOUT-294) Uniform API behavior for Jobs

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836274#action_12836274 ] Jake Mannix commented on MAHOUT-294: Have you checked out my patch on MAHOUT-301 - it's

Re: [jira] Created: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Ted Dunning
Given a plausible maximum number of mappers ( 50,000), it is reasonable to generate a random number here, especially if seeded using the host/task. 2^16 / (small number) is roughly where a random int quits being useful due to collisions. But I think that the task id itself may have the makings of

Re: more svn:ignore

2010-02-20 Thread Ted Dunning
I don't know the normal conventions (and they all seem to have changed recently anyway). *.ipr is the project file and the workspace and project files used to be at the top level. the module files could be below or not. The .idea directory is new and I don't grok it yet. It would only appear

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836278#action_12836278 ] Robin Anil commented on MAHOUT-301: --- Looks great. We parallely need to convert all

[jira] Updated: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-304: -- Attachment: MAHOUT-304.patch MeanShift doesn't read from VectorWritable

[jira] Assigned: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil reassigned MAHOUT-304: - Assignee: Robin Anil MeanShift doesn't read from VectorWritable

[jira] Updated: (MAHOUT-304) MeanShift doesn't read from VectorWritable

2010-02-20 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-304: -- Status: Patch Available (was: Open) MeanShift doesn't read from VectorWritable

[jira] Updated: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-301: --- Attachment: MAHOUT-301.patch Better version. Javadocs updated in the patch to reflect the way it

[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files

2010-02-20 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836328#action_12836328 ] Jake Mannix commented on MAHOUT-301: This patch modifies the mahout shell script to add