This was on reuters collection(real sparse vectors)
On Sat, Feb 20, 2010 at 11:31 AM, Jake Mannix jake.man...@gmail.com wrote:
On Fri, Feb 19, 2010 at 3:56 PM, Robin Anil robin.a...@gmail.com wrote:
Another tidbit: The getDistanceSquared of AbstractVector is much faster
than
the
On 18.02.2010 Drew Farris wrote:
I'm looking forward to working with you all,
Welcome to the Mahout community, Drew. Looking forward to working with you.
Isabel
signature.asc
Description: This is a digitally signed message part.
+1 to upgrade, addTo did not exist when clustering was written. Should
be pretty easy to upgrade it though.
Robin Anil wrote:
ah! Its not being used anywhere :). Should we make that a big task before
0.3 ? Sweep through code(mainly clustering) and change all these things.
Robin
On Fri, Feb
Hi Jeff, I will take care of Canopy and Kmeans, If you can take a look at
the others It would be great..
I have kept the issue open here
https://issues.apache.org/jira/browse/MAHOUT-297
Robin
On Sat, Feb 20, 2010 at 5:44 PM, Jeff Eastman j...@windwardsolutions.comwrote:
+1 to upgrade, addTo
Will do. I'm on a jet back to CA tomorrow for 11 hrs and will do it
then. You doing fuzzyK too?
Jeff
Robin Anil wrote:
Hi Jeff, I will take care of Canopy and Kmeans, If you can take a look at
the others It would be great..
I have kept the issue open here
FuzzyK is creating problems as is. Still for reuters it is converging to the
same point as is, i tried m=1,2,3,4 no difference. I found one slowdown
thought (that is distance calculation with centroid as second parameter(its
much faster with centroid as the first parameter). Better we tackle
I know I silently fixed a similar error a while ago, and someone else
mention such an error before. This would be the third time. This seems
like a dangerous optimization if competent developers have overlooked
it consistently. Is it such a performance win that it justifies a
likely bug in the
[
https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836159#action_12836159
]
Sean Owen commented on MAHOUT-180:
--
It's looking good to me, from a cursory visual
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836162#action_12836162
]
Sean Owen commented on MAHOUT-299:
--
Broadly it looks fine to me, especially as proven by
[
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836163#action_12836163
]
Sean Owen commented on MAHOUT-300:
--
Tiny stuff -- in things like dotSelf(), you don't need
+1 for more tests to the Vector implementations. Really, If vectors start
acting weirdly there is no way we can debug a ML algorithm and less so on
top of a distributed system. Like Grant once said, debugging such a system
would result in loss of hair.
I am ok with pulling out caching
[
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836167#action_12836167
]
Robin Anil commented on MAHOUT-300:
---
I removed hasNoElements check as per sean's and teds
[
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836169#action_12836169
]
Robin Anil commented on MAHOUT-300:
---
An issue i found here was for empty dense vectors
On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil robin.a...@gmail.com wrote:
+1 for more tests to the Vector implementations. Really, If vectors start
acting weirdly there is no way we can debug a ML algorithm and less so on
top of a distributed system. Like Grant once said, debugging such a
On Sat, Feb 20, 2010 at 8:55 PM, Jake Mannix jake.man...@gmail.com wrote:
On Sat, Feb 20, 2010 at 5:25 AM, Robin Anil robin.a...@gmail.com wrote:
+1 for more tests to the Vector implementations. Really, If vectors start
acting weirdly there is no way we can debug a ML algorithm and less so
On Sat, Feb 20, 2010 at 7:27 AM, Robin Anil robin.a...@gmail.com wrote:
And we do have v1.plus() vs. v1.plusMutable() - the latter is addTo().
What about other things like minus, divide etc etc
Those methods all return copies, and the mutable versions are simply
generalizations of the
[
https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix resolved MAHOUT-180.
Resolution: Fixed
Committed revision 912134.
Wiki on usage forthcoming.
port Hadoop-ified
Adding to that current tests dont cover all cases and at all levels of
sparseness and across multiple implementations
Seq.fn(Rand)
Rand.fn(Dense) and so on, so need to add a framework which does that
Robin
On Sat, Feb 20, 2010 at 9:10 PM, Jake Mannix jake.man...@gmail.com wrote:
On Sat, Feb
[
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-300:
--
Attachment: MAHOUT-300.patch
Solve performance issues with Vector Implementations
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836207#action_12836207
]
Drew Farris commented on MAHOUT-299:
Thanks for the review Sean, I'll get it committed
Personally I'm a fan of judicious use of static imports if readability is
good (esp. If there's only one class you're statically importing from),
because who writes java code without an ide?
Just my two cents.
On Feb 20, 2010 9:08 AM, Drew Farris (JIRA) j...@apache.org wrote:
[
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836209#action_12836209
]
Drew Farris commented on MAHOUT-301:
This is pretty nice, it gets to the point where
Hi all,
Robin told me such great chance for continuous contributing code here (many
thanks to Robin). Because I still work on Sequential SVM (Mahout-232) and I
prefer to extend it to a unified framework that incorporates some other
state-of-the-art linear SVM classifiers, I propose Linear Support
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836224#action_12836224
]
Drew Farris commented on MAHOUT-299:
bq. I'd not throw RuntimeException -
I've noticed that output and testdata directories are being created in
mahout-utils -- does anyone know where they're coming from?
The eclipse svn client wants to add them of course which is why it's
bugging me -- and I can set svn:ignore on them or figure out how to
change the tests so that
While I'm on the subject of svn:ignore, does anyone have a problem if
I set svn:ignore on the various detritus eclipse litters all over the
projectspace -- e.g: .settings, .classpath, .project
Many of the clustering and classification algorithms use these dirs for
tests. Sean had suggest earlier we move away from them and use temp
directories. Its not changed yet.
Robin
On Sun, Feb 21, 2010 at 12:06 AM, Drew Farris drew.far...@gmail.com wrote:
I've noticed that output and testdata
+1
On Sun, Feb 21, 2010 at 12:09 AM, Drew Farris drew.far...@gmail.com wrote:
While I'm on the subject of svn:ignore, does anyone have a problem if
I set svn:ignore on the various detritus eclipse litters all over the
projectspace -- e.g: .settings, .classpath, .project
On Sat, Feb 20, 2010 at 12:23 PM, Jake Mannix jake.man...@gmail.com wrote:
Personally I'm a fan of judicious use of static imports if readability is
good (esp. If there's only one class you're statically importing from),
because who writes java code without an ide?
Just my two cents.
I
MAHOUT-301 will help track this, so we wont miss it next time
On Sun, Feb 21, 2010 at 12:09 AM, Robin Anil robin.a...@gmail.com wrote:
Many of the clustering and classification algorithms use these dirs for
tests. Sean had suggest earlier we move away from them and use temp
directories.
Change tests to use temp directories instead of output, testdata
Key: MAHOUT-302
URL: https://issues.apache.org/jira/browse/MAHOUT-302
Project: Mahout
Issue Type: Task
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris reassigned MAHOUT-299:
--
Assignee: Drew Farris
Collocations: improve performance by making Gram BinaryComparable
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris updated MAHOUT-299:
---
Resolution: Fixed
Status: Resolved (was: Patch Available)
resolved in r912189
Ah, you mean SequentialAccessVector.assign(RandomAccessVector,
BinaryFunction map), etc?
Yes, we do need to make sure all combinations are properly checked for that
in the unit tests.
We need a Jira ticket for this too! :)
-jake
On Sat, Feb 20, 2010 at 8:05 AM, Robin Anil
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836231#action_12836231
]
Jake Mannix commented on MAHOUT-301:
The TODO refers to the issue that I think there,
Ok, I have this all set to commit, I'll pause a bit for further opinions.
On Sat, Feb 20, 2010 at 1:41 PM, Robin Anil robin.a...@gmail.com wrote:
+1
On Sun, Feb 21, 2010 at 12:09 AM, Drew Farris drew.far...@gmail.com wrote:
While I'm on the subject of svn:ignore, does anyone have a problem
Exhaustive Tests for Vector implementations
---
Key: MAHOUT-303
URL: https://issues.apache.org/jira/browse/MAHOUT-303
Project: Mahout
Issue Type: Task
Affects Versions: 0.4
Reporter:
https://issues.apache.org/jira/browse/MAHOUT-303
Ticket. All aboard the test train!.
Robin
On Sun, Feb 21, 2010 at 12:33 AM, Jake Mannix jake.man...@gmail.com wrote:
Ah, you mean SequentialAccessVector.assign(RandomAccessVector,
BinaryFunction map), etc?
Yes, we do need to make sure all
[
https://issues.apache.org/jira/browse/MAHOUT-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836238#action_12836238
]
Ted Dunning commented on MAHOUT-300:
{quote}
I dont know what to do in the edge case of
This seems like a good idea for a project, but I see two issues:
a) it seems very ambitious for one summer. This is good and bad. Good
because you are excited and want to accomplish something grand, bad if it is
too ambitious and would cause you to officially fail while still
accomplishing
Doug Cutting.
On Sat, Feb 20, 2010 at 9:23 AM, Jake Mannix jake.man...@gmail.com wrote:
who writes java code without an ide?
--
Ted Dunning, CTO
DeepDyve
Well then when he joins us in Mahout, I'll offer to go back and swap out all
the
import statics for him! :P
On Sat, Feb 20, 2010 at 11:56 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Doug Cutting.
On Sat, Feb 20, 2010 at 9:23 AM, Jake Mannix jake.man...@gmail.com
wrote:
who writes java
[
https://issues.apache.org/jira/browse/MAHOUT-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836246#action_12836246
]
Ted Dunning commented on MAHOUT-299:
{quote}
Just wanted to check on this - I think the
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836247#action_12836247
]
Ted Dunning commented on MAHOUT-301:
THis also helps non command line usage, actually.
WOuldn't hurt to do the same for the IDEA project (*.ipr), module (*.iml)
and workspace (*.iws) files. Lately, it seems idea is keeping this all in a
.idea sub-directory of the parent.
On Sat, Feb 20, 2010 at 11:20 AM, Drew Farris drew.far...@gmail.com wrote:
Ok, I have this all set to commit,
How does one regenerate this? I never added myself to here as well. I've
got forrest,
and I can get it to regenerate the site in lucene/mahout/site/build, but I'm
not sure what
target there is to push into the svn-watched directories of site/publish...
-jake
On Sat, Feb 20, 2010 at 11:25 AM,
Jake,
I just did a cp -a ./build/site/* ./publish and committed per the
instructions at
http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html -- the only
gotcha I ran into was that forest didn't like running under jdk 1.6,
but I'd remembered mention of that on the list. Of course we won't see
On Sat, Feb 20, 2010 at 1:32 PM, Drew Farris drew.far...@gmail.com wrote:
Jake,
I just did a cp -a ./build/site/* ./publish and committed per the
instructions at
http://cwiki.apache.org/MAHOUT/howtoupdatethewebsite.html -- the only
That's the page I was looking for! Thanks!
gotcha I
MeanShift doesn't read from VectorWritable
--
Key: MAHOUT-304
URL: https://issues.apache.org/jira/browse/MAHOUT-304
Project: Mahout
Issue Type: Improvement
Components: Clustering
Affects
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836268#action_12836268
]
Drew Farris commented on MAHOUT-301:
{blockquote}
What does GenericOptionsParser do if
Hi Jeff, I am trying to create a M/R to create the MeanShiftCanopy from the
Vectors. Do they need unique identifiers when they are being created ? In a
Map/Reduce format it becomes difficult to assign unique int ids. I also
cannot use the id of the vector as it is a String
Robin
[
https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-304:
--
Attachment: MAHOUT-304.patch
Added MeanShiftCanopyCreatorMapper (a map only job) to convert vectors to
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836271#action_12836271
]
Jake Mannix commented on MAHOUT-301:
So this current patch will totally take -conf /
[
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-294:
--
Description:
* Move AbstractJob to common and convert all the Driver classes to extend that.
One
[
https://issues.apache.org/jira/browse/MAHOUT-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836274#action_12836274
]
Jake Mannix commented on MAHOUT-294:
Have you checked out my patch on MAHOUT-301 - it's
Given a plausible maximum number of mappers ( 50,000), it is reasonable to
generate a random number here, especially if seeded using the host/task.
2^16 / (small number) is roughly where a random int quits being useful due
to collisions.
But I think that the task id itself may have the makings of
I don't know the normal conventions (and they all seem to have changed
recently anyway).
*.ipr is the project file and the workspace and project files used to be at
the top level. the module files could be below or not.
The .idea directory is new and I don't grok it yet. It would only appear
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836278#action_12836278
]
Robin Anil commented on MAHOUT-301:
---
Looks great. We parallely need to convert all
[
https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-304:
--
Attachment: MAHOUT-304.patch
MeanShift doesn't read from VectorWritable
[
https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil reassigned MAHOUT-304:
-
Assignee: Robin Anil
MeanShift doesn't read from VectorWritable
[
https://issues.apache.org/jira/browse/MAHOUT-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-304:
--
Status: Patch Available (was: Open)
MeanShift doesn't read from VectorWritable
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-301:
---
Attachment: MAHOUT-301.patch
Better version. Javadocs updated in the patch to reflect the way it
[
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836328#action_12836328
]
Jake Mannix commented on MAHOUT-301:
This patch modifies the mahout shell script to add
63 matches
Mail list logo