+1. I read over most of the patch and see three distinct patterns:
- replacing ad-hoc string arguments that name file paths with Path objects
- replacing ad-hoc temp file allocation, deallocation with a uniform
mechanism
- whitespace formatting differences between your and my formatters
Kudos
You could try using more, smaller input splits, but large datasets and
too-small distance thresholds will choke up the mappers with number of
canopies approaching the number of points seen by the mapper. Also the
single reducer will choke unless the thresholds allow condensing the
mapper
I can't seem to log into the wiki any more and two password reset
attempts have failed to produce the promised password email (I checked
my spam filter too). Does anybody have enough karma to help me out?
Jeff
I saw that email too, but confluence appears to be working. I've sent a
request to infrastructure...
On 5/2/10 9:12 AM, Robin Anil wrote:
I believe they are upgrading confluence. I got an email about it yesterday
On Sun, May 2, 2010 at 9:40 PM, Jeff Eastmanjeast...@windwardsolutions.com
Indeed, the wiki is pretty out of date in some areas and the actual apis
have changed (since 2008!). For users wishing to launch clustering jobs
using trunk I suggest checking out utils TestCDbwEvaluator and
TestClusterDumper which employ the latest versions. These do not use the
command-line
These sorts of optimizations could delay the growth of canopy clusters
in situations where the clustering thresholds are set too low for the
dataset. At some point the mapper would still OME with enough points if
all become clusters. That decision rests with the T2 threshold which
determines
The surfire report seems to indicate this might be a timing problem with
hdfs being lazy. Sometimes it passes and sometimes it fails, but of
course, right at the end of the 15 min core tests which makes it
especially annoying. Any resolution possible?
Failed tests:
testSimple(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest)
testSimpleItem(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest)
testNoCorrelation1(org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarityTest)
27, 2010 at 10:54 PM, Jeff Eastman
j...@windwardsolutions.com wrote:
Hi Sean,
I was under the impression that the recently refactored NamedVectors would
be just another kind of Vector and that they would not need to show up in
method signatures unless there really was a requirement
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860981#action_12860981
]
Jeff Eastman commented on MAHOUT-236:
-
Ok, the above patch was committed on the 21st
[
https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861194#action_12861194
]
Jeff Eastman commented on MAHOUT-297:
-
I don't understand why the constructors
[
https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861194#action_12861194
]
Jeff Eastman edited comment on MAHOUT-297 at 4/26/10 9:15 PM
version somewhere that I could get working again on trunk?
On 4/23/10 9:10 AM, Sean Owen wrote:
Good eye, this was fixed in the manuscript a while ago.
I will ping Manning to re-publish Chapters 1-6 since a lot of small
updates have happened since then.
On Fri, Apr 23, 2010 at 4:53 PM, Jeff
Yeay team!
On 4/21/10 1:09 PM, Grant Ingersoll wrote:
The Board has approved Mahout, Tika, and Nutch moving to be top level status.
Congrats! Now begins the fun part of changing mailing lists, domains, etc.
-Grant
Mahout Vectors and Clusters currently support JSON encodings for input
and output. What else is needed?
Jeff
On 4/21/10 4:18 PM, Robin Anil wrote:
The details are not clear at the moment. But, I am sure this will help
adoption of the mahout quickly.
Things to do. Parse JSON and make the
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859027#action_12859027
]
Jeff Eastman commented on MAHOUT-236:
-
I'm running into a challenge integrating Fuzzy
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
Attachment: MAHOUT-236.patch
Added a mean shift clustering job and now it works for CDbw too
CanopyClusterer.emitPointToExistingCanopies emits clusterId ::
VectorWritable
On 4/18/10 10:07 AM, Jake Mannix wrote:
In code we already have?
-jake
On Apr 18, 2010 9:53 AM, Jeff Eastmanj...@windwardsolutions.com wrote:
I can think of situations where I need to use a clusterId as the
Also mean shift clustering relies on vector identities and tbd emitting
its clustered points for CDbw would need to retain them.
On 4/18/10 10:22 AM, Jeff Eastman wrote:
CanopyClusterer.emitPointToExistingCanopies emits clusterId ::
VectorWritable
On 4/18/10 10:07 AM, Jake Mannix wrote
Sure, maybe just initialize names to instead of null?
private String name = ;
On 4/18/10 10:45 AM, Jake Mannix wrote:
Ok this is a good concrete example, I like concrete. :)
I'm still very wary of having to have some mapper or reducer classes deal
with LabeledVector some deal with just
+1 NamedVector seems a lot like VectorView. I'm comfortable enough with
this proposal for Sean to go forward with it grin. I agree with
separating the naming/identifying/labeling into a separate wrapper class
so that vectors themselves can be pure mathematical entities. Unifying
as many as
Looking at the KMeansClusterer.outputPointWithClusterInfo it seems this
code will have to change in the patch but I haven't yet looked:
String name = point.getName();
String key = (name != null) (name.length() != 0) ? name :
point.asFormatString();
output.collect(new Text(key),
Are you thinking of replacing our Writable or Json (asFormatString)
encodings? Certainly, using Avro as an I/O format for clustering would
improve their utility for other languages. Seems like a major rewrite to
replace Writable within our MR jobs.
On 4/17/10 9:10 AM, Ted Dunning wrote:
IF
On 4/16/10 10:05 AM, Sean Owen wrote:
Clojure isn't my cup of tea but that's not important.
It's an interesting question, how much belongs under the Mahout tent?
There's a tradeoff between excluding useful extensions to the project
on the one hand, and becoming a spare parts bin of code of
Ted Dunning wrote:
On Wed, Apr 14, 2010 at 12:53 PM, Sean Owen sro...@gmail.com wrote:
I would actually prefer ripping names out of the base vectors entirely.
They should decorate the mathematical vector, but as their use is
decidedly
non-mathematical and application specific
Benson Margulies wrote:
https://repository.apache.org/content/repositories/orgapachemahout-015/
contains (this time for sure) all the artifacts for release 1.0 of the
mahout-collections component. This is the first independent release of
collections from the rest of mahout; it differs from the
Benson Margulies wrote:
In order to decouple the mahout-collections library from the rest of
Mahout, to allow more frequent releases and other good things, we
propose to release the code generator for the collections library as a
separate Maven artifact. (Followed, in short order, by the
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854742#action_12854742
]
Jeff Eastman commented on MAHOUT-270:
-
r931372 renames Printable to Cluster and adds
[
https://issues.apache.org/jira/browse/MAHOUT-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852948#action_12852948
]
Jeff Eastman commented on MAHOUT-339:
-
Problem with example was introduced by a recent
of the Apache Mahout Project; and be it further
RESOLVED, that the persons listed immediately below be and
hereby are appointed to serve as the initial members of the
Apache Mahout Project:
• Isabel Drost (isa...@...)
• Ted Dunning (tdunn...@...)
• Jeff Eastman (jeast
Jake Mannix wrote:
On Wed, Mar 17, 2010 at 6:14 AM, Jeff Eastman j...@windwardsolutions.comwrote:
Pallavi Palleti wrote:
Hi,
Could some one kindly let me know the significance of instance variable
name in AbstractVector? It is causing problems, when I write a vector to
file and read
When I run mvn eclipse:eclipse it generates .classpath and .project
files in each of the mahout module directories. The last time I did this
I manually merged the .classpath library declarations from each module
into the main project's .classpath and that made Eclipse happy. This was
really
Drew Farris wrote:
On Wed, Mar 17, 2010 at 3:07 PM, Jeff Eastman
j...@windwardsolutions.com wrote:
Are any of you using this IDE in a more automatic way?
I use eclipse Galileo and m2eclipse 0.9 and the 'import maven
projects' feature. I check out the mahout sources into my workspace
: Bug
Components: Clustering
Affects Versions: 0.3
Reporter: Jeff Eastman
Priority: Critical
Fix For: 0.4
Mar 17, 2010 2:15:00 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0002
java.lang.ClassCastException
I'm going to be on holiday in Mexico with unknown Internet connectivity
for the next week. Please record my +1 vote on this resolution.
Jeff
Grant Ingersoll wrote:
On Mar 17, 2010, at 9:51 AM, Jeff Eastman wrote:
Hi Grant,
This version still has the old copy/paste problem
Grant Ingersoll wrote:
It usually takes 24 hours. Just follow the release dirs and we'll be good.
Tomorrow is a great day for a Mahout announcement! Maybe we can change the
logo to be green for tomorrow.
I've got corned beef n' cabbage on the boil so a bit o' green works for
me. Maybe
committers:
• Isabel Drost (isa...@...)
• Ted Dunning (tdunn...@...)
• Jeff Eastman (jeast...@...)
• Drew Farris (d...@...)
• Otis Gospodnetic (o...@...)
• Grant Ingersoll (gsing...@...)
• Sean Owen (sro...@...)
• Karl Wettin (ka
Robin Anil wrote:
This one is with a blue elephant :P
https://issues.apache.org/jira/secure/attachment/12438704/mahout-blueE-200.png
+1 to this one. I like the yellow in the mahout(s) as it stands out more
I can't see any difference between #3 and #4 but I do like the mahout
with hair and arms. The avatar blue person color is growing on me too,
especially when I put my 3d glasses on grin. The brownish elephant is
nice and so is the yellow one. Have you tried a gray elephant? Sorry I'm
not
I think if you mark the instVar as transient then Json won't include its
state in the JsonString.
Sean Owen wrote:
It seems like the length squared should not be part of the string
representation. Does anyone know how to control this Gson output
formatter to ignore this field?
The
I'm getting a consistent compiler heap overflow during mvn clean install
on one of two machines with the last commit. Ironically, my MacBook Pro
compiles and my Mac Pro does not. Both compiled before the commit.
Benson Margulies wrote:
Did you get the number of that commit? The very last commit was my release
arranging, and it's pretty hard to see how it could have that effect.
On Thu, Mar 11, 2010 at 7:54 PM, Jeff Eastman j...@windwardsolutions.comwrote:
I'm getting a consistent compiler heap
And check the asFormatString(bindings) implementation in ClusterBase. It
does this I think, though it has not yet been wired into
ClusterDumper.printClusters. I wanted to give the ClusterDumper users a
chance to critique my formatting but it is like the below.
Jeff
Jake Mannix (JIRA)
wrote:
It already does this, i think. But floats can be formatted better
On Tue, Mar 2, 2010 at 2:55 AM, Jeff Eastman j...@windwardsolutions.comwrote:
And check the asFormatString(bindings) implementation in ClusterBase. It
does this I think, though it has not yet been wired
they should all be
Printable too (the latter two are already). That would let us refactor
VectorDumper into AbstractVector and clean up another code duplication.
On Tue, Mar 2, 2010 at 3:16 AM, Jeff Eastman j...@windwardsolutions.comwrote:
The loop still needs to be closed in order to unify
.
-jake
On Mon, Mar 1, 2010 at 1:36 PM, Robin Anil robin.a...@gmail.com wrote:
It already does this, i think. But floats can be formatted better
On Tue, Mar 2, 2010 at 2:55 AM, Jeff Eastman j...@windwardsolutions.com
wrote:
And check the asFormatString(bindings) implementation
I'm +1 on getting these changes in asap, but +0 on whether to do them
during code freeze. I'm pretty confident Robin can pull it off, but it
is code freeze.
Jeff
Robin Anil wrote:
Hi guys,
I have some patches ready, this cleans up our clustering code,
gets the examples running,
AbstractVector.minus has a bug in the first if clause. Don't know if my
fix or this one would do what is intended by the optimization:
if (x instanceof RandomAccessSparseVector || x instanceof DenseVector) {
// TODO: if both are RandomAccess check the numNonDefault to
determine which
Jake Mannix wrote:
why is this not showing up in the unit tests?
On Wed, Feb 24, 2010 at 6:36 PM, Jeff Eastman
jeast...@windwardsolutions.com wrote:
AbstractVector.minus has a bug in the first if clause. Don't know if my fix
or this one would do what is intended by the optimization
+1 from me too
Ted Dunning wrote:
+1 to code freeze, waiting for hadoop release and testing the RC
On Tue, Feb 23, 2010 at 8:38 AM, Isabel Drost isa...@apache.org wrote:
On Tue Grant Ingersoll gsing...@apache.org wrote:
On Feb 23, 2010, at 9:18 AM, Sean Owen wrote:
It does
If the Vector-MSCanopy pre-job outputs all of its canopies then each of
those canopies would contain the generated canopyId and its canopy
center would contain the original vector with its docId. Seems like one
could use that data set to get the membership information in a separate
Robin Anil wrote:
after the ListVector - ListcanopyId optimization.
I did that in the patch. Take a look :)
+1 Simply marvelous
+1. This will then enable a small step forwards towards reducing the
memory footprint of MeanShiftCanopy.boundPoints by allowing the
ListVector to be replaced by ListInteger. The boundPoints don't need
to be accumulated at all if one is only interested in the resulting
cluster centers, but
+1 to upgrade, addTo did not exist when clustering was written. Should
be pretty easy to upgrade it though.
Robin Anil wrote:
ah! Its not being used anywhere :). Should we make that a big task before
0.3 ? Sweep through code(mainly clustering) and change all these things.
Robin
On Fri, Feb
/browse/MAHOUT-297
Robin
On Sat, Feb 20, 2010 at 5:44 PM, Jeff Eastman j...@windwardsolutions.comwrote:
+1 to upgrade, addTo did not exist when clustering was written. Should be
pretty easy to upgrade it though.
Robin Anil wrote:
ah! Its not being used anywhere :). Should we make
Very similar, especially when you consider that k-means only adds the
whole point value to the single, closest cluster (i.e.
weightedPointTotal += 1), whereas fuzzy adds it partially to all. I
don't think the other clustering routines require/expect numPoints to be
an integer and the instvar
please take a
look
Robin
On Wed, Feb 17, 2010 at 3:35 PM, Jeff Eastman j...@windwardsolutions.comwrote:
Robin Anil wrote:
Hadoop reuses the *same* instance whenever it uses readFields and I've
been
bitten more than once by assuming otherwise.
Yep!. Thats our bug
Robin Anil wrote:
Hadoop reuses the *same* instance whenever it uses readFields and I've been
bitten more than once by assuming otherwise.
Yep!. Thats our bug. Always assume mutability in Hadoop :) . I will see the
where the writable is causing the error.
Best is if we could have some
Looks to me like the unit tests are the only calls to recomputeCenter,
which is where the center is set. The clusterer seems to be calling
computeCentroid, which sets the centroid, instead. I'm not sure why it
needs both instance variables, as the pointProbSum and
weightedPointTotal variables
to be identical (and especially if they are not all
zeros).
Jeff
Robin Anil wrote:
On Tue, Feb 16, 2010 at 10:25 PM, Jeff Eastman
j...@windwardsolutions.comwrote:
Looks to me like the unit tests are the only calls to recomputeCenter,
which is where the center is set. The clusterer seems to be calling
I went to run the syntheticcontrol example (which uses MR) but there is
no fuzzy version. It might be easier to debug if one was created from
the kmeans job. It really feels to me like the problem lies somewhere in
Writable handling and not in the ClusterBase refactoring.
Jeff Eastman wrote
+1 on Isabel's comments.
Isabel Drost wrote:
On Sat Grant Ingersoll gsing...@apache.org wrote:
I don't see any harm in getting 0.3 out first if that makes folks
more comfortable.
Yeah, this feels better to me the more I think about it.
+1 from me as well: I really like the
+1 to all of Drew's comments
Drew Farris wrote:
+1 to eliminating the statics, they are indeed evil. The type to read
should be stored in the thing doing/facilitating the reading not the
vector itself and definitely not in a static field. Pretty sure vector
shouldn't be facilitating the reading
Robin Anil wrote:
any more +1s ?
+1 keep Mahout as unentangled as possible
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831678#action_12831678
]
Jeff Eastman commented on MAHOUT-270:
-
r908235 commits the Printable interface
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831678#action_12831678
]
Jeff Eastman edited comment on MAHOUT-270 at 2/9/10 9:39 PM
in IntelliJ too. I assume its an
artifact of reloading the pom.xml file and resetting some things.
On Tue, Feb 9, 2010 at 12:06 AM, Jeff Eastman
j...@windwardsolutions.com wrote:
I'm getting a lot of compile errors in Eclipse after my most recent svn
update today. The errors begin in the math module
Components: Clustering
Affects Versions: 0.2
Reporter: Jeff Eastman
Assignee: Jeff Eastman
I looked over the R reference code and alpha_0 is used in two places, not one
as in the current implementation:
- in state initialization beta = rbeta(K, 1, alpha_0) [K
[
https://issues.apache.org/jira/browse/MAHOUT-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830740#action_12830740
]
Jeff Eastman commented on MAHOUT-276:
-
The fix involves adding alpha_0 as an argument
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830741#action_12830741
]
Jeff Eastman commented on MAHOUT-270:
-
I'd like to deprecate the asFormatString
Jeff Eastman wrote:
Jeff Eastman wrote:
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes
the
probability that a point will go to any new cluster quite low. You
can
compensate somewhat for this with different values of alpha
Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes the
probability that a point will go to any new cluster quite low. You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes the
probability that a point will go to any new cluster quite low. You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm?
Just notice this didn't go to the list.
---BeginMessage---
Hi Jerry,
I'm not sure why Dirichlet is doing that with this dataset and have not
been able to get better results than you. I have gotten excellent
results using it with other models on other datasets, so I'm pretty
confident in the
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806873#action_12806873
]
Jeff Eastman commented on MAHOUT-270:
-
In the beginning, vectors, canopies and clusters
Affects Versions: 0.2
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Given the binary representation of models/clusters in Dirichlet, extend the
ClusterDumper utility to dump out a printable representation of them too.
--
This message is automatically generated by JIRA
Sean Owen wrote:
That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.
I note it only because the fix raises a question. It causes core/ to
depend utils/, and I had thought that was not the
20, 2010 at 4:56 PM, Jeff Eastman
j...@windwardsolutions.com wrote:
Sean Owen wrote:
That last commit concerning the dirichlet code and models seems to
cause the build to fail -- or else I'm the victim of another
environment-specific issue.
I note it only because the fix raises
I will run the build before I commit.
I will run the build before I commit.
...
I will run the build before I commit.
my bad
Ted Dunning wrote:
Our modules aren't working out as well as expected.
On Wed, Jan 20, 2010 at 4:56 PM, Jeff Eastman j...@windwardsolutions.comwrote:
Sean Owen
The build compiles but org.apache.mahout.math.TestVectorWritable fails
for some reason and it does not get to my test.
Jeff Eastman wrote:
I will run the build before I commit.
I will run the build before I commit.
...
I will run the build before I commit.
my bad
Ted Dunning wrote:
Our
I'm planning on attending
Jeff
Grant Ingersoll wrote:
On Jan 17, 2010, at 8:35 PM, Ted Dunning wrote:
We should have a beer some time anyway and the beers we owe you for cleaning
up Colt more than cancel any potential beer on this issue so I will be happy
to buy (Sean, you are included
[
https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman resolved MAHOUT-251.
-
Resolution: Fixed
r900519 wrapped up loose ends in the patch, adding new command line arguments
I've made some changes for MAHOUT-251 and all the tests run in Eclipse,
but two of them fail when run from Maven. How can I poke mvn to give me
more diagnostics?
Sean Owen wrote:
Could be. I took an indirect stab at mitigating possible sources of
this issue by increasing encapsulation in the tests -- I still believe
fields should never by non-private. This may start to surface the
behind-the-scenes dependencies and side effects that shouldn't be
there.
I just did a successful mvn install on trunk without seeing any
problems. My checkout is a couple of days old and there have been a few
other commits in addition to mine since.
Drew Farris wrote:
Yes, I'm seeing this too. Deneche encountered it back when working with:
: Mahout
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.2
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Users attempting to use Dirichlet Process Clustering on real life problems
cannot use any of the existing models or model
[
https://issues.apache.org/jira/browse/MAHOUT-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-251:
Attachment: MAHOUT-251.patch
This patch generalizes the 2-d dense models by introducing a new
Issue Type: Improvement
Components: Classification, Clustering, Collaborative Filtering,
Frequent Itemset/Association Rule Mining, Genetic Algorithms, Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Fix For: 0.4
[
https://issues.apache.org/jira/browse/MAHOUT-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-167:
Attachment: MAHOUT-167.patch
Work in progress patch which compiles most Canopy changes needed
I'm inclined towards Sean's perspective. Making the kinds of significant
changes to the vector implementation that 165 entails strike me as
non-trivial and likely to delay 0.2 significantly. I vote to not include
it in this point release so that the functionality which is ready to go
public
[
https://issues.apache.org/jira/browse/MAHOUT-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760462#action_12760462
]
Jeff Eastman commented on MAHOUT-136:
-
I think this issue has been completed and should
Some of the clustering unit tests that test the Hadoop jobs take a while
to run through their iterations. This is on the order of a minute or two
in some cases. I think testing the jobs should still be done in the
pre-commit batch, since the commits really need them to pass successfully.
Jeff
I propose leaving MAHOUT-167 out of 0.2 for the reasons which Sean
mentioned previously. MAHOUT-136 is, afaict, done and can probably be
closed. Grant, you had some comments in the issue; have they been resolved?
Grant Ingersoll wrote:
Here's the list of unresolved issues for 0.2:
Robin Anil wrote:
Dear Mahout Devs,Yourkit sales rep gave me my opensource
license. If anyone would like to get one. I can aggregate and send all the
requests to him.
If you would like to have an opensource license of Yourkit Profiler, reply
to on thread within 24 hours of this
-167
Project: Mahout
Issue Type: Improvement
Components: Clustering
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Fix For: 0.2
We need to update the clustering implementations to remove the deprecated
Hadoop
Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740573#action_12740573 ]
Grant Ingersoll commented on MAHOUT-121:
bq. Please,
+1 Looks reasonable to me. I had to change my Lucene formatter profile
to make lines and comments wrap at 120 vs. 80 in order for Eclipse to
curtail most of its reformatting changes. A new line or two is still
added/removed in the files I've checked but otherwise we are in synch.
Sean Owen
Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a parameter
to KMeansDriver to take in the number of reduce tasks?
The purpose of the clustering jobs, in general, was to simplify
computing the clusters and then clustering the data. It has been applied
- and
Ingersoll wrote:
Check out the patch I just put up on M-138
On Jun 26, 2009, at 12:32 PM, Jeff Eastman wrote:
Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a
parameter to KMeansDriver to take in the number of reduce tasks?
The purpose of the clustering jobs
then more. How
about you do Canopy and KMeans and I do the others, since those seem to
be in your critical path at the current time.
Jeff
Grant Ingersoll wrote:
On Jun 26, 2009, at 3:04 PM, Jeff Eastman wrote:
That looks reasonable, just reading the patch. You might also want to
put the clusters-x
1 - 100 of 304 matches
Mail list logo