going to look into reducing the numbers of iterations on the
clustering tests which are some of the culprits.
On 4/29/10 6:15 PM, Grant Ingersoll wrote:
On Apr 29, 2010, at 6:36 PM, Jeff Eastman wrote:
right at the end of the 15 min core tests which makes it especially annoying.
L
Hi Sean,
I was under the impression that the recently refactored NamedVectors
would be just another kind of Vector and that they would not need to
show up in method signatures unless there really was a requirement for
that explicit type. What I see now in many places in the clustering code
is
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
Attachment: MAHOUT-236.patch
Here's a new patch that has initial, probably inco
From Mahout In Action:
You may be searching for something like “CosineMeasureSimilarity” in
Mahout. You’veactuallyalreadyfounditbutunder
anunexpectedname: PearsonCorrelationSimilarity. The cosine
measure similarity and Pearson correlation aren’t the same thin
Ok, just checking. I've got an initial implementation that I'm debugging
and will post a patch soon. The equations in the paper still leave a bit
to the student from a completeness perspective.
On 4/27/10 12:15 AM, Robin Anil (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-
I'm not arguing it is a performance improvement for sparse vectors, just
that changing the class of the vector should not be necessary: if the
vectors being clustered are dense then the cluster constructors should
leave them dense. If the vectors that are being clustered are of a
sparse variety
Correct, all the clustered points are now clusterId -> VectorWritable.
This reflects some loss of generality for the two fuzzy clusterers
(fuzzyK, Dirichlet) and I will likely need to add another clustering
option for them that includes probability of membership. But for now and
for the CDbw ca
+1
On 4/26/10 5:24 AM, Grant Ingersoll wrote:
My edits inline.
On Apr 26, 2010, at 3:45 AM, Sean Owen wrote:
Here's my suggested boilerplate -- see below and please suggest edits
if desired. There's a 150 word limit.
Apache Mahout provides scalable implementations of machine learning
alg
See ClusterBase for those constants
On 4/23/10 11:53 AM, Robin Anil wrote:
May I suggest keeping constants in a public String value. That way people
will not hard code clsuters-0 and so on and
instead use Clusterer.CLUSTER_DIR
On Fri, Apr 23, 2010 at 11:55 PM, Jeff Eastman
wrote:
ntion and the book will
follow(shouldn't be the other way around)
Robin
On Fri, Apr 23, 2010 at 11:30 PM, Jeff Eastman
wrote:
The APIs did not change but the clustered points directory changed from
"points" to "clusteredPoints" and the various clusters directori
ain on trunk?
On 4/23/10 9:10 AM, Sean Owen wrote:
Good eye, this was fixed in the manuscript a while ago.
I will ping Manning to re-publish Chapters 1-6 since a lot of small
updates have happened since then.
On Fri, Apr 23, 2010 at 4:53 PM, Jeff Eastman
wrote:
Section 4.
Section 4.5.1 says:
"The third line shows how it is based on item-item similarities, not
user-user similarities as before. The algorithms are similar, but not
entirely symmetric. They do have notably different properties. For
instance, the running time of an item-based recommender scales up as
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
Attachment: MAHOUT-236.patch
This patch runs on top of Sean's latest patch (r936453) and a
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
Attachment: MAHOUT-236.patch
I made some small changes to fuzzyK clustering and now the evaluator
+1 As Robin noted, this patch will affect some of the clustering code
and it will conflict with the changes I've been working for MAHOUT-236.
On balance, fixing the whole Vector equivalence mess seems prudent and I
will deal with the rework. You've done a pile of work here and I think
factoring
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-236:
Attachment: MAHOUT-236.patch
Here's a patch that adds a CDbw reference point MR job that ite
16 matches
Mail list logo