Hi,
Erm, I finally get to use a proper machine for testing (phew~), and now fkmeans
with k=50 works fine (will try larger k value later). However, as you
mentioned, clusterdumper is still failing with OME, my HADOOP_HEAPSIZE is 2000
(apparently the maximum i can assign, running a machine with
Hi Jeff,
I tried running this on synthetic_control dataset. I see load being balanced on
reducers now; but the job stops after multiple failures with the following
message:
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at
I thought HBase might be a little slow for large data query. It normally
takes 10-30ms to do a random read request.
And even in a parallel/map-reduce condition, it will still take some time to
query from the region server to the data node. I really doubt the hbase
would become a io bottle neck for
Hello,
this is my first email to the mahout-user-list.
I am trying to do some clustering with mahout and i have a question concerning
the cluster-center and cluster-radius.
For testing, i clustered 10 points using the KMeansClusterer:
points:
[13.000, 4455.000]
[13.000, 5101.000]
[13.000,
I've been testing the Mahout Recommender software using a dataModel derived
from activity data generated by page views and downloads of items in an
open-access repository. The taste data has preferences based on the number of
times a user has viewed an item and I've also tested with boolean
The problem you've described is actually simpler than the 'classic'
recommendation problem, which is personalized per user.
All you want is a list of most-similar items. That's a lot easier. You could
easily roll your own by using an ItemSimilarity implementation and iterating
over all items. No
Hi Immo,
did you have an extra cluster assignment at the end? Because the KMeans uses
two phases: the first where all points are assigned to a cluster and the second
where the cluster centroids are calculated based on the first assignment. So my
idea is that you could use the clustering flag
Hi Christoph,
thanks for your reply!
The cluster-assignment is pretty much what i want to do:
I have some points that i want to be clustered. Thats what i use
KMeansClusterer.clusterpoints(...) for. Unfortunately this method does not
provide me with an item-cluster-map. The only thing i get
The first problem is that the input doesn't have comparable variability.
This means that distance is going to be pretty much just y-distance.
One way to improve this is to reduce each coordinate by dividing by the
standard deviation of that coordinate.
Depending on what your y coordinate is
On Tue, Jul 26, 2011 at 4:27 AM, Benjamin Heilbrunn ben...@gmail.comwrote:
1) How can I display the topic distribution for a (existing) document
from the reuters corpus?
There is a sequence file called docTopics in the output directory. keys are
docIds,
values are VectorWritable. Use
I am new and have doubts
http://www.acunu.com/blogs/sean-owen/recommending-cassandra/
I put together this quick-and-dirty writeup on using Cassandra as a
backend for recommenders. May be of interest to anyone using Cassandra
and/or the non-distributed recommenders.
Sean
(Abhik this is nothing to do with Mahout, but the Manning forum
system. I will reply privately as this is not the place.)
On Tue, Jul 26, 2011 at 6:41 PM, Abhik Banerjee
banerjee.abhik@gmail.com wrote:
I get a message saying your post is more than 80 characters, fix that
On 07/26/2011 01:22 PM, Sean Owen wrote:
http://www.acunu.com/blogs/sean-owen/recommending-cassandra/
I put together this quick-and-dirty writeup on using Cassandra as a
backend for recommenders. May be of interest to anyone using Cassandra
and/or the non-distributed recommenders.
Sean
Yep.
That sounds like a fine approach.
You should try several algorithms, but the basic text classification
approach should work reasonably well, especially if you include phrases and
are aggressive about getting rid of garbage text.
On Tue, Jul 26, 2011 at 2:17 PM, Shrikar archak
The FPGrowth driver page:
https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining
gives a command line that only works in mahout/core, rather than
mahout/. Is this drift, or a document bug?
--
Lance Norskog
goks...@gmail.com
16 matches
Mail list logo