What do you mean by self similarity? Power law size scaling? Or that two
successive clusterings get nearly the same answer?
Sent from my iPhone
On Jul 8, 2012, at 8:40 PM, Lance Norskog goks...@gmail.com wrote:
Are there any measures of self-similarity?
On Sun, Jul 8, 2012 at 6:07 PM,
I have thought about this problem before, and I read several posts talking
about this. Sean Owen is right that the math doesn't care about what the things
are. But in practice I think a better way is that you can evaluate the
individual similarity of different kinds of data, and then combine
Power law size scaling.
On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning ted.dunn...@gmail.com wrote:
What do you mean by self similarity? Power law size scaling? Or that two
successive clusterings get nearly the same answer?
Sent from my iPhone
On Jul 8, 2012, at 8:40 PM, Lance Norskog
Hi Mohsen, hello Sean,
there is already a lot of researching going on for doing recommendations
especially matrix factorization on GPUs:
e.g.
http://www.slideshare.net/NVIDIA/1034-gtc09
20x - 300x faster
or
http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf
60x faster over
(I agree, it's quite a useful approach -- was answering the question
about whether there was any such thing in Mahout. This all assumes you
can fit the data in memory in the GPU but that is true for moderately
large data sets.)
On Mon, Jul 9, 2012 at 9:04 AM, Manuel Blechschmidt
Just a quick and possible innumerate thought re WebGL (which is OpenGL
exposed as Web browser content via Javascript).
Perhaps the big heavy number-crunching can be done on server-side
Mahout / Hadoop, but with a role for *delivery* of computed matrices
in the browser? The memory concerns are
You can derive many metrics based on just co-occurrence, if your data
is 1 and 0. Pearson, cosine similarity, Tanimoto/Jaccard, Euclidean
distance, log-likelihood all just reduce to counting. Why not at least
give the choice?
You can keep half the diff matrix since it's symmetric of course.
The factorization is the heavy number crunching. The client of a
recommender needs to do very little computation in comparison, like a
vector-matrix product. While a GPU might make this happen faster, it's
already on the order of microseconds. Compare with the cost of
downloading the whole
Of course it's possible. It does mean you need to make sure your
similarity metrics are meaningfully comparable.
This ranges from basic stuff, like making sure they are both
outputting the same range, to more subtle stuff like making sure an
0.5 from both mean something comparable. This is not
Hi all,
I'm reading Mahout in Action and new to Mahout. Before I can run the code
in 2.2.2 Creating a recommender, I think I need to import Mahout into
Eclipse first.
However, encountered a problem when trying to import *Mahout*'s source
code to eclipse. My steps are as follows.
1 Start
try mvn eclipse:eclipse
2012/7/9 huangjia cucumbergua...@gmail.com
Hi all,
I'm reading Mahout in Action and new to Mahout. Before I can run the code
in 2.2.2 Creating a recommender, I think I need to import Mahout into
Eclipse first.
However, encountered a problem when trying to import
Do not combine maven-eclipse-plugin (eclipse:eclipse) and m2e plugin for
eclipse.
To explain what's happening when you import with m2e:
For maven plugins configured in build scripts to execute on specific build
lifecycle phases m2e needs metadata/info on what to do with them - execute,
ignore,
Can you rephrase that question? I do a rowsimilarity measure for the
docs excluding self-similarity but I doubt that is what you are asking.
Are you asking if I do a similarity calc on clusters? I'm planning to
find clusters that are similar using their centroids. This is to create
a sort of
Thanks for clarifications and comments.
On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote:
The factorization is the heavy number crunching. The client of a
recommender needs to do very little computation in comparison, like a
vector-matrix product. While a GPU might make this
Power law scaling is very rare to observe directly in k-means clusters
because the algorithm tends to force them to be the same physical size.
Bayesian non-parametric clustering algorithms can show some scaling
effects, but it is very difficult to see very many clusters so it is very
difficult to
Dot products are an example of something that gpu can't help with. The problem
is that there the same number of flops as memory operations and memory is slow.
To get acceleration you need lots of flops per memory fetch. Usually you need
at least matrix by matrix multiply with both dense.
Hi Stevo,
Sorry, but I couldn't quite understand your answer. Do you suggest that I
change the pom.xml? Is there a permanent solution to my problem?
Hi Chenghao,
Where shall I execute the mvn eclipse:eclipse command? In Cygwin?
Thank you both !
Jia
On Mon, Jul 9, 2012 at 12:13 PM, Stevo
There hasn't been much use-case for clustering up to now. Also, our
clustering is dead slow which discourages use.
On Mon, Jul 9, 2012 at 9:59 AM, Pat Ferrel p...@occamsmachete.com wrote:
Scale is one of Mahout's primary benefits. And at scale you require
evaluators or you ignore quality. I
Sorry, I'm not following this shorthand. Are you asking if the term
weights of each centroid follow a power law, like they are supposed to?
On 7/9/12 12:34 AM, Lance Norskog wrote:
Power law size scaling.
On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning ted.dunn...@gmail.com wrote:
What do you
yes it makes sense .
but I am more interested to get faster computation by combining the Mahout
and GPU capabilities. I just wanted to know if people involve in Mahout
have thought about it or is it at all possible or not.for example speed up
the Map and Reduce phases by parallelise computations
Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
a problem across quite remote machines while CUDA/GPU approaches rely
on putting all computation together not only on one machine but within
one graphics card.
It doesn't make sense to combine them. Either you want to
I think that he means cluster sizes rather than term weights.
For text, term frequencies follow an approximate power law.
On Mon, Jul 9, 2012 at 10:06 AM, Pat Ferrel p...@occamsmachete.com wrote:
Sorry, I'm not following this shorthand. Are you asking if the term
weights of each centroid
See https://issues.apache.org/jira/browse/MAHOUT-1043 for more info and
feel free to vote up if you're interested in having Apache Mahout sources
importable and buildable in modern eclipse/maven environment.
Kind regards,
Stevo Slavic.
On Mon, Jul 9, 2012 at 6:59 PM, huangjia
23 matches
Mail list logo