Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Ted Dunning
What do you mean by self similarity? Power law size scaling? Or that two successive clusterings get nearly the same answer? Sent from my iPhone On Jul 8, 2012, at 8:40 PM, Lance Norskog goks...@gmail.com wrote: Are there any measures of self-similarity? On Sun, Jul 8, 2012 at 6:07 PM,

Re:Re: Approaches for combining multiple types of item data for user-user similarity

2012-07-09 Thread bangbig
I have thought about this problem before, and I read several posts talking about this. Sean Owen is right that the math doesn't care about what the things are. But in practice I think a better way is that you can evaluate the individual similarity of different kinds of data, and then combine

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Lance Norskog
Power law size scaling. On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning ted.dunn...@gmail.com wrote: What do you mean by self similarity? Power law size scaling? Or that two successive clusterings get nearly the same answer? Sent from my iPhone On Jul 8, 2012, at 8:40 PM, Lance Norskog

Re: mahout on GPU

2012-07-09 Thread Manuel Blechschmidt
Hi Mohsen, hello Sean, there is already a lot of researching going on for doing recommendations especially matrix factorization on GPUs: e.g. http://www.slideshare.net/NVIDIA/1034-gtc09 20x - 300x faster or http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf 60x faster over

Re: mahout on GPU

2012-07-09 Thread Sean Owen
(I agree, it's quite a useful approach -- was answering the question about whether there was any such thing in Mahout. This all assumes you can fit the data in memory in the GPU but that is true for moderately large data sets.) On Mon, Jul 9, 2012 at 9:04 AM, Manuel Blechschmidt

Re: mahout on GPU

2012-07-09 Thread Dan Brickley
Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are

Re: Candidate items for different cases

2012-07-09 Thread Sean Owen
You can derive many metrics based on just co-occurrence, if your data is 1 and 0. Pearson, cosine similarity, Tanimoto/Jaccard, Euclidean distance, log-likelihood all just reduce to counting. Why not at least give the choice? You can keep half the diff matrix since it's symmetric of course.

Re: mahout on GPU

2012-07-09 Thread Sean Owen
The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this happen faster, it's already on the order of microseconds. Compare with the cost of downloading the whole

Re: Re: Approaches for combining multiple types of item data for user-user similarity

2012-07-09 Thread Sean Owen
Of course it's possible. It does mean you need to make sure your similarity metrics are meaningfully comparable. This ranges from basic stuff, like making sure they are both outputting the same range, to more subtle stuff like making sure an 0.5 from both mean something comparable. This is not

Import Mahout's source code to eclipse

2012-07-09 Thread huangjia
Hi all, I'm reading Mahout in Action and new to Mahout. Before I can run the code in 2.2.2 Creating a recommender, I think I need to import Mahout into Eclipse first. However, encountered a problem when trying to import *Mahout*'s source code to eclipse. My steps are as follows. 1 Start

Re: Import Mahout's source code to eclipse

2012-07-09 Thread chenghao liu
try mvn eclipse:eclipse 2012/7/9 huangjia cucumbergua...@gmail.com Hi all, I'm reading Mahout in Action and new to Mahout. Before I can run the code in 2.2.2 Creating a recommender, I think I need to import Mahout into Eclipse first. However, encountered a problem when trying to import

Re: Import Mahout's source code to eclipse

2012-07-09 Thread Stevo Slavić
Do not combine maven-eclipse-plugin (eclipse:eclipse) and m2e plugin for eclipse. To explain what's happening when you import with m2e: For maven plugins configured in build scripts to execute on specific build lifecycle phases m2e needs metadata/info on what to do with them - execute, ignore,

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Pat Ferrel
Can you rephrase that question? I do a rowsimilarity measure for the docs excluding self-similarity but I doubt that is what you are asking. Are you asking if I do a similarity calc on clusters? I'm planning to find clusters that are similar using their centroids. This is to create a sort of

Re: mahout on GPU

2012-07-09 Thread mohsen jadidi
Thanks for clarifications and comments. On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote: The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Ted Dunning
Power law scaling is very rare to observe directly in k-means clusters because the algorithm tends to force them to be the same physical size. Bayesian non-parametric clustering algorithms can show some scaling effects, but it is very difficult to see very many clusters so it is very difficult to

Re: mahout on GPU

2012-07-09 Thread Ted Dunning
Dot products are an example of something that gpu can't help with. The problem is that there the same number of flops as memory operations and memory is slow. To get acceleration you need lots of flops per memory fetch. Usually you need at least matrix by matrix multiply with both dense.

Re: Import Mahout's source code to eclipse

2012-07-09 Thread huangjia
Hi Stevo, Sorry, but I couldn't quite understand your answer. Do you suggest that I change the pom.xml? Is there a permanent solution to my problem? Hi Chenghao, Where shall I execute the mvn eclipse:eclipse command? In Cygwin? Thank you both ! Jia On Mon, Jul 9, 2012 at 12:13 PM, Stevo

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Ted Dunning
There hasn't been much use-case for clustering up to now. Also, our clustering is dead slow which discourages use. On Mon, Jul 9, 2012 at 9:59 AM, Pat Ferrel p...@occamsmachete.com wrote: Scale is one of Mahout's primary benefits. And at scale you require evaluators or you ignore quality. I

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Pat Ferrel
Sorry, I'm not following this shorthand. Are you asking if the term weights of each centroid follow a power law, like they are supposed to? On 7/9/12 12:34 AM, Lance Norskog wrote: Power law size scaling. On Sun, Jul 8, 2012 at 11:39 PM, Ted Dunning ted.dunn...@gmail.com wrote: What do you

Re: mahout on GPU

2012-07-09 Thread mohsen jadidi
yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations

Re: mahout on GPU

2012-07-09 Thread Sean Owen
Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to

Re: Cluster Evaluation 0.8 style

2012-07-09 Thread Ted Dunning
I think that he means cluster sizes rather than term weights. For text, term frequencies follow an approximate power law. On Mon, Jul 9, 2012 at 10:06 AM, Pat Ferrel p...@occamsmachete.com wrote: Sorry, I'm not following this shorthand. Are you asking if the term weights of each centroid

Re: Import Mahout's source code to eclipse

2012-07-09 Thread Stevo Slavić
See https://issues.apache.org/jira/browse/MAHOUT-1043 for more info and feel free to vote up if you're interested in having Apache Mahout sources importable and buildable in modern eclipse/maven environment. Kind regards, Stevo Slavic. On Mon, Jul 9, 2012 at 6:59 PM, huangjia