date:20111204

Re: Mahout performance issues

2011-12-04 Thread Sebastian Schelter

I created a jira to supply a non-distributed counterpart of the sampling that is done in the distributed item similarity computation: https://issues.apache.org/jira/browse/MAHOUT-914 2011/12/2 Sean Owen sro...@gmail.com: For your purposes, it's LogLikelihoodSimilarity. I made similar changes

Re: problem at : Installing and testing Taste

2011-12-04 Thread VIGNESH PRAJAPATI

ya i have not modified but after referring this link i have replaced my pom of taste-web. there is another errors like.. *At command mvn compile* [INFO] Scanning for projects... [WARNING] [WARNING] Some problems were encountered while building the effective model for

Re: problem at : Installing and testing Taste

2011-12-04 Thread Sean Owen

OK, only the errors are relevant, and some indicate that you're missing some dirs. For example make the lib directory referenced below. I think you should go back to version 0.5 as-is, rather than try any modifications. I think it was built with Maven 2.x rather than 3.x, so you may have to try

Re: Mahout performance issues

2011-12-04 Thread Daniel Zohar

Combining the latest commits with my optimized-SamplingCandidateItemsStrategy (http://pastebin.com/6n9C8Pw1) I achieved satisfying results. All the queries were under one second. Sebastian, I took a look at your patch and I think it's more practical than the current

Re: Mahout performance issues

2011-12-04 Thread Sean Owen

Are you referring to my patch, MAHOUT-910? It does let you specify a hard cap, really -- if you place a limit of X, then at most X^2 item-item associations come out. Before you could not bound the result, really, since one user could rate a lot of items. I think it's slightly more efficient and

Re: Mahout performance issues

2011-12-04 Thread Daniel Zohar

Actually I was referring to Sebastian's. I haven't seen you committed anything to SamplingCandidateItemsStrategy. Can you tell me in which class the change appears? On Sun, Dec 4, 2011 at 2:06 PM, Sean Owen sro...@gmail.com wrote: Are you referring to my patch, MAHOUT-910? It does let you

Re: Mahout performance issues

2011-12-04 Thread Sean Owen

Have a look at the patch attached to MAHOUT-910. I have not committed it yet so as to allow review. https://issues.apache.org/jira/browse/MAHOUT-910 The current implementation samples users. MAHOUT-914 samples items from users. MAHOUT-910 samples both. What's most ideal? I had supposed we want

Re: Mahout performance issues

2011-12-04 Thread Sebastian Schelter

Hi Daniel, My view is this: I think you can pretty safely down-sample power users like it is done in https://issues.apache.org/jira/browse/MAHOUT-914 I did some experiments on the movielens1M dataset that showed that you get a negligible error given you look at enough interactions per user:

Re: Mahout performance issues

2011-12-04 Thread Daniel Zohar

Sean, your impl. is indeed better than mine but for some reason when I ran it with for a user with a lot of interactions, I got 2023 possibleItemIDs (although I used 10,2 in the constructor). Sebastian, I will try and expriment also with your patch. I would just like to add that in my opinion, as

Re: Mahout performance issues

2011-12-04 Thread Daniel Zohar

I assume the parameter does not affect the possibleItemIDs because of the following line: max = (int) Math.max(defaultMaxPrefsPerItemConsidered, userItemCountMultiplier * Math.log(Math.max(dataModel.getNumUsers(), dataModel.getNumItems(; On Sun, Dec 4, 2011 at 2:59 PM, Daniel Zohar

Re: Mahout performance issues

2011-12-04 Thread Sean Owen

To talk about this clearly, let me go back to my example and add to it: --- Say we're recommending for user A. User A is connected to items 1, 2, 3. Those items are connected to other users X, Y, Z. And those users in turn are connected to items 100, 101, 102, 103 You can down-sample three

Re: Mahout performance issues

2011-12-04 Thread Ted Dunning

Sean, You can also do #1. That is what I have used in the past and what I recommend. That achieves a large part of #2, but what is most important is that it *directly* addresses the key cost factor in off-line recommendations since the number of item pairs emitted is proportional to the sum of

Re: 20newsgroups example does not print verbose output

2011-12-04 Thread Grant Ingersoll

What do you have for logging in your classpath? On Dec 1, 2011, at 1:24 PM, magicalo wrote: Hello, I have ran the 20newsgroups example on my own data set. It runs successfully and prints the summary output. However, I have enabled the verbose option in the script when I run the

Re: Time series analysis

2011-12-04 Thread Ted Dunning

2011/12/4 myn m...@163.com does mahout contain this method? Which method? Time series analysis is not a method.

Re: Time series analysis

2011-12-04 Thread Peyman Mohajerian

Any time you have data collected over time, you have time series data. For example data form trajectory of hand movement in biomechanics or movement of a give stock in a given day, x-axis is time. FFT, frequency analysis of the data is an example of time series analysis. In general regression are

Re: Time series analysis

2011-12-04 Thread Ted Dunning

Classification and clustering a also common tasks in time series analysis. Furthermore, not all time series have sample that are expressed as simple continuous values. Think about click streams or financial transactions. Neither can be expressed as a simple number. On Sun, Dec 4, 2011 at 7:29

When is PCA expected to be fully implemented into Mahout?

2011-12-04 Thread magicalo

Hello, Is there an expected release date for the PCA algorithm as part of Mahout? Tx!

Re: When is PCA expected to be fully implemented into Mahout?

2011-12-04 Thread Raphael Cendrillon

Hi Magicalo, You can find a patch for PCA under MAHOUT-512 which is available here https://issues.apache.org/jira/browse/MAHOUT-512. This implementation scales well with training samples and calculates the covariance matrix in a distributed way. The feature size is not so scalable as the

Re: Mahout performance issues

Re: problem at : Installing and testing Taste

Re: problem at : Installing and testing Taste

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: Mahout performance issues

Re: 20newsgroups example does not print verbose output

Re: Time series analysis

Re: Time series analysis

Re: Time series analysis

When is PCA expected to be fully implemented into Mahout?

Re: When is PCA expected to be fully implemented into Mahout?

18 matches

Site Navigation

Mail list logo

Footer information