I created a jira to supply a non-distributed counterpart of the
sampling that is done in the distributed item similarity computation:
https://issues.apache.org/jira/browse/MAHOUT-914
2011/12/2 Sean Owen sro...@gmail.com:
For your purposes, it's LogLikelihoodSimilarity. I made similar changes
ya i have not modified
but after referring this link i have replaced my pom of taste-web.
there is another errors like..
*At command mvn compile*
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model
for
OK, only the errors are relevant, and some indicate that you're missing
some dirs. For example make the lib directory referenced below.
I think you should go back to version 0.5 as-is, rather than try any
modifications. I think it was built with Maven 2.x rather than 3.x, so you
may have to try
Combining the latest commits with my
optimized-SamplingCandidateItemsStrategy (http://pastebin.com/6n9C8Pw1)
I achieved satisfying results. All the queries were under one second.
Sebastian, I took a look at your patch and I think it's more practical than
the current
Are you referring to my patch, MAHOUT-910?
It does let you specify a hard cap, really -- if you place a limit of X,
then at most X^2 item-item associations come out. Before you could not
bound the result, really, since one user could rate a lot of items.
I think it's slightly more efficient and
Actually I was referring to Sebastian's. I haven't seen you committed
anything to SamplingCandidateItemsStrategy. Can you tell me in which class
the change appears?
On Sun, Dec 4, 2011 at 2:06 PM, Sean Owen sro...@gmail.com wrote:
Are you referring to my patch, MAHOUT-910?
It does let you
Have a look at the patch attached to MAHOUT-910. I have not committed it
yet so as to allow review.
https://issues.apache.org/jira/browse/MAHOUT-910
The current implementation samples users. MAHOUT-914 samples items from
users. MAHOUT-910 samples both.
What's most ideal?
I had supposed we want
Hi Daniel,
My view is this: I think you can pretty safely down-sample power users
like it is done in https://issues.apache.org/jira/browse/MAHOUT-914
I did some experiments on the movielens1M dataset that showed that you
get a negligible error given you look at enough interactions per user:
Sean, your impl. is indeed better than mine but for some reason when I ran
it with for a user with a lot of interactions, I got 2023 possibleItemIDs
(although I used 10,2 in the constructor).
Sebastian, I will try and expriment also with your patch. I would just like
to add that in my opinion, as
I assume the parameter does not affect the possibleItemIDs because of the
following line:
max = (int)
Math.max(defaultMaxPrefsPerItemConsidered, userItemCountMultiplier *
Math.log(Math.max(dataModel.getNumUsers(), dataModel.getNumItems(;
On Sun, Dec 4, 2011 at 2:59 PM, Daniel Zohar
To talk about this clearly, let me go back to my example and add to it:
---
Say we're recommending for user A. User A is connected to items 1, 2, 3.
Those items are connected to other users X, Y, Z. And those users in turn
are connected to items 100, 101, 102, 103 You can down-sample three
Sean,
You can also do #1. That is what I have used in the past and what I
recommend. That achieves a large part of #2, but what is most important is
that it *directly* addresses the key cost factor in off-line
recommendations since the number of item pairs emitted is proportional to
the sum of
What do you have for logging in your classpath?
On Dec 1, 2011, at 1:24 PM, magicalo wrote:
Hello,
I have ran the 20newsgroups example on my own data set. It runs successfully
and prints the summary output. However, I have enabled the verbose option in
the script when I run the
2011/12/4 myn m...@163.com
does mahout contain this method?
Which method?
Time series analysis is not a method.
Any time you have data collected over time, you have time series data.
For example data form trajectory of hand movement in biomechanics or
movement of a give stock in a given day, x-axis is time. FFT,
frequency analysis of the data is an example of time series analysis.
In general regression are
Classification and clustering a also common tasks in time series analysis.
Furthermore, not all time series have sample that are expressed as simple
continuous values. Think about click streams or financial transactions.
Neither can be expressed as a simple number.
On Sun, Dec 4, 2011 at 7:29
Hello,
Is there an expected release date for the PCA algorithm as part of Mahout? Tx!
Hi Magicalo,
You can find a patch for PCA under MAHOUT-512 which is available here
https://issues.apache.org/jira/browse/MAHOUT-512.
This implementation scales well with training samples and calculates the
covariance matrix in a distributed way. The feature size is not so scalable as
the
18 matches
Mail list logo