Thanks Ted,
Yes for the time problem. We tend to use aggregations of session data. So
instead of asking for user recommendations we do things like user+sessions
recommendations.
Of course, deciding when sessions start and stop isn't trivial. I ideally
what I would want to is time-weight views usi
For the poly-syllable challenged,
hetereoscedasticity - degree of variation changes. This is common with
counts because you expect the standard deviation of count data to be
proportional to sqrt(n).
time imhogeneity - changes in behavior over time. One way to handle this
(roughly) is to first r
For my team it has usually been hetereoscedasticity and time inhomogeneity.
On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
wrote:
> Interesting topic,
> Ted, can you give examples of those mathematical assumptions
> under-pinning ALS which are violated by the real world?
>
> On Thu, Mar 27,
Least squares techniques in general depend on an assumption of normal
distribution of errors. With counts, that is only plausible with large values.
Also decomposition a like this make linearity assumptions which imply all
items/words are independent. They are clearly not.
Sent from my iPhon
I am running Fuzzy KMeans algorithm on Reuters corpus.
I am using Mahout 0.7 on Hadoop 1.1 on Ubuntu 12.04 machine.
Hadoop cluster consists of two machines
* master: 8GB RAM ( 4 cores )
* slave: 4GB RAM ( a KVM vm with only 1 core )
When I run this command, the clustering fails at iteration 3
Interesting topic,
Ted, can you give examples of those mathematical assumptions
under-pinning ALS which are violated by the real world?
On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning wrote:
> How can there be any other practical method? Essentially all of the
> mathematical assumptions under-pinni
How can there be any other practical method? Essentially all of the
mathematical assumptions under-pinning ALS are violated by the real world.
Why would any mathematical consideration of the number of features be much
more than heuristic?
That said, you can make an information content argument.
Hi,
does anyone know of a principled approach of choosing the number of
features for ALS (other than cross-validation?)
--sebastian
Dear Suneel,
I am very sorry that I did not reply to you for so long.
I just realized that your mail was automatically recognized as spam
My data is 700MB.
mapred.child.java.opt=-Xmx4g
It takes 2 hours for one map to compete its task.
15 map was started for the job.
Map runs very fast, I th