On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian wrote:
> Hi Ted,
>
> Thanks for your response. I thought that the mean of a sparse vector is
> simply the mean of the "defined" elements? Why would the vectors become
> dense unless you're meaning that all the undefined elements (0?) now will
> be (0
I have the exact same problem. Some of the longer running sub-tasks take
only 1 reducer and each reduce task runs between 2-3 hours !!!
Hi Ted,
Thanks for your response. I thought that the mean of a sparse vector is
simply the mean of the "defined" elements? Why would the vectors become
dense unless you're meaning that all the undefined elements (0?) now will
be (0-m_x)?
Looking at the following example:
X = [5 - 4] and Y= [4 5 2
Well, the best way to compute correlation using sparse vectors is to make
sure you keep them sparse. To do that, you must avoid subtracting the mean
by expanding whatever formulae you are using. For instance, if you are
computing
(x - m_x) . (y - m_y)
(here . means dot product)
If you do t
Okay so I rethought my question and realized that the paper never really
talked about collaborative filtering but just how to calculate item-item
similarity in a scalable fashion. Perhaps this is the reason for why the
common ratings aren't used? Because that's not a pre-req for this
calculation?
If you always insert 1's for each element, then you can detect collisions
by inserting all your elements (or all elements in each document
separately) and looking for the max value in the vector. If you see
something >1, you have a collision.
But collisions are actually good. The only way to com
Hi there
I open this old topic since I got some information more becouse I was able
in talking with my customer
Basically my customer wants the following:
by using some historical data, we have to cluster the data by using some
cluster analysis and some environment variables; for each cluster we ha
Hi, thanks for your quick reply. So multiple probes are a protection
against collisions? After playing a little with the default length of
a RandomAccessSparseVector object I noticed that (of course)
collisions occur when the length is too short. Therefore, I'm asking
myself if there is a possibili
The default with the Mahout encoders is two probes. This is unnecessary
with the intercept term, of course, if you protect the intercept term from
other updates, possible by encoding other data using a view of the original
feature vector.
For each probe, a different hash is used so each value is
I am new to Mahout.
I am trying to run Desicion tree in
https://github.com/apache/mahout/tree/mahout-0.6/examples/src/main/java/org/apache/mahout/classifier/df
I have gone through
https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation
I need to run "df" as a eclipse project.
So
On Fri, 22 Nov 2013 17:55:13 +0800
Jason Lee wrote:
> I noticed lots of algorithms implementations has deprecated in Mahout
> 0.8 and removed in 0.9, but no reasons or comments been marked. Can
> i ask why?
As Suneel mentioned earlier: Before removing these algorithms we asked
on the user list
For an example program using mahout I use the donut.csv sample data
from the project (
https://svn.apache.org/repos/asf/mahout/trunk/examples/src/main/resources/donut.csv
). My code looks like this:
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vecto
On Thu, 28 Nov 2013 13:24:26 +0530
Tharindu Rusira wrote:
> Yes that's the exact issue Suneel, it was a careless mistake while
> adding projects to Eclipse that I missed those .jars.
When changing Mahout code make sure to either run
mvn eclipse:eclipse before importing the project into your wor
13 matches
Mail list logo