Hi,
I have a recommender, with a boolean prefs model. I am following the
instructions in the MIA book, but only get this exception:
Illegal precision: NaN
[Thrown class java.lang.IllegalArgumentException]
Restarts:
0: [QUIT] Quit to the SLIME top level
Backtrace:
0: com.google.common.base.
Some mahout algorithms use map-reduce, others (e.g. logistic regression)
do not. If your data is in hive, you could look in to shoehorning the
mahout algorithm in to a UDAF. This is what I'll be looking in to in the
next couple of weeks, so if it's of potential interest, ping me in a few
weeks an
yes, that's the one. Thank you, Ted.
On Fri, Jul 6, 2012 at 2:32 PM, Ted Dunning wrote:
> I think that Dmitriy is referring to this:
>
> http://www.deepdyve.com/lp/association-for-computing-machinery/regression-based-latent-factor-models-1ebJXMCs0K
>
> On Fri, Jul 6, 2012 at 2:26 PM, Dmitriy Lyub
I think that Dmitriy is referring to this:
http://www.deepdyve.com/lp/association-for-computing-machinery/regression-based-latent-factor-models-1ebJXMCs0K
On Fri, Jul 6, 2012 at 2:26 PM, Dmitriy Lyubimov wrote:
> (it is in ACM library, or Ted knows a cheaper arrangement to pull it off).
>
these guys show one way to combine content info with dyadic data
factorization, which is pretty close to what i used. Unfortunately i
don't have a free download link for them (it is in ACM library, or Ted
knows a cheaper arrangement to pull it off).
Agarwal, Chen : "Regression-based Latent Factor
That's right, in the formulation you are referring to you are not
predicting the original input values, so you can't compare them with
RMSE or something.
To test precision / recall you hold out some of the top-rated items
(these are the "relevant results"), and see how many come back in the
recomm
Thanks Sean
I've accidently continued this thread under the thread you opened, so I'm
moving back to my thread :)
I will rephrase the question I've asked there.
Let's say that as part of my held-out test my model find for user u2 connection
to i1 has strength of 28.94 to i2 17.9 and to i3 4.5.
T
Thanks.
On Thu, Jul 5, 2012 at 9:55 AM, Ted Dunning wrote:
> For this size a dense solver like in commons math should work. For larger
> sizes (up to about a million non-zeros), the in-memory stochastic
> projection SVD in Mahout should work well.
>
> On Thu, Jul 5, 2012 at 12:44 AM, Sean Owen
I suggest asking this question on the lucene-users mailing list.
On Thu, Jul 5, 2012 at 8:56 AM, Praveen Chandar
wrote:
> Hi,
> I've used lucene as a data source for Mahout in the past. Recently, I
> switched to Lucene 4.0 (trunk) and in lucene 4.0 the indexing/term vector
> APIs have changed.
>
It is critical to use randomized projections here in order to get the
dimension independent characteristics.
On Fri, Jul 6, 2012 at 11:32 AM, Sean Owen wrote:
> LSH is probably my ticket, thanks all. I tried a form of this, but
> just used the basis of the feature space to define the hyperplanes
LSH is probably my ticket, thanks all. I tried a form of this, but
just used the basis of the feature space to define the hyperplanes
because I was lazy and experimenting. I didn't work well in the sense
that the best recommendations were not hashed together unless you had
fairly few buckets (i.e.,
On 6 July 2012 19:36, Sean Owen wrote:
> I don't recall that it has ever caused a problem, no. The values are
> just keys in a hashtable, so don't need to be sequential.
Thanks, Sean. Quite possibly I was misinterpreting something; I've not
managed to track down the source of my belief and am hap
THere is a very lightweight LSH implementation in
https://github.com/tdunning/knn that will be what I am bringing into Mahout
as part of 0.8. It is specifically design to approximate dot products to
accelerate search of this sort. You should be able to decrease the number
of actual dot products b
I don't recall that it has ever caused a problem, no. The values are
just keys in a hashtable, so don't need to be sequential.
On Fri, Jul 6, 2012 at 8:26 PM, Dan Brickley wrote:
> I recall having problems with this before, using the non-Mahout Taste
> code. I have meaningful strings for content
I recall having problems with this before, using the non-Mahout Taste
code. I have meaningful strings for content IDs and had mapped them
systematically to pseudo-meaningful (but non-sequential) numbers. I
remember that causing some problems a year or so back, ... but I'm
trying it again now with t
(Changed subject from unrelated thread)
You measure precision / recall, or the related F1 measure, or
normalized discounted cumulative gain, or ROC. They are different,
standard metrics that are less complicated than the sound.
On Fri, Jul 6, 2012 at 6:13 PM, Razon, Oren wrote:
> Thanks, it help
One more thought.
Cosine similarity kinds of measure the ratio of different feature
preference. In recommendation job, I think ratio of feature preference is
more relevant than the score itself ( kind reducing bias impact, some
people rank score higher,..)
Sam
On Fri, Jul 6, 2012 at 9:01 AM, sam
LSH has many different flavors (based on the different similarity metric).
Normally Minhash, which is good for if you have boolean (yes-no, 0-1)
features, and in the case of k-shingle, it fits well.
In the latent topcis model, like ALS, the feature is no longer 0-1. I think
Random Hyperplane (cosin
Hi Vignesh,
Hive is not database, It is a query language on Hadoop. Hive internally
converts queries into mapreduce jobs and executes.
Mahout is implementation of ml algorithms using mapreduce. Both uses HDFS
for storage.
What exactly you want to do?
Thanks,
B Anil Kumar.
On Fri, Jul 6, 2012
Thanks, it helped!
After having some thoughts about what the outcome prediction, I'm having a
question about measuring the quality of my model.
If I'm using a technique in which in the end I'm predicting a preference value
(implicit \ explicit) I could easily measure my model by applying it on a
Hi Dmitriy,
Thank you for the answer.
I will be happy to read such paper
-Original Message-
From: Dmitriy Lyubimov [mailto:dlie...@gmail.com]
Sent: Thursday, July 05, 2012 19:18
To: user@mahout.apache.org
Subject: RE: A bunch of SVD questions...
Cold start problem is usually best attacke
Maybe locality-sensitive hashing can help to get candidates before
calculating the exact distance?
Bye,
Jens
On 07/06/2012 11:35 AM, Sean Owen wrote:
Here's one I've been puzzling over for a bit. In a factorization based
on the SVD or what have you, you reconstruct the approximate original
mat
Can someone please ask me to following questions:
1)What the input of mahout (a xml file? Which is the output of solr, that what
it interests me!)?
2)What the output of mahout, I mean after clusterisation with k-means for
exemple (a xml file again? )?
3)Where the output is store?
4)Can somebod
Here's one I've been puzzling over for a bit. In a factorization based
on the SVD or what have you, you reconstruct the approximate original
matrix (well, one row) by multiplying the matrices back together and
looking for the largest elements. This is essentially multiplying a
user feature vector b
thank you
it's very helpful
Best Regards
Alexander Aristov
On 5 July 2012 20:12, Andy Schlaikjer wrote:
> Hi Lance,
>
> Elephant Bird includes support for SequenceFile i/o from Pig:
>
>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/sto
25 matches
Mail list logo