Thanks a bunch. That's very helpful.

On Friday, December 16, 2016, Sean Owen <so...@cloudera.com> wrote:

> That all looks correct.
>
> On Thu, Dec 15, 2016 at 11:54 PM Manish Tripathi <tr.man...@gmail.com
> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote:
>
>> ok. Thanks. So here is what I understood.
>>
>> Input data to Als.fit(implicitPrefs=True) is the actual strengths (count
>> data). So if I have a matrix of (user,item,views/purchases) I pass that as
>> the input and not the binarized one (preference). This signifies the
>> strength.
>>
>> 2) Since we also pass the alpha parameter to this Als.fit() method, Spark
>> internally creates the confidence matrix +1+alpha*input_data or some other
>> alpha factor.
>>
>> 3). The output which it gives is basically a factorization of 0/1 matrix
>> (binarized matrix from initial input data), hence the output also resembles
>> the preference matrix (0/1) suggesting the interaction. So typically it
>> should be between 0-1but if it is negative it means very less
>> preference/interaction
>>
>> *Does all the above sound correct?.*
>>
>> If yes, then one last question-
>>
>> 1). *For explicit dataset where we don't use implicitPref=True,* the
>> predicted ratings would be actual ratings like it can be 2.3,4.5 etc and
>> not the interaction measure. That is because in explicit we are not using
>> the confidence matrix and preference matrix concept and use the actual
>> rating data. So any output from Spark ALS for explicit data would be a
>> rating prediction.
>> ᐧ
>>
>> On Thu, Dec 15, 2016 at 3:46 PM, Sean Owen <so...@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote:
>>
>>> No, input are weights or strengths. The output is a factorization of the
>>> binarization of that to 0/1, not probs or a factorization of the input.
>>> This explains the range of the output.
>>>
>>>
>>> On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote:
>>>
>>>> when you say *implicit ALS *is* factoring the 0/1 matrix. , are you
>>>> saying for implicit feedback algorithm we need to pass the input data as
>>>> the preference matrix i.e a matrix of 0 and 1?. *
>>>>
>>>> Then how will they calculate the confidence matrix which is basically
>>>> =1+alpha*count matrix. If we don't pass the actual count of values (views
>>>> etc) then how does Spark calculates the confidence matrix?.
>>>>
>>>> I was of the understanding that input data for
>>>> als.fit(implicitPref=True) is the actual count matrix of the
>>>> views/purchases?. Am I going wrong here if yes, then how is Spark
>>>> calculating the confidence matrix if it doesn't have the actual count data.
>>>>
>>>> The original paper on which Spark algo is based needs the actual count
>>>> data to create a confidence matrix and also needs the 0/1 matrix since the
>>>> objective functions uses both the confidence matrix and 0/1 matrix to find
>>>> the user and item factors.
>>>> ᐧ
>>>>
>>>> On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com
>>>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote:
>>>>
>>>>> No, you can't interpret the output as probabilities at all. In
>>>>> particular they may be negative. It is not predicting rating but
>>>>> interaction. Negative means very strongly not predicted to interact. No,
>>>>> implicit ALS *is* factoring the 0/1 matrix.
>>>>>
>>>>> On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote:
>>>>>
>>>>>> Ok. So we can kind of interpret the output as probabilities even
>>>>>> though it is not modeling probabilities. This is to be able to use it for
>>>>>> binaryclassification evaluator.
>>>>>>
>>>>>> So the way I understand is and as per the algo, the predicted matrix
>>>>>> is basically a dot product of user factor and item factor matrix.
>>>>>>
>>>>>> but in what circumstances the ratings predicted can be negative. I
>>>>>> can understand if the individual user factor vector and item factor 
>>>>>> vector
>>>>>> is having negative factor terms, then it can be negative. But practically
>>>>>> does negative make any sense? AS per algorithm the dot product is the
>>>>>> predicted rating. So rating shouldnt be negative for it to make any 
>>>>>> sense.
>>>>>> Also rating just between 0-1 is normalised rating? Typically rating we
>>>>>> expect to be like any real value 2.3,4.5 etc.
>>>>>>
>>>>>> Also please note, for implicit feedback ALS, we don't feed 0/1
>>>>>> matrix. We feed the count matrix (discrete count values) and am assuming
>>>>>> spark internally converts it into a preference matrix (1/0) and a
>>>>>> confidence matrix =1+alpha*count_matrix
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ᐧ
>>>>>>
>>>>>> On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com
>>>>>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote:
>>>>>>
>>>>>>> No, ALS is not modeling probabilities. The outputs are
>>>>>>> reconstructions of a 0/1 matrix. Most values will be in [0,1], but, it's
>>>>>>> possible to get values outside that range.
>>>>>>>
>>>>>>> On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <
>>>>>>> tr.man...@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','tr.man...@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> ran the ALS model for implicit feedback thing. Then I used the
>>>>>>>> .transform method of the model to predict the ratings for the original
>>>>>>>> dataset. My dataset is of the form (user,item,rating)
>>>>>>>>
>>>>>>>> I see something like below:
>>>>>>>>
>>>>>>>> predictions.show(5,truncate=False)
>>>>>>>>
>>>>>>>>
>>>>>>>> Why is the last prediction value negative ?. Isn't the transform
>>>>>>>> method giving the prediction(probability) of seeing the rating as 1?. 
>>>>>>>> I had
>>>>>>>> counts data for rating (implicit feedback) and for validation dataset I
>>>>>>>> binarized the rating (1 if >0 else 0). My training data has rating 
>>>>>>>> positive
>>>>>>>> (it's basically the count of views to a video).
>>>>>>>>
>>>>>>>> I used following to train:
>>>>>>>>
>>>>>>>> * als = ALS(rank=x, maxIter=15, regParam=y,
>>>>>>>> implicitPrefs=True,alpha=40.0)*
>>>>>>>>
>>>>>>>> *                model=als.fit(self.train)*
>>>>>>>>
>>>>>>>> What does negative prediction mean here and is it ok to have that?
>>>>>>>> ᐧ
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Reply via email to