Precision and Recall for User Based Recommender System using Binary Data

2015-12-09 Thread Shady Hanna
Hi everyone , I am trying to develop a user based recommender system using binary data. The data has the user ID and product id which the user has bought and the preference is always 1 since I don't have ratings in my dataset. If the user did not buy an item, it is not included in the datase

Uncentered Cosine Similarity with Binary Data for userBased Recommender System

2015-10-21 Thread Shady Hanna
Hi, I have a boolean/binary where a customer and product id are found when the customer actually bought the product and not found it the customer did not buy it. The dataset represented like this: Customer ID Product ID Preference (1: customer bought the product) 1

Re: "Binary" Data

2014-05-17 Thread Floris Devriendt
Ted Dunning, You're maybe right recommendations isn't the best thing to use in this situation. However, it's partly requested of me to test Recommender Systems in such scenarios. But I will take your comments into account and see what I can do. A last note on the dataset I'm using, at the moment

Re: "Binary" Data

2014-05-17 Thread Ted Dunning
Floris, Given the size of the data you have and the goals that you have, I am not convinced that recommendation is the right fit for your needs. I would recommend using multi-dimensional response analysis and then define distance between users in terms of the latent variables you get from that.

Re: "Binary" Data

2014-05-17 Thread Floris Devriendt
Hello Ted Dunning, First of all thank you for the response, I appreciate it. Am I right if I say you are suggesting a combination of recommendation systems and an an item-response analysis of the data? You're right when saying my data isn't huge, so R could work as a tool. I'm just a little bit c

Re: "Binary" Data

2014-05-16 Thread Ted Dunning
The easiest way to shoehorn this data into the binary framework for recommenders is to keep two matrices, one for success, one for failure. There is lots to do from there. Most analyses of this kind of data (so-called item-response data [1]), however, requires some kind of hidden variable analysi

"Binary" Data

2014-05-16 Thread Floris Devriendt
Hello everybody, I'm a new Mahout user and I was hoping to some people could point me in the right direction. My data consists of exercise results made by different users and I want to recommend different exercises to different users using the collaborative filtering techniques available in Mahou

Re: irregular kmeans clusters on binary data

2012-07-13 Thread Masoud Moshref Javadi
approximately similar # of boxes that can be seen through all windows in the cluster. You talked about scaling. By scaling do you mean I should not use binary data and for example use 0 to 1 values and try to maximize variance of data in each dimension? On 7/13/2012 12:13 PM, Ted Dunning

Re: irregular kmeans clusters on binary data

2012-07-13 Thread Ted Dunning
On Fri, Jul 13, 2012 at 12:09 PM, Masoud Moshref Javadi wrote: > First of all thank you for your response with pictures. > That's true. Some features are 1 in many points and some are not. That's > the nature of my problem. But I did not scale features. > Should I do scaling? may be using a dimens

Re: irregular kmeans clusters on binary data

2012-07-13 Thread Masoud Moshref Javadi
ing: https://dl.dropbox.com/u/36863361/plot4.png On Fri, Jul 13, 2012 at 10:34 AM, Masoud Moshref Javadi wrote: I am clustering binary data (feature vaues are 0 or 1) over 20k points with 200k columns. I use canopy to find initial clusters and then do kmeans using Manhattan distance in 10 iterati

Re: irregular kmeans clusters on binary data

2012-07-13 Thread Ted Dunning
://dl.dropbox.com/u/36863361/plot4.png On Fri, Jul 13, 2012 at 10:34 AM, Masoud Moshref Javadi wrote: > I am clustering binary data (feature vaues are 0 or 1) over 20k points > with 200k columns. I use canopy to find initial clusters and then do kmeans > using Manhattan distance in 10 i

irregular kmeans clusters on binary data

2012-07-13 Thread Masoud Moshref Javadi
I am clustering binary data (feature vaues are 0 or 1) over 20k points with 200k columns. I use canopy to find initial clusters and then do kmeans using Manhattan distance in 10 iterations. After clustering I found that there are many clusters with just one point and a few very large clusters