Hi everyone ,
I am trying to develop a user based recommender system using binary data.
The data has the user ID and product id which the user has bought and the
preference is always 1 since I don't have ratings in my dataset. If the
user did not buy an item, it is not included in the datase
Hi,
I have a boolean/binary where a customer and product id are found when the
customer actually bought the product and not found it the customer did not
buy it. The dataset represented like this:
Customer ID Product ID Preference (1: customer bought
the product)
1
Ted Dunning,
You're maybe right recommendations isn't the best thing to use in this
situation. However, it's partly requested of me to test Recommender Systems
in such scenarios. But I will take your comments into account and see what
I can do.
A last note on the dataset I'm using, at the moment
Floris,
Given the size of the data you have and the goals that you have, I am not
convinced that recommendation is the right fit for your needs.
I would recommend using multi-dimensional response analysis and then define
distance between users in terms of the latent variables you get from that.
Hello Ted Dunning,
First of all thank you for the response, I appreciate it.
Am I right if I say you are suggesting a combination of recommendation
systems and an an item-response analysis of the data?
You're right when saying my data isn't huge, so R could work as a tool. I'm
just a little bit c
The easiest way to shoehorn this data into the binary framework for
recommenders is to keep two matrices, one for success, one for failure.
There is lots to do from there.
Most analyses of this kind of data (so-called item-response data [1]),
however, requires some kind of hidden variable analysi
Hello everybody,
I'm a new Mahout user and I was hoping to some people could point me in the
right direction.
My data consists of exercise results made by different users and I want to
recommend different exercises to different users using the collaborative
filtering techniques available in Mahou
approximately similar # of boxes that can be seen through
all windows in the cluster.
You talked about scaling. By scaling do you mean I should not use binary
data and for example use 0 to 1 values and try to maximize variance of
data in each dimension?
On 7/13/2012 12:13 PM, Ted Dunning
On Fri, Jul 13, 2012 at 12:09 PM, Masoud Moshref Javadi wrote:
> First of all thank you for your response with pictures.
> That's true. Some features are 1 in many points and some are not. That's
> the nature of my problem. But I did not scale features.
> Should I do scaling? may be using a dimens
ing: https://dl.dropbox.com/u/36863361/plot4.png
On Fri, Jul 13, 2012 at 10:34 AM, Masoud Moshref Javadi wrote:
I am clustering binary data (feature vaues are 0 or 1) over 20k points
with 200k columns. I use canopy to find initial clusters and then do kmeans
using Manhattan distance in 10 iterati
://dl.dropbox.com/u/36863361/plot4.png
On Fri, Jul 13, 2012 at 10:34 AM, Masoud Moshref Javadi wrote:
> I am clustering binary data (feature vaues are 0 or 1) over 20k points
> with 200k columns. I use canopy to find initial clusters and then do kmeans
> using Manhattan distance in 10 i
I am clustering binary data (feature vaues are 0 or 1) over 20k points
with 200k columns. I use canopy to find initial clusters and then do
kmeans using Manhattan distance in 10 iterations.
After clustering I found that there are many clusters with just one
point and a few very large clusters
12 matches
Mail list logo