Hi everyone.
I have a data set that looks like this:
Number of users: 198651
Number of items: 9972
Statistics of purchases from users
mean number of purchases
3.3
stdDev number of purchases
3.5
min number of purchases
1
max number of purchases
176
median number
Yes, I don't know if removing that data would improve results. It might
mean you can compute things faster, at little or no observable loss in
quality of the results.
I'm not sure, but you probably have repeat purchases of the same item, and
items of different value. Working in that data may help