Hi Sean,
I have been playing around with your patch. It looks good.
>From the little testing I did, I can also say that the recommendations seem
to be more accurate than in my initial proposal (#4).

I just have one suggestion though. I think the current parameters (int
defaultMaxPrefsPerItemConsidered, int userItemCountMultiplier) are not so
clear and don't give enough control over the sampling.
I would introduce two other parameters (it won't be backwards-compatible
though) -
- maxSourcePrefsConsidered: which will be used
in conjunction with SamplingLongPrimitiveIterator to do #1.
- maxFinalPrefs : which will set the value for 'int max' in your patch
(i.e. get rid of max = (int) Math.max(defaultMaxPrefsPerItemConsidered,
userItemCountMultiplier * Math.log(Math.max(dataModel.getNumUsers(),
dataModel.getNumItems()))); )

In the future it would be possible to add a strategy that will affect the
way maxSourcePrefsConsidered is sampled. For example, most recent items or
least recent items or random sampling (like we have now). Even though that
might not be the place to do so.. (since it's not in the context of the
user)

What do you think?


On Mon, Dec 5, 2011 at 3:04 PM, Sean Owen <[email protected]> wrote:

> I am posting a new patch to MAHOUT-910 in a second that shows efficient
> sampling of all three.
>
> On Mon, Dec 5, 2011 at 12:50 PM, Daniel Zohar <[email protected]> wrote:
>
> > No worries Manual.
> > I think we have almost done cracking the problem.
> > Lets wait for Sean's response.
> > Cheers
>

Reply via email to