OK,
yet another crazy idea of mine.
Generally, we we coerce to classical SVD form with singular values,
then Tikhonov regularization can be probably optimized
post-decomposition. Indeed, i can see no reason why we can't control
the smoothing at the prediction stage by hacking the predictor as in
Sort of kind of, but it is hard to extrapolate over a size range of >10 and
that is the scale difference we are talking about.
On Fri, Dec 16, 2011 at 11:44 AM, Dmitriy Lyubimov wrote:
> and there's no way to estimate a difference for a bigger sample?
>
> On Fri, Dec 16, 2011 at 11:37 AM, Ted
and there's no way to estimate a difference for a bigger sample?
On Fri, Dec 16, 2011 at 11:37 AM, Ted Dunning wrote:
> This doesn't work because the correct value for a sub-sampled batch will be
> smaller than for a full data set.
>
> On Fri, Dec 16, 2011 at 10:05 AM, Dmitriy Lyubimov wrote:
>
Not a bad idea at all. The objective function is probably very
asymmetrical when expressed with the value lambda. Transforming lambda
might help with that. The asymmetry shouldn't be all that big a deal if
you put a constrained 1-d optimizer on the problem.
On Fri, Dec 16, 2011 at 10:50 AM, Rap
This doesn't work because the correct value for a sub-sampled batch will be
smaller than for a full data set.
On Fri, Dec 16, 2011 at 10:05 AM, Dmitriy Lyubimov wrote:
> if it
> makes sense to find a better guess for lambda by just doing an R
> simulation on a randomly subsampled data before putt
i just suspect there must have been some research or study done in
terms of how accurate factorization problems are on a sumbsample.
Similar to standard errors and confidence intervals. e.g. i know how
many samples i need to fit observed mean into certain confidence
interval provided i know origina
the problem is convex but the idea is not to use a map reduce but a
subsample and solve it in memory on a reduced sample (i was actually
thinking of simple bisect rather than trying to fit to anything), but
that's not the point .
the point is how accurate the solution for a random subsample would
Hi Dmitry,
I have a feeling the objective may be very close to convex. In that case there
are faster approaches than random subsampling.
A common strategy for example is to fit a quadratic onto the previously
evaluated lambda values, and then solve it for the minimum.
This is an iterative appr
Hi,
I remember vaguely the discussion of finding the optimum for reg rate
in ALS-WR stuff.
Would it make sense to take a subsample (or, rather, a random
submatrix) of the original input and try to find optimum for it
somehow, similar to total order paritioner's distribution sampling?
I have put