Hi everyone,

I am doing some standardization using standardScaler on data from
VectorAssembler which is represented as sparse vectors. I plan to fit a
regularized model.  However, standardScaler does not allow the mean to be
subtracted from sparse vectors. It will only divide by the standard
deviation, which I understand is to keep the vector sparse. Thus I am
trying to convert my sparse vectors into dense vectors, but this may not be
worthwhile.

So my questions are:
Is subtracting the mean during standardization only important when working
with dense vectors? Does it not matter for sparse vectors? Is just dividing
by the standard deviation with sparse vectors equivalent to also dividing
by standard deviation w and subtracting mean with dense vectors?

Thank you,
Tobi

Reply via email to