Re: MLLib Sparse Input

2014-02-05 Thread Xiangrui Meng
s is a big decision that will effect a lot of code. Not sure if > anybody else has some work in progress. I could potentially try to get some > performance numbers for comparison and point at some code I've used in the > past, though it's all very rough. > > thanks, > Imran

Re: MLLib Sparse Input

2014-02-05 Thread Xiangrui Meng
llib...if you have a pull > request or jira link could you please point to it ? Jblas does not implememt > sparse formats the last time I looked at it but colt had sparse formats > which could be reused... > > Thanks. > Deb > > On Jan 31, 2014 11:15 AM, "Xiangrui Meng&q

Re: MLLib Sparse Input

2014-01-31 Thread Xiangrui Meng
Hi Jason, Sorry, I didn't see this message before I replied in another thread. So the following is copy-and-paste: We are currently working on the sparse data support, one of the highest priority features for MLlib. All existing algorithms will support sparse input. We will open a JIRA ticket for

Re: computeStats() in MLUtils will cause Nan (not a number) error

2014-01-28 Thread Xiangrui Meng
It happens when there are empty columns. Adding a very small smoothing factor should help. Btw, I notice that the computation of variance there is not stable, which should use the stable method implemented in RDD[Double]. -Xiangrui On Tue, Jan 28, 2014 at 5:22 AM, yinxusen wrote: > Hi all, > > Th