Hi Jason, Sorry, I didn't see this message before I replied in another thread. So the following is copy-and-paste:
We are currently working on the sparse data support, one of the highest priority features for MLlib. All existing algorithms will support sparse input. We will open a JIRA ticket for progress tracking and discussions. Best, Xiangrui On Fri, Jan 31, 2014 at 10:49 AM, jshao <jasonsh...@gmail.com> wrote: > Hi, > > Spark is absolutely amazing for machine learning as its iterative process is > super fast. However one big issue that I realized was that the MLLib API > isn't suitable for sparse inputs at all because it requires the feature > vector to be a dense array. > > For example, I currently want to run a logistic regression on data that is > wide and sparse (each data point might have 3 million fields with most of > them being 0). It is impossible to represent each data point as an array of > length 3 million. > > Can I expect/contribute to any changes that might handle sparse inputs? > > Thanks, > Jason > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.