Hi Jason,

Sorry, I didn't see this message before I replied in another thread.
So the following is copy-and-paste:

We are currently working on the sparse data support, one of the
highest priority features for MLlib. All existing algorithms will
support sparse input. We will open a JIRA ticket for progress tracking
and discussions.

Best,
Xiangrui

On Fri, Jan 31, 2014 at 10:49 AM, jshao <jasonsh...@gmail.com> wrote:
> Hi,
>
> Spark is absolutely amazing for machine learning as its iterative process is
> super fast. However one big issue that I realized was that the MLLib API
> isn't suitable for sparse inputs at all because it requires the feature
> vector to be a dense array.
>
> For example, I currently want to run a logistic regression on data that is
> wide and sparse (each data point might have 3 million fields with most of
> them being 0). It is impossible to represent each data point as an array of
> length 3 million.
>
> Can I expect/contribute to any changes that might handle sparse inputs?
>
> Thanks,
> Jason
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to