Hi,

Spark is absolutely amazing for machine learning as its iterative process is
super fast. However one big issue that I realized was that the MLLib API
isn't suitable for sparse inputs at all because it requires the feature
vector to be a dense array.

For example, I currently want to run a logistic regression on data that is
wide and sparse (each data point might have 3 million fields with most of
them being 0). It is impossible to represent each data point as an array of
length 3 million.

Can I expect/contribute to any changes that might handle sparse inputs?

Thanks,
Jason



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to