Hi, Spark is absolutely amazing for machine learning as its iterative process is super fast. However one big issue that I realized was that the MLLib API isn't suitable for sparse inputs at all because it requires the feature vector to be a dense array.
For example, I currently want to run a logistic regression on data that is wide and sparse (each data point might have 3 million fields with most of them being 0). It is impossible to represent each data point as an array of length 3 million. Can I expect/contribute to any changes that might handle sparse inputs? Thanks, Jason -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-Sparse-Input-tp1085.html Sent from the Apache Spark User List mailing list archive at Nabble.com.