We support sparse vectors in MLlib, which recognizes MLlib's sparse
vector and SciPy's csc_matrix with a single column. You can create RDD
of sparse vectors for your data and save/load them to/from parquet
format using dataframes. Sparse matrix supported will be added in 1.4.
-Xiangrui
On Mon, Apr
I'm trying to apply Spark to a NLP problem that I'm working around. I have near
4 million tweets text and I have converted them into word vectors. It's pretty
sparse because each message just has dozens of words but the vocabulary has
tens of thousand words.
These vectors should be loaded each t