These files follow the libsvm format where each line is a record, the first 
column is a label, and then after that the fields are offset:value where offset 
is the offset into the feature vector, and value is the value of the input 
feature. 

This is a fairly efficient representation for sparse but can double (or more) 
storage requirements for dense data. 

- Evan

> On Jun 22, 2014, at 3:35 PM, Justin Yip <yipjus...@gmail.com> wrote:
> 
> Hello,
> 
> I am looking into a couple of MLLib data files in 
> https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any 
> explanation for these files? Does anyone know if they are documented?
> 
> Thanks.
> 
> Justin

Reply via email to