Re: Loading data from files - Samsara

Pat Ferrel Tue, 04 Apr 2017 09:21:06 -0700

Mahout-Samsara has a couple CLI drivers but these are mostly for examples. They 
read from csv files but may not do what you want.


Mahout can also run in a Spark Shell or as a library to your app, which gives 
you all the data loading functions of Spark or Scala. For instance I use 
SimilarityAnalysis.cooccurrence, which takes the Mahout data type 
IndexedDataset. This has a conversion helper that takes the Spark RDD[String, 
String]. Spark can read in an RDD[String, String] in many ways. 

In short you have all the ways of Java, HDFS, and Spark to draw from, these are 
not implemented in Mahout so all you need to do is convert this data into 
something Mahout works with like a DRM (DistributedRowMatrix) or IndexedDataset 
(which contains and wraps a DRM) depending on what you want to do with it.


On Apr 4, 2017, at 2:57 AM, Nantia Makrynioti <nantiam...@gmail.com> wrote:

Hello,

is there a way to load data from a file, e.g. csv file, to an in-core
vector or matrix?

Thanks a lot,
Nantia

Re: Loading data from files - Samsara

Reply via email to