Correct me if wrong but This is done for distributed processing on large data 
sets and using map reduce principle and a common file type to do distributed 
processing.

Sent from my iPhone

> On 9 Oct 2014, at 20:56, Reinis Vicups <mah...@orbit-x.de> wrote:
> 
> Hello,
> 
> I am currently looking into the new (DRM) mahout framework.
> 
> I find myself wondering why is it so that from one side there is a lot
> of thought, effort and design complexity being invested into abstracting
> engines, contexts or algebraic operations,
> 
> but from the other side, even abstract interfaces, are defined in a way
> that everything has to be read or written from files (on HDFS).
> 
> I am considering to implement reading/writing to NoSQL database and
> initially I assumed it will be enough just to implement own
> ReaderWriter, but I am currently realizing that I will have to
> re-implement or hack-around by derivating own versions of large(?)
> portions of framework including own variant of CheckpointedDrm,
> DistributedEngine and what not.
> 
> Is it because abstracting away storage type would introduce even more
> complexity or because there are aspects of design that absolutely
> require to read/write only to (seq)files?
> 
> kind regards
> reinis
> 

Reply via email to