Correct me if wrong but This is done for distributed processing on large data sets and using map reduce principle and a common file type to do distributed processing.
Sent from my iPhone > On 9 Oct 2014, at 20:56, Reinis Vicups <mah...@orbit-x.de> wrote: > > Hello, > > I am currently looking into the new (DRM) mahout framework. > > I find myself wondering why is it so that from one side there is a lot > of thought, effort and design complexity being invested into abstracting > engines, contexts or algebraic operations, > > but from the other side, even abstract interfaces, are defined in a way > that everything has to be read or written from files (on HDFS). > > I am considering to implement reading/writing to NoSQL database and > initially I assumed it will be enough just to implement own > ReaderWriter, but I am currently realizing that I will have to > re-implement or hack-around by derivating own versions of large(?) > portions of framework including own variant of CheckpointedDrm, > DistributedEngine and what not. > > Is it because abstracting away storage type would introduce even more > complexity or because there are aspects of design that absolutely > require to read/write only to (seq)files? > > kind regards > reinis >