There are few things going on with DRM. First, Hadoop/MapReduce DRM in Mahout is pretty much constrained to its persistent format on hdfs (row-wise row key/vector pairs).
When we moved to Scala, this notion received further expansion as one of the types under governance of R-like DSL and algebraic optimizer of such algebraic expressions. E.g. distributed ridge regression solution under such DSL for dataset represented by tall and skinny matrix X would look something like this: val drmX = drmFromHdfs("X") val y = .. (y observation vector) val w = solve (drmX.t %*% drmX, drmX.t %*% y) Finally, algebraic optimizer optimizes execution plan for a particular engine, one of them being Spark's RDDs. Mahout RDDs in their checkpoint format (e.g. fully-formed intermediate RDD result) have dual representation -- either row-wise (tuples of key, row vectors) or block-wise (array of keys -> matrix vertical/horizontal block). Finally, assuming back engine is Spark's RDDs, it is possible to wrap certain RDD types into DRM type, and vice versa, get access to checkpoint rdd (e.g. drmX.rdd automatically creates checkpoint and exports matrix data as an RDD). for further details, i would hope the Mahout/Spark page would make it a bit more clear. there's also a talk and slides from last mahout meetup discussing main ideas here. -d On Sun, Sep 21, 2014 at 3:34 AM, kalmohsen <kalmoh...@ahlia.edu.bh> wrote: > I am continuously reading about Mahout, Hadoop, Spark and Scala; willing > to be able to add value to them. However, I am confused with 2 things: > Spark RDD and Mahout DRM. > I do know that spark’s RDD is used while working with Mahout. However, I > came across some Scala code which is using Mahout DRM or wrapping RDD to > DRM. > > Thus, could anyone clarify the difference between them? > > Thanks in advance > Regards > > >