On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
> While I agree with D and T, I’ll add a few things to watch out for. > > One of the hardest things to learn is the new model of execution, it’s not > quite Spark or any other compute engine. You need to create contexts that > have virtualized the actual compute engine. But you will probably need to > use the actual compute engine too. Switching back and forth is fairly > simple but must be learned and could be documented better. Mahout indeed abstracts native engine's context by wrapping it into DistributedMahoutContext. This is done largely to enable algebraic expressions to be completely backend-agnostic. Obtaining native engine context is easy although at that point it would create native engine dependencies and the code is not backend agnostic anymore. E.g. the code to unwrap spark context from mahout context (dc) is val sparkContext = dc.asInstanceOf[SparkDistributedMahoutContext].sc, i.e., we simply need to cast abstract context to a concrete expected engine's implementation of one, at which point backend-specific structures such as SparkContext are readily available.