Hi All,

I've submitted a proposal for GORA-386, Spark Backend Support.

You already know that Apache Gora open source framework provides an
in-memory data model and persistence for big data. Gora supports persisting
to column stores, key value stores, document stores and RDBMSs, and
analyzing the data with extensive Apache Hadoop MapReduce support.

On the other hand, Spark is an Apache project advertised as “lightning fast
cluster computing”. It has a thriving open-source community and is the most
active Apache project at the moment.

There is already an existing Map/Reduce support for Apache Gora. However
there is not a generic abstraction layer which allows using some other
replacements instead of that.

At my proposal, I aim to create an abstraction layer and support Spark as a
backend. My goal includes Gora Input Format to RDD Transformation, Generic
Abstraction Layer Backend and Data Storage via newly developed
GoraInputmap. Due to Gora will have an architectural change; I planned to
test its functionality with new architecture.

I also have some other plans if I can finish my proposal earlier. I want to
try to test the ability of mapping Hadoop style Map/Reduce stuff into Spark
style. There are some interesting articles about it, i.e.: [1]

Kind Regards,
Furkan KAMACI

[1]
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/

Reply via email to