As you know that there is an issue for integration Apache Spark and Apache
Gora [1]. Apache Spark is a popular project and in contrast to Hadoop's
two-stage disk-based MapReduce paradigm, Spark's in-memory primitives
provide performance up to 100 times faster for certain applications [2].
There are also some alternatives to Apache Spark, i.e. Apache Tez [3].

When implementing an integration for Spark, it should be considered to have
an abstraction for such kind of projects as an architectural design and
there is a related issue for it: [4].

There is another Apache project which aims to provide a framework named as
Apache Crunch [5] for writing, testing, and running MapReduce pipelines.
Its goal is to make pipelines that are composed of many user-defined
functions simple to write, easy to test, and efficient to run. It is an
high-level tool for writing data pipelines, as opposed to developing
against the MapReduce, Spark, Tez APIs or etc. directly [6].

I would like to learn how Apache Crunch fits with creating a multi
execution engine for Gora [4]? What kind of benefits we can get with
integrating Apache Gora and Apache Crunch and what kind of gaps we still
can have instead of developing a custom engine for our purpose?

Kind Regards,
Furkan KAMACI

[1] https://issues.apache.org/jira/browse/GORA-386
[2] Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker,
Scott; Stoica, Ion (June 2013).
[3] http://tez.apache.org/
[4] https://issues.apache.org/jira/browse/GORA-418
[5] https://crunch.apache.org/
[6] https://crunch.apache.org/user-guide.html#motivation

Reply via email to