Hi Furkan, In what context are we talking here? GSoC or Just development? I am very keen to essentially work towards what we can release as Gora 1.0 Thank you Furkan
On Saturday, March 21, 2015, Furkan KAMACI <furkankam...@gmail.com> wrote: > As you know that there is an issue for integration Apache Spark and Apache > Gora [1]. Apache Spark is a popular project and in contrast to Hadoop's > two-stage disk-based MapReduce paradigm, Spark's in-memory primitives > provide performance up to 100 times faster for certain applications [2]. > There are also some alternatives to Apache Spark, i.e. Apache Tez [3]. > > When implementing an integration for Spark, it should be considered to > have an abstraction for such kind of projects as an architectural design > and there is a related issue for it: [4]. > > There is another Apache project which aims to provide a framework named as > Apache Crunch [5] for writing, testing, and running MapReduce pipelines. > Its goal is to make pipelines that are composed of many user-defined > functions simple to write, easy to test, and efficient to run. It is an > high-level tool for writing data pipelines, as opposed to developing > against the MapReduce, Spark, Tez APIs or etc. directly [6]. > > I would like to learn how Apache Crunch fits with creating a multi > execution engine for Gora [4]? What kind of benefits we can get with > integrating Apache Gora and Apache Crunch and what kind of gaps we still > can have instead of developing a custom engine for our purpose? > > Kind Regards, > Furkan KAMACI > > [1] https://issues.apache.org/jira/browse/GORA-386 > [2] Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, > Scott; Stoica, Ion (June 2013). > [3] http://tez.apache.org/ > [4] https://issues.apache.org/jira/browse/GORA-418 > [5] https://crunch.apache.org/ > [6] https://crunch.apache.org/user-guide.html#motivation > -- *Lewis*