Re: how can I write a language "wrapper"?

2015-06-29 Thread Justin Uang
My guess is that if you are just wrapping the spark sql APIs, you can get away with not having to reimplement a lot of the complexities in Pyspark like storing everything in RDDs as pickled byte arrays, pipelining RDDs, doing aggregations and joins in the python interpreters, etc. Since the canoni

Re: how can I write a language "wrapper"?

2015-06-29 Thread Daniel Darabos
Hi Vasili, It so happens that the entire SparkR code was merged to Apache Spark in a single pull request. So you can see at once all the required changes in https://github.com/apache/spark/pull/5096. It's 12,043 lines and took more than 20 people about a year to write as I understand it. On Mon, J

Re: how can I write a language "wrapper"?

2015-06-29 Thread Vasili I. Galchin
Shivaram, Vis-a-vis Haskell support, I am reading DataFrame.R, SparkRBackend*, context.R, et. al., am I headed in the correct direction?/ Yes or no, please give more guidance. Thank you. Kind regards, Vasili On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman wrote: > Every language h

Re: how can I write a language "wrapper"?

2015-06-24 Thread Shivaram Venkataraman
The SparkR code is in the `R` directory i.e. https://github.com/apache/spark/tree/master/R Shivaram On Wed, Jun 24, 2015 at 8:45 AM, Vasili I. Galchin wrote: > Matei, > > Last night I downloaded the Spark bundle. > In order to save me time, can you give me the name of the SparkR example >

Re: how can I write a language "wrapper"?

2015-06-24 Thread Vasili I. Galchin
Matei, Last night I downloaded the Spark bundle. In order to save me time, can you give me the name of the SparkR example is and where it is in the Sparc tree? Thanks, Bill On Tuesday, June 23, 2015, Matei Zaharia wrote: > Just FYI, it would be easiest to follow SparkR's example and add

Re: how can I write a language "wrapper"?

2015-06-23 Thread Matei Zaharia
Just FYI, it would be easiest to follow SparkR's example and add the DataFrame API first. Other APIs will be designed to work on DataFrames (most notably machine learning pipelines), and the surface of this API is much smaller than of the RDD API. This API will also give you great performance as

Re: how can I write a language "wrapper"?

2015-06-23 Thread Shivaram Venkataraman
Every language has its own quirks / features -- so I don't think there exists a document on how to go about doing this for a new language. The most related write up I know of is the wiki page on PySpark internals https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals written by Josh Ro

how can I write a language "wrapper"?

2015-06-23 Thread Vasili I. Galchin
Hello, I want to add language support for another language(other than Scala, Java et. al.). Where is documentation that explains to provide support for a new language? Thank you, Vasili