create a SchemaRDD from a custom datasource

2015-01-13 Thread Niranda Perera
Hi, We have a custom datasources API, which connects to various data sources and exposes them out as a common API. We are now trying to implement the Spark datasources API released in 1.2.0 to connect Spark for analytics. Looking at the sources API, we figured out that we should extend a scan cla

Re: create a SchemaRDD from a custom datasource

2015-01-13 Thread Reynold Xin
Depends on what the other side is doing. You can create your own RDD implementation by subclassing RDD, or it might work if you use sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data and return an iterator */ ) where n is the number of partitions. On Tue, Jan 13, 2015 at 12

Re: create a SchemaRDD from a custom datasource

2015-01-13 Thread Reynold Xin
If it is a small collection of them on the driver, you can just use sc.parallelize to create an RDD. On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka wrote: > Hi Reynold, > > Thanks for the response. I am just wondering, lets say we have set of Row > objects. Isn't there a straightforward way