Re: [discuss] new Java friendly InputSource API

2015-04-24 Thread Mingyu Kim
spark.apache.org>> Subject: Re: [discuss] new Java friendly InputSource API In the ctor of InputSource (I'm also considering adding an explicit initialize call), the implementation of InputSource can execute arbitrary code. The state in it will also be serialized and passed onto the execut

Re: [discuss] new Java friendly InputSource API

2015-04-23 Thread Reynold Xin
In the ctor of InputSource (I'm also considering adding an explicit initialize call), the implementation of InputSource can execute arbitrary code. The state in it will also be serialized and passed onto the executors. Yes - technically you can hijack getSplits in Hadoop InputFormat to do the same

Re: [discuss] new Java friendly InputSource API

2015-04-23 Thread Mingyu Kim
Hi Reynold, You mentioned that the new API allows arbitrary code to be run on the driver side, but it¹s not very clear to me how this is different from what Hadoop API provides. In your example of using broadcast, did you mean broadcasting something in InputSource.getPartitions() and having InputP

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Soren Macbeth
I'm also super interested in this. Flambo (our clojure DSL) wraps the java api and it would be great to have this. On Tue, Apr 21, 2015 at 4:10 PM, Reynold Xin wrote: > It can reuse. That's a good point and we should document it in the API > contract. > > > On Tue, Apr 21, 2015 at 4:06 PM, Punya

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Reynold Xin
It can reuse. That's a good point and we should document it in the API contract. On Tue, Apr 21, 2015 at 4:06 PM, Punyashloka Biswal wrote: > Reynold, thanks for this! At Palantir we're heavy users of the Java APIs > and appreciate being able to stop hacking around with fake ClassTags :) > > Re

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Punyashloka Biswal
Reynold, thanks for this! At Palantir we're heavy users of the Java APIs and appreciate being able to stop hacking around with fake ClassTags :) Regarding this specific proposal, is the contract of RecordReader#get intended to be that it returns a fresh object each time? Or is it allowed to mutate

[discuss] new Java friendly InputSource API

2015-04-21 Thread Reynold Xin
I created a pull request last night for a new InputSource API that is essentially a stripped down version of the RDD API for providing data into Spark. Would be great to hear the community's feedback. Spark currently has two de facto input source API: 1. RDD 2. Hadoop MapReduce InputFormat Neithe