PySpark: Make persist() return a context manager

2016-08-04 Thread Nicholas Chammas
Context managers are a natural way to capture closely related setup and teardown code in Python. For example, they are commonly used when doing file I/O: with open('/path/to/file') as f: contents = f.read() ... Once

Re: We don't use ASF Jenkins builds, right?

2016-08-04 Thread Reynold Xin
We don't. On Friday, August 5, 2016, Sean Owen wrote: > There was a recent message about deprecating many Maven, ant and JDK > combos for ASF Jenkins machines, and I was just triple-checking we're > only making use of the Amplab ones. > >

We don't use ASF Jenkins builds, right?

2016-08-04 Thread Sean Owen
There was a recent message about deprecating many Maven, ant and JDK combos for ASF Jenkins machines, and I was just triple-checking we're only making use of the Amplab ones. - To unsubscribe e-mail:

Inquery about Spark's behaviour for configurations in Hadoop configuration instance via read/write.options()

2016-08-04 Thread Hyukjin Kwon
Hi all, If my understanding is correct, now Spark supports to set some options to Hadoop configuration instance via read/write.option(..) API. However, I recently saw some comments and opinion about this. If I understood them correctly, it was as below: - Respecting all the

Re: Source API requires unbounded distributed storage?

2016-08-04 Thread Michael Armbrust
Yeah, this API is in the private execution package because we are planning to continue to iterate on it. Today, we will only ever go back one batch, though that might change in the future if we do async checkpointing of internal state. You are totally right that we should relay this info back to

Source API requires unbounded distributed storage?

2016-08-04 Thread Fred Reiss
Hi, I've been looking over the Source API in org.apache.spark.sql.execution.streaming, and I'm at a loss for how the current API can be implemented in a practical way. The API defines a single getBatch() method for fetching records from the source, with the following Scaladoc comments defining

Re: Spark SQL and Kryo registration

2016-08-04 Thread Amit Sela
It should. Codegen uses the SparkConf in SparkEnv when instantiating a new Serializer. On Thu, Aug 4, 2016 at 6:14 PM Jacek Laskowski wrote: > Hi Olivier, > > I don't know either, but am curious what you've tried already. > > Jacek > > On 3 Aug 2016 10:50 a.m., "Olivier

Re: Spark SQL and Kryo registration

2016-08-04 Thread Jacek Laskowski
Hi Olivier, I don't know either, but am curious what you've tried already. Jacek On 3 Aug 2016 10:50 a.m., "Olivier Girardot" < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > I'm currently to use Spark 2.0.0 and making Dataframes work with kryo. > registrationRequired=true > Is it