Re: Spark 2.4.2

2019-04-17 Thread Reynold Xin
For Jackson - are you worrying about JSON parsing for users or internal Spark functionality breaking? On Wed, Apr 17, 2019 at 6:02 PM Sean Owen wrote: > There's only one other item on my radar, which is considering updating > Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
There's only one other item on my radar, which is considering updating Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up a few times now that there are a number of CVEs open for 2.6.7. Cons: not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson behavior

Re: Spark 2.4.2

2019-04-17 Thread Wenchen Fan
I volunteer to be the release manager for 2.4.2, as I was also going to propose 2.4.2 because of the reverting of SPARK-25250. Is there any other ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the release process today (CST). Thanks, Wenchen On Thu, Apr 18, 2019 at 3:44

Re: Thoughts on dataframe cogroup?

2019-04-17 Thread Li Jin
I have left some comments. This looks a good proposal to me. As a heavy pyspark user, this is a pattern that we see over and over again and I think could be pretty high value to other pyspark users as well. The fact that Chris and I come to same ideas sort of verifies my intuition. Also, this

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply

Re: JDK vs JRE in Docker Images

2019-04-17 Thread Sean Owen
I confess I don't know, but I don't think scalac or janino need javac and related tools, and those are the only things that come to mind. If the tests pass without a JDK, that's good evidence. On Wed, Apr 17, 2019 at 8:49 AM Rob Vesse wrote: > > Folks > > > > For those using the Kubernetes

Re: JDK vs JRE in Docker Images

2019-04-17 Thread Stavros Kontopoulos
Hi Rob, We are using registry.redhat.io/redhat-openjdk-18/openjdk18-openshift ( https://docs.openshift.com/online/using_images/s2i_images/java.html) It looks most convenient as Red Hat leads the openjdk updates which is even more important from now on and also from a security point of view. There

Re: pyspark.sql.functions ide friendly

2019-04-17 Thread Reynold Xin
Are you talking about the ones that are defined in a dictionary? If yes, that was actually not that great in hindsight (makes it harder to read & change), so I'm OK changing it. E.g. _functions = {     'lit': _lit_doc,     'col': 'Returns a :class:`Column` based on the given column name.',  

JDK vs JRE in Docker Images

2019-04-17 Thread Rob Vesse
Folks For those using the Kubernetes support and building custom images are you using a JDK or a JRE in the container images? Using a JRE saves a reasonable chunk of image size (about 50MB with our preferred Linux distro) but I didn’t want to make this change if there was a reason to

Re: pyspark.sql.functions ide friendly

2019-04-17 Thread Sean Owen
I use IntelliJ and have never seen an issue parsing the pyspark functions... you're just saying the linter has an optional inspection to flag it? just disable that? I don't think we want to complicate the Spark code just for this. They are declared at runtime for a reason. On Wed, Apr 17, 2019 at

Re: pyspark.sql.functions ide friendly

2019-04-17 Thread educhana
Hi, I'm aware of various workarounds to make this work smoothly in various IDEs, but wouldn't better to solve the root cause? I've seen the code and don't see anything that requires such level of dynamic code, the translation is 99% trivial. On 2019/04/16 12:16:41, 880f0464