Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread StanZhai
This problem was introduced by which is designed to improve performance of PARSE_URL(). The same issue exists in the following SQL: ```SQL SELECT PARSE_URL('http://stanzhai.site?p=["abc;]', 'QUERY', 'p') // return null in Spark 2.1+ // return

Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread StanZhai
This problem was introduced by which is designed to improve performance of PARSE_URL().The same issue exists in the following SQL:```SQLSELECT PARSE_URL('http://stanzhai.site?p=["abc;]', 'QUERY', 'p')// return null in Spark 2.1+// return ["abc"]

Re: Schema Evolution in Apache Spark

2018-01-11 Thread Georg Heiler
Isn't this related to the data format used, i.e. parquet, Avro, ... which already support changing schema? Dongjoon Hyun schrieb am Fr., 12. Jan. 2018 um 02:30 Uhr: > Hi, All. > > A data schema can evolve in several ways and Apache Spark 2.3 already > supports the

Accessing the SQL parser

2018-01-11 Thread Abdeali Kothari
I was writing some code to try to auto find a list of tables and databases being used in a SparkSQL query. Mainly I was looking to auto-check the permissions and owners of all the tables a query will be trying to access. I was wondering whether PySpark has some method for me to directly use the

[SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread yash datta
Hi devs, Stumbled across an interesting problem with the parse_url function that has been implemented in spark in https://issues.apache.org/jira/browse/SPARK-16281 When using internationalized Domains in the urls like: val url = "http://правительство.рф "

Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-11 Thread Dongjoon Hyun
Hi, All and Shane. Can we increase the build time for `branch-2.3` during 2.3 RC period? There are two known test issues, but the Jenkins on branch-2.3 with hadoop-2.7 fails with build timeout. So, it's difficult to monitor whether the branch is healthy or not. Build timed out (after 255

Schema Evolution in Apache Spark

2018-01-11 Thread Dongjoon Hyun
Hi, All. A data schema can evolve in several ways and Apache Spark 2.3 already supports the followings for file-based data sources like CSV/JSON/ORC/Parquet. 1. Add a column 2. Remove a column 3. Change a column position 4. Change a column type Can we guarantee users some schema evolution

Re: Branch 2.3 is cut

2018-01-11 Thread Sameer Agarwal
All major blockers have now been resolved with the exception of a couple of known test issues (SPARK-23020 and SPARK-23000 ) that are being actively worked on. Unless there is an objection, I'll

Structured Streaming with S3 file source duplicates data because of eventual consistency

2018-01-11 Thread Yash Sharma
Hi Team, I have been using Structured Streaming with the S3 data source but I am seeing it duplicate the data intermittently. New run seem to fix it, but the duplication happens ~10% of time. The ratio increases with more number of files in the source. Investigating more, I see this is clearly an

Re: Palantir replease under org.apache.spark?

2018-01-11 Thread Prajwal Tuladhar
If you check https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22, it's only listing "official" ones. On Thu, Jan 11, 2018 at 7:59 PM, Steve Loughran wrote: > > > On 9 Jan 2018, at 18:10, Sean Owen wrote: > > Just to follow up --

Re: Palantir replease under org.apache.spark?

2018-01-11 Thread Steve Loughran
On 9 Jan 2018, at 18:10, Sean Owen > wrote: Just to follow up -- those are actually in a Palantir repo, not Central. Deploying to Central would be uncourteous, but this approach is legitimate and how it has to work for vendors to release distros

Re: Publishing container images for Apache Spark

2018-01-11 Thread Craig Russell
Hi, I think your summary is spot on. I don't see further issues. Craig > On Jan 11, 2018, at 9:18 AM, Erik Erlandson wrote: > > Dear ASF Legal Affairs Committee, > > The Apache Spark development community has begun some discussions >

Publishing container images for Apache Spark

2018-01-11 Thread Erik Erlandson
Dear ASF Legal Affairs Committee, The Apache Spark development community has begun some discussions about publishing container images for Spark as part of its

Re: Kubernetes: why use init containers?

2018-01-11 Thread Anirudh Ramanathan
If we can separate concerns those out, that might make sense in the short term IMO. There are several benefits to reusing spark-submit and spark-class as you pointed out previously, so, we should be looking to leverage those irrespective of how we do dependency management - in the interest of

Call for Presentations FOSS Backstage open

2018-01-11 Thread Isabel Drost-Fromm
Hi, As announced on Berlin Buzzwords we (that is Isabel Drost-Fromm, Stefan Rudnitzki as well as the eventing team over at newthinking communications GmbH) are working on a new conference in summer in Berlin. The name of this new conference will be "FOSS Backstage". Backstage comprises all things