Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Xiao Li
Yes, this is not a blocker. "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by default. As DB Tsai said, column pruning of nested schema for Parquet tables is experimental. In this release, we encourage the whole community to try this new feature but it might have bugs like w

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread DB Tsai
+0 I understand that schema pruning is an experimental feature in Spark 2.4, and this can help a lot in read performance as people are trying to keep the hierarchical data in nested format. We just found a serious bug---it could fail parquet reader if a nested field and top level field are select

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Hyukjin Kwon
+1 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang 님이 작성: > +1 > > > 在 2018年10月30日,上午10:41,Sean Owen 写道: > > > > +1 > > > > Same result as in RC4 from me, and the issues I know of that were > > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. > > > > These items are still targeted to 2.

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Gengliang Wang
+1 > 在 2018年10月30日,上午10:41,Sean Owen 写道: > > +1 > > Same result as in RC4 from me, and the issues I know of that were > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. > > These items are still targeted to 2.4.0; Xiangrui I assume these > should just be untargeted now, or resolv

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Sean Owen
+1 Same result as in RC4 from me, and the issues I know of that were raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. These items are still targeted to 2.4.0; Xiangrui I assume these should just be untargeted now, or resolved? SPARK-25584 Document libsvm data source in doc site SPAR

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Xiao Li
+1 On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.0. > > The vote is open until November 1 PST and passes if a majority +1 PMC > votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package as

Re: DataSourceV2 hangouts sync

2018-10-29 Thread Ryan Blue
Everyone, There are now 25 guests invited, which is a lot of people to actively participate in a sync like this. For those of you who probably won't actively participate, I've added a live stream. If you don't plan to talk, please use the live stream instead of the meet/hangout so that we don't e

[VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 2.4.0. The vote is open until November 1 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.0 [ ] -1 Do not release this package because ... To