Re: SPARK-22267 issue: Spark SQL incorrectly reads ORC file when column order is different

2017-11-15 Thread Mark Petruska
Hi Dongjoon, Thanks for the info. Unfortunately I did not find any means to fix the issue without forcing CONVERT_METASTORE_ORC or changing the ORC reader implementation. Closing the PR, as it was only used to demonstrate the root cause. Best regards, Mark On Tue, Nov 14, 2017 at 6:58 PM, Dongjo

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Xiao Li
Hi, Felix, https://issues.apache.org/jira/browse/SPARK-22469 Maybe also include this regression of 2.2? It works in 2.1 Thanks, Xiao 2017-11-14 22:25 GMT-08:00 Felix Cheung : > Please vote on releasing the following candidate as Apache Spark version > 2.2.1. The vote is open until Monday No

[SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-15 Thread Jacek Laskowski
Hi, I've been playing with LocalTableScanExec and noticed that it defines numOutputRows metric, but I couldn't find it in the diagram in web UI's Details for Query in SQL tab. Why? scala> spark.version res1: String = 2.3.0-SNAPSHOT scala> val hello = udf { s: String => s"Hello $s" } hello: org.a

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-15 Thread Jorge Sánchez
Hi, after seeing that IDF needed refactoring to use ML vectors instead of MLLib ones, I have created a Jira ticket in https://issues.apache.org/jira/browse/SPARK-22531 and submitted a PR for it. If anyone can have a look and suggest any changes

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Xiao Li
Another issue https://issues.apache.org/jira/browse/SPARK-22479 is also critical for security. We should also merge it to 2.2.1? 2017-11-15 9:12 GMT-08:00 Xiao Li : > Hi, Felix, > > https://issues.apache.org/jira/browse/SPARK-22469 > > Maybe also include this regression of 2.2? It works in 2.1 >

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Felix Cheung
Thanks Xiao, please continue to merge them to branch-2.2 and tag with TargetVersion 2.2.2 They look to be fairly isolated, please continue to test this RC1 as much as possible and I think we should hold on rolling another RC till Sunday. On Wed, Nov 15, 2017 at 2:15 PM Xiao Li wrote: > Another

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Sean Owen
The signature is fine, with your new sig. Updated hashes look fine too. LICENSE is still fine to my knowledge. Is anyone else seeing this failure? - GenerateOrdering with ShortType *** RUN ABORTED *** java.lang.StackOverflowError: at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:3

Re: [discuss][PySpark] Can we drop support old Pandas (<0.19.2) or what version should we support?

2017-11-15 Thread Takuya UESHIN
Thanks for feedback. Hyukjin Kwon: > My only worry is, users who depends on lower pandas versions That's what I worried and one of the reasons I moved this discussion here. Li Jin: > how complicated it is to support pandas < 0.19.2 with old non-Arrow interops In my original PR (https://github.c

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-15 Thread Davis Varghese
Since we are on spark 2.2, I backported/fixed it. Here is the diff file comparing against https://github.com/apache/spark/blob/73fe1d8087cfc2d59ac5b9af48b4cf5f5b86f920/mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala 24c24 < import org.apache.spark.ml.param.{Param, ParamMap, P

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-15 Thread Davis Varghese
Since we are on spark 2.2, I backported/fixed it. Here is the diff file comparing against https://github.com/apache/spark/blob/73fe1d8087cfc2d59ac5b9af48b4cf5f5b86f920/mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala 24c24 < import org.apache.spark.ml.param.{Param, ParamMap, P