[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19008 Can you also update the various json functions in which we document the options? The way it is right now there is no way for end-users to discover this option. --- If your project is set up

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123916 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123358 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx

spark git commit: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7880909c4 -> a2db5c576 [MINOR][TYPO] Fix typos: runnning and Excecutors ## What changes were proposed in this pull request? Fix typos ## How was this patch tested? Existing tests Author: Andrew Ash Closes #18996

[GitHub] spark issue #18996: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18996 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18988: [SPARK-21778][SQL] Simpler Dataset.sample API in Scala /...

2017-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18988 Thanks. Do you want to add the Python and R ones? It is a little bit tricky because in Python we would need to detect whether withReplacement is a boolean or a floating point value

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-08-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r133887920 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark pull request #18988: [SPARK-21778][SQL] Simpler Dataset.sample API in ...

2017-08-17 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18988 [SPARK-21778][SQL] Simpler Dataset.sample API in Scala / Java ## What changes were proposed in this pull request? Dataset.sample requires a boolean flag withReplacement as the first argument

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-08-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r133360047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -37,6 +37,12 @@ import org.apache.spark.sql.types

[GitHub] spark pull request #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18640#discussion_r133131618 --- Diff: sql/core/pom.xml --- @@ -87,6 +87,16 @@ + org.apache.orc + orc-core + ${orc.classifier

[GitHub] spark issue #18923: [SPARK-21710][StSt] Fix OOM on ConsoleSink with large in...

2017-08-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18923 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

2017-08-10 Thread rxin
log. getTableOption. ## How was this patch tested? Removed the test case. Author: Reynold Xin <r...@databricks.com> Closes #18912 from rxin/remove-getTableOption. (cherry picked from commit 584c7f14370cdfafdc6cd554b2760b7ce7709368) Signed-off-by: Reynold Xin <r...@databricks.com> Proj

spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog

2017-08-10 Thread rxin
log. getTableOption. ## How was this patch tested? Removed the test case. Author: Reynold Xin <r...@databricks.com> Closes #18912 from rxin/remove-getTableOption. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/584c7f14 Tree: h

[GitHub] spark issue #18912: [SPARK-21699][SQL] Remove unused getTableOption in Exter...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18912 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18912: [SQL] Remove unused getTableOption in ExternalCat...

2017-08-10 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18912 [SQL] Remove unused getTableOption in ExternalCatalog ## What changes were proposed in this pull request? This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18900 We should put this in the catalog, shouldn't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

spark git commit: [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs

2017-08-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 84454d7d3 -> 95ad960ca [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs ## What changes were proposed in this pull request? This patch introduces an internal interface for tracking metrics and/or

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 this looks good to me, but I didn't review super carefully. cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

spark git commit: [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator

2017-08-09 Thread rxin
lly configurable). This fixes timeout issues in pyspark when using `collect` and similar functions, in cases where Python may take more than a couple seconds to connect. See https://issues.apache.org/jira/browse/SPARK-21551 ## How was this patch tested? Ran the tests. cc rxin Author: peay

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-08-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18752 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18886: [SPARK-21671][core] Move kvstore to "util" sub-package, ...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18886 thx lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18786: [SPARK-21584][SQL][SparkR] Update R method for summary t...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18786 I suspect it is ok for R ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132022172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -128,6 +128,7 @@ class FileStreamSink

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132006851 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed

[GitHub] spark pull request #18884: [SPARK-21669] Internal API for collecting metrics...

2017-08-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18884#discussion_r132006674 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed

[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18607 We can just put this code in a 3rd party library, can't we? If there is an issue with service/code discovery, we can come up with some sort of registration process similar to the data source API

[GitHub] spark issue #18884: [SPARK-21669] Internal API for collecting metrics/stats ...

2017-08-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18884 Jenkins, add to white list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-08-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18607 Unfortunately I think drill is not popular enough to warrant inclusion in here yet. If this is not extensible, we should make it possible to include such mappings outside Spark and then perhaps Drill

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18851 Looks like the strip global limit is used by at least some test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Ok makes sense. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18851 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...

2017-08-04 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18851 [SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly ## What changes were proposed in this pull request? The definition of `maxRows` in `LocalLimit` operator was simply wrong. This patch

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Actually why do we need this? Can't you just add Error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 I just checked the dependency size. They look pretty reasonable, roughly 2 MBs in total (although I do worry in the future whether ORC would bring in a lot more jars). cc @omalley any

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why don't we then create a separate orc module? Just copy a few of the files over? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 To the best of my knowledge almost everybody runs with Hive anyway and the vast majority of users that run ORC are Hive users. In hindsight we probably should have put most of the data source

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @srowen that's not what I said. Almost always an explicit LGTM is preferred. There are tiny changes that might not require them, and it is up to the judgement of the committer. But those are more

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why are we adding this to core? Why not just the hive module? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18844 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18839 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18839 Some test on string form of the plan might fail. Let's see ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18839: [SPARK-21634][SQL] Change OneRowRelation from a c...

2017-08-03 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18839 [SPARK-21634][SQL] Change OneRowRelation from a case object to case class ## What changes were proposed in this pull request? OneRowRelation is the only plan that is a case object, which causes

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @HyukjinKwon you weren't a committer before :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 @srowen search for "RTC vs CTR (was: Concerning Sentry...)" From Todd Lipcon: ``` I don't have incubator stats... nor do I have a good way to measure "most a

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Actually Sean I disagree. Spark has always been review then commit from the days before it entered ASF. In a huge debate last year within the ASF on RTC vs CTR, Spark was cited as a prominent example

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Ah OK. That's what we are discussing here. In the past it has always been an explicit "LGTM". That was defined before github had even the approval feature. Now most committers are actual

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18828 Still looking into it, but the failure is related to reuse exchange and caching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 What's your point? You should be able to merge PR without anybody reviewing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 It is documented: http://spark.apache.org/contributing.html It's been the convention forever and it's also good to use one way rather than multiple, so I'd prefer us just using that ... until

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 I think @srowen did it here using the new github approval: srowen approved these changes 20 hours ago @srowen might be better if we stick with the LGTM one. --- If your project is set up

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 seems fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18828 cc @adrian-ionescu @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18828: [SPARK-21619][SQL] Fail the execution of canonica...

2017-08-02 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18828 [SPARK-21619][SQL] Fail the execution of canonicalized plans explicitly ## What changes were proposed in this pull request? Canonicalized plans are not supposed to be executed. I ran into a case

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 @sitalkedia anyway you can talk to the FB team that does that one and relicense, similar to RocksDB? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 Our compression codec is actually completely decoupled from Hadoops, but dependency management (and licensing) can be annoying to deal with. --- If your project is set up for it, you can reply

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 How big is the dependency that's getting pulled in? If we are adding more compression codecs maybe we should retire some old ones, or move them into a separate package so downstream apps can

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18805 Any benchmark data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 OK great. I think we should avoid breaking developer APIs, unless it has a huge upside. It wouldn't be fun to break it just for some cosmetic things ... --- If your project is set up for it, you can

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 What is the compatibility concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18780: [INTRA] Close stale PRs

2017-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18780 If you are asking for their opinions it'd be easier if you ask more explicitly (A vs B) in one comment, rather than asking them to go through and read the entire thread ... --- If your project

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18752 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

spark git commit: [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions

2017-07-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master cf29828d7 -> 60472dbfd [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions ## What changes were proposed in this pull request? This generates a documentation for Spark SQL built-in functions. One drawback

[GitHub] spark issue #18702: [SPARK-21485][SQL][DOCS] Spark SQL documentation generat...

2017-07-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18702 LGTM too. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18697 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 When users upgrade from 2.11 to 2.12, their app would be broken, wouldn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 @srowen I don't agree that we should just break source compatibility here. We have already spent a lot of time doing this in the past and figuring out how to preserve it. --- If your project is set

[GitHub] spark issue #18715: [minor] Remove **** in test case names in FlatMapGroupsW...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18715 Wait let's ask why @tdas did it this way... On Sun, Jul 23, 2017 at 10:45 AM asfgit <notificati...@github.com> wrote: > Closed #18715 <https://github.com/apache/spark/pul

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 It is still source breaking change, and this is why I was saying it would be a lot of work to upgrade to Scala 2.12 without breaking existing source code. For 2.12 we should get rid of the functions

[GitHub] spark issue #18715: [minor] Remove **** in test case names in FlatMapGroupsW...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18715 cc @tdas Was there a reason to use ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18715: [minor] Remove **** in test case names in FlatMap...

2017-07-22 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18715 [minor] Remove in test case names in FlatMapGroupsWithStateSuite ## What changes were proposed in this pull request? This patch removes the `` string from test names

[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18709 "Create Version" isn't a good user facing description. It'd make more sense to just say "Created by Spark xxx" --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18714: [SPARK-20236][SQL] hive style partition overwrite

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18714#discussion_r128908118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -881,6 +881,16 @@ object SQLConf { .intConf

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 @srowen You just showed that the Scala 2.12 changes are source breaking, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890891 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -353,7 +353,7 @@ class DatasetSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890868 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -54,7 +54,10 @@ class TaskContextSuite extends SparkFunSuite

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18468 Uncompress a small block at a time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18468 Hey sorry for commenting late, but I don't think this change really makes sense. If anything, I'd decompress data in batch into uncompressed column batch, rather than building an adapter

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

2017-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18680 Have you guys checked the performance of this change? It changes the number of concrete implementations for column vector from 2 to 3 (and potentially 1 to 2 at runtime). This might (or might

[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...

2017-07-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18487 hm is this a bug fix? if not we shouldn't cherry pick it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18306: [SPARK-21029][SS] All StreamingQuery should be stopped w...

2017-07-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18306 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128162324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159780 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -58,6 +55,13 @@ case class UserDefinedFunction protected

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Are you working on 2.12? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Do the removal (i.e. this PR). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Maybe do it a bit later, when the backport rate drops? E.g. it's unlikely we still do a lot of backports when 2.3 is cut. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18606: [SPARK-21382] The note about Scala 2.10 in building-spar...

2017-07-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18606 It's already merged. https://github.com/apache/spark/commit/24367f23f77349a864da340573e39ab2168c5403 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18606: [SPARK-21382] The note about Scala 2.10 in building-spar...

2017-07-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18606 That's true. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

spark git commit: [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong.

2017-07-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2cbfc975b -> 24367f23f [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong. [https://issues.apache.org/jira/browse/SPARK-21382](https://issues.apache.org/jira/browse/SPARK-21382) There should be "Note that support for

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17633 @mallman we don't backport such risky changes to maintenance branches. Those branches typically go through much less testing. --- If your project is set up for it, you can reply to this email

spark git commit: [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark

2017-07-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master d03aebbe6 -> c3713fde8 [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark ## What changes were proposed in this pull request? At example of repartitionAndSortWithinPartitions at rdd.py, third argument

[GitHub] spark issue #18586: [SPARK-21358][Examples] Argument of repartitionandsortwi...

2017-07-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18586 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18559 It'd be important to document what syntaxes are no longer allowed in the JIRA ticket (and PR description), and we also highlight that in release notes. --- If your project is set up for it, you can

[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql

[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,24 @@ object TypeCoercion

<    1   2   3   4   5   6   7   8   9   10   >