[GitHub] spark pull request #22576: [SPARK-25560][SQL] Allow FunctionInjection in Spa...

2018-10-19 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/22576#discussion_r226622273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala --- @@ -168,4 +173,21 @@ class SparkSessionExtensions

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225992993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225805019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 Addressed Comments from @HyukjinKwon , I'm interested in @ueshin 's suggestions, but I can't figure out how to do that unless we bake it into the Extensions constructor. If we place

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225613517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225612270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225610975 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -1136,4 +1121,27 @@ object SparkSession extends Logging

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r225608605 --- Diff: python/pyspark/sql/session.py --- @@ -219,6 +219,7 @@ def __init__(self, sparkContext, jsparkSession=None

[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...

2018-10-16 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/22576 Cleaned up --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-09-29 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 I'm fine with anything really, I still think the ideal solution is probably not to tie the creation of the py4j gateway to the SparkContext, but that's probably a much bigger refactor

[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...

2018-09-28 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/22576 Ah I was registering functions with the built-in registry which is not reset. I've changed it to register only with a clone of the built-in registry. This would allow multiple extensions

[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...

2018-09-28 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/22576 Looks like the session with extensions from the test suite is leaking to other suites ... Investigating On Fri, Sep 28, 2018 at 11:25 AM UCB AMPLab wrote: > T

[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...

2018-09-27 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/22576 @hvanhovell Made a full PR for the change we discussed. Also updated the signature to match the new defined types for the registry and Identifier

[GitHub] spark pull request #22576: [SPARK-25560][SQL] Allow FunctionInjection in Spa...

2018-09-27 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/22576 [SPARK-25560][SQL] Allow FunctionInjection in SparkExtensions This allows an implementer of Spark Session Extensions to utilize a method "injectFunction" which will add a ne

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-09-19 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 Added new method of injecting extensions, this way the "getOrCreate" code from the scala method is not needed. @H

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-09-18 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 @HyukjinKwon so you want me to rewrite the code in python? I will note SparkR is doing this exact same thing

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-08-27 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 What I wanted was to just call the Scala Methods, instead of having half the code and half in python, but we create the JVM in the SparkContext creation code so this ends up not being a good

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-08-20 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21990 @HyukjinKwon So i've been staring at this for a while today, and I guess the big issue is that we always need to make a Python SparkContext to get a handle on the JavaGateway, so everything

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210941883 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210909196 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-17 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210909083 --- Diff: python/pyspark/sql/tests.py --- @@ -3563,6 +3563,51 @@ def test_query_execution_listener_on_collect_with_arrow(self

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-15 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/21990#discussion_r210428804 --- Diff: python/pyspark/sql/session.py --- @@ -218,7 +218,9 @@ def __init__(self, sparkContext, jsparkSession=None

[GitHub] spark pull request #21988: [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExt...

2018-08-07 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/21988 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21989: [SPARK-25003][PYSPARK][BRANCH-2.3] Use SessionExt...

2018-08-07 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/21989 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21988: [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExtensions...

2018-08-06 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21988 @felixcheung I just didn't know what version to target so I made a a PR for each one. We can just close the ones that shouldn't be merged

[GitHub] spark issue #21989: [SPARK-25003][PYSPARK][BRANCH-2.3] Use SessionExtensions...

2018-08-06 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21989 @kiszk sure, it all depends which branch the merge target should be I wasn't sure which one was being used for changes of this nature. Technically it's a bug fix I believe

[GitHub] spark issue #21988: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-08-03 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/21988 Local PEP didn't seem to mind this code ... Fixed up the indentation so hopefully jenkins will like it now

[GitHub] spark pull request #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-03 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/21990 [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark Master ## What changes were proposed in this pull request? Previously Pyspark used the private constructor

[GitHub] spark pull request #21989: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-03 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/21989 [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark (Branch-2.3) ##What changes were proposed in this pull request? Previously Pyspark used the private constructor

[GitHub] spark pull request #21988: [SPARK-25003][PYSPARK] Use SessionExtensions in P...

2018-08-03 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/21988 [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark ## What changes were proposed in this pull request? Previously Pyspark used the private constructor for SparkSession when

[GitHub] spark pull request #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/21453 Test branch to see how Scala 2.11.12 performs This may be useful when Java 8 is no longer supported since Scala 2.11.12 supports later versions of Java ## What changes were

[GitHub] spark pull request #20190: [SPARK-22976][Core]: Cluster mode driver director...

2018-01-17 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/20190 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20190: [SPARK-22976][Core]: Cluster mode driver directories can...

2018-01-17 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/20190 @jerryshao https://github.com/apache/spark/pull/20298 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20298: [SPARK-22976][Core]: Cluster mode driver dir remo...

2018-01-17 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/20298 [SPARK-22976][Core]: Cluster mode driver dir removed while running ## What changes were proposed in this pull request? The clean up logic on the worker perviously determined

[GitHub] spark issue #20201: [SPARK-22389][SQL] data source v2 partitioning reporting...

2018-01-09 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/20201 This looks very exciting to me --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #20190: [SPARK-22976][Core]: Cluster mode driver directories can...

2018-01-09 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/20190 @zsxwing I think you were the last to touch this code, could you please review? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20190: [SPARK-22976][Core]: Cluster mode driver director...

2018-01-08 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/20190 [SPARK-22976][Core]: Cluster mode driver directories can be removed w… …hile running ## What changes were proposed in this pull request? The clean up logic

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137531270 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-06 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r137333974 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] spark pull request #10655: [SPARK-12639][SQL] Improve Explain for Datasource...

2016-09-15 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/10655 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

2016-09-15 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/10655 We fixed this on a different pr https://github.com/apache/spark/pull/11317 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #11796: [SPARK-13579][build][test-maven] Stop building th...

2016-08-05 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/11796#discussion_r73781398 --- Diff: assembly/pom.xml --- @@ -69,6 +68,17 @@ spark-repl_${scala.binary.version} ${project.version

[GitHub] spark issue #11317: [SPARK-12639] [SQL] Mark Filters Fully Handled By Source...

2016-07-08 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/11317 Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-15 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/13652 I would love to be able to just specify days since epoch rather than using java.sql.Date --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13652: [SPARK-] Fix incorrect days to millis conversion

2016-06-14 Thread RussellSpitzer
Github user RussellSpitzer commented on the issue: https://github.com/apache/spark/pull/13652 I think this is unfortunately the right thing to do. I wish we didn't have to use java.sql.Date :( --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...

2016-05-09 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/11317#issuecomment-218052162 I don't think this is because of me ``` # A fatal error has been detected by the Java Runtime Environment: # # Internal Error

[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...

2016-05-09 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/11317#issuecomment-218030539 @HyukjinKwon + @yhuai Sorry it took so long! Things have been busy :) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-05-06 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-217483240 Sorry I forgot about this, I'll clean this up tomorrow and get it ready --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12639] [SQL] Mark Filters Fully Handled...

2016-02-22 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/11317 [SPARK-12639] [SQL] Mark Filters Fully Handled By Sources with * ## What changes were proposed in this pull request? In order to make it clear which filters are fully handled

[GitHub] spark pull request: SPARK-12639 SQL Mark Filters Fully Handled By ...

2016-02-22 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/10929 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request: SPARK-12639 SQL Mark Filters Fully Handled By ...

2016-01-26 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/10929 SPARK-12639 SQL Mark Filters Fully Handled By Sources with * In order to make it clear which filters are fully handled by the underlying datasource we will mark them

[GitHub] spark pull request: [SPARK-13021][CORE] Fail fast when custom RDDs...

2016-01-26 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10932#issuecomment-175294425 I'm +1 on this in 2.0 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-22 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-174113049 Haven't forgotten this will have a new pr soon :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-15 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-172145152 I personally think the ambiguous `PUSHED_FILTERS` is more confusing. When we see a predicate there we have no idea whether or not it is a valid filter

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-15 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-172136871 @yhuai I removed the PushedFilters and add the other examples. We could read-add the "PushedFilters" if you like. I wasn't sure if you still wanted

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-11 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/10655#discussion_r49342098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-08 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/10655#issuecomment-170064440 @rxin Added, basically I think the current "PushedFilters" list isn't very valuable if everything is listed there. So instead we should just list thos

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-07 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/10655#discussion_r49153042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-07 Thread RussellSpitzer
Github user RussellSpitzer commented on a diff in the pull request: https://github.com/apache/spark/pull/10655#discussion_r49152876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -321,8 +321,8 @@ private[sql] object

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-01-07 Thread RussellSpitzer
GitHub user RussellSpitzer opened a pull request: https://github.com/apache/spark/pull/10655 SPARK-12639 SQL Improve Explain for Datasources with Handled Predicates SPARK-11661 Makes all predicates pushed down to underlying Datasources regardless of whether the source can handle

[GitHub] spark pull request: SPARK-11415: Remove timezone shift of Catalyst...

2015-11-18 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request: https://github.com/apache/spark/pull/9369#issuecomment-157886376 np --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-11415: Remove timezone shift of Catalyst...

2015-11-18 Thread RussellSpitzer
Github user RussellSpitzer closed the pull request at: https://github.com/apache/spark/pull/9369 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so