[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-06-07 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r193664416 --- Diff: docs/submitting-applications.md --- @@ -218,6 +218,115 @@ These commands can be used with `pyspark`, `spark-shell`, and `spark-submit` to For

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-06-07 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 Thanks for the interest on this PR and the info about `Pipfiles`. I think we could support that after this PR get merged so that we can provide users more options for virtualenv based on their

[GitHub] spark issue #13493: [SPARK-15750][MLLib][PYSPARK] Constructing FPGrowth fail...

2018-05-01 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13493 Thanks @jkbradley The failed tests seems unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 That would be awesome. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 I am afraid I would not be present in Strata SJ, I live in Shanghai China, and may not be able to travel at time. --- - To

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-04 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 ping @holdenk @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-29 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164646157 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-26 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164069473 --- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala --- @@ -39,12 +39,17 @@ object PythonRunner { val pyFiles = args(1

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-26 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164068172 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164037516 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,41 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164037239 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -98,7 +98,7 @@ class

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164037055 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -299,20 +300,34 @@ // 4. environment variable

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164036488 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164034980 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-25 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164034871 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,41 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-23 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r163427975 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,42 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-22 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 @holdenk @HyukjinKwon @ueshin I have updated the PR, and now it also works when executor is restarted and even dynamic allocation is enabled. The only overhead is on the driver side when executor

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-09 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160572606 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,35 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160310618 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1039,33 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160308321 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -299,20 +301,39 @@ // 4. environment variable

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160285377 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-08 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 @holdenk @ueshin @HyukjinKwon Thanks for review the long pending PR. Will refine the PR soon. --- - To unsubscribe, e-mail

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160141363 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1039,33 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160140451 --- Diff: docs/submitting-applications.md --- @@ -218,6 +218,73 @@ These commands can be used with `pyspark`, `spark-shell`, and `spark-submit` to For

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160139161 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160138782 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160138391 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -60,6 +66,12 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160138349 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -29,7 +30,10 @@ import org.apache.spark._ import

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-07 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160070613 --- Diff: core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-07 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160070518 --- Diff: python/pyspark/context.py --- @@ -980,6 +996,33 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-07 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160070457 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -475,6 +475,19 @@ object SparkSubmit extends CommandLineUtils with Logging

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-07-05 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 Thanks @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-07-05 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 This PR fails fails PySpark pip packaging tests. But I don't know what's wrong here. @holdenk Is the `PySpark pip packaging test` an known issue ? --- If your project is set up for i

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-06-24 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r123876794 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -20,16 +20,19 @@ package

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-06-23 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r123871674 --- Diff: sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java --- @@ -31,7 +31,7 @@ import

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-06-23 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r123871670 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -20,16 +20,19 @@ package

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Support for deploying Anaconda an...

2017-06-16 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/14180 Here's my approach #13599 for virtualenv and conda support, welcome any comments and reviews https://docs.google.com/document/d/1EGNEf4vFmpGXSd2DPOLu_HL23Xhw9aWKeUrzzxsEbQs/edi

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-06-12 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @gatorsmile sorry for late response, will update it soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-12 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r116325723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-12 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r116293947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-12 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r116293890 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -20,16 +20,19 @@ package

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-05 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r114963449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-05 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r114962484 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-05-05 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r114948086 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-05-04 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @cloud-fan This is not about using python UDF, it is to allow pyspark to use java UDF (no python daemon will be launched). So actually it would improve the performance. --- If your project is set

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-05-03 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk @gatorsmile Any more comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk The link you pasted is for the case that using scala closure to create udf. While `registerJava` use java reflection to create udf. This is what I use in `registerJava` https://github.com

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-24 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r113085517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk But it has nothing to return, because scala side return Unit. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L528

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-20 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 ping @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-12 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r111279320 --- Diff: python/pyspark/ml/classification.py --- @@ -172,6 +172,47 @@ def intercept(self): """ return self._call_

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-11 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 Good catch ! @holdenk `return` is removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r111042227 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-11 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17586#discussion_r111042049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -355,6 +368,19 @@ object LinearSVCModel extends MLReadable

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-04-11 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16906 Kindly ping @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17586: [SPARK-20249][ML][PYSPARK] Add summary for LinearSVCMode...

2017-04-10 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17586 @hhbyyh @jkbradley Please help review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17586: [SPARK-20249][ML][PYSPARK] Add summary for LinearSVCMode...

2017-04-09 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17586 I didn't add metrics like roc for this summary yet, I can add it if it is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #17586: [SPARK-20249][ML][PYSPARK] Add summary for Linear...

2017-04-09 Thread zjffdu
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/17586 [SPARK-20249][ML][PYSPARK] Add summary for LinearSVCModel ## What changes were proposed in this pull request? Add summary for LinearSVCModel so that user can get the training process

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-06 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @viirya Thanks for careful review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-06 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r110113683 --- Diff: python/pyspark/sql/tests.py --- @@ -436,6 +436,20 @@ def test_udf_with_order_by_and_limit(self): res.explain(True

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-06 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r110108533 --- Diff: python/pyspark/sql/context.py --- @@ -228,6 +228,24 @@ def registerJavaFunction(self, name, javaClassName, returnType=None): jdt

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-05 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk Mind to review it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-29 Thread zjffdu
GitHub user zjffdu reopened a pull request: https://github.com/apache/spark/pull/17222 [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support UDAFs ## What changes were proposed in this pull request? Support register Java UDAFs in PySpark so that

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-29 Thread zjffdu
Github user zjffdu closed the pull request at: https://github.com/apache/spark/pull/17222 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #17367: [MINOR][PYSPARK] Remove _inferSchema in context.py

2017-03-20 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17367 Close it as _inferSchema is still used in many places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17367: [MINOR][PYSPARK] Remove _inferSchema in context.p...

2017-03-20 Thread zjffdu
Github user zjffdu closed the pull request at: https://github.com/apache/spark/pull/17367 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-03-20 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16906 Yeah, make sense. Fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17367: [MINOR][PYSPARK] Remove _inferSchema in context.p...

2017-03-20 Thread zjffdu
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/17367 [MINOR][PYSPARK] Remove _inferSchema in context.py ## What changes were proposed in this pull request? _inferSchema is not used in context.py, all the things have been moved to

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-03-13 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 @holdenk Do you have time to review this ? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-03-13 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 I created a google doc about how to use it, https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit?usp=sharing --- If your project is set up for it, you can reply to

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-10 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r105392650 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -484,6 +484,21 @@ class UDFRegistration private[sql] (functionRegistry

[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-03-09 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17222 @holdenk @marmbrus Please help review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-03-08 Thread zjffdu
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/17222 [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support UDAFs ## What changes were proposed in this pull request? Support register Java UDAFs in PySpark so that use

[GitHub] spark issue #17194: Add new aggregates EVERY and ANY (SOME).

2017-03-08 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/17194 @ptkool Please help the title to include the JIRA Id so that it can be linked to jira automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #10307: [SPARK-12334][SQL][PYSPARK] Support read from mul...

2017-03-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/10307#discussion_r104874993 --- Diff: python/pyspark/sql/readwriter.py --- @@ -282,6 +282,23 @@ def parquet(self, *paths): """ retur

[GitHub] spark pull request #10307: [SPARK-12334][SQL][PYSPARK] Support read from mul...

2017-03-07 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/10307#discussion_r104829036 --- Diff: python/pyspark/sql/readwriter.py --- @@ -407,15 +424,17 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark pull request #10307: [SPARK-12334][SQL][PYSPARK] Support read from mul...

2017-02-28 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/10307#discussion_r103600310 --- Diff: python/pyspark/sql/readwriter.py --- @@ -388,16 +388,18 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non

[GitHub] spark pull request #16907: [SPARK-19572][SPARKR] Allow to disable hive in sp...

2017-02-28 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/16907#discussion_r103599670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -47,12 +47,15 @@ private[sql] object SQLUtils extends Logging

[GitHub] spark issue #16907: [SPARK-19572][SPARKR] Allow to disable hive in sparkR sh...

2017-02-24 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16907 Yeah, it would be nice to be merged into 2.1 as well. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16907: [SPARK-19572][SPARKR] Allow to disable hive in sparkR sh...

2017-02-24 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16907 Seems a flaky test, let me trigger the build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16907: [SPARK-19572][SPARKR] Allow to disable hive in sp...

2017-02-23 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/16907#discussion_r102865692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -48,13 +48,14 @@ private[sql] object SQLUtils extends Logging

[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...

2017-02-22 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/11211 @holdenk description is updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...

2017-02-20 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/11211 ping @holdenk @HyukjinKwon PR is updated, please help review. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #11211: [SPARK-13330][PYSPARK] PYTHONHASHSEED is not propgated t...

2017-02-13 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/11211 Sorry for late reply, I may come back to this issue late of this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16907: [SPARK-19572][SPARKR] Allow to disable hive in sparkR sh...

2017-02-13 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16907 Address the comments. @felixcheung, correct, `shell.R` is not supposed to be used outside. This ticket is mainly for disabling hive in sparkR shell, sparkR batch mode already support this feature

[GitHub] spark issue #16907: [SPARK-19572][SPARKR] Allow to disable hive in sparkR sh...

2017-02-13 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16907 @felixcheung Please help review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16907: [SPARK-19582][SPARKR] Allow to disable hive in sp...

2017-02-12 Thread zjffdu
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/16907 [SPARK-19582][SPARKR] Allow to disable hive in sparkR shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This is not

[GitHub] spark issue #16906: [SPARK-19570][PYSPARK] Allow to disable hive in pyspark ...

2017-02-12 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/16906 @holdenk Please help review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16906: [SPARK-19570][PYSPARK] Allow to disable hive in p...

2017-02-12 Thread zjffdu
GitHub user zjffdu opened a pull request: https://github.com/apache/spark/pull/16906 [SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell ## What changes were proposed in this pull request? SPARK-15236 do this for scala shell, this ticket is for pyspark shell. This

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK][ML] Add KMeanSummary in KMeans of...

2016-11-28 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13557 @sethah Thanks for the review, I have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files & spark.jars shoul...

2016-11-01 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15669 hmm, notice spark.files is still passed to SparkContext in yarn-client mode, seems I need to do that in SparkSubmit --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files should not be pass...

2016-11-01 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15669 That's correct, this PR will also fix the yarn-client case. PR title is updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request #15669: [SPARK-18160][CORE][YARN] spark.files should not ...

2016-11-01 Thread zjffdu
Github user zjffdu closed the pull request at: https://github.com/apache/spark/pull/15669 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15669: [SPARK-18160][CORE][YARN] spark.files should not ...

2016-11-01 Thread zjffdu
GitHub user zjffdu reopened a pull request: https://github.com/apache/spark/pull/15669 [SPARK-18160][CORE][YARN] spark.files should not be passed to driver in yarn-cluster mode ## What changes were proposed in this pull request? spark.files is still passed to driver in

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85877935 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85877793 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-10-31 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13599 Thanks for the review @mridulm , this approach is trying the move the overhead from user to cluster. User just need to specify the requirement file and spark will set up the virtualenv automatically

[GitHub] spark pull request #15669: [SPARK-18160][CORE][YARN] spark.files should not ...

2016-10-31 Thread zjffdu
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/15669#discussion_r85872343 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1716,29 +1716,12 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...

2016-10-31 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15669 that's correct, it is due to `spark.files`, jira has been updated. Will update the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...

2016-10-31 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15669 spark.files would still be passed to driver even in yarn-cluster if you check the following code. https://github.com/apache/spark/blob/7bf8a4049866b2ec7fdf0406b1ad0c3a12488645/core/src/main

  1   2   3   4   5   >