Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r193664416
--- Diff: docs/submitting-applications.md ---
@@ -218,6 +218,115 @@ These commands can be used with `pyspark`,
`spark-shell`, and `spark-submit` to
For
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
Thanks for the interest on this PR and the info about `Pipfiles`. I think
we could support that after this PR get merged so that we can provide users
more options for virtualenv based on their
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13493
Thanks @jkbradley The failed tests seems unrelated.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
That would be awesome.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
I am afraid I would not be present in Strata SJ, I live in Shanghai China,
and may not be able to travel at time.
---
-
To
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
ping @holdenk @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164646157
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164069473
--- Diff: core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala ---
@@ -39,12 +39,17 @@ object PythonRunner {
val pyFiles = args(1
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164068172
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164037516
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,41 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164037239
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
---
@@ -98,7 +98,7 @@ class
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164037055
--- Diff:
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java
---
@@ -299,20 +300,34 @@
// 4. environment variable
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164036488
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164034980
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164034871
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,41 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r163427975
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,42 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
@holdenk @HyukjinKwon @ueshin I have updated the PR, and now it also works
when executor is restarted and even dynamic allocation is enabled. The only
overhead is on the driver side when executor
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160572606
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,35 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160310618
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1039,33 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160308321
--- Diff:
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java
---
@@ -299,20 +301,39 @@
// 4. environment variable
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160285377
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
@holdenk @ueshin @HyukjinKwon Thanks for review the long pending PR. Will
refine the PR soon.
---
-
To unsubscribe, e-mail
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160141363
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1039,33 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160140451
--- Diff: docs/submitting-applications.md ---
@@ -218,6 +218,73 @@ These commands can be used with `pyspark`,
`spark-shell`, and `spark-submit` to
For
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160139161
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160138782
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160138391
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -60,6 +66,12 @@ private[spark] class PythonWorkerFactory
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160138349
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -29,7 +30,10 @@ import org.apache.spark._
import
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160070613
--- Diff:
core/src/main/scala/org/apache/spark/api/python/VirtualEnvFactory.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160070518
--- Diff: python/pyspark/context.py ---
@@ -980,6 +996,33 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r160070457
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -475,6 +475,19 @@ object SparkSubmit extends CommandLineUtils with
Logging
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
Thanks @viirya
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
This PR fails fails PySpark pip packaging tests. But I don't know what's
wrong here. @holdenk Is the `PySpark pip packaging test` an known issue ?
---
If your project is set up for i
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r123876794
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
---
@@ -20,16 +20,19 @@ package
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r123871674
--- Diff:
sql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java ---
@@ -31,7 +31,7 @@
import
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r123871670
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
---
@@ -20,16 +20,19 @@ package
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/14180
Here's my approach #13599 for virtualenv and conda support, welcome any
comments and reviews
https://docs.google.com/document/d/1EGNEf4vFmpGXSd2DPOLu_HL23Xhw9aWKeUrzzxsEbQs/edi
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@gatorsmile sorry for late response, will update it soon
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r116325723
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r116293947
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r116293890
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
---
@@ -20,16 +20,19 @@ package
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r114963449
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r114962484
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r114948086
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@cloud-fan This is not about using python UDF, it is to allow pyspark to
use java UDF (no python daemon will be launched). So actually it would improve
the performance.
---
If your project is set
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@holdenk @gatorsmile Any more comments ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@holdenk The link you pasted is for the case that using scala closure to
create udf. While `registerJava` use java reflection to create udf. This is
what I use in `registerJava`
https://github.com
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r113085517
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@holdenk But it has nothing to return, because scala side return Unit. See
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L528
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
ping @holdenk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r111279320
--- Diff: python/pyspark/ml/classification.py ---
@@ -172,6 +172,47 @@ def intercept(self):
"""
return self._call_
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
Good catch ! @holdenk `return` is removed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r111042227
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -355,6 +368,19 @@ object LinearSVCModel extends
MLReadable
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17586#discussion_r111042049
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -355,6 +368,19 @@ object LinearSVCModel extends
MLReadable
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16906
Kindly ping @holdenk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17586
@hhbyyh @jkbradley Please help review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17586
I didn't add metrics like roc for this summary yet, I can add it if it is
necessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
GitHub user zjffdu opened a pull request:
https://github.com/apache/spark/pull/17586
[SPARK-20249][ML][PYSPARK] Add summary for LinearSVCModel
## What changes were proposed in this pull request?
Add summary for LinearSVCModel so that user can get the training process
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@viirya Thanks for careful review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r110113683
--- Diff: python/pyspark/sql/tests.py ---
@@ -436,6 +436,20 @@ def test_udf_with_order_by_and_limit(self):
res.explain(True
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r110108533
--- Diff: python/pyspark/sql/context.py ---
@@ -228,6 +228,24 @@ def registerJavaFunction(self, name, javaClassName,
returnType=None):
jdt
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@holdenk Mind to review it ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user zjffdu reopened a pull request:
https://github.com/apache/spark/pull/17222
[SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support
UDAFs
## What changes were proposed in this pull request?
Support register Java UDAFs in PySpark so that
Github user zjffdu closed the pull request at:
https://github.com/apache/spark/pull/17222
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17367
Close it as _inferSchema is still used in many places.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user zjffdu closed the pull request at:
https://github.com/apache/spark/pull/17367
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16906
Yeah, make sense. Fixed it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user zjffdu opened a pull request:
https://github.com/apache/spark/pull/17367
[MINOR][PYSPARK] Remove _inferSchema in context.py
## What changes were proposed in this pull request?
_inferSchema is not used in context.py, all the things have been moved to
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
@holdenk Do you have time to review this ? Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
I created a google doc about how to use it,
https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit?usp=sharing
---
If your project is set up for it, you can reply to
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/17222#discussion_r105392650
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -484,6 +484,21 @@ class UDFRegistration private[sql] (functionRegistry
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17222
@holdenk @marmbrus Please help review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user zjffdu opened a pull request:
https://github.com/apache/spark/pull/17222
[SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support
UDAFs
## What changes were proposed in this pull request?
Support register Java UDAFs in PySpark so that use
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/17194
@ptkool Please help the title to include the JIRA Id so that it can be
linked to jira automatically.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/10307#discussion_r104874993
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -282,6 +282,23 @@ def parquet(self, *paths):
"""
retur
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/10307#discussion_r104829036
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -407,15 +424,17 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/10307#discussion_r103600310
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -388,16 +388,18 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/16907#discussion_r103599670
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala
---
@@ -47,12 +47,15 @@ private[sql] object SQLUtils extends Logging
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16907
Yeah, it would be nice to be merged into 2.1 as well. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16907
Seems a flaky test, let me trigger the build
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/16907#discussion_r102865692
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala
---
@@ -48,13 +48,14 @@ private[sql] object SQLUtils extends Logging
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/11211
@holdenk description is updated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/11211
ping @holdenk @HyukjinKwon PR is updated, please help review. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/11211
Sorry for late reply, I may come back to this issue late of this week.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16907
Address the comments. @felixcheung, correct, `shell.R` is not supposed to
be used outside. This ticket is mainly for disabling hive in sparkR shell,
sparkR batch mode already support this feature
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16907
@felixcheung Please help review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user zjffdu opened a pull request:
https://github.com/apache/spark/pull/16907
[SPARK-19582][SPARKR] Allow to disable hive in sparkR shell
## What changes were proposed in this pull request?
SPARK-15236 do this for scala shell, this ticket is for sparkR shell. This
is not
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/16906
@holdenk Please help review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user zjffdu opened a pull request:
https://github.com/apache/spark/pull/16906
[SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell
## What changes were proposed in this pull request?
SPARK-15236 do this for scala shell, this ticket is for pyspark shell. This
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13557
@sethah Thanks for the review, I have updated the PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/15669
hmm, notice spark.files is still passed to SparkContext in yarn-client
mode, seems I need to do that in SparkSubmit
---
If your project is set up for it, you can reply to this email and have your
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/15669
That's correct, this PR will also fix the yarn-client case. PR title is
updated.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user zjffdu closed the pull request at:
https://github.com/apache/spark/pull/15669
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user zjffdu reopened a pull request:
https://github.com/apache/spark/pull/15669
[SPARK-18160][CORE][YARN] spark.files should not be passed to driver in
yarn-cluster mode
## What changes were proposed in this pull request?
spark.files is still passed to driver in
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r85877935
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r85877793
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/13599
Thanks for the review @mridulm , this approach is trying the move the
overhead from user to cluster. User just need to specify the requirement file
and spark will set up the virtualenv automatically
Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/15669#discussion_r85872343
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1716,29 +1716,12 @@ class SparkContext(config: SparkConf) extends
Logging
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/15669
that's correct, it is due to `spark.files`, jira has been updated. Will
update the PR soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on G
Github user zjffdu commented on the issue:
https://github.com/apache/spark/pull/15669
spark.files would still be passed to driver even in yarn-cluster if you
check the following code.
https://github.com/apache/spark/blob/7bf8a4049866b2ec7fdf0406b1ad0c3a12488645/core/src/main
1 - 100 of 441 matches
Mail list logo