[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-09 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r160549913 --- Diff: python/pyspark/context.py --- @@ -1023,6 +1032,35 @@ def getConf(self): conf.setAll(self._conf.getAll()) return conf

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Support for deploying Anaconda an...

2018-01-08 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14180 Hello. Been a long time, it probably needs a full rework. Maybe we need to take a step back and have a talk between several person interested in this feature to see what is the more suitable for

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2017-07-02 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Hello. Sadly I cannot work on this we are in a middle of a big restructuration at work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-06-16 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 I would be so pleased to see proper support for virtualenv and conda. I also lobby for support of wheelhouse. Now there is Pipfile that is replacing requirements.txt. Having truly independent

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2017-04-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 I guess a rebased will be welcomed, I can do it by tomorow if you want --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-03-31 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2017-02-15 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Hello. This is actually the execution of the pylint/autopep8 config proposed in #14963. I can minimize a little bit more this PR by ignoring indeed more rules. --- If your project is set up for

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-02-15 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r101263394 --- Diff: examples/src/main/python/mllib/naive_bayes_example.py --- @@ -24,16 +24,17 @@ from __future__ import print_function

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-02-15 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r101263279 --- Diff: examples/src/main/python/mllib/streaming_linear_regression_example.py --- @@ -25,13 +25,14 @@ # $example off$ from pyspark

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-02-15 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r101263182 --- Diff: examples/src/main/python/streaming/network_wordjoinsentiments.py --- @@ -54,22 +54,25 @@ def print_happiest_words(rdd): # Read in

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-02-15 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r101262948 --- Diff: examples/src/main/python/ml/decision_tree_classification_example.py --- @@ -65,8 +67,9 @@ predictions.select("predi

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-02-15 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r101262606 --- Diff: examples/src/main/python/ml/count_vectorizer_example.py --- @@ -17,23 +17,26 @@ from __future__ import print_function -from

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2017-01-25 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14918 Ok, abandonning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2017-01-25 Thread Stibbons
Github user Stibbons closed the pull request at: https://github.com/apache/spark/pull/14918 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2017-01-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Can we agree on merging this patch ? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2017-01-09 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r95129312 --- Diff: examples/src/main/python/ml/generalized_linear_regression_example.py --- @@ -17,9 +17,10 @@ from __future__ import print_function

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2017-01-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Any hope this patch might be integrated ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-27 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93977216 --- Diff: examples/src/main/python/ml/cross_validator.py --- @@ -42,20 +42,22 @@ # $example on$ # Prepare training documents, which

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-27 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93977062 --- Diff: examples/src/main/python/ml/generalized_linear_regression_example.py --- @@ -17,9 +17,10 @@ from __future__ import print_function

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-22 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93613427 --- Diff: examples/src/main/python/ml/vector_slicer_example.py --- @@ -20,15 +20,18 @@ # $example on$ from pyspark.ml.feature import VectorSlicer

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2016-12-21 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Rebased, space issue will be fixed in another PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-21 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93424188 --- Diff: docs/streaming-programming-guide.md --- @@ -2105,7 +2105,7 @@ documentation), or set the `spark.default.parallelism` {:.no_toc} The

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-21 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93424082 --- Diff: docs/streaming-programming-guide.md --- @@ -1626,10 +1626,10 @@ See the full [source code]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-21 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93423843 --- Diff: examples/src/main/python/ml/dct_example.py --- @@ -23,6 +23,7 @@ # $example off$ from pyspark.sql import SparkSession

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

2016-12-21 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r93423576 --- Diff: docs/streaming-programming-guide.md --- @@ -1099,7 +1099,7 @@ joinedStream = stream1.join(stream2) {% endhighlight %} -Here

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-11-01 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 Maybe we can try to split this work in several parts to ease merge. I clearly think python job should be deployed a bit like jar dependencies are specified for scala (with `--packages`), and this

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-11-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85907419 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,67 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-11-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85907139 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-20 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 I agree, just email me :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Ho, I have these pylint errors on my ubuntu! Probably I did not rebased correctly. Fixed with new check ignore: - deprecated-method - unsubscriptable-object, - used-before

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 I really think using Spark with Anaconda is a **must have**. Deploying jobs that runs inside a Conda environment is so fast and efficient. I really want to push for this pull request #14180 that

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 You and I agree, actually: - PySpark can run inside Anaconda, and indeed this is greatly valuable. This will make available to the "driver" all the package provided by Anaconda (in c

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Here is my proposal: I leave the environment and let the script create the virtual env, plus minor improvement on the linter script. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 I have just tested this pull request on MacOS X with latest version of Spark, no error: ``` ... Checking Pep8... PEP8 checks passed. Checking Pylint... Pylint checks

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-12 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Indeed, I have not taken into account Anaconda environment, first because this tool provide quick and efficient way of having an almost good working environment to run jobs with numpy, pandas, and

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and ...

2016-10-12 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r83010354 --- Diff: dev/requirements.txt --- @@ -1,3 +1,7 @@ jira==1.0.3 +numpy==1.11.1 +pep8==1.7.0 PyGithub==1.26.0 +pylint==1.6.4 +Sphinx

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-10 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Thanks both of you for your reviews. It also fix the version of sphinx for documentation build during the `lint-python` My point of view is that for **application**, all versions of tools

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-08 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 sorry I wrote on the wrong pull request, I was talking about the virtualenv support for executor (#14180) :( This one is indeed only to reenable pylint. Merging this would allow me to submit

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-08 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 I would love to have a bit more feedback on this matter but it does not seem to interest core developers, sadly :( It's a bit disappointing, seeing how Python support on Spark is great,

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-09-26 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Hello. Does anyone had time to review this patch? Is there anything I should do to help it being merged? This reenables Pylint checks using `lint-python` script and there are some errors that

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-09-26 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 yes. This file is only written once, dependending the configuration of the Spark Cluster, and each time you want to send a Python job that will use virtualenv/conda, user would just add

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-09-26 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 I see that everything in this "conf" settings may be set into a .conf file that define the default values, so user can give to spark-submit with all default values, I wonder if t

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-09-26 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r80423208 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,67 @@ private[spark] class PythonWorkerFactory

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14918 For information, I continue to look at these kind of simple optimisations that does not cost too much. Python is a pretty slow language, very productive in term of code writing, but inefficient in

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Wheelhouse and VirtualEnv support

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14180 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and import...

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14567 Rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14918 Rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-09-22 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Pull Request rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-09-21 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 What is the PR dashboard ? I usually rebase this patch one or twice a week, I'll do it tomorrow --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-09-16 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Hello, sorry to bother you, but if this patch gets merged, I can work on the pylint errors and submit new PR I had to add in the ignore list of pylint. If I reenable most of them, here is

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-13 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/15026#discussion_r78669516 --- Diff: python/pyspark/broadcast.py --- @@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-12 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/15026#discussion_r78464418 --- Diff: python/pyspark/broadcast.py --- @@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-09-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14918 Indeed, i dont think it will be feasible to propagate the generator up to the jvm. It would be cool, because when we have the schema there is no need to iterate several time on the complete

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8

2016-09-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Great ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8

2016-09-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Great ! By the way, I am also working on virtualenv and wheel support for PySpark job deployment (see #14180) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on...

2016-09-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Sure ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8

2016-09-09 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Yes but for pylint you have many dependencies to update as well (astroid,...). At least with a virtualenv, pip does it for us :) And every time I see an hard coded external url I am afraid it

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and ...

2016-09-08 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r78136301 --- Diff: dev/lint-python --- @@ -26,30 +67,26 @@ PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt" PYLINT_INSTALL_INFO="$SPA

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-08 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14963 Hum I don't see how it was reenabled... Where is it called? And I had many errors to ignore once I have reenabled it on the execution of lint-python. I'll update the title. At le

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-08 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r78135769 --- Diff: dev/lint-python --- @@ -17,6 +17,47 @@ # limitations under the License. # +VIRTUAL_ENV_DIR="build/venv" +

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-08 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r78135582 --- Diff: dev/requirements.txt --- @@ -1,3 +1,5 @@ jira==1.0.3 PyGithub==1.26.0 Unidecode==0.04.19 +pep8==1.7.0 --- End diff

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-08 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r78135524 --- Diff: dev/lint-python --- @@ -26,30 +67,26 @@ PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt" PYLINT_INSTALL_INFO="$SPA

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Wheelhouse and VirtualEnv support

2016-09-07 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14180 I have written a blog post about this pull request to explain what we can do with it: http://www.great-a-blog.co/wheel-deployment-for-pyspark/ --- If your project is set up for it, you can reply

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-09-06 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14918 at last, tests pass ! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-09-06 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/13599 @zjffdu any news? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r77542206 --- Diff: dev/lint-python --- @@ -26,30 +67,26 @@ PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt" PYLINT_INSTALL_INFO="$SPA

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14963#discussion_r77540952 --- Diff: python/pylintrc --- @@ -84,7 +84,85 @@ enable= # If you would like to improve the code quality of pyspark, remove any of these disabled

[GitHub] spark pull request #14963: [SPARK-16992][PYSPARK] Reenable Pylint

2016-09-05 Thread Stibbons
GitHub user Stibbons opened a pull request: https://github.com/apache/spark/pull/14963 [SPARK-16992][PYSPARK] Reenable Pylint Use a virtualenv for isolation and easy installation. This basically reverts 85a50a6352b72c4619d010e29e3a76774dbc0c71 Might have been a

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] PEP8 on Pyspark docu...

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r77530346 --- Diff: examples/src/main/python/ml/string_indexer_example.py --- @@ -22,6 +22,7 @@ # $example off$ from pyspark.sql import SparkSession

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] PEP8 on Pyspark docu...

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r77530312 --- Diff: examples/src/main/python/ml/tf_idf_example.py --- @@ -18,7 +18,7 @@ from __future__ import print_function # $example on$ -from

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] PEP8 on Pyspark docu...

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r77530245 --- Diff: docs/streaming-programming-guide.md --- @@ -1626,10 +1626,10 @@ See the full [source code]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] PEP8 on Pyspark docu...

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r77530186 --- Diff: docs/streaming-programming-guide.md --- @@ -1585,7 +1585,7 @@ public class JavaRow implements java.io.Serializable { /** DataFrame

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] PEP8 on Pyspark docu...

2016-09-05 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r77530179 --- Diff: docs/streaming-programming-guide.md --- @@ -1099,7 +1099,7 @@ joinedStream = stream1.join(stream2) {% endhighlight %} -Here

[GitHub] spark issue #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and import...

2016-09-05 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14567 I am not sure if there is a test in pylint on the backslash syntax, there are some cases like with the ```with``` statement where the backslash might not be easily replaceable (see https

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Wheelhouse and VirtualEnv support

2016-09-05 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14180 Hello. Can someone help to review this PR? I find the current way Spark handle Python programs really problematic, with this proposal (based on top of #13599), jobs deployment becomes so much

[GitHub] spark issue #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and import...

2016-09-01 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14567 Rebased. Update: - move ```.editorconfig``` up to the root of the project. This is needed so editors plugin will find it and configure both scala and python files. I didn't fo

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r77177492 --- Diff: .gitignore --- @@ -28,6 +28,7 @@ build/*.jar build/apache-maven* build/scala* build/zinc* +build/venv --- End diff

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14918#discussion_r77174968 --- Diff: python/pyspark/sql/tests.py --- @@ -196,7 +199,8 @@ def setUpClass(cls): cls.tempdir = tempfile.NamedTemporaryFile(delete=False

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14918#discussion_r77174898 --- Diff: python/pyspark/sql/session.py --- @@ -396,14 +398,18 @@ def _createFromLocal(self, data, schema): raise TypeError("schema s

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14918#discussion_r77174813 --- Diff: python/pyspark/sql/session.py --- @@ -373,16 +375,16 @@ def _createFromRDD(self, rdd, schema, samplingRatio): rdd = rdd.map

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14918#discussion_r77174714 --- Diff: python/pyspark/sql/session.py --- @@ -396,14 +398,18 @@ def _createFromLocal(self, data, schema): raise TypeError("schema s

[GitHub] spark pull request #14863: [SPARK-16992][PYSPARK] use map comprehension in d...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14863#discussion_r77165689 --- Diff: examples/src/main/python/ml/quantile_discretizer_example.py --- @@ -29,7 +29,7 @@ .getOrCreate() # $example on

[GitHub] spark pull request #14863: [SPARK-16992][PYSPARK] use map comprehension in d...

2016-09-01 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14863#discussion_r77165741 --- Diff: examples/src/main/python/ml/vector_slicer_example.py --- @@ -32,8 +32,8 @@ # $example on$ df = spark.createDataFrame

[GitHub] spark issue #14863: [SPARK-16992][PYSPARK] use map comprehension in doc

2016-09-01 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14863 No my proposal was wrong. I have updated it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14863: [SPARK-16992][PYSPARK] use map comprehension in doc

2016-09-01 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14863 This is actually wrong, 'map()' returns a 'list' and not a dict --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request #14918: [SPARK-17360][PYSPARK] Support generator in creat...

2016-09-01 Thread Stibbons
GitHub user Stibbons opened a pull request: https://github.com/apache/spark/pull/14918 [SPARK-17360][PYSPARK] Support generator in createDataFrame ## What changes were proposed in this pull request? Avoid useless iteration within 'data' structure when creating a

[GitHub] spark issue #14863: [SPARK-16992][PYSPARK] use map comprehension in doc

2016-09-01 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14863 I agree. I would prefer if Spark examples also "promotes" the good practice of Python, ie, replacing 'map' and 'filter' by list or map comprehension ('reduce&

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-31 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r76968414 --- Diff: python/pep8rc --- @@ -0,0 +1,21 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-31 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r76967915 --- Diff: python/.editorconfig --- @@ -0,0 +1,30 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-31 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r76967736 --- Diff: dev/py-validate.sh --- @@ -0,0 +1,110 @@ +#!/usr/bin/env bash --- End diff -- My point of view: - don't enforce it right

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-31 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r76967333 --- Diff: dev/isort.cfg --- @@ -1,9 +1,9 @@ # Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements

[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Wheelhouse and VirtualEnv support

2016-08-30 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14180 Status for test 'standalone install, 'client' deployment": - virtualenv create and pip install Pypi repository: ok (1 min 30 exec) - wheelhouse (Pypi repositoy): ko, be

[GitHub] spark pull request #14863: [SPARK-16992][PYSPARK] use map comprehension in d...

2016-08-29 Thread Stibbons
GitHub user Stibbons opened a pull request: https://github.com/apache/spark/pull/14863 [SPARK-16992][PYSPARK] use map comprehension in doc Code is equivalent, but map comprehency is most of the time faster than a map. You can merge this pull request into a Git repository by

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] PEP8 on documentation exam...

2016-08-29 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r76608499 --- Diff: examples/src/main/python/als.py --- @@ -62,10 +62,10 @@ def update(i, mat, ratings): example. Please use pyspark.ml.recommendation.ALS

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] autopep8 on documentation example...

2016-08-29 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Cool I wasn't sure of it. No pbl, I can even split it into several PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] autopep8 on documentation example...

2016-08-29 Thread Stibbons
Github user Stibbons commented on the issue: https://github.com/apache/spark/pull/14830 Here is a new proposal. I've taken into account your remark, hope all $on/$off things are ok, and added some minor rework with the multiline syntax (I find using \ weird and inelegant,

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r76598828 --- Diff: examples/src/main/python/ml/aft_survival_regression.py --- @@ -17,9 +17,9 @@ from __future__ import print_function +from

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14830#discussion_r76595360 --- Diff: examples/src/main/python/ml/binarizer_example.py --- @@ -17,9 +17,10 @@ from __future__ import print_function -from

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-29 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/14567#discussion_r76582188 --- Diff: python/pep8rc --- @@ -0,0 +1,21 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements

  1   2   >