[GitHub] spark issue #12229: [SPARK-10063][SQL] Remove DirectParquetOutputCommitter

2016-08-22 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/12229 @rxin so it seems like `DirectParquetOutputCommitter` has been removed with Spark 2.0, is there a recommended replacement? (I'm in the process of migrating form Spark 1.6 to 2.0) -

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon thanks for your help! I'm happy to complete this PR and follow what you suggest for testing. How would the package level docstring work? The goal (which I think we all

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't quite see how we could define the input DataFrame globally while at the same time have a clear, self-contained docs

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-25 Thread mortada
Github user mortada commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r80405104 --- Diff: python/pyspark/sql/functions.py --- @@ -411,7 +415,7 @@ def monotonically_increasing_id(): The generated ID is guaranteed to be

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I may still be confused about something - first of all what do you mean by the package level docstring? Do you mean here: https://github.com/apache/spark/blob/master/python/pyspark/sql

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 I see, ok so you mean leave all the docstrings for the individual methods unchanged, but instead just add ``` """ >>> df.show() +-

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 But that's what this PR is supposed to fix, the problem that the docstring for each individual method is not self-contained :) I think I now see where I was confused - it seems like w

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 In other words, currently it is not possible for the user to follow the examples in this docstring below. It's not clear what all these input variables (`df`, `df2`, etc) are, and where you&#

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-29 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon do you know how I could run the doctests for these files? I found this online: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals which says that I could do

[GitHub] spark issue #15053: [SPARK-18069][Doc] improve python API docstrings

2017-01-28 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon my apologies for not having been able to follow up on this. I still think this doc improvement would be very helpful to pyspark users. Would you like to take over the PR

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-11 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/15053 [Doc] improve python API docstrings ## What changes were proposed in this pull request? a lot of the python API functions show example usage that is incomplete. The docstring shows output

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread mortada
Github user mortada commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78480019 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1420,7 @@ def withColumnRenamed(self, existing, new): :param existing: string, name of

[GitHub] spark issue #15053: [SPARK-18069][Doc] improve python API docstrings

2016-10-23 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I've created a JIRA ticket and also went through all the files you mentioned, please take a look --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #13639: [DOCUMENTATION] fixed typos in python programming guide

2016-06-13 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/13639 @srowen I went through the docs, found a few more minor fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14253: [Doc] improve python doc for rdd.histogram

2016-07-18 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/14253 [Doc] improve python doc for rdd.histogram ## What changes were proposed in this pull request? doc change only ## How was this patch tested? doc change only

[GitHub] spark pull request #13587: [Documentation] fixed groupby aggregation example...

2016-06-09 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/13587 [Documentation] fixed groupby aggregation example for pyspark ## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How

[GitHub] spark pull request #13639: [DOCUMENTATION] fixed typo

2016-06-13 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/13639 [DOCUMENTATION] fixed typo ## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self

[GitHub] spark issue #13639: [DOCUMENTATION] fixed typo

2016-06-13 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/13639 sure will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request: python3 compatibility for launching ec2 m3 ins...

2015-11-17 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/9797 python3 compatibility for launching ec2 m3 instances this currently breaks for python3 because `string` module doesn't have `letters` anymore, instead `ascii_letters` should be used You can

[GitHub] spark pull request: python3 compatibility for launching ec2 m3 ins...

2015-11-18 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/9797#issuecomment-157912292 @JoshRosen sure I just created a JIRA ticket here: https://issues.apache.org/jira/browse/SPARK-11837 --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-11837] [EC2] python3 compatibility for ...

2015-11-19 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/9797#issuecomment-158145798 really puzzled by the test results ... the failed test doesn't seem to have anything to do with this PR ``` FAIL: test_update_state_by_key (__m

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread mortada
GitHub user mortada opened a pull request: https://github.com/apache/spark/pull/10867 [SPARK-12760] [DOCS] invalid lambda expression in python example for … …local vs cluster @srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173639832 @srowen it compiles for local, let me test that on a cluster I noticed that the next line is actually also invalid python ``` In [7]: print

[GitHub] spark pull request: [SPARK-12760] [DOCS] invalid lambda expression...

2016-01-21 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/10867#issuecomment-173648674 @srowen I tested the python code in cluster mode (5 ec2 workers) and this works fine ``` 16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering

[GitHub] spark pull request: [SPARK-11837] [EC2] python3 compatibility for ...

2015-11-19 Thread mortada
Github user mortada commented on a diff in the pull request: https://github.com/apache/spark/pull/9797#discussion_r45396501 --- Diff: ec2/spark_ec2.py --- @@ -591,11 +591,15 @@ def launch_cluster(conn, opts, cluster_name): # AWS ignores the AMI-specified block device

[GitHub] spark pull request: [SPARK-11837] [EC2] python3 compatibility for ...

2015-11-19 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/9797#issuecomment-158270661 just updated the PR incorporating your comment, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-11837] [EC2] python3 compatibility for ...

2015-11-21 Thread mortada
Github user mortada commented on the pull request: https://github.com/apache/spark/pull/9797#issuecomment-158695890 @JoshRosen Jenkins seemed to have failed again, but this PR should be good to go --- If your project is set up for it, you can reply to this email and have your reply