Github user mortada commented on the issue:
https://github.com/apache/spark/pull/12229
@rxin so it seems like `DirectParquetOutputCommitter` has been removed with
Spark 2.0, is there a recommended replacement?
(I'm in the process of migrating form Spark 1.6 to 2.0)
-
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon thanks for your help! I'm happy to complete this PR and follow
what you suggest for testing.
How would the package level docstring work? The goal (which I think we all
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't
quite see how we could define the input DataFrame globally while at the same
time have a clear, self-contained docs
Github user mortada commented on a diff in the pull request:
https://github.com/apache/spark/pull/15053#discussion_r80405104
--- Diff: python/pyspark/sql/functions.py ---
@@ -411,7 +415,7 @@ def monotonically_increasing_id():
The generated ID is guaranteed to be
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I may still be confused about something - first of all what do
you mean by the package level docstring? Do you mean here:
https://github.com/apache/spark/blob/master/python/pyspark/sql
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
I see, ok so you mean leave all the docstrings for the individual methods
unchanged, but instead just add
```
"""
>>> df.show()
+-
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
But that's what this PR is supposed to fix, the problem that the docstring
for each individual method is not self-contained :)
I think I now see where I was confused - it seems like w
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
In other words, currently it is not possible for the user to follow the
examples in this docstring below. It's not clear what all these input variables
(`df`, `df2`, etc) are, and where you
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon do you know how I could run the doctests for these files? I
found this online:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals which says
that I could do
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon my apologies for not having been able to follow up on this. I
still think this doc improvement would be very helpful to pyspark users. Would
you like to take over the PR
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/15053
[Doc] improve python API docstrings
## What changes were proposed in this pull request?
a lot of the python API functions show example usage that is incomplete.
The docstring shows output
Github user mortada commented on a diff in the pull request:
https://github.com/apache/spark/pull/15053#discussion_r78480019
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1393,6 +1420,7 @@ def withColumnRenamed(self, existing, new):
:param existing: string, name of
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I've created a JIRA ticket and also went through all the files
you mentioned, please take a look
---
If your project is set up for it, you can reply to this email and have your
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/13639
@srowen I went through the docs, found a few more minor fixes
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/14253
[Doc] improve python doc for rdd.histogram
## What changes were proposed in this pull request?
doc change only
## How was this patch tested?
doc change only
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/13587
[Documentation] fixed groupby aggregation example for pyspark
## What changes were proposed in this pull request?
fixing documentation for the groupby/agg example in python
## How
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/13639
[DOCUMENTATION] fixed typo
## What changes were proposed in this pull request?
minor typo
## How was this patch tested?
minor typo in the doc, should be self
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/13639
sure will do
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/9797
python3 compatibility for launching ec2 m3 instances
this currently breaks for python3 because `string` module doesn't have
`letters` anymore, instead `ascii_letters` should be used
You can
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/9797#issuecomment-157912292
@JoshRosen sure I just created a JIRA ticket here:
https://issues.apache.org/jira/browse/SPARK-11837
---
If your project is set up for it, you can reply to this email
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/9797#issuecomment-158145798
really puzzled by the test results ... the failed test doesn't seem to have
anything to do with this PR
```
FAIL: test_update_state_by_key (__m
GitHub user mortada opened a pull request:
https://github.com/apache/spark/pull/10867
[SPARK-12760] [DOCS] invalid lambda expression in python example for â¦
â¦local vs cluster
@srowen thanks for the PR at https://github.com/apache/spark/pull/10866!
sorry it took me a
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/10867#issuecomment-173639832
@srowen it compiles for local, let me test that on a cluster
I noticed that the next line is actually also invalid python
```
In [7]: print
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/10867#issuecomment-173648674
@srowen I tested the python code in cluster mode (5 ec2 workers) and this
works fine
```
16/01/21 17:33:29 INFO BlockManagerMasterEndpoint: Registering
Github user mortada commented on a diff in the pull request:
https://github.com/apache/spark/pull/9797#discussion_r45396501
--- Diff: ec2/spark_ec2.py ---
@@ -591,11 +591,15 @@ def launch_cluster(conn, opts, cluster_name):
# AWS ignores the AMI-specified block device
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/9797#issuecomment-158270661
just updated the PR incorporating your comment, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user mortada commented on the pull request:
https://github.com/apache/spark/pull/9797#issuecomment-158695890
@JoshRosen Jenkins seemed to have failed again, but this PR should be good
to go
---
If your project is set up for it, you can reply to this email and have your
reply
27 matches
Mail list logo