Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
BTW, it'd be even nicer if the images in
https://github.com/apache/spark/pull/15053#issuecomment-249550254 are included
in the PR description as we have the images already.
---
If your project
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@mortada I see. Then,
- It'd not almost increase the test time.
- It will improve the documentation.
- We have a reference for this, `numpy`
Therefore, +1 for each docstr
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon do you know how I could run the doctests for these files? I
found this online:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals which says
that I could do
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Also, I guess this change will be an example and will affect all the Python
documentation and worry disagreement from others in the future. So, I'd rather
set a clear reason and investigation fu
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
IMHO, i think the original author intended to avoid creating each dataframe
in each method docstring. I am uncertain about doing each method docstring
although I do prefer verbose/complete docum
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
OK, I just manually fixed and built the docs.
-
[**Original**](https://cloud.githubusercontent.com/assets/6477701/18833109/11098e5c-842a-11e6-9e48-6b3cae7c10b0.png)
- [**Suggest
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@mortada Otherwise, let me please try this in my local and post some images
here by myself if you can bear with few days :)
---
If your project is set up for it, you can reply to this email and
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@mortada yes, I understand your point. However, I'd like to avoid change a
lot and I guess @holdenk feels in the same way (she is insightful in Python).
I guess it'd not take a lot of ti
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
In other words, currently it is not possible for the user to follow the
examples in this docstring below. It's not clear what all these input variables
(`df`, `df2`, etc) are, and where you'd even f
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
But that's what this PR is supposed to fix, the problem that the docstring
for each individual method is not self-contained :)
I think I now see where I was confused - it seems like we are a
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@mortada Exactly!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
I see, ok so you mean leave all the docstrings for the individual methods
unchanged, but instead just add
```
"""
>>> df.show()
+-+---+
| name|age|
+-+---+
|Alice
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Oh, maybe my previous comments were not clear. I meant
https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L18
> Also, is the idea that we would define df globall
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I may still be confused about something - first of all what do
you mean by the package level docstring? Do you mean here:
https://github.com/apache/spark/blob/master/python/pyspark/sql/
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Hm.. @mortada It seems the dataframe definition is added into each
docstring. Could you maybe remove the duplicates and then just print the
contents in the package level docstring?
---
If your
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15053
@mortada what do you think of @HyukjinKwon 's suggestions?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not hav
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/15053
Oh yes I agree that's a much smaller change - I was just explaining the
motivation behind my initial comment as mortada asked me to elaborate.
---
If your project is set up for it, you can reply to
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@holdenk I am also cautious but leaving everything but adding `df.show()`
in the package docstring with cleaning up duplicated defining dataframes in
each docstring will be minimal change and
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/15053
I was thinking that the user would probably read the package string
documentation before looking at the individual functions (or if they went
looking for the definition of the dataframe). I'm a litt
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Oh, I meant just don't touch the `globs` in `_test()` but just print the
global dataframes (which should be rather `show()` to show the contents) so
that users can understand the input and outpu
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't
quite see how we could define the input DataFrame globally while at the same
time have a clear, self-contained docstring fo
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Hi @mortada , I am sorry that I am a bit being noisy here but I just took a
look for myself.
I resembled the PySpark structure and made a draft for myself.
```python
"""
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
I haven't checked if package level docstring can define global level
variables to be accessed from other docstrings. So, I would like to defer this
to @holdenk (if you are not sure too, then we
Github user mortada commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon thanks for your help! I'm happy to complete this PR and follow
what you suggest for testing.
How would the package level docstring work? The goal (which I think we all
agree o
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Hi @mortada, do you mind if I ask to address mine or @holdenk' comment? If
you find any problem with testing, I am willing to take over this which I will
ask comitters to credit this to you.
-
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@holdenk Personally, I like package level docstring more if we can write
them pretty and well.
(If we are not sure on that, then, I think we can maybe do this for each as
a safe choice thoug
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/15053
What about if for the shared DF we also put the comment in the package
level docstring so people reading the documentation would see it first but we
wouldn't need to have it repeated for all of the
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@srowen - Thanks for your feedback. In my personal opinion, I believe
examples in documentation should be self-contained, at least, to allow copy and
paste them to test and let users understand
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15053
@HyukjinKwon I think you'd know better than I. I tend to agree it would
make the examples more self-contained and readable. It would lead to some more
duplication of code I guess, but if you judge it
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
@srowen Would this sensible if we remove all the predefined dataframes as
globals and convert them to within each doctest if this change looks good?
---
If your project is set up for it, you ca
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
BTW - If you meant how they passed before not how they ran, it seems some
dataframes were created before actually running the tests,
[dataframe.py#L1600-L1617](https://github.com/mortada/spark/b
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/15053
Oh, it will run doc tests as far as I know,
http://www.sphinx-doc.org/en/stable/ext/doctest.html
Maybe I will try to run it locally to check it by myself.
---
If your project is set up
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15053
Looks like the expected output doesn't match now. Maybe you all can teach
me something -- how did this docstring work as a test at all before? or if it
doesn't run correctly is the output not checked
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15053
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15053
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65242/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15053
**[Test build #65242 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)**
for PR 15053 at commit
[`52240bc`](https://github.com/apache/spark/commit/
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15053
**[Test build #65242 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)**
for PR 15053 at commit
[`52240bc`](https://github.com/apache/spark/commit/5
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15053
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15053
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat
39 matches
Mail list logo