[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 BTW, it'd be even nicer if the images in https://github.com/apache/spark/pull/15053#issuecomment-249550254 are included in the PR description as we have the images already. --- If your project

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @mortada I see. Then, - It'd not almost increase the test time. - It will improve the documentation. - We have a reference for this, `numpy` Therefore, +1 for each docstr

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-29 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon do you know how I could run the doctests for these files? I found this online: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals which says that I could do

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Also, I guess this change will be an example and will affect all the Python documentation and worry disagreement from others in the future. So, I'd rather set a clear reason and investigation fu

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 IMHO, i think the original author intended to avoid creating each dataframe in each method docstring. I am uncertain about doing each method docstring although I do prefer verbose/complete docum

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 OK, I just manually fixed and built the docs. - [**Original**](https://cloud.githubusercontent.com/assets/6477701/18833109/11098e5c-842a-11e6-9e48-6b3cae7c10b0.png) - [**Suggest

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @mortada Otherwise, let me please try this in my local and post some images here by myself if you can bear with few days :) --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @mortada yes, I understand your point. However, I'd like to avoid change a lot and I guess @holdenk feels in the same way (she is insightful in Python). I guess it'd not take a lot of ti

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 In other words, currently it is not possible for the user to follow the examples in this docstring below. It's not clear what all these input variables (`df`, `df2`, etc) are, and where you'd even f

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 But that's what this PR is supposed to fix, the problem that the docstring for each individual method is not self-contained :) I think I now see where I was confused - it seems like we are a

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @mortada Exactly! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 I see, ok so you mean leave all the docstrings for the individual methods unchanged, but instead just add ``` """ >>> df.show() +-+---+ | name|age| +-+---+ |Alice

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Oh, maybe my previous comments were not clear. I meant https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L18 > Also, is the idea that we would define df globall

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I may still be confused about something - first of all what do you mean by the package level docstring? Do you mean here: https://github.com/apache/spark/blob/master/python/pyspark/sql/

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Hm.. @mortada It seems the dataframe definition is added into each docstring. Could you maybe remove the duplicates and then just print the contents in the package level docstring? --- If your

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-25 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15053 @mortada what do you think of @HyukjinKwon 's suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-19 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15053 Oh yes I agree that's a much smaller change - I was just explaining the motivation behind my initial comment as mortada asked me to elaborate. --- If your project is set up for it, you can reply to

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @holdenk I am also cautious but leaving everything but adding `df.show()` in the package docstring with cleaning up duplicated defining dataframes in each docstring will be minimal change and

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15053 I was thinking that the user would probably read the package string documentation before looking at the individual functions (or if they went looking for the definition of the dataframe). I'm a litt

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Oh, I meant just don't touch the `globs` in `_test()` but just print the global dataframes (which should be rather `show()` to show the contents) so that users can understand the input and outpu

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I understand we can have `py.test` and `doctest`, but I don't quite see how we could define the input DataFrame globally while at the same time have a clear, self-contained docstring fo

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Hi @mortada , I am sorry that I am a bit being noisy here but I just took a look for myself. I resembled the PySpark structure and made a draft for myself. ```python """

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 I haven't checked if package level docstring can define global level variables to be accessed from other docstrings. So, I would like to defer this to @holdenk (if you are not sure too, then we

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread mortada
Github user mortada commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon thanks for your help! I'm happy to complete this PR and follow what you suggest for testing. How would the package level docstring work? The goal (which I think we all agree o

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Hi @mortada, do you mind if I ask to address mine or @holdenk' comment? If you find any problem with testing, I am willing to take over this which I will ask comitters to credit this to you. -

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @holdenk Personally, I like package level docstring more if we can write them pretty and well. (If we are not sure on that, then, I think we can maybe do this for each as a safe choice thoug

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-15 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15053 What about if for the shared DF we also put the comment in the package level docstring so people reading the documentation would see it first but we wouldn't need to have it repeated for all of the

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @srowen - Thanks for your feedback. In my personal opinion, I believe examples in documentation should be self-contained, at least, to allow copy and paste them to test and let users understand

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-15 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15053 @HyukjinKwon I think you'd know better than I. I tend to agree it would make the examples more self-contained and readable. It would lead to some more duplication of code I guess, but if you judge it

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @srowen Would this sensible if we remove all the predefined dataframes as globals and convert them to within each doctest if this change looks good? --- If your project is set up for it, you ca

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 BTW - If you meant how they passed before not how they ran, it seems some dataframes were created before actually running the tests, [dataframe.py#L1600-L1617](https://github.com/mortada/spark/b

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Oh, it will run doc tests as far as I know, http://www.sphinx-doc.org/en/stable/ext/doctest.html Maybe I will try to run it locally to check it by myself. --- If your project is set up

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15053 Looks like the expected output doesn't match now. Maybe you all can teach me something -- how did this docstring work as a test at all before? or if it doesn't run correctly is the output not checked

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15053 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65242/ Test FAILed. ---

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15053 **[Test build #65242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)** for PR 15053 at commit [`52240bc`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15053 **[Test build #65242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)** for PR 15053 at commit [`52240bc`](https://github.com/apache/spark/commit/5

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15053 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15053 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat