Hi Seth and David,

I'm very happy to have your reply and suggestions. I would like to share my
thoughts here:

The main motivation we want to refactor the PyFlink doc is that we want to
make sure that the Python users could find all they want starting from the
PyFlink documentation mainpage. That’s, the PyFlink documentation should
have a catalogue which includes all the functionalities available in
PyFlink. However, this doesn’t mean that we will make a copy of the content
of the documentation in the other places. It may be just a reference/link
to the other documentation if needed. For the documentation added under
PyFlink mainpage, the principle is that it should only include Python
specific content, instead of making a copy of the Java content.

>>  I'm concerned that this proposal duplicates a lot of content that will
quickly get out of sync. It feels like it is documenting PyFlink separately
from the rest of the project.

Regarding the concerns about maintainability, as mentioned above, The goal
of this FLIP is to provide an intelligible entrance of Python API, and the
content in it should only contain the information which is useful for
Python users. There are indeed many agenda items that duplicate the Java
documents in this FLIP, but it doesn't mean the content would be copied
from Java documentation. i.e, if the content of the document is the same as
the corresponding Java document, we will add a link to the Java document.
e.g. the "Built-in functions" and "SQL". We only create a page for the
Python-only content, and then redirect to the Java document if there is
something shared with Java. e.g. "Connectors" and "Catalogs". If the
document is Python-only and already exists, we will move it from the old
python document to the new python document, e.g. "Configurations". If the
document is Python-only and not exists before, we will create a new page
for it. e.g. "DataTypes".

The main reason we create a new page for Python Data Types is that it is
only conceptually one-to-one correspondence with Java Data Types, but the
actual document content would be very different from Java DataTypes. Some
detailed difference are as following:



  - The text in the Java Data Types document is written for JVM-based
language users, which is incomprehensible to users who only understand
python.

  - Currently the Python Data Types does not support the "bridgedTo"
method, DataTypes.RAW, DataTypes.NULL and User Defined Types.

  - The section "Planner Compatibility" and "Data Type Extraction" are only
useful for Java/Scala users.

  - We want to add sections which may only apply for Python such as which
Data Types are currently supported in Python, the mapping between DataType
and Python object type, etc.

I think the root cause of such a difference with existing documents is
that, Python is the first non-JVM language we support in flink. This means
our previous method of sharing documents between Java and Scala may not be
suitable for Python. So we will adopt some very different methods to
provide documentation for Python users. Of course, we should reduce
maintenance costs as much as possible while ensuring user experience.
Furthermore, python is the first step of flink multi-language support, and
there may be R, Go, etc in future. it is very necessary for us to form main
page for each language, so that users of each type of language can focus on
the content which they care about.

>> Things like the cookbook and tutorial should be under the Try Flink
section of the documentation.

Regarding the position of the "Cookbook" section, in my sense the "Try
Flink" is for the new users and the "Cookbook" is for more advanced users,
i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
World” and In “Cookbook” we can add more use cases closer to production
business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
keep the current structure.

>>  it's relatively straightforward to compare the Python API with the Java
and Scala versions.

Regarding the comparison between Python API and Java/Scala API, I think the
majority of users, especially the beginner users, would not have this
demand. The priority of increasing user experience for beginner users seems
higher than it from my side. Would you please add more inputs for why user
want to compare? How much impact will the comparison be if we put it on
multiple pages :)

Thanks for all of your feedback and suggestions, any follow-up feedback is
welcome.

Best,

Jincheng


David Anderson <da...@alpinegizmo.com> 于2020年8月3日周一 下午10:49写道:

> Jincheng,
>
> One thing that I like about the way that the documentation is currently
> organized is that it's relatively straightforward to compare the Python API
> with the Java and Scala versions. I'm concerned that if the PyFlink docs
> are more independent, it will be challenging to respond to questions about
> which features from the other APIs are available from Python.
>
> David
>
> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
>
>> Would be great if you could join the contribution of PyFlink
>> documentation @Marta !
>> Thanks for all of the positive feedback. I will start a formal vote then
>> later...
>>
>> Best,
>> Jincheng
>>
>>
>> Shuiqiang Chen <acqua....@gmail.com> 于2020年8月3日周一 上午9:56写道:
>>
>> > Hi jincheng,
>> >
>> > Thanks for the discussion. +1 for the FLIP.
>> >
>> > A well-organized documentation will greatly improve the efficiency and
>> > experience for developers.
>> >
>> > Best,
>> > Shuiqiang
>> >
>> > Hequn Cheng <he...@apache.org> 于2020年8月1日周六 上午8:42写道:
>> >
>> >> Hi Jincheng,
>> >>
>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>> >>
>> >> I think this will bring big benefits for the PyFlink users. Currently,
>> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL
>> tab
>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>> mixed
>> >> with Java/Scala documentation. It is hard for users to have an
>> overview of
>> >> all the PyFlink documents. As more and more functionalities are added
>> into
>> >> PyFlink, I think it's time for us to refactor the document.
>> >>
>> >> Best,
>> >> Hequn
>> >>
>> >>
>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>> ma...@ververica.com>
>> >> wrote:
>> >>
>> >>> Hi, Jincheng!
>> >>>
>> >>> Thanks for creating this detailed FLIP, it will make a big difference
>> in
>> >>> the experience of Python developers using Flink. I'm interested in
>> >>> contributing to this work, so I'll reach out to you offline!
>> >>>
>> >>> Also, thanks for sharing some information on the adoption of PyFlink,
>> >>> it's
>> >>> great to see that there are already production users.
>> >>>
>> >>> Marta
>> >>>
>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com>
>> wrote:
>> >>>
>> >>> > Hi Jincheng,
>> >>> >
>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>> >>> >
>> >>> > Big +1 for improving the structure of PyFlink doc.
>> >>> >
>> >>> > It will be very friendly to give PyFlink users a unified entrance to
>> >>> learn
>> >>> > PyFlink documents.
>> >>> >
>> >>> > Best,
>> >>> > Xingbo
>> >>> >
>> >>> > Dian Fu <dian0511...@gmail.com> 于2020年7月31日周五 上午11:00写道:
>> >>> >
>> >>> >> Hi Jincheng,
>> >>> >>
>> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1
>> to
>> >>> >> improve the Python API doc.
>> >>> >>
>> >>> >> I have received many feedbacks from PyFlink beginners about
>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>> >>> mixed
>> >>> >> with the Java doc and it's not easy to find the docs he wants to
>> know.
>> >>> >>
>> >>> >> I think it would greatly improve the user experience if we can have
>> >>> one
>> >>> >> place which includes most knowledges PyFlink users should know.
>> >>> >>
>> >>> >> Regards,
>> >>> >> Dian
>> >>> >>
>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com> 写道:
>> >>> >>
>> >>> >> Hi folks,
>> >>> >>
>> >>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>> >>> grow.
>> >>> >> As far as I know there are many companies have used PyFlink for
>> data
>> >>> >> analysis, operation and maintenance monitoring business has been
>> put
>> >>> into
>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>> According
>> >>> to
>> >>> >> the feedback we received, current documentation is not very
>> friendly
>> >>> to
>> >>> >> PyFlink users. There are two shortcomings:
>> >>> >>
>> >>> >> - Python related content is mixed in the Java/Scala documentation,
>> >>> which
>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>> >>> >> - There is already a "Python Table API" section in the Table API
>> >>> document
>> >>> >> to store PyFlink documents, but the number of articles is small and
>> >>> the
>> >>> >> content is fragmented. It is difficult for beginners to learn from
>> it.
>> >>> >>
>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>> >>> >> documents will be added for those new APIs. In order to increase
>> the
>> >>> >> readability and maintainability of the PyFlink document, Wei Zhong
>> >>> and me
>> >>> >> have discussed offline and would like to rework it via this FLIP.
>> >>> >>
>> >>> >> We will rework the document around the following three objectives:
>> >>> >>
>> >>> >> - Add a separate section for Python API under the "Application
>> >>> >> Development" section.
>> >>> >> - Restructure current Python documentation to a brand new
>> structure to
>> >>> >> ensure complete content and friendly to beginners.
>> >>> >> - Improve the documents shared by Python/Java/Scala to make it more
>> >>> >> friendly to Python users and without affecting Java/Scala users.
>> >>> >>
>> >>> >> More detail can be found in the FLIP-133:
>> >>> >>
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>> >>> >>
>> >>> >> Best,
>> >>> >> Jincheng
>> >>> >>
>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>> >>> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>
>>
>

回复