Thank you for your positive feedback Seth ! Would you please vote in the voting mail thread. Thank you!
Best, Jincheng Seth Wiesman <sjwies...@gmail.com> 于2020年8月10日周一 下午10:34写道: > I think this sounds good. +1 > > On Wed, Aug 5, 2020 at 8:37 PM jincheng sun <sunjincheng...@gmail.com> > wrote: > >> Hi David, Thank you for sharing the problems with the current document, >> and I agree with you as I also got the same feedback from Chinese users. I >> am often contacted by users to ask questions such as whether PyFlink >> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root >> cause of these problems is that our existing documents are based on Java >> users (text and API mixed part). Since Python is newly added from 1.9, many >> document information is not friendly to Python users. They don't want to >> look for Python content in unfamiliar Java documents. Just yesterday, there >> were complaints from Chinese users about where is all the document entries >> of Python API. So, have a centralized entry and clear document structure, >> which is the urgent demand of Python users. The original intention of FLIP >> is do our best to solve these user pain points. >> >> Hi Xingbo and Wei Thank you for sharing PySpark's status on document >> optimization. You're right. PySpark already has a lot of Python user >> groups. They also find that Python user community is an important position >> for multilingual support. The centralization and unification of Python >> document content will reduce the learning cost of Python users, and good >> document structure and content will also reduce the Q & A burden of the >> community, It's a once and for all job. >> >> Hi Seth, I wonder if your concerns have been resolved through the >> previous discussion? >> >> Anyway, the principle of FLIP is that in python document should only >> include Python specific content, instead of making a copy of the Java >> content. And would be great to have you to join in the improvement for >> PyFlink (Both PRs and Review PRs). >> >> Best, >> Jincheng >> >> >> Wei Zhong <weizhong0...@gmail.com> 于2020年8月5日周三 下午5:46写道: >> >>> Hi Xingbo, >>> >>> Thanks for your information. >>> >>> I think the PySpark's documentation redesigning deserves our attention. >>> It seems that the Spark community has also begun to treat the user >>> experience of Python documentation more seriously. We can continue to pay >>> attention to the discussion and progress of the redesigning in the Spark >>> community. It is so similar to our working that there should be some ideas >>> worthy for us. >>> >>> Best, >>> Wei >>> >>> >>> 在 2020年8月5日,15:02,Xingbo Huang <hxbks...@gmail.com> 写道: >>> >>> Hi, >>> >>> I found that the spark community is also working on redesigning pyspark >>> documentation[1] recently. Maybe we can compare the difference between our >>> document structure and its document structure. >>> >>> [1] https://issues.apache.org/jira/browse/SPARK-31851 >>> >>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html >>> >>> Best, >>> Xingbo >>> >>> David Anderson <da...@alpinegizmo.com> 于2020年8月5日周三 上午3:17写道: >>> >>>> I'm delighted to see energy going into improving the documentation. >>>> >>>> With the current documentation, I get a lot of questions that I believe >>>> reflect two fundamental problems with what we currently provide: >>>> >>>> (1) We have a lot of contextual information in our heads about how >>>> Flink works, and we are able to use that knowledge to make reasonable >>>> inferences about how things (probably) work in cases we aren't so familiar >>>> with. For example, I get a lot of questions of the form "If I use <this >>>> feature> will I still have exactly once guarantees?" The answer is always >>>> yes, but they continue to have doubts because we have failed to clearly >>>> communicate this fundamental, underlying principle. >>>> >>>> This specific example about fault tolerance applies across all of the >>>> Flink docs, but the general idea can also be applied to the Table/SQL and >>>> PyFlink docs. The guiding principles underlying these APIs should be >>>> written down in one easy-to-find place. >>>> >>>> (2) The other kind of question I get a lot is "Can I do <X> with <Y>?" >>>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be >>>> very difficult to answer because it is frequently the case that one has to >>>> reason about why a given feature doesn't seem to appear in the >>>> documentation. It could be that I'm looking in the wrong place, or it could >>>> be that someone forgot to document something, or it could be that it can in >>>> fact be done by applying a general mechanism in a specific way that I >>>> haven't thought of -- as in this case, where one can use a JDBC sink from >>>> Python if one thinks to use DDL. >>>> >>>> So I think it would be helpful to be explicit about both what is, and >>>> what is not, supported in PyFlink. And to have some very clear organizing >>>> principles in the documentation so that users can quickly learn where to >>>> look for specific facts. >>>> >>>> Regards, >>>> David >>>> >>>> >>>> On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <sunjincheng...@gmail.com> >>>> wrote: >>>> >>>>> Hi Seth and David, >>>>> >>>>> I'm very happy to have your reply and suggestions. I would like to >>>>> share my thoughts here: >>>>> >>>>> The main motivation we want to refactor the PyFlink doc is that we >>>>> want to make sure that the Python users could find all they want starting >>>>> from the PyFlink documentation mainpage. That’s, the PyFlink documentation >>>>> should have a catalogue which includes all the functionalities available >>>>> in >>>>> PyFlink. However, this doesn’t mean that we will make a copy of the >>>>> content >>>>> of the documentation in the other places. It may be just a reference/link >>>>> to the other documentation if needed. For the documentation added under >>>>> PyFlink mainpage, the principle is that it should only include Python >>>>> specific content, instead of making a copy of the Java content. >>>>> >>>>> >> I'm concerned that this proposal duplicates a lot of content that >>>>> will quickly get out of sync. It feels like it is documenting PyFlink >>>>> separately from the rest of the project. >>>>> >>>>> Regarding the concerns about maintainability, as mentioned above, The >>>>> goal of this FLIP is to provide an intelligible entrance of Python API, >>>>> and >>>>> the content in it should only contain the information which is useful for >>>>> Python users. There are indeed many agenda items that duplicate the Java >>>>> documents in this FLIP, but it doesn't mean the content would be copied >>>>> from Java documentation. i.e, if the content of the document is the same >>>>> as >>>>> the corresponding Java document, we will add a link to the Java document. >>>>> e.g. the "Built-in functions" and "SQL". We only create a page for the >>>>> Python-only content, and then redirect to the Java document if there is >>>>> something shared with Java. e.g. "Connectors" and "Catalogs". If the >>>>> document is Python-only and already exists, we will move it from the old >>>>> python document to the new python document, e.g. "Configurations". If the >>>>> document is Python-only and not exists before, we will create a new page >>>>> for it. e.g. "DataTypes". >>>>> >>>>> The main reason we create a new page for Python Data Types is that it >>>>> is only conceptually one-to-one correspondence with Java Data Types, but >>>>> the actual document content would be very different from Java DataTypes. >>>>> Some detailed difference are as following: >>>>> >>>>> >>>>> - The text in the Java Data Types document is written for JVM-based >>>>> language users, which is incomprehensible to users who only understand >>>>> python. >>>>> - Currently the Python Data Types does not support the "bridgedTo" >>>>> method, DataTypes.RAW, DataTypes.NULL and User Defined Types. >>>>> - The section "Planner Compatibility" and "Data Type Extraction" are >>>>> only useful for Java/Scala users. >>>>> - We want to add sections which may only apply for Python such as >>>>> which Data Types are currently supported in Python, the mapping between >>>>> DataType and Python object type, etc. >>>>> >>>>> I think the root cause of such a difference with existing documents is >>>>> that, Python is the first non-JVM language we support in flink. This means >>>>> our previous method of sharing documents between Java and Scala may not be >>>>> suitable for Python. So we will adopt some very different methods to >>>>> provide documentation for Python users. Of course, we should reduce >>>>> maintenance costs as much as possible while ensuring user experience. >>>>> Furthermore, python is the first step of flink multi-language support, and >>>>> there may be R, Go, etc in future. it is very necessary for us to form >>>>> main >>>>> page for each language, so that users of each type of language can focus >>>>> on >>>>> the content which they care about. >>>>> >>>>> >> Things like the cookbook and tutorial should be under the Try Flink >>>>> section of the documentation. >>>>> >>>>> Regarding the position of the "Cookbook" section, in my sense the "Try >>>>> Flink" is for the new users and the "Cookbook" is for more advanced users, >>>>> i.e., In “Try Flink” can be the simplest end-to-end example, such as >>>>> “Hello >>>>> World” and In “Cookbook” we can add more use cases closer to production >>>>> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to >>>>> keep the current structure. >>>>> >>>>> >> it's relatively straightforward to compare the Python API with the >>>>> Java and Scala versions. >>>>> >>>>> Regarding the comparison between Python API and Java/Scala API, I >>>>> think the majority of users, especially the beginner users, would not have >>>>> this demand. The priority of increasing user experience for beginner users >>>>> seems higher than it from my side. Would you please add more inputs for >>>>> why >>>>> user want to compare? How much impact will the comparison be if we put it >>>>> on multiple pages :) >>>>> >>>>> Thanks for all of your feedback and suggestions, any follow-up >>>>> feedback is welcome. >>>>> >>>>> Best, >>>>> Jincheng >>>>> >>>>> >>>>> David Anderson <da...@alpinegizmo.com> 于2020年8月3日周一 下午10:49写道: >>>>> >>>>>> Jincheng, >>>>>> >>>>>> One thing that I like about the way that the documentation is >>>>>> currently organized is that it's relatively straightforward to compare >>>>>> the >>>>>> Python API with the Java and Scala versions. I'm concerned that if the >>>>>> PyFlink docs are more independent, it will be challenging to respond to >>>>>> questions about which features from the other APIs are available from >>>>>> Python. >>>>>> >>>>>> David >>>>>> >>>>>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Would be great if you could join the contribution of PyFlink >>>>>>> documentation @Marta ! >>>>>>> Thanks for all of the positive feedback. I will start a formal vote >>>>>>> then >>>>>>> later... >>>>>>> >>>>>>> Best, >>>>>>> Jincheng >>>>>>> >>>>>>> >>>>>>> Shuiqiang Chen <acqua....@gmail.com> 于2020年8月3日周一 上午9:56写道: >>>>>>> >>>>>>> > Hi jincheng, >>>>>>> > >>>>>>> > Thanks for the discussion. +1 for the FLIP. >>>>>>> > >>>>>>> > A well-organized documentation will greatly improve the efficiency >>>>>>> and >>>>>>> > experience for developers. >>>>>>> > >>>>>>> > Best, >>>>>>> > Shuiqiang >>>>>>> > >>>>>>> > Hequn Cheng <he...@apache.org> 于2020年8月1日周六 上午8:42写道: >>>>>>> > >>>>>>> >> Hi Jincheng, >>>>>>> >> >>>>>>> >> Thanks a lot for raising the discussion. +1 for the FLIP. >>>>>>> >> >>>>>>> >> I think this will bring big benefits for the PyFlink users. >>>>>>> Currently, >>>>>>> >> the Python TableAPI document is hidden deeply under the >>>>>>> TableAPI&SQL tab >>>>>>> >> which makes it quite unreadable. Also, the PyFlink documentation >>>>>>> is mixed >>>>>>> >> with Java/Scala documentation. It is hard for users to have an >>>>>>> overview of >>>>>>> >> all the PyFlink documents. As more and more functionalities are >>>>>>> added into >>>>>>> >> PyFlink, I think it's time for us to refactor the document. >>>>>>> >> >>>>>>> >> Best, >>>>>>> >> Hequn >>>>>>> >> >>>>>>> >> >>>>>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira < >>>>>>> ma...@ververica.com> >>>>>>> >> wrote: >>>>>>> >> >>>>>>> >>> Hi, Jincheng! >>>>>>> >>> >>>>>>> >>> Thanks for creating this detailed FLIP, it will make a big >>>>>>> difference in >>>>>>> >>> the experience of Python developers using Flink. I'm interested >>>>>>> in >>>>>>> >>> contributing to this work, so I'll reach out to you offline! >>>>>>> >>> >>>>>>> >>> Also, thanks for sharing some information on the adoption of >>>>>>> PyFlink, >>>>>>> >>> it's >>>>>>> >>> great to see that there are already production users. >>>>>>> >>> >>>>>>> >>> Marta >>>>>>> >>> >>>>>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com> >>>>>>> wrote: >>>>>>> >>> >>>>>>> >>> > Hi Jincheng, >>>>>>> >>> > >>>>>>> >>> > Thanks a lot for bringing up this discussion and the proposal. >>>>>>> >>> > >>>>>>> >>> > Big +1 for improving the structure of PyFlink doc. >>>>>>> >>> > >>>>>>> >>> > It will be very friendly to give PyFlink users a unified >>>>>>> entrance to >>>>>>> >>> learn >>>>>>> >>> > PyFlink documents. >>>>>>> >>> > >>>>>>> >>> > Best, >>>>>>> >>> > Xingbo >>>>>>> >>> > >>>>>>> >>> > Dian Fu <dian0511...@gmail.com> 于2020年7月31日周五 上午11:00写道: >>>>>>> >>> > >>>>>>> >>> >> Hi Jincheng, >>>>>>> >>> >> >>>>>>> >>> >> Thanks a lot for bringing up this discussion and the >>>>>>> proposal. +1 to >>>>>>> >>> >> improve the Python API doc. >>>>>>> >>> >> >>>>>>> >>> >> I have received many feedbacks from PyFlink beginners about >>>>>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python >>>>>>> doc is >>>>>>> >>> mixed >>>>>>> >>> >> with the Java doc and it's not easy to find the docs he wants >>>>>>> to know. >>>>>>> >>> >> >>>>>>> >>> >> I think it would greatly improve the user experience if we >>>>>>> can have >>>>>>> >>> one >>>>>>> >>> >> place which includes most knowledges PyFlink users should >>>>>>> know. >>>>>>> >>> >> >>>>>>> >>> >> Regards, >>>>>>> >>> >> Dian >>>>>>> >>> >> >>>>>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com> >>>>>>> 写道: >>>>>>> >>> >> >>>>>>> >>> >> Hi folks, >>>>>>> >>> >> >>>>>>> >>> >> Since the release of Flink 1.11, users of PyFlink have >>>>>>> continued to >>>>>>> >>> grow. >>>>>>> >>> >> As far as I know there are many companies have used PyFlink >>>>>>> for data >>>>>>> >>> >> analysis, operation and maintenance monitoring business has >>>>>>> been put >>>>>>> >>> into >>>>>>> >>> >> production(Such as 聚美优品[1](Jumei), 浙江墨芷[2] (Mozhi) etc.). >>>>>>> According >>>>>>> >>> to >>>>>>> >>> >> the feedback we received, current documentation is not very >>>>>>> friendly >>>>>>> >>> to >>>>>>> >>> >> PyFlink users. There are two shortcomings: >>>>>>> >>> >> >>>>>>> >>> >> - Python related content is mixed in the Java/Scala >>>>>>> documentation, >>>>>>> >>> which >>>>>>> >>> >> makes it difficult for users who only focus on PyFlink to >>>>>>> read. >>>>>>> >>> >> - There is already a "Python Table API" section in the Table >>>>>>> API >>>>>>> >>> document >>>>>>> >>> >> to store PyFlink documents, but the number of articles is >>>>>>> small and >>>>>>> >>> the >>>>>>> >>> >> content is fragmented. It is difficult for beginners to learn >>>>>>> from it. >>>>>>> >>> >> >>>>>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. >>>>>>> Many >>>>>>> >>> >> documents will be added for those new APIs. In order to >>>>>>> increase the >>>>>>> >>> >> readability and maintainability of the PyFlink document, Wei >>>>>>> Zhong >>>>>>> >>> and me >>>>>>> >>> >> have discussed offline and would like to rework it via this >>>>>>> FLIP. >>>>>>> >>> >> >>>>>>> >>> >> We will rework the document around the following three >>>>>>> objectives: >>>>>>> >>> >> >>>>>>> >>> >> - Add a separate section for Python API under the "Application >>>>>>> >>> >> Development" section. >>>>>>> >>> >> - Restructure current Python documentation to a brand new >>>>>>> structure to >>>>>>> >>> >> ensure complete content and friendly to beginners. >>>>>>> >>> >> - Improve the documents shared by Python/Java/Scala to make >>>>>>> it more >>>>>>> >>> >> friendly to Python users and without affecting Java/Scala >>>>>>> users. >>>>>>> >>> >> >>>>>>> >>> >> More detail can be found in the FLIP-133: >>>>>>> >>> >> >>>>>>> >>> >>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation >>>>>>> >>> >> >>>>>>> >>> >> Best, >>>>>>> >>> >> Jincheng >>>>>>> >>> >> >>>>>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg >>>>>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> >>> >>>>>>> >> >>>>>>> >>>>>> >>>