+1 (non-binding)

Kazu

> On Mar 1, 2024, at 5:44 PM, L. C. Hsieh <vii...@gmail.com> wrote:
> 
> +1 (binding)
> 
> On Fri, Mar 1, 2024 at 1:25 PM Joris Van den Bossche
> <jorisvandenboss...@gmail.com> wrote:
>> 
>> +1 (binding)
>> 
>> On Fri, 1 Mar 2024 at 22:18, Sutou Kouhei <k...@clear-code.com> wrote:
>>> 
>>> +1
>>> 
>>> In <CAFhtnRy2J9GCU6e2K56-KPVc=gawemuipeyhmnwcd+htkfa...@mail.gmail.com>
>>>  "[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project" 
>>> on Fri, 1 Mar 2024 06:33:08 -0500,
>>>  Andrew Lamb <al...@influxdata.com> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> As we have discussed[1][2] I would like to vote on the proposal to
>>>> create a new Apache Top Level Project for DataFusion. The text of the
>>>> proposed resolution and background document is copy/pasted below
>>>> 
>>>> If the community is in favor of this, we plan to submit the resolution
>>>> to the ASF board for approval with the next Arrow report (for the
>>>> April 2024 board meeting).
>>>> 
>>>> The vote will be open for at least 7 days.
>>>> 
>>>> [ ] +1 Accept this Proposal
>>>> [ ] +0
>>>> [ ] -1 Do not accept this proposal because...
>>>> 
>>>> Andrew
>>>> 
>>>> [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>>>> [2] https://github.com/apache/arrow-datafusion/discussions/6475
>>>> 
>>>> ---------- Proposed Resolution ---------
>>>> 
>>>> Resolution to Create the Apache DataFusion Project from the Apache
>>>> Arrow DataFusion Sub Project
>>>> 
>>>> =============================================================
>>>> 
>>>> X. Establish the Apache DataFusion Project
>>>> 
>>>> WHEREAS, the Board of Directors deems it to be in the best
>>>> interests of the Foundation and consistent with the
>>>> Foundation's purpose to establish a Project Management
>>>> Committee charged with the creation and maintenance of
>>>> open-source software related to an extensible query engine
>>>> for distribution at no charge to the public.
>>>> 
>>>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>>> Committee (PMC), to be known as the "Apache DataFusion Project",
>>>> be and hereby is established pursuant to Bylaws of the
>>>> Foundation; and be it further
>>>> 
>>>> RESOLVED, that the Apache DataFusion Project be and hereby is
>>>> responsible for the creation and maintenance of software
>>>> related to an extensible query engine; and be it further
>>>> 
>>>> RESOLVED, that the office of "Vice President, Apache DataFusion" be
>>>> and hereby is created, the person holding such office to
>>>> serve at the direction of the Board of Directors as the chair
>>>> of the Apache DataFusion Project, and to have primary responsibility
>>>> for management of the projects within the scope of
>>>> responsibility of the Apache DataFusion Project; and be it further
>>>> 
>>>> RESOLVED, that the persons listed immediately below be and
>>>> hereby are appointed to serve as the initial members of the
>>>> Apache DataFusion Project:
>>>> 
>>>> * Andy Grove (agr...@apache.org)
>>>> * Andrew Lamb (al...@apache.org)
>>>> * Daniël Heres (dhe...@apache.org)
>>>> * Jie Wen (jake...@apache.org)
>>>> * Kun Liu (liu...@apache.org)
>>>> * Liang-Chi Hsieh (vii...@apache.org)
>>>> * Qingping Hou: (ho...@apache.org)
>>>> * Wes McKinney(w...@apache.org)
>>>> * Will Jones (wjones...@apache.org)
>>>> 
>>>> RESOLVED, that the Apache DataFusion Project be and hereby
>>>> is tasked with the migration and rationalization of the Apache
>>>> Arrow DataFusion sub-project; and be it further
>>>> 
>>>> RESOLVED, that all responsibilities pertaining to the Apache
>>>> Arrow DataFusion sub-project encumbered upon the
>>>> Apache Arrow Project are hereafter discharged.
>>>> 
>>>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
>>>> be appointed to the office of Vice President, Apache DataFusion, to
>>>> serve in accordance with and subject to the direction of the
>>>> Board of Directors and the Bylaws of the Foundation until
>>>> death, resignation, retirement, removal or disqualification,
>>>> or until a successor is appointed.
>>>> =============================================================
>>>> 
>>>> 
>>>> -------
>>>> 
>>>> 
>>>> Summary:
>>>> 
>>>> We propose creating a new top level project, Apache DataFusion, from
>>>> an existing sub project of Apache Arrow to facilitate additional
>>>> community and project growth.
>>>> 
>>>> Abstract
>>>> 
>>>> Apache Arrow DataFusion[1]  is a very fast, extensible query engine
>>>> for building high-quality data-centric systems in Rust, using the
>>>> Apache Arrow in-memory format. DataFusion offers SQL and Dataframe
>>>> APIs, excellent performance, built-in support for CSV, Parquet, JSON,
>>>> and Avro, extensive customization, and a great community.
>>>> 
>>>> [1] https://arrow.apache.org/datafusion/
>>>> 
>>>> 
>>>> Proposal
>>>> 
>>>> We propose creating a new top level ASF project, Apache DataFusion,
>>>> governed initially by a subset of the Apache Arrow project’s PMC and
>>>> committers. The project’s code is in five existing git repositories,
>>>> currently governed by Apache Arrow which would transfer to the new top
>>>> level project.
>>>> 
>>>> Background
>>>> 
>>>> When DataFusion was initially donated to the Arrow project, it did not
>>>> have a strong enough community to stand on its own. It has since grown
>>>> significantly, and benefited immensely from being part of Arrow and
>>>> nurturing of the Apache Way, and now has a community strong enough to
>>>> stand on its own and that would benefit from focused governance
>>>> attention.
>>>> 
>>>> The community has discussed this idea publicly for more than 6 months
>>>> https://github.com/apache/arrow-datafusion/discussions/6475  and
>>>> briefly on the Arrow PMC mailing list
>>>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As
>>>> of the time of this writing both had exclusively positive reactions.
>>>> 
>>>> Several current members of the Arrow PMC are both active contributors
>>>> to DataFusion and understand and believe deeply in the Apache Way, and
>>>> play active governance roles in the Arrow project as PMC members and
>>>> PMC chairs, guiding the community, and releasing software versions.
>>>> With this existing governance experience and structure, the new top
>>>> level project will be able to function well immediately and
>>>> independently.
>>>> 
>>>> Overview of DataFusion
>>>> 
>>>> Current Status
>>>> 
>>>> Meritocracy
>>>> 
>>>> DataFusion has been developed as part of Apache Arrow and thus has
>>>> been operating as a meritocracy. Many of the developers of DataFusion
>>>> are Arrow PMC members or committers. The DataFusion project plans to
>>>> continue adding new PMC and committers as the project matures and
>>>> grows.
>>>> 
>>>> Community
>>>> 
>>>> The DataFusion development team seeks to foster the development and
>>>> user communities. We hope that becoming a separate project will help
>>>> both Arrow and DataFusion communities by being more focused.  Focused
>>>> governance will make it easier to grow the community of committers and
>>>> PMC members and make the organization more clear to others.
>>>> 
>>>> Alignment
>>>> 
>>>> The ASF is a natural host for DataFusion given that it is already the
>>>> home of Arrow, Parquet, and other related distributed system, storage
>>>> and query execution systems.
>>>> 
>>>> Project Leadership
>>>> 
>>>> Proposed Initial PMC
>>>> 
>>>> We propose the following people as the initial DataFusion PMC members.
>>>> This is a subset of the existing Arrow PMC members who contribute to
>>>> DataFusion https://people.apache.org/phonebook.html?unix=arrow
>>>> 
>>>> Andy Grove (agrove):  Arrow PMC Chair
>>>> Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair
>>>> Daniël Heres (dheres) Arrow PMC
>>>> Jie Wen (jakevin):  Arrow PMC, Doris Committer
>>>> Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC
>>>> Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC
>>>> Qingping Hou: (houqp): Arrow PMC
>>>> Wes McKinney(wesm): Arrow PMC, ASF Member
>>>> Will Jones (wjones127): Arrow PMC
>>>> 
>>>> We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF
>>>> VP) for the DataFusion project.
>>>> 
>>>> Affiliations
>>>> 
>>>> Andy Grove (agrove):  NVidia
>>>> Andrew Lamb (alamb): InfluxData
>>>> Daniël Heres (dheres): Coralogix
>>>> Jie Wen (jakevin): SelectDB
>>>> Kun Liu (liukun): Ebay
>>>> Liang-Chi Hsieh (viirya): Apple
>>>> Qingping Hou: (houqp): Scribd
>>>> Wes McKinney(wesm): Posit
>>>> Will Jones (wjones127): LanceDB
>>>> 
>>>> Proposed Initial Committers
>>>> 
>>>> In addition to the PMC, we propose the following people as the initial
>>>> DataFusion committers. This is a subset of the existing Arrow
>>>> committers who contribute to DataFusion
>>>> https://people.apache.org/phonebook.html?unix=arrow
>>>> 
>>>> akurmustafa Mustafa Akur (Synnada)
>>>> avantgardner Brent Gardner (Coralogix)
>>>> comphead Oleks V. (Unaffiliated)
>>>> jayzhan Jay Zhan (Unaffiliated)
>>>> jeffreyvo Jeffry Vo (Unaffiliated)
>>>> jiayuliu Liu Jiayu (Airbnb)
>>>> mete Metehan Yildirim (Synnada)
>>>> mingmwang Wang Mingming (Ebay)
>>>> mneumann Marco Neumann (InfluxData)
>>>> nju_yaho Zhong Yanghong (Ebay)
>>>> ozankabak Mehmet Ozan Kabak (Synnada)
>>>> paddyhoran Paddy Horan (Assured Allies)
>>>> rdettai Rémi Dettai (Cloudfuse)
>>>> sunchao Chao Sun (Apple)
>>>> thinkharderdev Daniel Harris (Coralogix)
>>>> tustvold Raphael Taylor-Davies (InfluxData)
>>>> wayne Ruihang Xia (Greptime)
>>>> xudong963 Xudong Wang (ByteDance)
>>>> yjshen Yijie Shen (Space and Time)
>>>> yangjiang Yang Jiang (ebay)
>>>> 
>>>> 
>>>> Risk Assessments
>>>> 
>>>> Naming / Trademarks
>>>> 
>>>> As a sub-project of Arrow, the DataFusion name has been used for over
>>>> 4 years without any known issues. A podling name search did not turn
>>>> up any concerns and was approved:
>>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>>> 
>>>> Legal / IP Clearance
>>>> 
>>>> All DataFusion code has either been donated to the Arrow project with
>>>> appropriate IP clearance or  has been developed directly under ASF
>>>> processes and procedures. Thus creating a new top level project poses
>>>> no new Legal or IP risks.
>>>> 
>>>> Code Extraction
>>>> 
>>>> The relevant code is already in 5 separate repositories:
>>>> https://github.com/apache/arrow-datafusion/
>>>> https://github.com/apache/arrow-datafusion-python
>>>> https://github.com/apache/arrow-ballista
>>>> https://github.com/apache/arrow-ballista-python
>>>> https://github.com/apache/arrow-datafusion-comet
>>>> 
>>>> We foresee no issues with code extraction and propose these
>>>> repositories be  renamed to reflect top level projects
>>>> 
>>>> Note:  https://github.com/apache/arrow-rs, the Rust implementation of
>>>> Arrow, would remain part of the Arrow project.
>>>> 
>>>> Orphaned Products
>>>> 
>>>> DataFusion is known to be used in many open source and commercial
>>>> projects 
>>>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users,
>>>> has had multiple commits daily for several years, and its adoption and
>>>> number of contributors appears to be growing. We do not foresee the
>>>> project being orphaned in the next several years.
>>>> 
>>>> Inexperience with Open Source
>>>> 
>>>> The proposed PMC has extensive experience with Apache Arrow and other
>>>> Apache projects, and includes PMC members, PMC chairs and an ASF
>>>> Member. The DataFusion PMC and more experienced committers will
>>>> continue to coach new community members who may be less familiar with
>>>> the Apache Way.
>>>> 
>>>> Homogeneous Developers
>>>> 
>>>> The 9 proposed PMC members are from 9 different employers and the
>>>> proposed committers are similarly distributed across affiliations. No
>>>> specific entity employs more than 3 total proposed developers.
>>>> 
>>>> Reliance on Salaried Developers
>>>> 
>>>> A substantial amount of work on DataFusion has been by salaried
>>>> developers, but it also has a long tradition of attracting
>>>> contributions from students and hobbyists and we plan no changes in
>>>> contribution structure.
>>>> 
>>>> Relationships with Other Apache Products
>>>> 
>>>> DataFusion will obviously have a strong relationship with the Arrow
>>>> project given the overlap in people. We don’t foresee close
>>>> collaboration with other projects at this time.
>>>> 
>>>> Cryptography
>>>> 
>>>> DataFusion does not directly support encryption and there are no
>>>> near-term plans to add support for encryption. Users who need this
>>>> functionality can use the extension APIs.
>>>> 
>>>> Required Resources
>>>> 
>>>> Mailing Lists
>>>> 
>>>> - priv...@datafusion.apache.org for private PMC discussions (with
>>>> moderated subscriptions)
>>>> - d...@datafusion.apache.org
>>>> - comm...@datafusion.apache.org
>>>> - u...@datafusion.apache.org
>>>> 
>>>> Version Control
>>>> 
>>>> We propose to continue to use git for source control and github for
>>>> hosting and testing resources.
>>>> 
>>>> We also need to rename the github repositories to reflect the new top
>>>> level names:
>>>> 
>>>> https://github.com/apache/arrow-datafusion/ → apache/datafusion
>>>> https://github.com/apache/arrow-datafusion-python → 
>>>> apache/datafusion-python
>>>> https://github.com/apache/arrow-ballista → apache/datafusion-ballista
>>>> https://github.com/apache/arrow-ballista-python  →
>>>> apache/datafusion-ballista-python
>>>> https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet
>>>> 
>>>> 
>>>> 
>>>> Issue Tracking
>>>> 
>>>> DataFusion would continue to use github for its issue tracking and
>>>> communications
>>>> 
>>>> Other Resources
>>>> 
>>>> The existing repositories already make use of existing Apache
>>>> infrastructure, and we expect no change in the initial resource usage.
>>>> As the project continues to grow, we expect continued infrastructure
>>>> demand growth.
>>>> 
>>>> 
>>>> FAQ: Has a sub project been promoted to a top level project before?
>>>> 
>>>> Yes, and it appears to happen commonly. The Arrow project itself was
>>>> created as a top level project from work that started in Apache Drill,
>>>> and there are many sub projects of Hadoop that spun out as their own
>>>> top level projects such as Mahout, Avro and HBase:
>>>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4
>>>> 
>>>> 
>>>> 
>>>> Related material:
>>>> Name search request / research for DataFusion:
>>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>>> Discussion about this proposal on the arrow mailing list:
>>>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>>>> Discussion about which repositories on the arrow mailing list:
>>>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q
>>>> Discussion about initial PMC on the arrow mailing list:
>>>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b
>>>> Discussion in github about creating a new DataFusion top level
>>>> project: https://github.com/apache/arrow-datafusion/discussions/6475
>>>> Discussion about graduating on incubator list:
>>>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99
>>>> Original Proposal for the Arrow project:
>>>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3

Reply via email to