Re: [PR] Add design process section to the docs [datafusion]

2025-06-16 Thread via GitHub


comphead merged PR #16397:
URL: https://github.com/apache/datafusion/pull/16397


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-16 Thread via GitHub


alamb commented on code in PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#discussion_r2149689081


##
docs/source/contributor-guide/index.md:
##
@@ -108,6 +108,26 @@ Features above) prior to acceptance include:
 [extensions list]: ../library-user-guide/extensions.md
 [design goal]: 
https://docs.rs/datafusion/latest/datafusion/index.html#design-goals
 
+### Design Build vs. Big Up Front Design
+
+Typically, the DataFusion community attacks large problems by solving them bit
+by bit and refining a solution iteratively on the `main` branch as a series of
+Pull Requests. This is different from projects which front-load the effort
+with a more comprehensive design process.
+
+By "advancing the front" we always make tangible progress, and the strategy is
+especially effective in a project that relies on individual contributors who 
may
+not have the time or resources to invest in a large upfront design effort.
+However, this "bit by bit approach" doesn't always succeed, and sometimes we 
get

Review Comment:
   I think the idea behind this sentence is to acknowledge the tradeoffs 
inherent in "design / build" vs "big design all upfront" (it is this tension 
that actually sparked the original comment in the first place))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-14 Thread via GitHub


comphead commented on code in PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#discussion_r2147429880


##
docs/source/contributor-guide/index.md:
##
@@ -108,6 +108,26 @@ Features above) prior to acceptance include:
 [extensions list]: ../library-user-guide/extensions.md
 [design goal]: 
https://docs.rs/datafusion/latest/datafusion/index.html#design-goals
 
+### Design Build vs. Big Up Front Design
+
+Typically, the DataFusion community attacks large problems by solving them bit
+by bit and refining a solution iteratively on the `main` branch as a series of
+Pull Requests. This is different from projects which front-load the effort
+with a more comprehensive design process.
+
+By "advancing the front" we always make tangible progress, and the strategy is
+especially effective in a project that relies on individual contributors who 
may
+not have the time or resources to invest in a large upfront design effort.
+However, this "bit by bit approach" doesn't always succeed, and sometimes we 
get

Review Comment:
   ```suggestion
   However, this "bit by bit approach" doesn't always succeed, and sometimes we 
get
   ```
   wondering if this is needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-14 Thread via GitHub


comphead commented on code in PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#discussion_r2147429475


##
docs/source/contributor-guide/index.md:
##
@@ -108,6 +108,26 @@ Features above) prior to acceptance include:
 [extensions list]: ../library-user-guide/extensions.md
 [design goal]: 
https://docs.rs/datafusion/latest/datafusion/index.html#design-goals
 
+### Design Build vs. Big Up Front Design
+
+Typically, the DataFusion community attacks large problems by solving them bit
+by bit and refining a solution iteratively on the `main` branch as a series of
+Pull Requests. This is different from projects which front-load the effort
+with a more comprehensive design process.
+
+By "advancing the front" we always make tangible progress, and the strategy is

Review Comment:
   ```suggestion
   By "advancing the front" the community always makes tangible progress, and 
the strategy is
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-13 Thread via GitHub


alamb commented on PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#issuecomment-2971466770

   > Relevant question to this text and the project is what the project's 
stance is wrt API stability? Merging fast means you're likely to ship something 
a little bit too quickly every now and then. I'm not saying it's a bad 
strategy, just wondering how you balance the tension between stability and 
velocity.
   
   I would say we "try not to do API churn but it happens every release". 
Indeed it has come up as a challenge for downstream users, though I would say 
it has been less of a challenge the last 6 months or so. There is more detail 
here
   - https://github.com/apache/datafusion/issues/13525
   
   The policy is documented here: 
https://datafusion.apache.org/contributor-guide/api-health.html
   
   You can get a sense of the kinds of changes required by looking at 
https://datafusion.apache.org/library-user-guide/upgrading.html
   
   Basically at some point I expect users of DataFusion will care enough about 
non breaking releases that they will want to contribute to helping make some 
release vehicle that has stable APIs (e.g. backport stuff to a LTS release for 
example)
   
   But until that happens we just keep cranking on the code and change APIs 
every month
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-13 Thread via GitHub


pepijnve commented on PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#issuecomment-2971445102

   Sorry to go a bit off topic for a sec, but there's some context I would like 
to add. I worked on API design of a commercial software library with tons of 
extension points for 10+ years where backwards compatibility of the public API 
was something we stuck to religiously because of the burden API breakage places 
on the entire user base. Doing that kind of work for an extended period of time 
makes you think three times about new API and all the hypothetical uses and 
abuses; perhaps a bit too much.
   
   Relevant question to this text and the project is what the project's stance 
is wrt API stability? Merging fast means you're likely to ship something a 
little bit too quickly every now and then. I'm not saying it's a bad strategy, 
just wondering how you balance the tension between stability and velocity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-13 Thread via GitHub


alamb commented on PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#issuecomment-2971392100

   > This is really nice, thanks @alamb!
   
   Thanks -- I was just channeling @ozankabak  :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add design process section to the docs [datafusion]

2025-06-13 Thread via GitHub


alamb commented on code in PR #16397:
URL: https://github.com/apache/datafusion/pull/16397#discussion_r2145138376


##
docs/source/contributor-guide/index.md:
##
@@ -108,6 +108,26 @@ Features above) prior to acceptance include:
 [extensions list]: ../library-user-guide/extensions.md
 [design goal]: 
https://docs.rs/datafusion/latest/datafusion/index.html#design-goals
 
+### Design Build vs. Big Up Front Design
+
+Typically, the DataFusion community attacks large problems by solving them bit
+by bit and refining a solution iteratively on the `main` branch as a series of
+Pull Requests. This is different from projects which front-load the effort
+with a more comprehensive design process.
+
+By "advancing the front" we always make tangible progress, and the strategy is
+especially effective in a project that relies on individual contributors who 
may
+not have the time or resources to invest in a large upfront design effort.
+However, this "bit by bit approach" doesn't always succeed, and sometimes we 
get
+stuck or go down the wrong path and then change directions.
+
+Our process necessarily results in imperfect solutions being the "state of the
+code" in some cases, and larger visions are not yet fully realized. However, 
the
+community is good at driving things to completion in the long run. If you see
+something that needs improvement or an area that is not yet fully realized,
+please consider submitting an issue or PR to improve it. We are always looking
+for more contributions.

Review Comment:
   Of course we always have to be 🎣 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[PR] Add design process section to the docs [datafusion]

2025-06-13 Thread via GitHub


alamb opened a new pull request, #16397:
URL: https://github.com/apache/datafusion/pull/16397

   ## Which issue does this PR close?
   
   - part of #7013 
   
   ## Rationale for this change
   
   While discussing a design for cancellation with @pepijnve and @zhuqi-lucas  
and myself, @ozankabak wrote a great summary of how the DataFusion community 
handles larger projects:
   - https://github.com/apache/datafusion/pull/16196#issuecomment-2956513724
   
   > Look, I see that you are trying to help and we do want to take it. I 
suspect we might be facing a "culture" challenge here: Typically, DF community 
attacks large problems by solving them bit by bit and refining a solution 
iteratively. This is unlike some other projects which front-load the effort by 
going through a more comprehensive design process. We also do that for some 
tasks where this iterative approach is not applicable, but it is not very 
common.
   > 
   > This "bit by bit approach" doesn't always succeed, every now and then it 
happens that we get stuck or go down the wrong path for a while, and then 
change tacks. However, we still typically prefer to "advance the front" and 
make progress in tangible ways as much as we can (if we see a way). This 
necessarily results in imperfect solutions being the "state of the code" in 
some cases, and they survive in the codebase for a while, but we are good at 
driving things to completion in the long run.
   
   
   I really liked that description and think it captures well the current state 
of the project, and thus would be valuable to make part of the docs
   
   
   ## What changes are included in this PR?
   
   Add a description of the design process to the Datafusion docs site
   
   ## Are these changes tested?
   By CI
   
   ## Are there any user-facing changes?
   
   New docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]