Having worked on a major feature in a feature branch, I have some thoughts and observations on feature branch development.
IMO feature branch development v. direct commits to trunk in piecemeal is really a choice of *granularity*. Do we want a series of fine-grained state changes on trunk or fewer coarse-grained chunks of commits on trunk? This makes me favor a branch-based development model for any "decent-sized" features (we'll need to define "decent-sized" of course). Once you have coarse-grained changes, it's easier to reason about what made what release and in what state. As importantly, it makes it easier to back out a complete feature fairly easily if that becomes necessary. My totally unscientific suggestion may be if a feature takes more than dozen commits and longer than a month, we should probably have a bias towards a feature branch. Branch-based development also makes you go faster if your feature is larger. I wouldn't do it the other way for timeline service v.2 for example. That said, feature branches don't come for free. Now the onus is on the feature developer to constantly rebase with the trunk to keep it reasonably integrated with the trunk. More logistics is involved for the feature developer. Another big question is, when a feature branch gets big and it's time to merge, would it get as scrutinized as a series of individual commits? Since the size of merge can be big, you kind of have to rely on those feature committers and those who help them. In terms of integrating/stabilizing, I don't think branch development necessarily makes it harder. It is again granularity. In case of direct commits on trunk, you do a lot more fine-grained integrations. In case of branch development, you do far fewer coarse-grained integrations via rebasing. If more people are doing branch-based development, it makes rebasing easier to manage too. Going back to the related topic of where to release (trunk v. branch-X), I think that is more of a proxy of the real question of "how do we maintain quality and stability of the trunk?". Even if we release from the trunk, if our bar for merging to trunk is low, the quality will not improve automatically. So I think we ought to tackle the quality question first. My 2 cents. On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <z...@apache.org> wrote: > Thanks for the notes Andrew, Junping, Karthik. > > Here are some of my understandings: > > - Trunk is the "latest and greatest" of Hadoop. If a user starts using > Hadoop today, without legacy workloads, trunk is what he/she should use. > - Therefore, each commit to trunk should be transactional -- atomic, > consistent, isolated (from other uncommitted patches); I'm not so sure > about durability, Hadoop might be gone in 50 years :). As a committer, I > should be able to look at a patch and determine whether it's a > self-contained improvement of trunk, without looking at other uncommitted > patches. > - Some comments inline: > > On Fri, Jun 10, 2016 at 6:56 AM Junping Du <j...@hortonworks.com> wrote: > > > Comparing with advantages, I believe the disadvantages of shipping any > > releases directly from trunk are more obvious and significant: > > - A lot of commits (incompatible, risky, uncompleted feature, etc.) have > > to wait to commit to trunk or put into a separated branch that could > delay > > feature development progress as additional vote process get involved even > > the feature is simple and harmless. > > > Thanks Junping, those are valid concerns. I think we should clearly > separate incompatible with uncompleted / half-done work in this > discussion. Whether people should commit incompatible changes to trunk is a > much more tricky question (related to trunk-incompat etc.). But per my > comment above, IMHO, *not committing uncompleted work to trunk* should be a > much easier principle to agree upon. > > > > - For small feature with only 1 or 2 commits, that need three +1 from > PMCs > > will increase the bar largely for contributors who just start to > contribute > > on Hadoop features but no such sufficient support. > > > Development overhead is another valid concern. I think our rule-of-thumb > should be that, small-medium new features should be proposed as a single > JIRA/patch (as we recently did for HADOOP-12666). If the complexity goes > beyond a single JIRA/patch, use a feature branch. > > > > > > Given these concerns, I am open to other options, like: proposed by Vinod > > or Chris, but rather than to release anything directly from trunk. > > > > - This point doesn't necessarily need to be resolved now though, since > > again we're still doing alphas. > > No. I think we have to settle down this first. Without a common agreed > and > > transparent release process and branches in community, any release > (alpha, > > beta) bits is only called a private release but not a official apache > > hadoop release (even alpha). > > > > > > Thanks, > > > > Junping > > ________________________________________ > > From: Karthik Kambatla <ka...@cloudera.com> > > Sent: Friday, June 10, 2016 7:49 AM > > To: Andrew Wang > > Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; > > mapreduce-...@hadoop.apache.org; yarn-dev@hadoop.apache.org > > Subject: Re: [DISCUSS] Increased use of feature branches > > > > Thanks for restarting this thread Andrew. I really hope we can get this > > across to a VOTE so it is clear. > > > > I see a few advantages shipping from trunk: > > > > - The lack of need for one additional backport each time. > > - Feature rot in trunk > > > > Instead of creating branch-3, I recommend creating branch-3.x so we can > > continue doing 3.x releases off branch-3 even after we move trunk to 4.x > (I > > said it :)) > > > > On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang <andrew.w...@cloudera.com> > > wrote: > > > > > Hi all, > > > > > > On a separate thread, a question was raised about 3.x branching and use > > of > > > feature branches going forward. > > > > > > We discussed this previously on the "Looking to a Hadoop 3 release" > > thread > > > that has spanned the years, with Vinod making this proposal (building > on > > > ideas from others who also commented in the email thread): > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser > > > > > > Pasting here for ease: > > > > > > On an unrelated note, offline I was pitching to a bunch of > > > contributors another idea to deal > > > with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*. > > > > > > What this gains us is that > > > - Trunk is always nearly stable or nearly ready for releases > > > - We no longer have some code lying around in some branch (today’s > > > trunk) that is not releasable > > > because it gets mixed with other undesirable and incompatible changes. > > > - This needs to be coupled with more discipline on individual > > > features - medium to to large > > > features are always worked upon in branches and get merged into trunk > > > (and a nearing release!) > > > when they are ready > > > - All incompatible changes go into some sort of a trunk-incompat > > > branch and stay there till > > > we accumulate enough of those to warrant another major release. > > > > > > Regarding "trunk-incompat", since we're still in the alpha stage for > > 3.0.0, > > > there's no need for this branch yet. This aspect of Vinod's proposal > was > > > still under a bit of discussion; Chris Douglas though we should cut a > > > branch-3 for the first 3.0.0 beta, which aligns with my original > > thinking. > > > This point doesn't necessarily need to be resolved now though, since > > again > > > we're still doing alphas. > > > > > > What we should get consensus on is the goal of keeping trunk stable, > and > > > achieving that by doing more development on feature branches and being > > > judicious about merges. My sense from the Hadoop 3 email thread (and > the > > > more recent one on the async API) is that people are generally in favor > > of > > > this. > > > > > > We're just about ready to do the first 3.0.0 alpha, so would greatly > > > appreciate everyone's timely response in this matter. > > > > > > Thanks, > > > Andrew > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > >