design

David A. Holland Thu, 23 May 2013 17:41:45 -0700

Module Name:    othersrc
Committed By:   dholland
Date:           Fri May 24 00:41:31 UTC 2013


Added Files:
        othersrc/external/bsd/bikeshed/dist/design: requirements.txt

Log Message:
Stuff distilled from my notes and previous arguments and bikeshed sessions


To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 \
    othersrc/external/bsd/bikeshed/dist/design/requirements.txt

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Added files:

Index: othersrc/external/bsd/bikeshed/dist/design/requirements.txt
diff -u /dev/null othersrc/external/bsd/bikeshed/dist/design/requirements.txt:1.1
--- /dev/null	Fri May 24 00:41:31 2013
+++ othersrc/external/bsd/bikeshed/dist/design/requirements.txt	Fri May 24 00:41:31 2013
@@ -0,0 +1,340 @@
+The material herein is grouped first by topic and then by priority.
+
+------------------------------------------------------------
+
+1. Operational model
+
+- Centralized operation with one master tree
+- Supports disconnected operation
+- No compare-by-hash
+- Native support for synced slave copies of the master tree (like anoncvs)
+- Transport-independent remote operation, supporting both http/https
+  and ssh
+- Checkouts can cache arbitrary amounts of history locally but are not
+obliged to clone everything
+- Non-committers with readonly checkouts should be able to package
+changesets for review and commit by committers.
+
+Rationale:
+
+Centralized operation is posed as a design requirement because it's a
+prerequisite for other things... and because this whole project is
+predicated on the assumption that centralized operation is acceptable.
+If someone comes up with a clever way to support distributed operation
+without compromising other requirements, well and good; otherwise one
+may as well use one of the modern distributed version control systems.
+
+Disconnected operation, meanwhile, covers most of the use cases people
+cite in favor of distributed version control.
+
+Compare-by-hash is bad not because it's slightly sleazy, or because
+the statistical assumptions about the probability of collisions are
+wrong (although in some contexts they're questionable) -- it's because
+cryptographic hash functions don't age well and the standard DVCS
+scheme for hashing chains of versions doesn't provide any decent way
+to migrate an existing repository to a new hash function.
+
+Native support for synced slave copies is needed in order to be able
+to provide anonymous access (like anoncvs) without needing access to
+the master tree. This is also meant to satisfy the use cases where
+currently people rsync the whole CVS repository locally.
+
+Transport-independent remote operation should be a no-brainer, but
+even many recent systems have felt the need to make up their own
+protocols and network-level constructs.
+
+Easier collaboration with non-committers is an often-requested
+feature and a de facto property of distributed version contorl.
+
+Unanswered questions:
+
+- Do we need support for disconnected operation by more than one user
+at a time (or perhaps more than one tree at a time) so that
+uncommitted changesets can be shared? The non-committer changeset
+support might cover this territory adequately, or not, depending on
+how it ends up working.
+
+
+2. Schema
+
+- Supports arbitrary (smallish) metadata attached to changesets, and
+also to files and directories
+- Metadata (including on old versions) is mutable and changes are kept
+in history (this includes commit message text)
+- Provides provenance tracking for changesets/commits
+- Commits/changesets are atomic
+- Version numbers (for projects, files, and subtrees if any) are
+sequential.
+- Supports rename (of files or dirs) properly, and file history
+crosses renames transparently
+- Supports copy/duplicate (of files or dirs) properly
+- Has a coherent semantic model of tree history
+- Supports local-only changes that are not pushed back to the master
+tree
+
+Rationale:
+
+While arbitrary metadata is a nuisance to support (compared to a small
+fixed metadata schema) and in many cases using this metadata facility
+(as opposed to storing information in an ordinary file in the
+repository) would be a mistake, it is nonetheless useful for various
+purposes. One of these is preserving old version numbers from a
+repository conversion; given the large number of references to NetBSD
+CVS file version numbers, including in places like security advisories
+that count as "important", preserving this information and making it
+searchable is highly desirable.
+
+Metadata should be mutable because sometimes it contains errors. One
+of the big weaknesses of current distributed version control is that
+effectively all metadata is immutable once committed; this means any
+botch not immediately detected is graven in stone for all time, unless
+someone does a complete repository rebuild updating all subsequent
+versions. Meanwhile, keeping the history should be a no-brainer.
+
+Provenance tracking for commits is important for two reasons:
+maintaining proper credit/attribution (which can involve legalities
+via copyright as well as propriety) and also making sure that bogus
+changesets cannot be introduced. In distributed version control
+systems this becomes complicated and either requires an elaborate
+solution (e.g. in monotone) or giving up on the problem entirely (e.g.
+in mercurial). For a centralized system it is much easier but still
+important, especially given tools for applying changesets that
+originate from non-developers.
+
+Changesets need to be atomic. Non-atomic changesets is a stupid design
+flaw of CVS that we should certainly not perpetuate.
+
+Version numbers need to be sequential so it's possible to tell easily
+if a version you have contains a particular change or fix... and in
+particular, tell easily without having to cut and paste a hash code
+and go ask the version control system. You should be able to tell at a
+glance from running ident on a binary whether it needs to be replaced
+with a fixed version or not.
+
+CVS doesn't support rename. We desperately need rename support because
+large sections of the NetBSD source tree are in serious need of
+organizational cleanup. File history should cross renames because if
+you're looking at the history of a particular file, you shouldn't have
+to stop and go search something else just because someone moved the
+file around. This sounds like a no-brainer but a lot of "modern"
+version control systems don't really get it right. Note that rename is
+not semantically equivalent to copy and delete.
+
+Likewise, duplicating a file is an action that should be explicitly
+recorded; the support for sideways change propagation (below) requires
+this.
+
+A coherent semantic model of tree history is required in order to do
+merges of changesets that reorganize the tree. Many "modern" version
+control systems don't really get this right.
+
+Support for local-only changes is highly desirable if you're carrying
+local modifications; it's effectively the same as keeping private
+changes as uncommitted modifications in your working tree, except with
+more structure, proper history, and a way to explicitly make sure the
+changes don't get committed by accident.
+
+
+3. Branches and branch management
+
+- Supports lightweight branches / multiple heads
+- Supports full/named branches
+- Supports something like hg bookmarks to keep git users happy
+- Distinguishes branches intended to diverge from those intended to be
+folded back in later
+- Allows enforcing a graph of branch relationships
+- Keeps track of which changesets from parallel branches have been
+pulled in/merged across (including instances of separate but
+equivalent changes)
+- Also keeps track of which changesets from parallel branches have
+been considered and rejected
+- Supports this same form of sideways change propagation for files
+that have been duplicated
+- Supports hyper-branches (preferably)
+- Supports local-only branches that are not pushed back to the master
+tree
+- Allows accmumulating small local changes into a single upstream
+commit that neither loses the individual change history nor forces
+other users to wade through it except by choice
+- Maybe, support for local patch queues
+
+Rationale:
+
+Lightweight branches (that is, if you commit a change based on an
+older version you just get another head) are necessary for
+disconnected operation. These occur and get merged on short timescales
+as a routine matter during development.
+
+"Real" branches (branches with names that have metadata and tracking
+information and so on) are also required, for releases and for
+development of major features and so forth.
+
+Mercurial was forced to add "bookmarks" to keep git users happy; a lot
+of git users apparently don't understand anything besides git's insane
+branch semantics and aren't interested in learning or understanding
+what they're doing. We will need something like this too, in all
+probability (and it's a useful feature) so it may as well get designed
+in up front.
+
+Branches that are intended to diverge (releases, for example, or
+outright project forks) are fundamentally different from branches that
+are expected to reconnect to their parent (b.g. feature development
+branches) once the version control system has any kind of branch
+management or tracking support.
+
+If you have a lot of branches that are supposed to exist with certain
+relationships to one another, it's fairly easy to accidentally break
+this structure by merging with the wrong other branch; and if you do,
+backing out of the resulting mess can be quite a nuisance. Therefore,
+it should be possible to declare the intended structure and have the
+system reject accidental attempts to violate that structure. (Note:
+NetBSD may not need this. dholland specifically wants it and will put
+in the work to get it.)
+
+No existing version control system keeps track of which changesets
+from branch A have and have not been pulled in to branch B, or is
+capable of listing the ones that haven't been considered yet for
+possible action. There is absolutely no reason, however, that the
+version control system shouldn't be able to provide this information.
+AIUI, for release branches releng currently has to maintain this
+metadata by hand.
+
+If you duplicate a file, such as cloning a device driver template file
+for a new driver, or starting a new pmap by copying an old one,
+usually bug fixes applied to the original version should also be
+propagated to the clone. The same kind of changeset tracking just
+described for branches should be available for duplicated files, to
+make sure this gets done and to allow easily keeping track of where it
+has and hasn't been done.
+
+By "hyper-branches", I mean a branch of the entire repository state,
+including branches. (I have a vague recollection that somebody else
+may be using the term "hyper-branches" for something else, in which
+case we need new terminology.) This is, for example, something you
+might want if you have two parallel versions of a project (e.g. a free
+and pay version) and maintain those as branches, but then also want to
+be able to take release branches of both at once. I have no idea at
+the moment if there's a use case for hyper-hyper-branches (that is,
+branches of hyper-branches) or not. (Note: NetBSD does not need this.
+dholland specifically wants it for another project and is willing to
+put in a good deal of work to get it.)
+
+Local-only branches have the same rationale as local-only changesets.
+
+Merging cumulative local commits into a single upstream commit makes
+it possible to commit very early and very often (which is very useful
+if you ever need to bisect later) without deluging other developers on
+the project with a flood of tiny commits they don't care about in
+detail. However, because you want to maintain the individual changes
+in the master repository (to support that bisecting) but don't want to
+show them by default, there needs to be explicit support for dividing
+changesets into subchangesets and an explicit way to expand them when
+viewing history. No existing system can do this; many can do something
+similar, but in all cases I know of this either throws away the
+fine-grained history or makes everybody wade through it afterwards.
+
+Local patch queues (like mq in mercurial) are a useful way of
+maintaining private changes and/or preparing batch commits. It is
+probable that most of the use cases are subsumed by other features
+(local-only commits, cumulative commits, etc.) and we don't also need
+patch queues. Given the branch graph structure feature described
+above, even the use case of preparing patchkits for third-party trees
+may be better done with branches, although it might be worthwhile to
+arrange a way to do branch push/pop in a way akin to patch push/pop.
+
+
+3. Implementation
+
+- Written in C
+- Doesn't depend on anything other than standard system libs
+- Decently fast
+- Scales to large trees with deep history
+- Supports inotify/kqueue/whatnot for monitoring large checkout trees
+- Install doesn't spew tons of crap all over everywhere
+- Has an interface for plugins and/or extensions
+
+Rationale:
+
+Writing in C (or perhaps C++ but C++ is not really a sane choice of
+language) with no major deps is a requirement for importing into base,
+where the tool used to manage the NetBSD source tree should be found.
+
+Being decently fast is necessary to avoid driving users crazy. Scaling
+is necessary for use on/in NetBSD.
+
+The major performance bottleneck for most systems on large trees is
+scanning the tree for files that have been modified. This inevitably
+takes as long as doing find . -ls, and on a tree the size of NetBSD's
+source tree that takes a while even when the whole tree fits in RAM.
+Many recent tools have a gizmo that starts a daemon using inotify or
+similar to monitor the working tree in the background; then the
+explicit search can be avoided and things become much faster.
+
+A tidy install is desirable for a number of reasons (integration into
+base being one of them) and should not be a major problem.
+
+We want some kind of plugin/extension interface because, at a minimum,
+there are probably some graphic tools that should be available and
+they can't be part of the base install of either this program or
+NetBSD.
+
+
+4. User interface
+
+- Clean, small command set
+- No weird semantics
+- Search support for metadata (including/also change messages) as well
+as searching file contents
+
+Rationale:
+
+All of this is pretty much obvious. By "no weird semantics" I mean
+anything from oddities like mercurial's tags to core design mistakes
+like git's branches... or things in between like subversion's
+branches; anything that violates the principle of least surprise or
+that requires lengthy explanation/justification for why it doesn't
+behave the way a reasonable person would expect.
+
+
+5. Miscellaneous other features
+
+- Can remove/obsolete/blacklist unwanted changesets
+- Supports splicing of equivalent but technically unrelated versions
+- Can stash local changes temporarily
+- Can check out subtrees
+- Can explicitly revert files or whole subtrees in a checked-out tree
+to earlier versions
+- Supports configurable keyword expansion
+
+Rationale:
+
+Blacklisting or otherwise getting rid of unwanted changesets is a
+non-negotiable requirement for legal reasons.
+
+We want splicing so we can, at some point in the future and if we so
+desire, pull in the CSRG version history and connect it up with our
+own.
+
+Stashing local changes is necessary if you can't have uncommitted
+local changes while merging, and it really doesn't make sense to allow
+that. People do gripe, but the best way is to stash your changes,
+merge, and unstash them. Otherwise if you get a merge conflict in a
+file you've also got local changes to, it becomes an awful mess.
+(Note that in comparable situations CVS makes you check out a whole
+new tree...)
+
+Checking out subtrees is widely desired for working on single programs
+or (in particular) checking out only the kernel.
+
+Reverting portions of the tree locally is often necessary for one
+reason or another in practice; lack of adequate support for this in
+most of the "modern" version control systems has been and remains a
+barrier to adoption in/for NetBSD.
+
+We still need to be able to run ident on binaries and get useful
+information out. Keyword expansion is not the only way to accomplish
+this; but it's easier to deploy and use than any of the alternatives.
+A reasonable implementation should not suffer from the persistent
+aggravations that CVS keywords often cause. (All expansions need to be
+invertible; all actions, particularly diffs and merges, should be
+always done using the unexpanded form.)

CVS commit: othersrc/external/bsd/bikeshed/dist/design

Reply via email to