Re: [Git Developer Blog] [PATCH] post: a tour of git's object types

2019-10-22 Thread Junio C Hamano
Emily Shaffer  writes:

> +# Commit
> +
> +This is the one we're all familiar with - commits are those things we write 
> at
> +1am, angry at a pesky bug, and label with something like "really fix it this
> +time", right?
> +
> +A commit references exactly one tree. That's the root directory of your 
> project.
> +It also references zero or more other commits - and this is where we diverge
> +from the filesystem parallel, because the other commits it references are its
> +parent(s), each of which has its own copy of the project at that commit's 
> point
> +in time. (Commits with more than one parent are merge commits; otherwise, 
> your
> +commit only has the one parent.)

I do not see a need for (parentheses) around the last sentence, but if
you must, then s/in time. /in time/; and s/one parent.)/one parent)./
would be better.

> +Commits represent a specific state of the repository which someone thought 
> was
> +worth saving - a new feature, or a small step of progress which you don't 
> want
> +to lose as you're hacking your way through that homework assignment. Each 
> commit
> +points to a complete image of the repository - but that's not as bad as it
> +sounds, as we'll see when we try it out.

That's half of a commit (i.e. the tree that represents the specific
state).  

The other half is that a commit (at least in a serious-enough
project) is a statement by its author: I considered all the parent
commits, and declare that the tree this commit records suits my goal
better than any and all of these parents' trees.

That is what makes 3-way merge work correctly at the philosophical
level.  As long as the project participants share the same goal and
trust each other, when one creates a merge, one trusts what the
others built in the side branch (i.e. each of the commits they made
got us closer to our collective goals) and take their changes to
where one did not touch.

Of course, a good description in the log message helps the one
who makes such a merge to see if the workmade on the side branch
moves the tree in the direction that truly fits one's goal.

> +# Tag
> +
> +Tags are a little lackluster after all the exciting types up until now. They 
> are
> +essentially a label; they serve as an entry point into the graph and point to
> +either another tag or a commit.

Either say "generally point to", or "point to another object"
(i.e. a tag that points to a tree or a blob is normal---it is just
they do not so frequently appear).

> They're generally used to mark releases, and you
> +can see them pretty easily with `git log --oneline --decorate=full`.
> +
> +# A quick return to an overloaded word
> +
> +"Tree", "worktree", and "working tree" seem to refer to different concepts.

Not just seem to.  They do refer to different things.  "tree" is a
type of object.  "working tree" is a directory hiearchy where you
did "git checkout" to materialize the contents of a tree object
(recursively) and are using to work towards updating the index to
create the next commit.  "worktree" is a mechanism that allows you
to have multiple "working tree"s that is backed by the same repository.

They may share the same word "tree".  You may want to update this
document to say "tree object" when you mean it---that would help
disambiguating it from other uses of words with "tree" in them.

> ...
> +predictable way. Let's walk through creating a pretty basic repository and
> +examining it with some low-level plumbing commands!
> +
> +# An empty repo
> +
> +For starters, we'll make a new, shiny, totally empty repo.
> +
> +{% highlight shell_session %}
> +$ mkdir demo
> +$ cd demo
> +$ git init
> +{% endhighlight %}
> +
> +We've got nothing. If we try `git log`, we'll be assured that we have no
> +commits, and if we try `git branch -a` we'll see we have no branches, either.
> +So let's make a very simple first commit.
> +
> +# A single commit
> +
> +{% highlight shell_session %}
> +$ echo "abcd" >foo.txt
> +$ git add foo.txt
> +$ git commit -m "first commit"
> +{% endhighlight %}
> +
> +I know this is boring, but bear with me and run `git ls-tree HEAD`.

You may probably want to say that, even though you upfront raised
the expectation of readers that they would hear about plumbing soon,
you haven't so far used any plumbing yet.  And stress that ls-tree
is a plumbing, what the readers have been waiting for!

> +{% highlight shell_session %}
> +$ git ls-tree HEAD
> +100644 blob acbe86c7c89586e0912a0a851bacf309c595c308 foo.txt
> +$ git cat-file -p acbe86c7c89586e0912a0a851bacf309c595c308
> +abcd
> +{% endhighlight %}
> +
> +While we're here, we can also take a look at the commit object. Use `git log`
> +to determine your commit's OID, then use `git cat-file -p` to print the
> +contents:

I doubt that "cat-file -p" is helpful to a reader who is learning
the basic object layer.  For non-tree types, learning "cat-file -t"
followed by "cat-file " would be more useful to gain proper
understanding (and for trees, as you showed above, ls-tr

Re: [Git Developer Blog] [PATCH] post: a tour of git's object types

2019-10-21 Thread Derrick Stolee
On 10/18/2019 8:20 PM, Emily Shaffer wrote:
> An overview of what Git object types mean and how they loosely translate
> into filesystem types users are already familiar with is a good start to
> making Git's internals less scary to users. This post is an interactive
> overview of the various types, demonstrating subcommands which show what
> the objects look like and how their names are generated.
> 
> This is related to https://gitlab.com/git-scm/blog/issues/15
> 
> Signed-off-by: Emily Shaffer 
> ---
> Hi all,
> 
> In the hopes of getting some more momentum on the developer blog, here's
> a crosspost from my personal blog some months ago, targeted for the Git
> Developer Blog (as discussed in the Virtual Contributor Summit and
> on-list). During those conversations I emphasized my wish to make sure
> posts on this developer blog are vetted by the Git development community
> - to that end, the textual contents of this blog post are being sent to
> the vger.kernel.org list in their entirety. Feel free to provide
> comments here, or on the merge request in GitLab:
> https://gitlab.com/git-scm/blog/merge_requests/4
> Or, if you have another idea about how you'd like this review process to
> look, we may as well discuss it on this patch too.
> 
> I hope this post also shows what I hoped to achieve with the Git
> Developer Blog - in-depth, accurate information presented in a casual
> tone which helps users better understand Git.
> 
> At this time I've simply copied the blog post verbatim from my personal
> blog; I didn't do a lot of review on it because I was hoping to focus on
> the process of getting posts reviewed and accepted to the GitLab repo.
> It's probable that the tone is actually more conversational than we want
> for a developer blog, and the post itself didn't go through any kind of
> peer review, so I welcome comments on any and all aspects of the post.

Thanks for getting the conversation started (again)! I've got a post that
I've been tinkering with for some time now, and you gave me the motivation
to actually finish it.

> +Naming is one of the hard problems in computer science, right? It's hard for
> +Git developers too. One of the more arcane concepts in Git - object
> +reachability - becomes simpler to understand with a little bit of naming
> +indirection.
> +
> +Reachability is an important concept in Git. It's how the server determines
> +what objects you need in order to get it up to what the server itself knows.
> +It's also how merges and rebases work. With all this big stuff riding on
> +reachability, it seems intimidating to try to understand - but it turns out 
> if
> +we give it a slightly simpler name, things become a little clearer.
> +
> +## Git's four object types
> +
> +Under the covers, Git is mostly a directed graph of objects. Those objects 
> come
> +in four flavors; from root to leaf (generally), those flavors are:
> +
> +- Tag
> +- Commit
> +- Tree
> +- Blob
> +
> +We'll take a closer look in the opposite order, though.
> +
> +# Blob
> +
> +Surprise! It's a file. Well, kind of - it can also be a symlink to a file - 
> but
> +this is the most atomic type of object. We'll explore these a little more 
> later,
> +but really, it's just a file.

When I teach people about blobs [1], I take special care to point out that
a blob is only the file _content_. It does not actually store any information 
about
the filename or permissions.

It could help to describe an example: maybe `git cat-file -p HEAD:README` for a
well-known repo? I'm using "HEAD:" here because it is easier to understand
where the file reference comes from, but perhaps it would be better to use a
"git rev-parse" and "git cat-file" pair:

$ git rev-parse HEAD:README.md
88f126184c52bfe4859ec189d018872902e02a84

$ git cat-file -p 88f126184c52bfe4859ec189d018872902e02a84
[![Build 
Status](https://dev.azure.com/git/git/_apis/build/status/git.git)](https://dev.azure.com/git/git/_build/latest?definitionId=11)

Git - fast, scalable, distributed revision control system
=

...

[1] https://stolee.dev/docs/git.pptx

> +
> +# Tree
> +
> +A tree references zero or more trees or blobs. Said another way, a tree holds
Perhaps "A tree references blobs and other trees." Saying "zero or more" makes
me get spun up about what it would mean for a tree to have zero entries, which
is not possible in porcelain Git.

> +one or more trees or files. This sounds familiar - basically, a tree is a
> +directory. (Okay, or a symlink to a directory.) It points to more trees
> +(subdirectories) or blobs (files). It can also point to commits in the case
> +of submodules, but that's another conversation.

Here is a great time to mention that filenames happen in a tree. You can
use `git cat-file -p HEAD:docs/` (or something) to show more contents of a tree.

> +By the way, "tree" is one that gets a little sticky, because we also talk 
> about
> +"working tree" as well as "worktree". We'll touch

Re: [Git Developer Blog] [PATCH] post: a tour of git's object types

2019-10-20 Thread Junio C Hamano
Emily Shaffer  writes:

> +Under the covers, Git is mostly a directed graph of objects. Those objects 
> come
> +in four flavors; from root to leaf (generally), those flavors are:

Is "acyclic" worth mentioning, I wonder.

> +
> +- Tag
> +- Commit
> +- Tree
> +- Blob
> +
> +We'll take a closer look in the opposite order, though.
> +
> +# Blob
> +
> +Surprise! It's a file. Well, kind of - it can also be a symlink to a file - 
> but
> +this is the most atomic type of object. We'll explore these a little more 
> later,
> +but really, it's just a file.

It may be easier to understand if we said it is "just a stream of
bytes".  And of course the simplest applciation of a stream of bytes
is to store contents of a file, but it also can be used to store the
value of a symbolic file, and also can be used to store the notes.

So, really, it's just a stream of bytes.

> +A tree references zero or more trees or blobs. Said another way, a tree holds
> +one or more trees or files.

That captures only half of a tree.  It is a mapping from names to
objects.  Of course, being a mapping, it references other objects
(by the way, do not limit the contents to "trees or blobs") on the
value side of the mapping.

A tree gives names to objects within its scope.  It maps names
to objects, typically a blob or a tree.  Thus, it can be used
(and it indeed is used) to represent a directory full of files
by storing mapping from filenames to blob objects that store
their contents.  A subdirectory can be represented by having a
mapping from its name to the tree object that represents the
contents of the subdirectory.

> This sounds familiar - basically, a tree is a
> +directory. (Okay, or a symlink to a directory.) It points to more trees

No, I do not think it is a symlink to a directory.  What makes you
think so?

I'd stop here for now.  I am certain that I haven't read enough to
say things either negative or positive about the "naming is hard,
naming used in the canonical documentation of Git is unnecessary
hard to read and we propose a better wording" premise given by the
introduction, so I won't comment on it yet.

Thanks.




[Git Developer Blog] [PATCH] post: a tour of git's object types

2019-10-18 Thread Emily Shaffer
An overview of what Git object types mean and how they loosely translate
into filesystem types users are already familiar with is a good start to
making Git's internals less scary to users. This post is an interactive
overview of the various types, demonstrating subcommands which show what
the objects look like and how their names are generated.

This is related to https://gitlab.com/git-scm/blog/issues/15

Signed-off-by: Emily Shaffer 
---
Hi all,

In the hopes of getting some more momentum on the developer blog, here's
a crosspost from my personal blog some months ago, targeted for the Git
Developer Blog (as discussed in the Virtual Contributor Summit and
on-list). During those conversations I emphasized my wish to make sure
posts on this developer blog are vetted by the Git development community
- to that end, the textual contents of this blog post are being sent to
the vger.kernel.org list in their entirety. Feel free to provide
comments here, or on the merge request in GitLab:
https://gitlab.com/git-scm/blog/merge_requests/4
Or, if you have another idea about how you'd like this review process to
look, we may as well discuss it on this patch too.

I hope this post also shows what I hoped to achieve with the Git
Developer Blog - in-depth, accurate information presented in a casual
tone which helps users better understand Git.

At this time I've simply copied the blog post verbatim from my personal
blog; I didn't do a lot of review on it because I was hoping to focus on
the process of getting posts reviewed and accepted to the GitLab repo.
It's probable that the tone is actually more conversational than we want
for a developer blog, and the post itself didn't go through any kind of
peer review, so I welcome comments on any and all aspects of the post.

Thanks all for your thoughts!
 - Emily

 .../2019-10-18-git-objects-explained.markdown | 411 ++
 1 file changed, 411 insertions(+)
 create mode 100644 content/post/2019-10-18-git-objects-explained.markdown

diff --git a/content/post/2019-10-18-git-objects-explained.markdown 
b/content/post/2019-10-18-git-objects-explained.markdown
new file mode 100644
index 000..038c98f
--- /dev/null
+++ b/content/post/2019-10-18-git-objects-explained.markdown
@@ -0,0 +1,411 @@
+---
+layout: post
+title: "A Tour of Git's Object Types"
+date: '2019-10-18'
+draft: true
+categories: open-source version-control scm
+---
+
+*This post originally appeared on [nasamuffin.github.io] in June 2019.*
+
+Naming is one of the hard problems in computer science, right? It's hard for
+Git developers too. One of the more arcane concepts in Git - object
+reachability - becomes simpler to understand with a little bit of naming
+indirection.
+
+Reachability is an important concept in Git. It's how the server determines
+what objects you need in order to get it up to what the server itself knows.
+It's also how merges and rebases work. With all this big stuff riding on
+reachability, it seems intimidating to try to understand - but it turns out if
+we give it a slightly simpler name, things become a little clearer.
+
+## Git's four object types
+
+Under the covers, Git is mostly a directed graph of objects. Those objects come
+in four flavors; from root to leaf (generally), those flavors are:
+
+- Tag
+- Commit
+- Tree
+- Blob
+
+We'll take a closer look in the opposite order, though.
+
+# Blob
+
+Surprise! It's a file. Well, kind of - it can also be a symlink to a file - but
+this is the most atomic type of object. We'll explore these a little more 
later,
+but really, it's just a file.
+
+# Tree
+
+A tree references zero or more trees or blobs. Said another way, a tree holds
+one or more trees or files. This sounds familiar - basically, a tree is a
+directory. (Okay, or a symlink to a directory.) It points to more trees
+(subdirectories) or blobs (files). It can also point to commits in the case
+of submodules, but that's another conversation.
+
+By the way, "tree" is one that gets a little sticky, because we also talk about
+"working tree" as well as "worktree". We'll touch back on that in a minute.
+
+# Commit
+
+This is the one we're all familiar with - commits are those things we write at
+1am, angry at a pesky bug, and label with something like "really fix it this
+time", right?
+
+A commit references exactly one tree. That's the root directory of your 
project.
+It also references zero or more other commits - and this is where we diverge
+from the filesystem parallel, because the other commits it references are its
+parent(s), each of which has its own copy of the project at that commit's point
+in time. (Commits with more than one parent are merge commits; otherwise, your
+commit only has the one parent.)
+
+Commits represent a specific state of the repository which someone thought was
+worth saving - a new feature, or a small step of progress which you don't want
+to lose as you're hacking your way through that homework assignment. Each 
commit
+points to a compl