A short git tutorial
====================
-v0.99.5, Aug 2005
Introduction
------------
inspect that with `ls`. For your new empty project, it should show you
three entries, among other things:
- - a symlink called `HEAD`, pointing to `refs/heads/master`
+ - a symlink called `HEAD`, pointing to `refs/heads/master` (if your
+ platform does not have native symlinks, it is a file containing the
+ line "ref: refs/heads/master")
+
Don't worry about the fact that the file that the `HEAD` link points to
doesn't even exist yet -- you haven't created the commit that will
git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238
where the `-t` tells `git-cat-file` to tell you what the "type" of the
-object is. Git will tell you that you have a "blob" object (ie just a
+object is. git will tell you that you have a "blob" object (ie just a
regular file), and you can see the contents with
git-cat-file "blob" 557db03
------------
diff --git a/hello b/hello
+index 557db03..263414f 100644
--- a/hello
+++ b/hello
@@ -1 +1,2 @@
on its standard input, and it will write out the resulting object name for the
commit to its standard output.
-And this is where we start using the `.git/HEAD` file. The `HEAD` file is
-supposed to contain the reference to the top-of-tree, and since that's
-exactly what `git-commit-tree` spits out, we can do this all with a simple
-shell pipeline:
+And this is where we create the `.git/refs/heads/master` file
+which is pointed at by `HEAD`. This file is supposed to contain
+the reference to the top-of-tree of the master branch, and since
+that's exactly what `git-commit-tree` spits out, we can do this
+all with a sequence of simple shell commands:
------------------------------------------------
-echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD
+tree=$(git-write-tree)
+commit=$(echo 'Initial commit' | git-commit-tree $tree)
+git-update-ref HEAD $(commit)
------------------------------------------------
which will say:
This is not hard to understand, as soon as you realize that git simply
never knows (or cares) about files that it is not told about
-explicitly. Git will never go *looking* for files to compare, it
+explicitly. git will never go *looking* for files to compare, it
expects you to tell it what the files are, and that's what the index
is there for.
================
commit itself (`git-commit`).
-Checking it out
----------------
+Inspecting Changes
+------------------
While creating changes is useful, it's even more useful if you can tell
later what changed. The most useful command for this is another of the
Copying repositories
--------------------
-Git repositories are normally totally self-sufficient, and it's worth noting
+git repositories are normally totally self-sufficient, and it's worth noting
that unlike CVS, for example, there is no separate notion of
"repository" and "working tree". A git repository normally *is* the
working tree, with the local git information hidden in the `.git`
just telling `git checkout` what the base of the checkout would be.
In other words, if you have an earlier tag or branch, you'd just do
- git checkout -b mybranch earlier-commit
+------------
+git checkout -b mybranch earlier-commit
+------------
and it would create the new branch `mybranch` at the earlier commit,
and check out the state at that time.
You can always just jump back to your original `master` branch by doing
- git checkout master
+------------
+git checkout master
+------------
(or any other branch-name, for that matter) and if you forget which
branch you happen to be on, a simple
- ls -l .git/HEAD
+------------
+ls -l .git/HEAD
+------------
+
+will tell you where it's pointing (Note that on platforms with bad or no
+symlink support, you have to execute
-will tell you where it's pointing. To get the list of branches
-you have, you can say
+------------
+cat .git/HEAD
+------------
- git branch
+instead). To get the list of branches you have, you can say
+
+------------
+git branch
+------------
which is nothing more than a simple script around `ls .git/refs/heads`.
There will be asterisk in front of the branch you are currently on.
Sometimes you may wish to create a new branch _without_ actually
checking it out and switching to it. If so, just use the command
- git branch <branchname> [startingpoint]
+------------
+git branch <branchname> [startingpoint]
+------------
which will simply _create_ the branch, but will not do anything further.
You can then later -- once you decide that you want to actually develop
------------------------------------------------
Here, we just added another line to `hello`, and we used a shorthand for
-both going a `git-update-index hello` and `git commit` by just giving the
+doing both `git-update-index hello` and `git commit` by just giving the
filename directly to `git commit`. The `-m` flag is to give the
commit log message from the command line.
(which is correct, so never mind), and you can write a small merge
message about your adventures in git-merge-land.
-After you're done, start up `gitk --all` to see graphically what the
+After you're done, start up `gitk \--all` to see graphically what the
history looks like. Notice that `mybranch` still exists, and you can
switch to it, and continue to work with it if you want to. The
`mybranch` branch will not contain the merge, but next time you merge it
! [mybranch] Some work.
--
+ [master] Merged "mybranch" changes.
-+ [master~1] Some fun.
++ [mybranch] Some work.
------------------------------------------------
means they are now part of the `master` branch. Only the "Some
work" commit has the plus `+` character in the second column,
because `mybranch` has not been merged to incorporate these
-commits from the master branch.
+commits from the master branch. The string inside brackets
+before the commit log message is a short name you can use to
+name the commit. In the above example, 'master' and 'mybranch'
+are branch heads. 'master~1' is the first parent of 'master'
+branch head. Please see 'git-rev-parse' documentation if you
+see more complex cases.
Now, let's pretend you are the one who did all the work in
`mybranch`, and the fruit of your hard work has finally been merged
to the `master` branch. Let's go back to `mybranch`, and run
resolve to get the "upstream changes" back to your branch.
- git checkout mybranch
- git resolve HEAD master "Merge upstream changes."
+------------
+git checkout mybranch
+git resolve HEAD master "Merge upstream changes."
+------------
This outputs something like this (the actual commit object names
would be different)
the tree of your branch to that of the `master` branch. This is
often called 'fast forward' merge.
-You can run `gitk --all` again to see how the commit ancestry
+You can run `gitk \--all` again to see how the commit ancestry
looks like, or run `show-branch`, which tells you this.
------------------------------------------------
both ends on the local machine instead of running other end on
the remote machine via `ssh`.
-GIT Native::
+git Native::
`git://remote.machine/path/to/repo.git/`
+
This transport was designed for anonymous downloading. Like SSH
sometimes also called 'commit walkers'.
+
The 'commit walkers' are sometimes also called 'dumb
-transports', because they do not require any GIT aware smart
-server like GIT Native transport does. Any stock HTTP server
+transports', because they do not require any git aware smart
+server like git Native transport does. Any stock HTTP server
would suffice.
+
There are (confusingly enough) `git-ssh-fetch` and `git-ssh-upload`
programs, which are 'commit walkers'; they outlived their
-usefulness when GIT Native and SSH transports were introduced,
+usefulness when git Native and SSH transports were introduced,
and not used by `git pull` or `git push` scripts.
Once you fetch from the remote repository, you `resolve` that
on the remote machine. The communication between the two over
the network internally uses an SSH connection.
-Your private repository's GIT directory is usually `.git`, but
+Your private repository's git directory is usually `.git`, but
your public repository is often named after the project name,
i.e. `<project>.git`. Let's create such a public repository for
project `my-git`. After logging into the remote machine, create
an empty directory:
- mkdir my-git.git
+------------
+mkdir my-git.git
+------------
-Then, make that directory into a GIT repository by running
+Then, make that directory into a git repository by running
`git init-db`, but this time, since its name is not the usual
`.git`, we do things slightly differently:
- GIT_DIR=my-git.git git-init-db
+------------
+GIT_DIR=my-git.git git-init-db
+------------
Make sure this directory is available for others you want your
changes to be pulled by via the transport of your choice. Also
Come back to the machine you have your private repository. From
there, run this command:
- git push <public-host>:/path/to/my-git.git master
+------------
+git push <public-host>:/path/to/my-git.git master
+------------
This synchronizes your public repository to match the named
branch head (i.e. `master` in this case) and objects reachable
repository. Kernel.org mirror network takes care of the
propagation to other publicly visible machines:
- git push master.kernel.org:/pub/scm/git/git.git/
+------------
+git push master.kernel.org:/pub/scm/git/git.git/
+------------
Packing your repository
immutable once they are created, there is a way to optimize the
storage by "packing them together". The command
- git repack
+------------
+git repack
+------------
will do it for you. If you followed the tutorial examples, you
would have accumulated about 17 objects in `.git/objects/??/`
Once you have packed objects, you do not need to leave the
unpacked objects that are contained in the pack file anymore.
- git prune-packed
+------------
+git prune-packed
+------------
would remove them for you.
back before you push your work when it happens.
+Bundling your work together
+---------------------------
+
+It is likely that you will be working on more than one thing at
+a time. It is easy to use those more-or-less independent tasks
+using branches with git.
+
+We have already seen how branches work in a previous example,
+with "fun and work" example using two branches. The idea is the
+same if there are more than two branches. Let's say you started
+out from "master" head, and have some new code in the "master"
+branch, and two independent fixes in the "commit-fix" and
+"diff-fix" branches:
+
+------------
+$ git show-branch
+! [commit-fix] Fix commit message normalization.
+ ! [diff-fix] Fix rename detection.
+ * [master] Release candidate #1
+---
+ + [diff-fix] Fix rename detection.
+ + [diff-fix~1] Better common substring algorithm.
++ [commit-fix] Fix commit message normalization.
+ + [master] Release candidate #1
++++ [diff-fix~2] Pretty-print messages.
+------------
+
+Both fixes are tested well, and at this point, you want to merge
+in both of them. You could merge in 'diff-fix' first and then
+'commit-fix' next, like this:
+
+------------
+$ git resolve master diff-fix 'Merge fix in diff-fix'
+$ git resolve master commit-fix 'Merge fix in commit-fix'
+------------
+
+Which would result in:
+
+------------
+$ git show-branch
+! [commit-fix] Fix commit message normalization.
+ ! [diff-fix] Fix rename detection.
+ * [master] Merge fix in commit-fix
+---
+ + [master] Merge fix in commit-fix
++ + [commit-fix] Fix commit message normalization.
+ + [master~1] Merge fix in diff-fix
+ ++ [diff-fix] Fix rename detection.
+ ++ [diff-fix~1] Better common substring algorithm.
+ + [master~2] Release candidate #1
++++ [master~3] Pretty-print messages.
+------------
+
+However, there is no particular reason to merge in one branch
+first and the other next, when what you have are a set of truly
+independent changes (if the order mattered, then they are not
+independent by definition). You could instead merge those two
+branches into the current branch at once. First let's undo what
+we just did and start over. We would want to get the master
+branch before these two merges by resetting it to 'master~2':
+
+------------
+$ git reset --hard master~2
+------------
+
+You can make sure 'git show-branch' matches the state before
+those two 'git resolve' you just did. Then, instead of running
+two 'git resolve' commands in a row, you would pull these two
+branch heads (this is known as 'making an Octopus'):
+
+------------
+$ git pull . commit-fix diff-fix
+$ git show-branch
+! [commit-fix] Fix commit message normalization.
+ ! [diff-fix] Fix rename detection.
+ * [master] Octopus merge of branches 'diff-fix' and 'commit-fix'
+---
+ + [master] Octopus merge of branches 'diff-fix' and 'commit-fix'
++ + [commit-fix] Fix commit message normalization.
+ ++ [diff-fix] Fix rename detection.
+ ++ [diff-fix~1] Better common substring algorithm.
+ + [master~1] Release candidate #1
++++ [master~2] Pretty-print messages.
+------------
+
+Note that you should not do Octopus because you can. An octopus
+is a valid thing to do and often makes it easier to view the
+commit history if you are pulling more than two independent
+changes at the same time. However, if you have merge conflicts
+with any of the branches you are merging in and need to hand
+resolve, that is an indication that the development happened in
+those branches were not independent after all, and you should
+merge two at a time, documenting how you resolved the conflicts,
+and the reason why you preferred changes made in one side over
+the other. Otherwise it would make the project history harder
+to follow, not easier.
+
[ to be continued.. cvsimports ]