in your git archive. We'll start off with a few bad examples, just to
get a feel for how this works:
- echo "Hello World" > a
- echo "Silly example" > b
+ echo "Hello World" >hello
+ echo "Silly example" >example
you have now created two files in your working directory, but to
actually check in your hard work, you will have to go through two steps:
So to populate the index with the two files you just created, you can do
- git-update-cache --add a b
+ git-update-cache --add hello example
and you have now told git to track those two files.
git-cat-file "blob" 557db03de997c86a4a028e1ebd3a1ceb225be238
which will print out "Hello World". The object 557db... is nothing
-more than the contents of your file "a".
+more than the contents of your file "hello".
-[ Digression: don't confuse that object with the file "a" itself. The
+[ Digression: don't confuse that object with the file "hello" itself. The
object is literally just those specific _contents_ of the file, and
- however much you later change the contents in file "a", the object we
+ however much you later change the contents in file "hello", the object we
just looked at will never change. Objects are immutable. ]
Anyway, as we mentioned previously, you normally never actually take a
most basic git commands to manipulate the files or look at their status.
In particular, let's not even check in the two files into git yet, we'll
-start off by adding another line to "a" first:
+start off by adding another line to "hello" first:
- echo "It's a new day for git" >> a
+ echo "It's a new day for git" >>hello
-and you can now, since you told git about the previous state of "a", ask
+and you can now, since you told git about the previous state of "hello", ask
git what has changed in the tree compared to your old index, using the
"git-diff-files" command:
oops. That wasn't very readable. It just spit out its own internal
version of a "diff", but that internal version really just tells you
-that it has noticed that "a" has been modified, and that the old object
+that it has noticed that "hello" has been modified, and that the old object
contents it had have been replaced with something else.
To make it readable, we can tell git-diff-files to output the
which will spit out
- diff --git a/a b/a
- --- a/a
- +++ b/a
+ diff --git a/hello b/hello
+ --- a/hello
+ +++ b/hello
@@ -1 +1,2 @@
Hello World
+It's a new day for git
-ie the diff of the change we caused by adding another line to "a".
+ie the diff of the change we caused by adding another line to "hello".
In other words, git-diff-files always shows us the difference between
what is recorded in the index, and what is currently in the working
git-write-tree
and this will just output the name of the resulting tree, in this case
-(if you have does exactly as I've described) it should be
+(if you have done exactly as I've described) it should be
- 3ede4ed7e895432c0a247f09d71a76db53bd0fa4
+ 8988da15d077d4829fc51d8544c097def6644dbb
which is another incomprehensible object name. Again, if you want to,
-you can use "git-cat-file -t 3ede4.." to see that this time the object
+you can use "git-cat-file -t 8988d.." to see that this time the object
is not a "blob" object, but a "tree" object (you can also use
git-cat-file to actually output the raw object contents, but you'll see
mainly a binary mess, so that's less interesting).
which will say:
- Committing initial tree 3ede4ed7e895432c0a247f09d71a76db53bd0fa4
+ Committing initial tree 8988da15d077d4829fc51d8544c097def6644dbb
just to warn you about the fact that it created a totally new commit
that is not related to anything else. Normally you do this only _once_
Again, normally you'd never actually do this by hand. There is a
helpful script called "git commit" that will do all of this for you. So
-you could have just writtten
+you could have just written
git commit
Making a change
---------------
-Remember how we did the "git-update-cache" on file "a" and then we
-changed "a" afterward, and could compare the new state of "a" with the
+Remember how we did the "git-update-cache" on file "hello" and then we
+changed "hello" afterward, and could compare the new state of "hello" with the
state we saved in the index file?
Further, remember how I said that "git-write-tree" writes the contents
of the _index_ file to the tree, and thus what we just committed was in
-fact the _original_ contents of the file "a", not the new ones. We did
+fact the _original_ contents of the file "hello", not the new ones. We did
that on purpose, to show the difference between the index state, and the
state in the working directory, and how they don't have to match, even
when we commit things.
Unlike "git-diff-files", which showed the difference between the index
file and the working directory, "git-diff-cache" shows the differences
-between a committed _tree_ and either the the index file or the working
+between a committed _tree_ and either the index file or the working
directory. In other words, git-diff-cache wants a tree to be diffed
against, and before we did the commit, we couldn't do that, because we
didn't have anything to diff against.
work through the index file, so the first thing we need to do is to
update the index cache:
- git-update-cache a
+ git-update-cache hello
(note how we didn't need the "--add" flag this time, since git knew
about the file already).
Note what happens to the different git-diff-xxx versions here. After
-we've updated "a" in the index, "git-diff-files -p" now shows no
+we've updated "hello" in the index, "git-diff-files -p" now shows no
differences, but "git-diff-cache -p HEAD" still _does_ show that the
current state is different from the state we committed. In fact, now
"git-diff-cache" shows the same difference whether we use the "--cached"
flag or not, since now the index is coherent with the working directory.
-Now, since we've updated "a" in the index, we can commit the new
+Now, since we've updated "hello" in the index, we can commit the new
version. We could do it by writing the tree by hand again, and
committing the tree (this time we'd have to use the "-p HEAD" flag to
tell commit that the HEAD was the _parent_ of the new commit, and that
it in the ".git/refs/tags/" subdirectory instead of calling it a "head".
So the simplest form of tag involves nothing more than
- cat .git/HEAD > .git/refs/tags/my-first-tag
+ git tag my-first-tag
-after which point you can use this symbolic name for that particular
-state. You can, for example, do
+which just writes the current HEAD into the .git/refs/tags/my-first-tag
+file, after which point you can then use this symbolic name for that
+particular state. You can, for example, do
git diff my-first-tag
to diff your current state against that tag (which at this point will
obviously be an empty diff, but if you continue to develop and commit
-stuff, you can use your tag as a "anchor-point" to see what has changed
+stuff, you can use your tag as an "anchor-point" to see what has changed
since you tagged it.
A "signed tag" is actually a real git object, and contains not only a
pointer to the state you want to tag, but also a small tag name and
message, along with a PGP signature that says that yes, you really did
-that tag. You create these signed tags with
+that tag. You create these signed tags with the "-s" flag to "git tag":
- git tag <tagname>
+ git tag -s <tagname>
which will sign the current HEAD (but you can also give it another
argument that specifies the thing to tag, ie you could have tagged the
mkdir my-git
cd my-git
- rsync -rL rsync://rsync.kernel.org/pub/scm/git/git.git/ my-git .git
+ rsync -rL rsync://rsync.kernel.org/pub/scm/git/git.git/ .git
followed by
---------------------
Branches in git are really nothing more than pointers into the git
-object space from within the ",git/refs/" subdirectory, and as we
+object space from within the ".git/refs/" subdirectory, and as we
already discussed, the HEAD branch is nothing but a symlink to one of
these object pointers.
and nothing enforces it.
To show that as an example, let's go back to the git-tutorial archive we
-used earlier, and create a branch in it. You literally do that by just
-creating a new SHA1 reference file, and switch to it by just making the
-HEAD pointer point to it:
+used earlier, and create a branch in it. You do that by simply just
+saying that you want to check out a new branch:
- cat .git/HEAD > .git/refs/heads/mybranch
- ln -sf refs/heads/mybranch .git/HEAD
+ git checkout -b mybranch
-and you're done.
+will create a new branch based at the current HEAD position, and switch
+to it.
-Now, if you make the decision to start your new branch at some other
-point in the history than the current HEAD, you usually also want to
-actually switch the contents of your working directory to that point
-when you switch the head, and "git checkout" will do that for you:
-instead of switching the branch by hand with "ln -sf", you can just do
+[ Side note: if you make the decision to start your new branch at some
+ other point in the history than the current HEAD, you can do so by
+ just telling "git checkout" what the base of the checkout would be.
+ In other words, if you have an earlier tag or branch, you'd just do
- git checkout mybranch
+ git checkout -b mybranch earlier-branch
-which will basically "jump" to the branch specified, update your working
-directory to that state, and also make it become the new default HEAD.
+ and it would create the new branch "mybranch" at the earlier point,
+ and check out the state at that time. ]
You can always just jump back to your original "master" branch by doing
git checkout master
-and if you forget which branch you happen to be on, a simple
+(or any other branch-name, for that matter) and if you forget which
+branch you happen to be on, a simple
ls -l .git/HEAD
will tell you where it's pointing.
+NOTE! Sometimes you may wish to create a new branch _without_ actually
+checking it out and switching to it. If so, just use the command
+
+ git branch <branchname> [startingpoint]
+
+which will simply _create_ the branch, but will not do anything further.
+You can then later - once you decide that you want to actually develop
+on that branch - switch to that branch with a regular "git checkout"
+with the branchname as the argument.
+
Merging two branches
--------------------
that branch, and do some work there.
git checkout mybranch
- echo "Work, work, work" >> a
- git commit a
+ echo "Work, work, work" >>hello
+ git commit hello
-Here, we just added another line to "a", and we used a shorthand for
-both going a "git-update-cache a" and "git commit" by just giving the
-filename directly to "git commit".
+Here, we just added another line to "hello", and we used a shorthand for
+both going a "git-update-cache hello" and "git commit" by just giving the
+filename directly to "git commit".
Now, to make it a bit more interesting, let's assume that somebody else
does some work in the original branch, and simulate that by going back
git checkout master
-Here, take a moment to look at the contents of "a", and notice how they
+Here, take a moment to look at the contents of "hello", and notice how they
don't contain the work we just did in "mybranch" - because that work
hasn't happened in the "master" branch at all. Then do
- echo "Play, play, play" >> a
- echo "Lots of fun" >> b
- git commit a b
+ echo "Play, play, play" >>hello
+ echo "Lots of fun" >>example
+ git commit hello example
since the master branch is obviously in a much better mood.
file, which had no differences in the "mybranch" branch), and say:
Simple merge failed, trying Automatic merge
- Auto-merging a.
+ Auto-merging hello.
merge: warning: conflicts during merge
- ERROR: Merge conflict in a.
+ ERROR: Merge conflict in hello.
fatal: merge program failed
Automatic merge failed, fix up by hand
which is way too verbose, but it basically tells you that it failed the
really trivial merge ("Simple merge") and did an "Automatic merge"
-instead, but that too failed due to conflicts in "a".
+instead, but that too failed due to conflicts in "hello".
-Not to worry. It left the (trivial) conflict in "a" in the same form you
+Not to worry. It left the (trivial) conflict in "hello" in the same form you
should already be well used to if you've ever used CVS, so let's just
-open "a" in our editor (whatever that may be), and fix it up somehow.
-I'd suggest just making it so that "a" contains all four lines:
+open "hello" in our editor (whatever that may be), and fix it up somehow.
+I'd suggest just making it so that "hello" contains all four lines:
Hello World
It's a new day for git
and once you're happy with your manual merge, just do a
- git commit a
+ git commit hello
which will very loudly warn you that you're now committing a merge
(which is correct, so never mind), and you can write a small merge
message about your adventures in git-merge-land.
After you're done, start up "gitk --all" to see graphically what the
-history looks like. Notive that "mybranch" still exists, and you can
+history looks like. Notice that "mybranch" still exists, and you can
switch to it, and continue to work with it if you want to. The
"mybranch" branch will not contain the merge, but next time you merge it
from the "master" branch, git will know how you merged it, so you'll not
GIT URL
git://remote.machine/path/to/repo.git/
+
+ SSH URL
remote.machine:/path/to/repo.git/
Local directory
/path/to/repo.git/
-[ Side Note: currently, HTTP transport is slightly broken in
- that when the remote repository is "packed" they do not always
- work. But we have not talked about packing repository yet, so
- let's not worry too much about it for now. ]
-
[ Digression: you could do without using any branches at all, by
keeping as many local repositories as you would like to have
branches, and merging between them with "git pull", just like
course, you will pay the price of more disk usage to hold
multiple working trees, but disk space is cheap these days. ]
+It is likely that you will be pulling from the same remote
+repository from time to time. As a short hand, you can store
+the remote repository URL in a file under .git/branches/
+directory, like this:
+
+ mkdir -p .git/branches
+ echo rsync://kernel.org/pub/scm/git/git.git/ \
+ >.git/branches/linus
+
+and use the filename to "git pull" instead of the full URL.
+The contents of a file under .git/branches can even be a prefix
+of a full URL, like this:
+
+ echo rsync://kernel.org/pub/.../jgarzik/
+ >.git/branches/jgarzik
+
+Examples.
+
+ (1) git pull linus
+ (2) git pull linus tag v0.99.1
+ (3) git pull jgarzik/netdev-2.6.git/ e100
+
+the above are equivalent to:
+
+ (1) git pull rsync://kernel.org/pub/scm/git/git.git/ HEAD
+ (2) git pull rsync://kernel.org/pub/scm/git/git.git/ tag v0.99.1
+ (3) git pull rsync://kernel.org/pub/.../jgarzik/netdev-2.6.git e100
+
Publishing your work
--------------------
You can try running "find .git/objects -type f" before and after
you run "git prune-packed" if you are curious.
-[ Side Note: as we already mentioned, "git pull" is broken for
- some transports dealing with packed repositories right now, so
- do not run "git prune-packed" if you plan to give "git pull"
- access via HTTP transport for now. ]
+[ Side Note: "git pull" is slightly cumbersome for HTTP transport,
+ as a packed repository may contain relatively few objects in a
+ relatively large pack. If you expect many HTTP pulls from your
+ public repository you might want to repack & prune often, or
+ never. ]
If you run "git repack" again at this point, it will say
"Nothing to pack". Once you continue your development and
while, depending on how active your project is.
When a repository is synchronized via "git push" and "git pull",
-objects packed in the source repository is usually stored
+objects packed in the source repository are usually stored
unpacked in the destination, unless rsync transport is used.
Working with Others
-------------------
-A recommended work cycle for a "project lead" is like this:
+Although git is a truly distributed system, it is often
+convenient to organize your project with an informal hierarchy
+of developers. Linux kernel development is run this way. There
+is a nice illustration (page 17, "Merges to Mainline") in Randy
+Dunlap's presentation (http://tinyurl.com/a2jdg).
+
+It should be stressed that this hierarchy is purely "informal".
+There is nothing fundamental in git that enforces the "chain of
+patch flow" this hierarchy implies. You do not have to pull
+from only one remote repository.
+
+
+A recommended workflow for a "project lead" goes like this:
(1) Prepare your primary repository on your local machine. Your
work is done there.
repository.
(4) "git repack" the public repository. This establishes a big
- pack that contains the initial set of objects.
+ pack that contains the initial set of objects as the
+ baseline, and possibly "git prune-packed" if the transport
+ used for pulling from your repository supports packed
+ repositories.
- (5) Keep working in your primary repository, and push your
- changes to the public repository. Your changes include
- your own, patches you receive via e-mail, and merge resulting
- from pulling the "public" repositories of your "subsystem
- maintainers".
+ (5) Keep working in your primary repository. Your changes
+ include modifications of your own, patches you receive via
+ e-mails, and merges resulting from pulling the "public"
+ repositories of your "subsystem maintainers".
You can repack this private repository whenever you feel
like.
- (6) Every once in a while, "git repack" the public repository.
+ (6) Push your changes to the public repository, and announce it
+ to the public.
+
+ (7) Every once in a while, "git repack" the public repository.
Go back to step (5) and continue working.
-A recommended work cycle for a "subsystem maintainer" that
-works on that project and has own "public repository" is like
-this:
+
+A recommended work cycle for a "subsystem maintainer" who works
+on that project and has an own "public repository" goes like this:
(1) Prepare your work repository, by "git clone" the public
- repository of the "project lead".
+ repository of the "project lead". The URL used for the
+ initial cloning is stored in .git/branches/origin.
(2) Prepare a public repository accessible to others.
(3) Copy over the packed files from "project lead" public
- repository to your public repository by hand; this part is
- currently not automated.
+ repository to your public repository by hand; preferrably
+ use rsync for that task.
(4) Push into the public repository from your primary
- repository.
+ repository. Run "git repack", and possibly "git
+ prune-packed" if the transport used for pulling from your
+ repository supports packed repositories.
- (5) Keep working in your primary repository, and push your
- changes to your public repository, and ask your "project
- lead" to pull from it. Your changes include your own,
- patches you receive via e-mail, and merge resulting from
- pulling the "public" repositories of your "project lead"
- and possibly your "sub-subsystem maintainers".
+ (5) Keep working in your primary repository. Your changes
+ include modifications of your own, patches you receive via
+ e-mails, and merges resulting from pulling the "public"
+ repositories of your "project lead" and possibly your
+ "sub-subsystem maintainers".
You can repack this private repository whenever you feel
like.
- (6) Every once in a while, "git repack" the public repository.
+ (6) Push your changes to your public repository, and ask your
+ "project lead" and possibly your "sub-subsystem
+ maintainers" to pull from it.
+
+ (7) Every once in a while, "git repack" the public repository.
Go back to step (5) and continue working.
+
A recommended work cycle for an "individual developer" who does
not have a "public" repository is somewhat different. It goes
like this:
- (1) Prepare your work repositories, by "git clone" the public
- repository of the "project lead" (or "subsystem
- maintainer", if you work on a subsystem).
-
- (2) Copy .git/refs/master to .git/refs/upstream.
-
- (3) Do your work there. Make commits.
+ (1) Prepare your work repository, by "git clone" the public
+ repository of the "project lead" (or a "subsystem
+ maintainer", if you work on a subsystem). The URL used for
+ the initial cloning is stored in .git/branches/origin.
- (4) Run "git fetch" from the public repository of your upstream
- every once in a while. This does only the first half of
- "git pull" but does not merge. The head of the public
- repository is stored in .git/FETCH_HEAD. Copy it in
- .git/refs/heads/upstream.
+ (2) Do your work there. Make commits.
- (5) Use "git cherry" to see which ones of your patches were
- accepted, and/or use "git rebase" to port your unmerged
- changes forward to the updated upstream.
+ (3) Run "git fetch origin" from the public repository of your
+ upstream every once in a while. This does only the first
+ half of "git pull" but does not merge. The head of the
+ public repository is stored in .git/refs/heads/origin.
- (6) Use "git format-patch upstream" to prepare patches for
- e-mail submission to your upstream and send it out.
- Go back to step (3) and continue.
+ (4) Use "git cherry origin" to see which ones of your patches
+ were accepted, and/or use "git rebase origin" to port your
+ unmerged changes forward to the updated upstream.
-[Side Note: I think Cogito calls this upstream "origin".
- Somebody care to confirm or deny? ]
+ (5) Use "git format-patch origin" to prepare patches for e-mail
+ submission to your upstream and send it out. Go back to
+ step (2) and continue.
[ to be continued.. cvsimports ]