Andrew's git - gitweb.git/blob - Documentation/tutorial.txt

   1A short git tutorial
   2====================
   3v0.99.5, Aug 2005
   4
   5Introduction
   6------------
   7
   8This is trying to be a short tutorial on setting up and using a git
   9repository, mainly because being hands-on and using explicit examples is
  10often the best way of explaining what is going on.
  11
  12In normal life, most people wouldn't use the "core" git programs
  13directly, but rather script around them to make them more palatable. 
  14Understanding the core git stuff may help some people get those scripts
  15done, though, and it may also be instructive in helping people
  16understand what it is that the higher-level helper scripts are actually
  17doing. 
  18
  19The core git is often called "plumbing", with the prettier user
  20interfaces on top of it called "porcelain". You may not want to use the
  21plumbing directly very often, but it can be good to know what the
  22plumbing does for when the porcelain isn't flushing... 
  23
  24
  25Creating a git repository
  26-------------------------
  27
  28Creating a new git repository couldn't be easier: all git repositories start
  29out empty, and the only thing you need to do is find yourself a
  30subdirectory that you want to use as a working tree - either an empty
  31one for a totally new project, or an existing working tree that you want
  32to import into git. 
  33
  34For our first example, we're going to start a totally new repository from
  35scratch, with no pre-existing files, and we'll call it `git-tutorial`.
  36To start up, create a subdirectory for it, change into that
  37subdirectory, and initialize the git infrastructure with `git-init-db`:
  38
  39------------------------------------------------
  40mkdir git-tutorial
  41cd git-tutorial
  42git-init-db
  43------------------------------------------------
  44
  45to which git will reply
  46
  47        defaulting to local storage area
  48
  49which is just git's way of saying that you haven't been doing anything
  50strange, and that it will have created a local `.git` directory setup for
  51your new project. You will now have a `.git` directory, and you can
  52inspect that with `ls`. For your new empty project, it should show you
  53three entries, among other things:
  54
  55 - a symlink called `HEAD`, pointing to `refs/heads/master`
  56+
  57Don't worry about the fact that the file that the `HEAD` link points to
  58doesn't even exist yet -- you haven't created the commit that will
  59start your `HEAD` development branch yet.
  60
  61 - a subdirectory called `objects`, which will contain all the
  62   objects of your project. You should never have any real reason to
  63   look at the objects directly, but you might want to know that these
  64   objects are what contains all the real 'data' in your repository.
  65
  66 - a subdirectory called `refs`, which contains references to objects.
  67
  68In particular, the `refs` subdirectory will contain two other
  69subdirectories, named `heads` and `tags` respectively. They do
  70exactly what their names imply: they contain references to any number
  71of different 'heads' of development (aka 'branches'), and to any
  72'tags' that you have created to name specific versions in your
  73repository.
  74
  75One note: the special `master` head is the default branch, which is
  76why the `.git/HEAD` file was created as a symlink to it even if it
  77doesn't yet exist. Basically, the `HEAD` link is supposed to always
  78point to the branch you are working on right now, and you always
  79start out expecting to work on the `master` branch.
  80
  81However, this is only a convention, and you can name your branches
  82anything you want, and don't have to ever even 'have' a `master`
  83branch. A number of the git tools will assume that `.git/HEAD` is
  84valid, though.
  85
  86[NOTE]
  87An 'object' is identified by its 160-bit SHA1 hash, aka 'object name',
  88and a reference to an object is always the 40-byte hex
  89representation of that SHA1 name. The files in the `refs`
  90subdirectory are expected to contain these hex references
  91(usually with a final `\'\n\'` at the end), and you should thus
  92expect to see a number of 41-byte files containing these
  93references in these `refs` subdirectories when you actually start
  94populating your tree.
  95
  96You have now created your first git repository. Of course, since it's
  97empty, that's not very useful, so let's start populating it with data.
  98
  99
 100Populating a git repository
 101---------------------------
 102
 103We'll keep this simple and stupid, so we'll start off with populating a
 104few trivial files just to get a feel for it.
 105
 106Start off with just creating any random files that you want to maintain
 107in your git repository. We'll start off with a few bad examples, just to
 108get a feel for how this works:
 109
 110------------------------------------------------
 111echo "Hello World" >hello
 112echo "Silly example" >example
 113------------------------------------------------
 114
 115you have now created two files in your working tree (aka 'working directory'), but to
 116actually check in your hard work, you will have to go through two steps:
 117
 118 - fill in the 'index' file (aka 'cache') with the information about your
 119   working tree state.
 120
 121 - commit that index file as an object.
 122
 123The first step is trivial: when you want to tell git about any changes
 124to your working tree, you use the `git-update-cache` program. That
 125program normally just takes a list of filenames you want to update, but
 126to avoid trivial mistakes, it refuses to add new entries to the cache
 127(or remove existing ones) unless you explicitly tell it that you're
 128adding a new entry with the `\--add` flag (or removing an entry with the
 129`\--remove`) flag.
 130
 131So to populate the index with the two files you just created, you can do
 132
 133------------------------------------------------
 134git-update-cache --add hello example
 135------------------------------------------------
 136
 137and you have now told git to track those two files.
 138
 139In fact, as you did that, if you now look into your object directory,
 140you'll notice that git will have added two new objects to the object
 141database. If you did exactly the steps above, you should now be able to do
 142
 143        ls .git/objects/??/*
 144
 145and see two files:
 146
 147        .git/objects/55/7db03de997c86a4a028e1ebd3a1ceb225be238 
 148        .git/objects/f2/4c74a2e500f5ee1332c86b94199f52b1d1d962
 149
 150which correspond with the objects with names of 557db... and f24c7..
 151respectively.
 152
 153If you want to, you can use `git-cat-file` to look at those objects, but
 154you'll have to use the object name, not the filename of the object:
 155
 156        git-cat-file -t 557db03de997c86a4a028e1ebd3a1ceb225be238
 157
 158where the `-t` tells `git-cat-file` to tell you what the "type" of the
 159object is. Git will tell you that you have a "blob" object (ie just a
 160regular file), and you can see the contents with
 161
 162        git-cat-file "blob" 557db03
 163
 164which will print out "Hello World". The object 557db03 is nothing
 165more than the contents of your file `hello`.
 166
 167[NOTE]
 168Don't confuse that object with the file `hello` itself. The
 169object is literally just those specific *contents* of the file, and
 170however much you later change the contents in file `hello`, the object
 171we just looked at will never change. Objects are immutable.
 172
 173[NOTE]
 174The second example demonstrates that you can
 175abbreviate the object name to only the first several
 176hexadecimal digits in most places.
 177
 178Anyway, as we mentioned previously, you normally never actually take a
 179look at the objects themselves, and typing long 40-character hex
 180names is not something you'd normally want to do. The above digression
 181was just to show that `git-update-cache` did something magical, and
 182actually saved away the contents of your files into the git object
 183database.
 184
 185Updating the cache did something else too: it created a `.git/index`
 186file. This is the index that describes your current working tree, and
 187something you should be very aware of. Again, you normally never worry
 188about the index file itself, but you should be aware of the fact that
 189you have not actually really "checked in" your files into git so far,
 190you've only *told* git about them.
 191
 192However, since git knows about them, you can now start using some of the
 193most basic git commands to manipulate the files or look at their status. 
 194
 195In particular, let's not even check in the two files into git yet, we'll
 196start off by adding another line to `hello` first:
 197
 198------------------------------------------------
 199echo "It's a new day for git" >>hello
 200------------------------------------------------
 201
 202and you can now, since you told git about the previous state of `hello`, ask
 203git what has changed in the tree compared to your old index, using the
 204`git-diff-files` command:
 205
 206------------
 207git-diff-files
 208------------
 209
 210Oops. That wasn't very readable. It just spit out its own internal
 211version of a `diff`, but that internal version really just tells you
 212that it has noticed that "hello" has been modified, and that the old object
 213contents it had have been replaced with something else.
 214
 215To make it readable, we can tell git-diff-files to output the
 216differences as a patch, using the `-p` flag:
 217
 218------------
 219git-diff-files -p
 220------------
 221
 222which will spit out
 223
 224------------
 225diff --git a/hello b/hello
 226--- a/hello
 227+++ b/hello
 228@@ -1 +1,2 @@
 229 Hello World
 230+It's a new day for git
 231----
 232
 233i.e. the diff of the change we caused by adding another line to `hello`.
 234
 235In other words, `git-diff-files` always shows us the difference between
 236what is recorded in the index, and what is currently in the working
 237tree. That's very useful.
 238
 239A common shorthand for `git-diff-files -p` is to just write `git
 240diff`, which will do the same thing.
 241
 242
 243Committing git state
 244--------------------
 245
 246Now, we want to go to the next stage in git, which is to take the files
 247that git knows about in the index, and commit them as a real tree. We do
 248that in two phases: creating a 'tree' object, and committing that 'tree'
 249object as a 'commit' object together with an explanation of what the
 250tree was all about, along with information of how we came to that state.
 251
 252Creating a tree object is trivial, and is done with `git-write-tree`.
 253There are no options or other input: git-write-tree will take the
 254current index state, and write an object that describes that whole
 255index. In other words, we're now tying together all the different
 256filenames with their contents (and their permissions), and we're
 257creating the equivalent of a git "directory" object:
 258
 259------------------------------------------------
 260git-write-tree
 261------------------------------------------------
 262
 263and this will just output the name of the resulting tree, in this case
 264(if you have done exactly as I've described) it should be
 265
 266        8988da15d077d4829fc51d8544c097def6644dbb
 267
 268which is another incomprehensible object name. Again, if you want to,
 269you can use `git-cat-file -t 8988d\...` to see that this time the object
 270is not a "blob" object, but a "tree" object (you can also use
 271`git-cat-file` to actually output the raw object contents, but you'll see
 272mainly a binary mess, so that's less interesting).
 273
 274However -- normally you'd never use `git-write-tree` on its own, because
 275normally you always commit a tree into a commit object using the
 276`git-commit-tree` command. In fact, it's easier to not actually use
 277`git-write-tree` on its own at all, but to just pass its result in as an
 278argument to `git-commit-tree`.
 279
 280`git-commit-tree` normally takes several arguments -- it wants to know
 281what the 'parent' of a commit was, but since this is the first commit
 282ever in this new repository, and it has no parents, we only need to pass in
 283the object name of the tree. However, `git-commit-tree`
 284also wants to get a commit message
 285on its standard input, and it will write out the resulting object name for the
 286commit to its standard output.
 287
 288And this is where we start using the `.git/HEAD` file. The `HEAD` file is
 289supposed to contain the reference to the top-of-tree, and since that's
 290exactly what `git-commit-tree` spits out, we can do this all with a simple
 291shell pipeline:
 292
 293------------------------------------------------
 294echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD
 295------------------------------------------------
 296
 297which will say:
 298
 299        Committing initial tree 8988da15d077d4829fc51d8544c097def6644dbb
 300
 301just to warn you about the fact that it created a totally new commit
 302that is not related to anything else. Normally you do this only *once*
 303for a project ever, and all later commits will be parented on top of an
 304earlier commit, and you'll never see this "Committing initial tree"
 305message ever again.
 306
 307Again, normally you'd never actually do this by hand. There is a
 308helpful script called `git commit` that will do all of this for you. So
 309you could have just written `git commit`
 310instead, and it would have done the above magic scripting for you.
 311
 312
 313Making a change
 314---------------
 315
 316Remember how we did the `git-update-cache` on file `hello` and then we
 317changed `hello` afterward, and could compare the new state of `hello` with the
 318state we saved in the index file? 
 319
 320Further, remember how I said that `git-write-tree` writes the contents
 321of the *index* file to the tree, and thus what we just committed was in
 322fact the *original* contents of the file `hello`, not the new ones. We did
 323that on purpose, to show the difference between the index state, and the
 324state in the working tree, and how they don't have to match, even
 325when we commit things.
 326
 327As before, if we do `git-diff-files -p` in our git-tutorial project,
 328we'll still see the same difference we saw last time: the index file
 329hasn't changed by the act of committing anything. However, now that we
 330have committed something, we can also learn to use a new command:
 331`git-diff-cache`.
 332
 333Unlike `git-diff-files`, which showed the difference between the index
 334file and the working tree, `git-diff-cache` shows the differences
 335between a committed *tree* and either the index file or the working
 336tree. In other words, `git-diff-cache` wants a tree to be diffed
 337against, and before we did the commit, we couldn't do that, because we
 338didn't have anything to diff against. 
 339
 340But now we can do
 341
 342        git-diff-cache -p HEAD
 343
 344(where `-p` has the same meaning as it did in `git-diff-files`), and it
 345will show us the same difference, but for a totally different reason. 
 346Now we're comparing the working tree not against the index file,
 347but against the tree we just wrote. It just so happens that those two
 348are obviously the same, so we get the same result.
 349
 350Again, because this is a common operation, you can also just shorthand
 351it with
 352
 353        git diff HEAD
 354
 355which ends up doing the above for you.
 356
 357In other words, `git-diff-cache` normally compares a tree against the
 358working tree, but when given the `\--cached` flag, it is told to
 359instead compare against just the index cache contents, and ignore the
 360current working tree state entirely. Since we just wrote the index
 361file to HEAD, doing `git-diff-cache \--cached -p HEAD` should thus return
 362an empty set of differences, and that's exactly what it does. 
 363
 364[NOTE]
 365================
 366`git-diff-cache` really always uses the index for its
 367comparisons, and saying that it compares a tree against the working
 368tree is thus not strictly accurate. In particular, the list of
 369files to compare (the "meta-data") *always* comes from the index file,
 370regardless of whether the `\--cached` flag is used or not. The `\--cached`
 371flag really only determines whether the file *contents* to be compared
 372come from the working tree or not.
 373
 374This is not hard to understand, as soon as you realize that git simply
 375never knows (or cares) about files that it is not told about
 376explicitly. Git will never go *looking* for files to compare, it
 377expects you to tell it what the files are, and that's what the index
 378is there for.
 379================
 380
 381However, our next step is to commit the *change* we did, and again, to
 382understand what's going on, keep in mind the difference between "working
 383tree contents", "index file" and "committed tree". We have changes
 384in the working tree that we want to commit, and we always have to
 385work through the index file, so the first thing we need to do is to
 386update the index cache:
 387
 388------------------------------------------------
 389git-update-cache hello
 390------------------------------------------------
 391
 392(note how we didn't need the `\--add` flag this time, since git knew
 393about the file already).
 394
 395Note what happens to the different `git-diff-\*` versions here. After
 396we've updated `hello` in the index, `git-diff-files -p` now shows no
 397differences, but `git-diff-cache -p HEAD` still *does* show that the
 398current state is different from the state we committed. In fact, now
 399`git-diff-cache` shows the same difference whether we use the `--cached`
 400flag or not, since now the index is coherent with the working tree.
 401
 402Now, since we've updated `hello` in the index, we can commit the new
 403version. We could do it by writing the tree by hand again, and
 404committing the tree (this time we'd have to use the `-p HEAD` flag to
 405tell commit that the HEAD was the *parent* of the new commit, and that
 406this wasn't an initial commit any more), but you've done that once
 407already, so let's just use the helpful script this time:
 408
 409------------------------------------------------
 410git commit
 411------------------------------------------------
 412
 413which starts an editor for you to write the commit message and tells you
 414a bit about what you have done.
 415
 416Write whatever message you want, and all the lines that start with '#'
 417will be pruned out, and the rest will be used as the commit message for
 418the change. If you decide you don't want to commit anything after all at
 419this point (you can continue to edit things and update the cache), you
 420can just leave an empty message. Otherwise `git commit` will commit
 421the change for you.
 422
 423You've now made your first real git commit. And if you're interested in
 424looking at what `git commit` really does, feel free to investigate:
 425it's a few very simple shell scripts to generate the helpful (?) commit
 426message headers, and a few one-liners that actually do the
 427commit itself (`git-commit-script`).
 428
 429
 430Checking it out
 431---------------
 432
 433While creating changes is useful, it's even more useful if you can tell
 434later what changed. The most useful command for this is another of the
 435`diff` family, namely `git-diff-tree`.
 436
 437`git-diff-tree` can be given two arbitrary trees, and it will tell you the
 438differences between them. Perhaps even more commonly, though, you can
 439give it just a single commit object, and it will figure out the parent
 440of that commit itself, and show the difference directly. Thus, to get
 441the same diff that we've already seen several times, we can now do
 442
 443        git-diff-tree -p HEAD
 444
 445(again, `-p` means to show the difference as a human-readable patch),
 446and it will show what the last commit (in `HEAD`) actually changed.
 447
 448More interestingly, you can also give `git-diff-tree` the `-v` flag, which
 449tells it to also show the commit message and author and date of the
 450commit, and you can tell it to show a whole series of diffs.
 451Alternatively, you can tell it to be "silent", and not show the diffs at
 452all, but just show the actual commit message.
 453
 454In fact, together with the `git-rev-list` program (which generates a
 455list of revisions), `git-diff-tree` ends up being a veritable fount of
 456changes. A trivial (but very useful) script called `git-whatchanged` is
 457included with git which does exactly this, and shows a log of recent
 458activities.
 459
 460To see the whole history of our pitiful little git-tutorial project, you
 461can do
 462
 463        git log
 464
 465which shows just the log messages, or if we want to see the log together
 466with the associated patches use the more complex (and much more
 467powerful)
 468
 469        git-whatchanged -p --root
 470
 471and you will see exactly what has changed in the repository over its
 472short history. 
 473
 474[NOTE]
 475The `\--root` flag is a flag to `git-diff-tree` to tell it to
 476show the initial aka 'root' commit too. Normally you'd probably not
 477want to see the initial import diff, but since the tutorial project
 478was started from scratch and is so small, we use it to make the result
 479a bit more interesting.
 480
 481With that, you should now be having some inkling of what git does, and
 482can explore on your own.
 483
 484[NOTE]
 485Most likely, you are not directly using the core
 486git Plumbing commands, but using Porcelain like Cogito on top
 487of it. Cogito works a bit differently and you usually do not
 488have to run `git-update-cache` yourself for changed files (you
 489do tell underlying git about additions and removals via
 490`cg-add` and `cg-rm` commands). Just before you make a commit
 491with `cg-commit`, Cogito figures out which files you modified,
 492and runs `git-update-cache` on them for you.
 493
 494
 495Tagging a version
 496-----------------
 497
 498In git, there are two kinds of tags, a "light" one, and an "annotated tag".
 499
 500A "light" tag is technically nothing more than a branch, except we put
 501it in the `.git/refs/tags/` subdirectory instead of calling it a `head`.
 502So the simplest form of tag involves nothing more than
 503
 504------------------------------------------------
 505git tag my-first-tag
 506------------------------------------------------
 507
 508which just writes the current `HEAD` into the `.git/refs/tags/my-first-tag`
 509file, after which point you can then use this symbolic name for that
 510particular state. You can, for example, do
 511
 512        git diff my-first-tag
 513
 514to diff your current state against that tag (which at this point will
 515obviously be an empty diff, but if you continue to develop and commit
 516stuff, you can use your tag as an "anchor-point" to see what has changed
 517since you tagged it.
 518
 519An "annotated tag" is actually a real git object, and contains not only a
 520pointer to the state you want to tag, but also a small tag name and
 521message, along with optionally a PGP signature that says that yes,
 522you really did
 523that tag. You create these annotated tags with either the `-a` or
 524`-s` flag to `git tag`:
 525
 526        git tag -s <tagname>
 527
 528which will sign the current `HEAD` (but you can also give it another
 529argument that specifies the thing to tag, ie you could have tagged the
 530current `mybranch` point by using `git tag <tagname> mybranch`).
 531
 532You normally only do signed tags for major releases or things
 533like that, while the light-weight tags are useful for any marking you
 534want to do -- any time you decide that you want to remember a certain
 535point, just create a private tag for it, and you have a nice symbolic
 536name for the state at that point.
 537
 538
 539Copying repositories
 540--------------------
 541
 542Git repositories are normally totally self-sufficient, and it's worth noting
 543that unlike CVS, for example, there is no separate notion of
 544"repository" and "working tree". A git repository normally *is* the
 545working tree, with the local git information hidden in the `.git`
 546subdirectory. There is nothing else. What you see is what you got.
 547
 548[NOTE]
 549You can tell git to split the git internal information from
 550the directory that it tracks, but we'll ignore that for now: it's not
 551how normal projects work, and it's really only meant for special uses.
 552So the mental model of "the git information is always tied directly to
 553the working tree that it describes" may not be technically 100%
 554accurate, but it's a good model for all normal use.
 555
 556This has two implications: 
 557
 558 - if you grow bored with the tutorial repository you created (or you've
 559   made a mistake and want to start all over), you can just do simple
 560
 561        rm -rf git-tutorial
 562+
 563and it will be gone. There's no external repository, and there's no
 564history outside the project you created.
 565
 566 - if you want to move or duplicate a git repository, you can do so. There
 567   is `git clone` command, but if all you want to do is just to
 568   create a copy of your repository (with all the full history that
 569   went along with it), you can do so with a regular
 570   `cp -a git-tutorial new-git-tutorial`.
 571+
 572Note that when you've moved or copied a git repository, your git index
 573file (which caches various information, notably some of the "stat"
 574information for the files involved) will likely need to be refreshed.
 575So after you do a `cp -a` to create a new copy, you'll want to do
 576
 577        git-update-cache --refresh
 578+
 579in the new repository to make sure that the index file is up-to-date.
 580
 581Note that the second point is true even across machines. You can
 582duplicate a remote git repository with *any* regular copy mechanism, be it
 583`scp`, `rsync` or `wget`.
 584
 585When copying a remote repository, you'll want to at a minimum update the
 586index cache when you do this, and especially with other peoples'
 587repositories you often want to make sure that the index cache is in some
 588known state (you don't know *what* they've done and not yet checked in),
 589so usually you'll precede the `git-update-cache` with a
 590
 591        git-read-tree --reset HEAD
 592        git-update-cache --refresh
 593
 594which will force a total index re-build from the tree pointed to by `HEAD`.
 595It resets the index contents to `HEAD`, and then the `git-update-cache`
 596makes sure to match up all index entries with the checked-out files.
 597If the original repository had uncommitted changes in its
 598working tree, `git-update-cache --refresh` notices them and
 599tells you they need to be updated.
 600
 601The above can also be written as simply
 602
 603        git reset
 604
 605and in fact a lot of the common git command combinations can be scripted
 606with the `git xyz` interfaces, and you can learn things by just looking
 607at what the `git-*-script` scripts do (`git reset` is the above two lines
 608implemented in `git-reset-script`, but some things like `git status` and
 609`git commit` are slightly more complex scripts around the basic git
 610commands). 
 611
 612Many (most?) public remote repositories will not contain any of
 613the checked out files or even an index file, and will *only* contain the
 614actual core git files. Such a repository usually doesn't even have the
 615`.git` subdirectory, but has all the git files directly in the
 616repository. 
 617
 618To create your own local live copy of such a "raw" git repository, you'd
 619first create your own subdirectory for the project, and then copy the
 620raw repository contents into the `.git` directory. For example, to
 621create your own copy of the git repository, you'd do the following
 622
 623        mkdir my-git
 624        cd my-git
 625        rsync -rL rsync://rsync.kernel.org/pub/scm/git/git.git/ .git
 626
 627followed by 
 628
 629        git-read-tree HEAD
 630
 631to populate the index. However, now you have populated the index, and
 632you have all the git internal files, but you will notice that you don't
 633actually have any of the working tree files to work on. To get
 634those, you'd check them out with
 635
 636        git-checkout-cache -u -a
 637
 638where the `-u` flag means that you want the checkout to keep the index
 639up-to-date (so that you don't have to refresh it afterward), and the
 640`-a` flag means "check out all files" (if you have a stale copy or an
 641older version of a checked out tree you may also need to add the `-f`
 642flag first, to tell git-checkout-cache to *force* overwriting of any old
 643files). 
 644
 645Again, this can all be simplified with
 646
 647        git clone rsync://rsync.kernel.org/pub/scm/git/git.git/ my-git
 648        cd my-git
 649        git checkout
 650
 651which will end up doing all of the above for you.
 652
 653You have now successfully copied somebody else's (mine) remote
 654repository, and checked it out. 
 655
 656
 657Creating a new branch
 658---------------------
 659
 660Branches in git are really nothing more than pointers into the git
 661object database from within the `.git/refs/` subdirectory, and as we
 662already discussed, the `HEAD` branch is nothing but a symlink to one of
 663these object pointers. 
 664
 665You can at any time create a new branch by just picking an arbitrary
 666point in the project history, and just writing the SHA1 name of that
 667object into a file under `.git/refs/heads/`. You can use any filename you
 668want (and indeed, subdirectories), but the convention is that the
 669"normal" branch is called `master`. That's just a convention, though,
 670and nothing enforces it. 
 671
 672To show that as an example, let's go back to the git-tutorial repository we
 673used earlier, and create a branch in it. You do that by simply just
 674saying that you want to check out a new branch:
 675
 676------------
 677git checkout -b mybranch
 678------------
 679
 680will create a new branch based at the current `HEAD` position, and switch
 681to it. 
 682
 683[NOTE]
 684================================================
 685If you make the decision to start your new branch at some
 686other point in the history than the current `HEAD`, you can do so by
 687just telling `git checkout` what the base of the checkout would be.
 688In other words, if you have an earlier tag or branch, you'd just do
 689
 690        git checkout -b mybranch earlier-commit
 691
 692and it would create the new branch `mybranch` at the earlier commit,
 693and check out the state at that time.
 694================================================
 695
 696You can always just jump back to your original `master` branch by doing
 697
 698        git checkout master
 699
 700(or any other branch-name, for that matter) and if you forget which
 701branch you happen to be on, a simple
 702
 703        ls -l .git/HEAD
 704
 705will tell you where it's pointing. To get the list of branches
 706you have, you can say
 707
 708        git branch
 709
 710which is nothing more than a simple script around `ls .git/refs/heads`.
 711There will be asterisk in front of the branch you are currently on.
 712
 713Sometimes you may wish to create a new branch _without_ actually
 714checking it out and switching to it. If so, just use the command
 715
 716        git branch <branchname> [startingpoint]
 717
 718which will simply _create_ the branch, but will not do anything further. 
 719You can then later -- once you decide that you want to actually develop
 720on that branch -- switch to that branch with a regular `git checkout`
 721with the branchname as the argument.
 722
 723
 724Merging two branches
 725--------------------
 726
 727One of the ideas of having a branch is that you do some (possibly
 728experimental) work in it, and eventually merge it back to the main
 729branch. So assuming you created the above `mybranch` that started out
 730being the same as the original `master` branch, let's make sure we're in
 731that branch, and do some work there.
 732
 733------------------------------------------------
 734git checkout mybranch
 735echo "Work, work, work" >>hello
 736git commit -m 'Some work.' hello
 737------------------------------------------------
 738
 739Here, we just added another line to `hello`, and we used a shorthand for
 740both going a `git-update-cache hello` and `git commit` by just giving the
 741filename directly to `git commit`. The `-m` flag is to give the
 742commit log message from the command line.
 743
 744Now, to make it a bit more interesting, let's assume that somebody else
 745does some work in the original branch, and simulate that by going back
 746to the master branch, and editing the same file differently there:
 747
 748------------
 749git checkout master
 750------------
 751
 752Here, take a moment to look at the contents of `hello`, and notice how they
 753don't contain the work we just did in `mybranch` -- because that work
 754hasn't happened in the `master` branch at all. Then do
 755
 756------------
 757echo "Play, play, play" >>hello
 758echo "Lots of fun" >>example
 759git commit -m 'Some fun.' hello example
 760------------
 761
 762since the master branch is obviously in a much better mood.
 763
 764Now, you've got two branches, and you decide that you want to merge the
 765work done. Before we do that, let's introduce a cool graphical tool that
 766helps you view what's going on:
 767
 768        gitk --all
 769
 770will show you graphically both of your branches (that's what the `\--all`
 771means: normally it will just show you your current `HEAD`) and their
 772histories. You can also see exactly how they came to be from a common
 773source. 
 774
 775Anyway, let's exit `gitk` (`^Q` or the File menu), and decide that we want
 776to merge the work we did on the `mybranch` branch into the `master`
 777branch (which is currently our `HEAD` too). To do that, there's a nice
 778script called `git resolve`, which wants to know which branches you want
 779to resolve and what the merge is all about:
 780
 781------------
 782git resolve HEAD mybranch "Merge work in mybranch"
 783------------
 784
 785where the third argument is going to be used as the commit message if
 786the merge can be resolved automatically.
 787
 788Now, in this case we've intentionally created a situation where the
 789merge will need to be fixed up by hand, though, so git will do as much
 790of it as it can automatically (which in this case is just merge the `example`
 791file, which had no differences in the `mybranch` branch), and say:
 792
 793        Simple merge failed, trying Automatic merge
 794        Auto-merging hello.
 795        merge: warning: conflicts during merge
 796        ERROR: Merge conflict in hello.
 797        fatal: merge program failed
 798        Automatic merge failed, fix up by hand
 799
 800which is way too verbose, but it basically tells you that it failed the
 801really trivial merge ("Simple merge") and did an "Automatic merge"
 802instead, but that too failed due to conflicts in `hello`.
 803
 804Not to worry. It left the (trivial) conflict in `hello` in the same form you
 805should already be well used to if you've ever used CVS, so let's just
 806open `hello` in our editor (whatever that may be), and fix it up somehow.
 807I'd suggest just making it so that `hello` contains all four lines:
 808
 809------------
 810Hello World
 811It's a new day for git
 812Play, play, play
 813Work, work, work
 814------------
 815
 816and once you're happy with your manual merge, just do a
 817
 818------------
 819git commit hello
 820------------
 821
 822which will very loudly warn you that you're now committing a merge
 823(which is correct, so never mind), and you can write a small merge
 824message about your adventures in git-merge-land.
 825
 826After you're done, start up `gitk --all` to see graphically what the
 827history looks like. Notice that `mybranch` still exists, and you can
 828switch to it, and continue to work with it if you want to. The
 829`mybranch` branch will not contain the merge, but next time you merge it
 830from the `master` branch, git will know how you merged it, so you'll not
 831have to do _that_ merge again.
 832
 833Another useful tool, especially if you do not always work in X-Window
 834environment, is `git show-branch`.
 835
 836------------------------------------------------
 837$ git show-branch master mybranch
 838* [master] Merged "mybranch" changes.
 839 ! [mybranch] Some work.
 840--
 841+  [master] Merged "mybranch" changes.
 842+  [master~1] Some fun.
 843++ [mybranch] Some work.
 844------------------------------------------------
 845
 846The first two lines indicate that it is showing the two branches
 847and the first line of the commit log message from their
 848top-of-the-tree commits, you are currently on `master` branch
 849(notice the asterisk `*` character), and the first column for
 850the later output lines is used to show commits contained in the
 851`master` branch, and the second column for the `mybranch`
 852branch. Three commits are shown along with their log messages.
 853All of them have plus `+` characters in the first column, which
 854means they are now part of the `master` branch. Only the "Some
 855work" commit has the plus `+` character in the second column,
 856because `mybranch` has not been merged to incorporate these
 857commits from the master branch.
 858
 859Now, let's pretend you are the one who did all the work in
 860`mybranch`, and the fruit of your hard work has finally been merged
 861to the `master` branch. Let's go back to `mybranch`, and run
 862resolve to get the "upstream changes" back to your branch.
 863
 864        git checkout mybranch
 865        git resolve HEAD master "Merge upstream changes."
 866
 867This outputs something like this (the actual commit object names
 868would be different)
 869
 870        Updating from ae3a2da... to a80b4aa....
 871         example |    1 +
 872         hello   |    1 +
 873         2 files changed, 2 insertions(+), 0 deletions(-)
 874
 875Because your branch did not contain anything more than what are
 876already merged into the `master` branch, the resolve operation did
 877not actually do a merge. Instead, it just updated the top of
 878the tree of your branch to that of the `master` branch. This is
 879often called 'fast forward' merge.
 880
 881You can run `gitk --all` again to see how the commit ancestry
 882looks like, or run `show-branch`, which tells you this.
 883
 884------------------------------------------------
 885$ git show-branch master mybranch
 886! [master] Merged "mybranch" changes.
 887 * [mybranch] Merged "mybranch" changes.
 888--
 889++ [master] Merged "mybranch" changes.
 890------------------------------------------------
 891
 892
 893Merging external work
 894---------------------
 895
 896It's usually much more common that you merge with somebody else than
 897merging with your own branches, so it's worth pointing out that git
 898makes that very easy too, and in fact, it's not that different from
 899doing a `git resolve`. In fact, a remote merge ends up being nothing
 900more than "fetch the work from a remote repository into a temporary tag"
 901followed by a `git resolve`.
 902
 903Fetching from a remote repository is done by, unsurprisingly,
 904`git fetch`:
 905
 906        git fetch <remote-repository>
 907
 908One of the following transports can be used to name the
 909repository to download from:
 910
 911Rsync::
 912        `rsync://remote.machine/path/to/repo.git/`
 913+
 914Rsync transport is usable for both uploading and downloading,
 915but is completely unaware of what git does, and can produce
 916unexpected results when you download from the public repository
 917while the repository owner is uploading into it via `rsync`
 918transport.  Most notably, it could update the files under
 919`refs/` which holds the object name of the topmost commits
 920before uploading the files in `objects/` -- the downloader would
 921obtain head commit object name while that object itself is still
 922not available in the repository.  For this reason, it is
 923considered deprecated.
 924
 925SSH::
 926        `remote.machine:/path/to/repo.git/` or
 927+
 928`ssh://remote.machine/path/to/repo.git/`
 929+
 930This transport can be used for both uploading and downloading,
 931and requires you to have a log-in privilege over `ssh` to the
 932remote machine.  It finds out the set of objects the other side
 933lacks by exchanging the head commits both ends have and
 934transfers (close to) minimum set of objects.  It is by far the
 935most efficient way to exchange git objects between repositories.
 936
 937Local directory::
 938        `/path/to/repo.git/`
 939+
 940This transport is the same as SSH transport but uses `sh` to run
 941both ends on the local machine instead of running other end on
 942the remote machine via `ssh`.
 943
 944GIT Native::
 945        `git://remote.machine/path/to/repo.git/`
 946+
 947This transport was designed for anonymous downloading.  Like SSH
 948transport, it finds out the set of objects the downstream side
 949lacks and transfers (close to) minimum set of objects.
 950
 951HTTP(s)::
 952        `http://remote.machine/path/to/repo.git/`
 953+
 954HTTP and HTTPS transport are used only for downloading.  They
 955first obtain the topmost commit object name from the remote site
 956by looking at `repo.git/info/refs` file, tries to obtain the
 957commit object by downloading from `repo.git/objects/xx/xxx\...`
 958using the object name of that commit object.  Then it reads the
 959commit object to find out its parent commits and the associate
 960tree object; it repeats this process until it gets all the
 961necessary objects.  Because of this behaviour, they are
 962sometimes also called 'commit walkers'.
 963+
 964The 'commit walkers' are sometimes also called 'dumb
 965transports', because they do not require any GIT aware smart
 966server like GIT Native transport does.  Any stock HTTP server
 967would suffice.
 968+
 969There are (confusingly enough) `git-ssh-pull` and `git-ssh-push`
 970programs, which are 'commit walkers'; they outlived their
 971usefulness when GIT Native and SSH transports were introduced,
 972and not used by `git pull` or `git push` scripts.
 973
 974Once you fetch from the remote repository, you `resolve` that
 975with your current branch.
 976
 977However -- it's such a common thing to `fetch` and then
 978immediately `resolve`, that it's called `git pull`, and you can
 979simply do
 980
 981        git pull <remote-repository>
 982
 983and optionally give a branch-name for the remote end as a second
 984argument.
 985
 986[NOTE]
 987You could do without using any branches at all, by
 988keeping as many local repositories as you would like to have
 989branches, and merging between them with `git pull`, just like
 990you merge between branches. The advantage of this approach is
 991that it lets you keep set of files for each `branch` checked
 992out and you may find it easier to switch back and forth if you
 993juggle multiple lines of development simultaneously. Of
 994course, you will pay the price of more disk usage to hold
 995multiple working trees, but disk space is cheap these days.
 996
 997[NOTE]
 998You could even pull from your own repository by
 999giving '.' as <remote-repository> parameter to `git pull`.
1000
1001It is likely that you will be pulling from the same remote
1002repository from time to time. As a short hand, you can store
1003the remote repository URL in a file under .git/remotes/
1004directory, like this:
1005
1006------------------------------------------------
1007mkdir -p .git/remotes/
1008cat >.git/remotes/linus <<\EOF
1009URL: http://www.kernel.org/pub/scm/git/git.git/
1010EOF
1011------------------------------------------------
1012
1013and use the filename to `git pull` instead of the full URL.
1014The URL specified in such file can even be a prefix
1015of a full URL, like this:
1016
1017------------------------------------------------
1018cat >.git/remotes/jgarzik <<\EOF
1019URL: http://www.kernel.org/pub/scm/linux/git/jgarzik/
1020EOF
1021------------------------------------------------
1022
1023
1024Examples.
1025
1026. `git pull linus`
1027. `git pull linus tag v0.99.1`
1028. `git pull jgarzik/netdev-2.6.git/ e100`
1029
1030the above are equivalent to:
1031
1032. `git pull http://www.kernel.org/pub/scm/git/git.git/ HEAD`
1033. `git pull http://www.kernel.org/pub/scm/git/git.git/ tag v0.99.1`
1034. `git pull http://www.kernel.org/pub/.../jgarzik/netdev-2.6.git e100`
1035
1036
1037Publishing your work
1038--------------------
1039
1040So we can use somebody else's work from a remote repository; but
1041how can *you* prepare a repository to let other people pull from
1042it?
1043
1044Your do your real work in your working tree that has your
1045primary repository hanging under it as its `.git` subdirectory.
1046You *could* make that repository accessible remotely and ask
1047people to pull from it, but in practice that is not the way
1048things are usually done. A recommended way is to have a public
1049repository, make it reachable by other people, and when the
1050changes you made in your primary working tree are in good shape,
1051update the public repository from it. This is often called
1052'pushing'.
1053
1054[NOTE]
1055This public repository could further be mirrored, and that is
1056how git repositories at `kernel.org` are managed.
1057
1058Publishing the changes from your local (private) repository to
1059your remote (public) repository requires a write privilege on
1060the remote machine. You need to have an SSH account there to
1061run a single command, `git-receive-pack`.
1062
1063First, you need to create an empty repository on the remote
1064machine that will house your public repository. This empty
1065repository will be populated and be kept up-to-date by pushing
1066into it later. Obviously, this repository creation needs to be
1067done only once.
1068
1069[NOTE]
1070`git push` uses a pair of programs,
1071`git-send-pack` on your local machine, and `git-receive-pack`
1072on the remote machine. The communication between the two over
1073the network internally uses an SSH connection.
1074
1075Your private repository's GIT directory is usually `.git`, but
1076your public repository is often named after the project name,
1077i.e. `<project>.git`. Let's create such a public repository for
1078project `my-git`. After logging into the remote machine, create
1079an empty directory:
1080
1081        mkdir my-git.git
1082
1083Then, make that directory into a GIT repository by running
1084`git init-db`, but this time, since its name is not the usual
1085`.git`, we do things slightly differently:
1086
1087        GIT_DIR=my-git.git git-init-db
1088
1089Make sure this directory is available for others you want your
1090changes to be pulled by via the transport of your choice. Also
1091you need to make sure that you have the `git-receive-pack`
1092program on the `$PATH`.
1093
1094[NOTE]
1095Many installations of sshd do not invoke your shell as the login
1096shell when you directly run programs; what this means is that if
1097your login shell is `bash`, only `.bashrc` is read and not
1098`.bash_profile`. As a workaround, make sure `.bashrc` sets up
1099`$PATH` so that you can run `git-receive-pack` program.
1100
1101Your "public repository" is now ready to accept your changes.
1102Come back to the machine you have your private repository. From
1103there, run this command:
1104
1105        git push <public-host>:/path/to/my-git.git master
1106
1107This synchronizes your public repository to match the named
1108branch head (i.e. `master` in this case) and objects reachable
1109from them in your current repository.
1110
1111As a real example, this is how I update my public git
1112repository. Kernel.org mirror network takes care of the
1113propagation to other publicly visible machines:
1114
1115        git push master.kernel.org:/pub/scm/git/git.git/ 
1116
1117
1118Packing your repository
1119-----------------------
1120
1121Earlier, we saw that one file under `.git/objects/??/` directory
1122is stored for each git object you create. This representation
1123is efficient to create atomically and safely, but
1124not so convenient to transport over the network. Since git objects are
1125immutable once they are created, there is a way to optimize the
1126storage by "packing them together". The command
1127
1128        git repack
1129
1130will do it for you. If you followed the tutorial examples, you
1131would have accumulated about 17 objects in `.git/objects/??/`
1132directories by now. `git repack` tells you how many objects it
1133packed, and stores the packed file in `.git/objects/pack`
1134directory.
1135
1136[NOTE]
1137You will see two files, `pack-\*.pack` and `pack-\*.idx`,
1138in `.git/objects/pack` directory. They are closely related to
1139each other, and if you ever copy them by hand to a different
1140repository for whatever reason, you should make sure you copy
1141them together. The former holds all the data from the objects
1142in the pack, and the latter holds the index for random
1143access.
1144
1145If you are paranoid, running `git-verify-pack` command would
1146detect if you have a corrupt pack, but do not worry too much.
1147Our programs are always perfect ;-).
1148
1149Once you have packed objects, you do not need to leave the
1150unpacked objects that are contained in the pack file anymore.
1151
1152        git prune-packed
1153
1154would remove them for you.
1155
1156You can try running `find .git/objects -type f` before and after
1157you run `git prune-packed` if you are curious.  Also `git
1158count-objects` would tell you how many unpacked objects are in
1159your repository and how much space they are consuming.
1160
1161[NOTE]
1162`git pull` is slightly cumbersome for HTTP transport, as a
1163packed repository may contain relatively few objects in a
1164relatively large pack. If you expect many HTTP pulls from your
1165public repository you might want to repack & prune often, or
1166never.
1167
1168If you run `git repack` again at this point, it will say
1169"Nothing to pack". Once you continue your development and
1170accumulate the changes, running `git repack` again will create a
1171new pack, that contains objects created since you packed your
1172repository the last time. We recommend that you pack your project
1173soon after the initial import (unless you are starting your
1174project from scratch), and then run `git repack` every once in a
1175while, depending on how active your project is.
1176
1177When a repository is synchronized via `git push` and `git pull`
1178objects packed in the source repository are usually stored
1179unpacked in the destination, unless rsync transport is used.
1180While this allows you to use different packing strategies on
1181both ends, it also means you may need to repack both
1182repositories every once in a while.
1183
1184
1185Working with Others
1186-------------------
1187
1188Although git is a truly distributed system, it is often
1189convenient to organize your project with an informal hierarchy
1190of developers. Linux kernel development is run this way. There
1191is a nice illustration (page 17, "Merges to Mainline") in Randy
1192Dunlap's presentation (`http://tinyurl.com/a2jdg`).
1193
1194It should be stressed that this hierarchy is purely *informal*.
1195There is nothing fundamental in git that enforces the "chain of
1196patch flow" this hierarchy implies. You do not have to pull
1197from only one remote repository.
1198
1199A recommended workflow for a "project lead" goes like this:
1200
12011. Prepare your primary repository on your local machine. Your
1202   work is done there.
1203
12042. Prepare a public repository accessible to others.
1205+
1206If other people are pulling from your repository over dumb
1207transport protocols, you need to keep this repository 'dumb
1208transport friendly'.  After `git init-db`,
1209`$GIT_DIR/hooks/post-update` copied from the standard templates
1210would contain a call to `git-update-server-info` but the
1211`post-update` hook itself is disabled by default -- enable it
1212with `chmod +x post-update`.
1213
12143. Push into the public repository from your primary
1215   repository.
1216
12174. `git repack` the public repository. This establishes a big
1218   pack that contains the initial set of objects as the
1219   baseline, and possibly `git prune` if the transport
1220   used for pulling from your repository supports packed
1221   repositories.
1222
12235. Keep working in your primary repository. Your changes
1224   include modifications of your own, patches you receive via
1225   e-mails, and merges resulting from pulling the "public"
1226   repositories of your "subsystem maintainers".
1227+
1228You can repack this private repository whenever you feel like.
1229
12306. Push your changes to the public repository, and announce it
1231   to the public.
1232
12337. Every once in a while, "git repack" the public repository.
1234   Go back to step 5. and continue working.
1235
1236
1237A recommended work cycle for a "subsystem maintainer" who works
1238on that project and has an own "public repository" goes like this:
1239
12401. Prepare your work repository, by `git clone` the public
1241   repository of the "project lead". The URL used for the
1242   initial cloning is stored in `.git/remotes/origin`.
1243
12442. Prepare a public repository accessible to others, just like
1245   the "project lead" person does.
1246
12473. Copy over the packed files from "project lead" public
1248   repository to your public repository.
1249
12504. Push into the public repository from your primary
1251   repository. Run `git repack`, and possibly `git prune` if the
1252   transport used for pulling from your repository supports
1253   packed repositories.
1254
12555. Keep working in your primary repository. Your changes
1256   include modifications of your own, patches you receive via
1257   e-mails, and merges resulting from pulling the "public"
1258   repositories of your "project lead" and possibly your
1259   "sub-subsystem maintainers".
1260+
1261You can repack this private repository whenever you feel
1262like.
1263
12646. Push your changes to your public repository, and ask your
1265   "project lead" and possibly your "sub-subsystem
1266   maintainers" to pull from it.
1267
12687. Every once in a while, `git repack` the public repository.
1269   Go back to step 5. and continue working.
1270
1271
1272A recommended work cycle for an "individual developer" who does
1273not have a "public" repository is somewhat different. It goes
1274like this:
1275
12761. Prepare your work repository, by `git clone` the public
1277   repository of the "project lead" (or a "subsystem
1278   maintainer", if you work on a subsystem). The URL used for
1279   the initial cloning is stored in `.git/remotes/origin`.
1280
12812. Do your work in your repository on 'master' branch.
1282
12833. Run `git fetch origin` from the public repository of your
1284   upstream every once in a while. This does only the first
1285   half of `git pull` but does not merge. The head of the
1286   public repository is stored in `.git/refs/heads/origin`.
1287
12884. Use `git cherry origin` to see which ones of your patches
1289   were accepted, and/or use `git rebase origin` to port your
1290   unmerged changes forward to the updated upstream.
1291
12925. Use `git format-patch origin` to prepare patches for e-mail
1293   submission to your upstream and send it out. Go back to
1294   step 2. and continue.
1295
1296
1297Working with Others, Shared Repository Style
1298--------------------------------------------
1299
1300If you are coming from CVS background, the style of cooperation
1301suggested in the previous section may be new to you. You do not
1302have to worry. git supports "shared public repository" style of
1303cooperation you are probably more familiar with as well.
1304
1305For this, set up a public repository on a machine that is
1306reachable via SSH by people with "commit privileges".  Put the
1307committers in the same user group and make the repository
1308writable by that group.
1309
1310You, as an individual committer, then:
1311
1312- First clone the shared repository to a local repository:
1313------------------------------------------------
1314$ git clone repo.shared.xz:/pub/scm/project.git/ my-project
1315$ cd my-project
1316$ hack away
1317------------------------------------------------
1318
1319- Merge the work others might have done while you were hacking
1320  away:
1321------------------------------------------------
1322$ git pull origin
1323$ test the merge result
1324------------------------------------------------
1325[NOTE]
1326================================
1327The first `git clone` would have placed the following in
1328`my-project/.git/remotes/origin` file, and that's why this and
1329the next step work.
1330------------
1331URL: repo.shared.xz:/pub/scm/project.git/ my-project
1332Pull: master:origin
1333------------
1334================================
1335
1336- push your work as the new head of the shared
1337  repository.
1338------------------------------------------------
1339$ git push origin master
1340------------------------------------------------
1341If somebody else pushed into the same shared repository while
1342you were working locally, `git push` in the last step would
1343complain, telling you that the remote `master` head does not
1344fast forward.  You need to pull and merge those other changes
1345back before you push your work when it happens.
1346
1347
1348[ to be continued.. cvsimports ]