Documentation / git-fast-import.txton commit regex: use regexec_buf() (b7d36ff)
   1git-fast-import(1)
   2==================
   3
   4NAME
   5----
   6git-fast-import - Backend for fast Git data importers
   7
   8
   9SYNOPSIS
  10--------
  11[verse]
  12frontend | 'git fast-import' [options]
  13
  14DESCRIPTION
  15-----------
  16This program is usually not what the end user wants to run directly.
  17Most end users want to use one of the existing frontend programs,
  18which parses a specific type of foreign source and feeds the contents
  19stored there to 'git fast-import'.
  20
  21fast-import reads a mixed command/data stream from standard input and
  22writes one or more packfiles directly into the current repository.
  23When EOF is received on standard input, fast import writes out
  24updated branch and tag refs, fully updating the current repository
  25with the newly imported data.
  26
  27The fast-import backend itself can import into an empty repository (one that
  28has already been initialized by 'git init') or incrementally
  29update an existing populated repository.  Whether or not incremental
  30imports are supported from a particular foreign source depends on
  31the frontend program in use.
  32
  33
  34OPTIONS
  35-------
  36
  37--force::
  38        Force updating modified existing branches, even if doing
  39        so would cause commits to be lost (as the new commit does
  40        not contain the old commit).
  41
  42--quiet::
  43        Disable all non-fatal output, making fast-import silent when it
  44        is successful.  This option disables the output shown by
  45        --stats.
  46
  47--stats::
  48        Display some basic statistics about the objects fast-import has
  49        created, the packfiles they were stored into, and the
  50        memory used by fast-import during this run.  Showing this output
  51        is currently the default, but can be disabled with --quiet.
  52
  53Options for Frontends
  54~~~~~~~~~~~~~~~~~~~~~
  55
  56--cat-blob-fd=<fd>::
  57        Write responses to `get-mark`, `cat-blob`, and `ls` queries to the
  58        file descriptor <fd> instead of `stdout`.  Allows `progress`
  59        output intended for the end-user to be separated from other
  60        output.
  61
  62--date-format=<fmt>::
  63        Specify the type of dates the frontend will supply to
  64        fast-import within `author`, `committer` and `tagger` commands.
  65        See ``Date Formats'' below for details about which formats
  66        are supported, and their syntax.
  67
  68--done::
  69        Terminate with error if there is no `done` command at the end of
  70        the stream.  This option might be useful for detecting errors
  71        that cause the frontend to terminate before it has started to
  72        write a stream.
  73
  74Locations of Marks Files
  75~~~~~~~~~~~~~~~~~~~~~~~~
  76
  77--export-marks=<file>::
  78        Dumps the internal marks table to <file> when complete.
  79        Marks are written one per line as `:markid SHA-1`.
  80        Frontends can use this file to validate imports after they
  81        have been completed, or to save the marks table across
  82        incremental runs.  As <file> is only opened and truncated
  83        at checkpoint (or completion) the same path can also be
  84        safely given to --import-marks.
  85
  86--import-marks=<file>::
  87        Before processing any input, load the marks specified in
  88        <file>.  The input file must exist, must be readable, and
  89        must use the same format as produced by --export-marks.
  90        Multiple options may be supplied to import more than one
  91        set of marks.  If a mark is defined to different values,
  92        the last file wins.
  93
  94--import-marks-if-exists=<file>::
  95        Like --import-marks but instead of erroring out, silently
  96        skips the file if it does not exist.
  97
  98--[no-]relative-marks::
  99        After specifying --relative-marks the paths specified
 100        with --import-marks= and --export-marks= are relative
 101        to an internal directory in the current repository.
 102        In git-fast-import this means that the paths are relative
 103        to the .git/info/fast-import directory. However, other
 104        importers may use a different location.
 105+
 106Relative and non-relative marks may be combined by interweaving
 107--(no-)-relative-marks with the --(import|export)-marks= options.
 108
 109Performance and Compression Tuning
 110~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 111
 112--active-branches=<n>::
 113        Maximum number of branches to maintain active at once.
 114        See ``Memory Utilization'' below for details.  Default is 5.
 115
 116--big-file-threshold=<n>::
 117        Maximum size of a blob that fast-import will attempt to
 118        create a delta for, expressed in bytes.  The default is 512m
 119        (512 MiB).  Some importers may wish to lower this on systems
 120        with constrained memory.
 121
 122--depth=<n>::
 123        Maximum delta depth, for blob and tree deltification.
 124        Default is 10.
 125
 126--export-pack-edges=<file>::
 127        After creating a packfile, print a line of data to
 128        <file> listing the filename of the packfile and the last
 129        commit on each branch that was written to that packfile.
 130        This information may be useful after importing projects
 131        whose total object set exceeds the 4 GiB packfile limit,
 132        as these commits can be used as edge points during calls
 133        to 'git pack-objects'.
 134
 135--max-pack-size=<n>::
 136        Maximum size of each output packfile.
 137        The default is unlimited.
 138
 139
 140Performance
 141-----------
 142The design of fast-import allows it to import large projects in a minimum
 143amount of memory usage and processing time.  Assuming the frontend
 144is able to keep up with fast-import and feed it a constant stream of data,
 145import times for projects holding 10+ years of history and containing
 146100,000+ individual commits are generally completed in just 1-2
 147hours on quite modest (~$2,000 USD) hardware.
 148
 149Most bottlenecks appear to be in foreign source data access (the
 150source just cannot extract revisions fast enough) or disk IO (fast-import
 151writes as fast as the disk will take the data).  Imports will run
 152faster if the source data is stored on a different drive than the
 153destination Git repository (due to less IO contention).
 154
 155
 156Development Cost
 157----------------
 158A typical frontend for fast-import tends to weigh in at approximately 200
 159lines of Perl/Python/Ruby code.  Most developers have been able to
 160create working importers in just a couple of hours, even though it
 161is their first exposure to fast-import, and sometimes even to Git.  This is
 162an ideal situation, given that most conversion tools are throw-away
 163(use once, and never look back).
 164
 165
 166Parallel Operation
 167------------------
 168Like 'git push' or 'git fetch', imports handled by fast-import are safe to
 169run alongside parallel `git repack -a -d` or `git gc` invocations,
 170or any other Git operation (including 'git prune', as loose objects
 171are never used by fast-import).
 172
 173fast-import does not lock the branch or tag refs it is actively importing.
 174After the import, during its ref update phase, fast-import tests each
 175existing branch ref to verify the update will be a fast-forward
 176update (the commit stored in the ref is contained in the new
 177history of the commit to be written).  If the update is not a
 178fast-forward update, fast-import will skip updating that ref and instead
 179prints a warning message.  fast-import will always attempt to update all
 180branch refs, and does not stop on the first failure.
 181
 182Branch updates can be forced with --force, but it's recommended that
 183this only be used on an otherwise quiet repository.  Using --force
 184is not necessary for an initial import into an empty repository.
 185
 186
 187Technical Discussion
 188--------------------
 189fast-import tracks a set of branches in memory.  Any branch can be created
 190or modified at any point during the import process by sending a
 191`commit` command on the input stream.  This design allows a frontend
 192program to process an unlimited number of branches simultaneously,
 193generating commits in the order they are available from the source
 194data.  It also simplifies the frontend programs considerably.
 195
 196fast-import does not use or alter the current working directory, or any
 197file within it.  (It does however update the current Git repository,
 198as referenced by `GIT_DIR`.)  Therefore an import frontend may use
 199the working directory for its own purposes, such as extracting file
 200revisions from the foreign source.  This ignorance of the working
 201directory also allows fast-import to run very quickly, as it does not
 202need to perform any costly file update operations when switching
 203between branches.
 204
 205Input Format
 206------------
 207With the exception of raw file data (which Git does not interpret)
 208the fast-import input format is text (ASCII) based.  This text based
 209format simplifies development and debugging of frontend programs,
 210especially when a higher level language such as Perl, Python or
 211Ruby is being used.
 212
 213fast-import is very strict about its input.  Where we say SP below we mean
 214*exactly* one space.  Likewise LF means one (and only one) linefeed
 215and HT one (and only one) horizontal tab.
 216Supplying additional whitespace characters will cause unexpected
 217results, such as branch names or file names with leading or trailing
 218spaces in their name, or early termination of fast-import when it encounters
 219unexpected input.
 220
 221Stream Comments
 222~~~~~~~~~~~~~~~
 223To aid in debugging frontends fast-import ignores any line that
 224begins with `#` (ASCII pound/hash) up to and including the line
 225ending `LF`.  A comment line may contain any sequence of bytes
 226that does not contain an LF and therefore may be used to include
 227any detailed debugging information that might be specific to the
 228frontend and useful when inspecting a fast-import data stream.
 229
 230Date Formats
 231~~~~~~~~~~~~
 232The following date formats are supported.  A frontend should select
 233the format it will use for this import by passing the format name
 234in the --date-format=<fmt> command-line option.
 235
 236`raw`::
 237        This is the Git native format and is `<time> SP <offutc>`.
 238        It is also fast-import's default format, if --date-format was
 239        not specified.
 240+
 241The time of the event is specified by `<time>` as the number of
 242seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
 243written as an ASCII decimal integer.
 244+
 245The local offset is specified by `<offutc>` as a positive or negative
 246offset from UTC.  For example EST (which is 5 hours behind UTC)
 247would be expressed in `<tz>` by ``-0500'' while UTC is ``+0000''.
 248The local offset does not affect `<time>`; it is used only as an
 249advisement to help formatting routines display the timestamp.
 250+
 251If the local offset is not available in the source material, use
 252``+0000'', or the most common local offset.  For example many
 253organizations have a CVS repository which has only ever been accessed
 254by users who are located in the same location and time zone.  In this
 255case a reasonable offset from UTC could be assumed.
 256+
 257Unlike the `rfc2822` format, this format is very strict.  Any
 258variation in formatting will cause fast-import to reject the value.
 259
 260`rfc2822`::
 261        This is the standard email format as described by RFC 2822.
 262+
 263An example value is ``Tue Feb 6 11:22:18 2007 -0500''.  The Git
 264parser is accurate, but a little on the lenient side.  It is the
 265same parser used by 'git am' when applying patches
 266received from email.
 267+
 268Some malformed strings may be accepted as valid dates.  In some of
 269these cases Git will still be able to obtain the correct date from
 270the malformed string.  There are also some types of malformed
 271strings which Git will parse wrong, and yet consider valid.
 272Seriously malformed strings will be rejected.
 273+
 274Unlike the `raw` format above, the time zone/UTC offset information
 275contained in an RFC 2822 date string is used to adjust the date
 276value to UTC prior to storage.  Therefore it is important that
 277this information be as accurate as possible.
 278+
 279If the source material uses RFC 2822 style dates,
 280the frontend should let fast-import handle the parsing and conversion
 281(rather than attempting to do it itself) as the Git parser has
 282been well tested in the wild.
 283+
 284Frontends should prefer the `raw` format if the source material
 285already uses UNIX-epoch format, can be coaxed to give dates in that
 286format, or its format is easily convertible to it, as there is no
 287ambiguity in parsing.
 288
 289`now`::
 290        Always use the current time and time zone.  The literal
 291        `now` must always be supplied for `<when>`.
 292+
 293This is a toy format.  The current time and time zone of this system
 294is always copied into the identity string at the time it is being
 295created by fast-import.  There is no way to specify a different time or
 296time zone.
 297+
 298This particular format is supplied as it's short to implement and
 299may be useful to a process that wants to create a new commit
 300right now, without needing to use a working directory or
 301'git update-index'.
 302+
 303If separate `author` and `committer` commands are used in a `commit`
 304the timestamps may not match, as the system clock will be polled
 305twice (once for each command).  The only way to ensure that both
 306author and committer identity information has the same timestamp
 307is to omit `author` (thus copying from `committer`) or to use a
 308date format other than `now`.
 309
 310Commands
 311~~~~~~~~
 312fast-import accepts several commands to update the current repository
 313and control the current import process.  More detailed discussion
 314(with examples) of each command follows later.
 315
 316`commit`::
 317        Creates a new branch or updates an existing branch by
 318        creating a new commit and updating the branch to point at
 319        the newly created commit.
 320
 321`tag`::
 322        Creates an annotated tag object from an existing commit or
 323        branch.  Lightweight tags are not supported by this command,
 324        as they are not recommended for recording meaningful points
 325        in time.
 326
 327`reset`::
 328        Reset an existing branch (or a new branch) to a specific
 329        revision.  This command must be used to change a branch to
 330        a specific revision without making a commit on it.
 331
 332`blob`::
 333        Convert raw file data into a blob, for future use in a
 334        `commit` command.  This command is optional and is not
 335        needed to perform an import.
 336
 337`checkpoint`::
 338        Forces fast-import to close the current packfile, generate its
 339        unique SHA-1 checksum and index, and start a new packfile.
 340        This command is optional and is not needed to perform
 341        an import.
 342
 343`progress`::
 344        Causes fast-import to echo the entire line to its own
 345        standard output.  This command is optional and is not needed
 346        to perform an import.
 347
 348`done`::
 349        Marks the end of the stream. This command is optional
 350        unless the `done` feature was requested using the
 351        `--done` command-line option or `feature done` command.
 352
 353`get-mark`::
 354        Causes fast-import to print the SHA-1 corresponding to a mark
 355        to the file descriptor set with `--cat-blob-fd`, or `stdout` if
 356        unspecified.
 357
 358`cat-blob`::
 359        Causes fast-import to print a blob in 'cat-file --batch'
 360        format to the file descriptor set with `--cat-blob-fd` or
 361        `stdout` if unspecified.
 362
 363`ls`::
 364        Causes fast-import to print a line describing a directory
 365        entry in 'ls-tree' format to the file descriptor set with
 366        `--cat-blob-fd` or `stdout` if unspecified.
 367
 368`feature`::
 369        Enable the specified feature. This requires that fast-import
 370        supports the specified feature, and aborts if it does not.
 371
 372`option`::
 373        Specify any of the options listed under OPTIONS that do not
 374        change stream semantic to suit the frontend's needs. This
 375        command is optional and is not needed to perform an import.
 376
 377`commit`
 378~~~~~~~~
 379Create or update a branch with a new commit, recording one logical
 380change to the project.
 381
 382....
 383        'commit' SP <ref> LF
 384        mark?
 385        ('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
 386        'committer' (SP <name>)? SP LT <email> GT SP <when> LF
 387        data
 388        ('from' SP <commit-ish> LF)?
 389        ('merge' SP <commit-ish> LF)?
 390        (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
 391        LF?
 392....
 393
 394where `<ref>` is the name of the branch to make the commit on.
 395Typically branch names are prefixed with `refs/heads/` in
 396Git, so importing the CVS branch symbol `RELENG-1_0` would use
 397`refs/heads/RELENG-1_0` for the value of `<ref>`.  The value of
 398`<ref>` must be a valid refname in Git.  As `LF` is not valid in
 399a Git refname, no quoting or escaping syntax is supported here.
 400
 401A `mark` command may optionally appear, requesting fast-import to save a
 402reference to the newly created commit for future use by the frontend
 403(see below for format).  It is very common for frontends to mark
 404every commit they create, thereby allowing future branch creation
 405from any imported commit.
 406
 407The `data` command following `committer` must supply the commit
 408message (see below for `data` command syntax).  To import an empty
 409commit message use a 0 length data.  Commit messages are free-form
 410and are not interpreted by Git.  Currently they must be encoded in
 411UTF-8, as fast-import does not permit other encodings to be specified.
 412
 413Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
 414`filedeleteall` and `notemodify` commands
 415may be included to update the contents of the branch prior to
 416creating the commit.  These commands may be supplied in any order.
 417However it is recommended that a `filedeleteall` command precede
 418all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
 419the same commit, as `filedeleteall` wipes the branch clean (see below).
 420
 421The `LF` after the command is optional (it used to be required).
 422
 423`author`
 424^^^^^^^^
 425An `author` command may optionally appear, if the author information
 426might differ from the committer information.  If `author` is omitted
 427then fast-import will automatically use the committer's information for
 428the author portion of the commit.  See below for a description of
 429the fields in `author`, as they are identical to `committer`.
 430
 431`committer`
 432^^^^^^^^^^^
 433The `committer` command indicates who made this commit, and when
 434they made it.
 435
 436Here `<name>` is the person's display name (for example
 437``Com M Itter'') and `<email>` is the person's email address
 438(``\cm@example.com'').  `LT` and `GT` are the literal less-than (\x3c)
 439and greater-than (\x3e) symbols.  These are required to delimit
 440the email address from the other fields in the line.  Note that
 441`<name>` and `<email>` are free-form and may contain any sequence
 442of bytes, except `LT`, `GT` and `LF`.  `<name>` is typically UTF-8 encoded.
 443
 444The time of the change is specified by `<when>` using the date format
 445that was selected by the --date-format=<fmt> command-line option.
 446See ``Date Formats'' above for the set of supported formats, and
 447their syntax.
 448
 449`from`
 450^^^^^^
 451The `from` command is used to specify the commit to initialize
 452this branch from.  This revision will be the first ancestor of the
 453new commit.  The state of the tree built at this commit will begin
 454with the state at the `from` commit, and be altered by the content
 455modifications in this commit.
 456
 457Omitting the `from` command in the first commit of a new branch
 458will cause fast-import to create that commit with no ancestor. This
 459tends to be desired only for the initial commit of a project.
 460If the frontend creates all files from scratch when making a new
 461branch, a `merge` command may be used instead of `from` to start
 462the commit with an empty tree.
 463Omitting the `from` command on existing branches is usually desired,
 464as the current commit on that branch is automatically assumed to
 465be the first ancestor of the new commit.
 466
 467As `LF` is not valid in a Git refname or SHA-1 expression, no
 468quoting or escaping syntax is supported within `<commit-ish>`.
 469
 470Here `<commit-ish>` is any of the following:
 471
 472* The name of an existing branch already in fast-import's internal branch
 473  table.  If fast-import doesn't know the name, it's treated as a SHA-1
 474  expression.
 475
 476* A mark reference, `:<idnum>`, where `<idnum>` is the mark number.
 477+
 478The reason fast-import uses `:` to denote a mark reference is this character
 479is not legal in a Git branch name.  The leading `:` makes it easy
 480to distinguish between the mark 42 (`:42`) and the branch 42 (`42`
 481or `refs/heads/42`), or an abbreviated SHA-1 which happened to
 482consist only of base-10 digits.
 483+
 484Marks must be declared (via `mark`) before they can be used.
 485
 486* A complete 40 byte or abbreviated commit SHA-1 in hex.
 487
 488* Any valid Git SHA-1 expression that resolves to a commit.  See
 489  ``SPECIFYING REVISIONS'' in linkgit:gitrevisions[7] for details.
 490
 491* The special null SHA-1 (40 zeros) specifies that the branch is to be
 492  removed.
 493
 494The special case of restarting an incremental import from the
 495current branch value should be written as:
 496----
 497        from refs/heads/branch^0
 498----
 499The `^0` suffix is necessary as fast-import does not permit a branch to
 500start from itself, and the branch is created in memory before the
 501`from` command is even read from the input.  Adding `^0` will force
 502fast-import to resolve the commit through Git's revision parsing library,
 503rather than its internal branch table, thereby loading in the
 504existing value of the branch.
 505
 506`merge`
 507^^^^^^^
 508Includes one additional ancestor commit.  The additional ancestry
 509link does not change the way the tree state is built at this commit.
 510If the `from` command is
 511omitted when creating a new branch, the first `merge` commit will be
 512the first ancestor of the current commit, and the branch will start
 513out with no files.  An unlimited number of `merge` commands per
 514commit are permitted by fast-import, thereby establishing an n-way merge.
 515
 516Here `<commit-ish>` is any of the commit specification expressions
 517also accepted by `from` (see above).
 518
 519`filemodify`
 520^^^^^^^^^^^^
 521Included in a `commit` command to add a new file or change the
 522content of an existing file.  This command has two different means
 523of specifying the content of the file.
 524
 525External data format::
 526        The data content for the file was already supplied by a prior
 527        `blob` command.  The frontend just needs to connect it.
 528+
 529....
 530        'M' SP <mode> SP <dataref> SP <path> LF
 531....
 532+
 533Here usually `<dataref>` must be either a mark reference (`:<idnum>`)
 534set by a prior `blob` command, or a full 40-byte SHA-1 of an
 535existing Git blob object.  If `<mode>` is `040000`` then
 536`<dataref>` must be the full 40-byte SHA-1 of an existing
 537Git tree object or a mark reference set with `--import-marks`.
 538
 539Inline data format::
 540        The data content for the file has not been supplied yet.
 541        The frontend wants to supply it as part of this modify
 542        command.
 543+
 544....
 545        'M' SP <mode> SP 'inline' SP <path> LF
 546        data
 547....
 548+
 549See below for a detailed description of the `data` command.
 550
 551In both formats `<mode>` is the type of file entry, specified
 552in octal.  Git only supports the following modes:
 553
 554* `100644` or `644`: A normal (not-executable) file.  The majority
 555  of files in most projects use this mode.  If in doubt, this is
 556  what you want.
 557* `100755` or `755`: A normal, but executable, file.
 558* `120000`: A symlink, the content of the file will be the link target.
 559* `160000`: A gitlink, SHA-1 of the object refers to a commit in
 560  another repository. Git links can only be specified by SHA or through
 561  a commit mark. They are used to implement submodules.
 562* `040000`: A subdirectory.  Subdirectories can only be specified by
 563  SHA or through a tree mark set with `--import-marks`.
 564
 565In both formats `<path>` is the complete path of the file to be added
 566(if not already existing) or modified (if already existing).
 567
 568A `<path>` string must use UNIX-style directory separators (forward
 569slash `/`), may contain any byte other than `LF`, and must not
 570start with double quote (`"`).
 571
 572A path can use C-style string quoting; this is accepted in all cases
 573and mandatory if the filename starts with double quote or contains
 574`LF`. In C-style quoting, the complete name should be surrounded with
 575double quotes, and any `LF`, backslash, or double quote characters
 576must be escaped by preceding them with a backslash (e.g.,
 577`"path/with\n, \\ and \" in it"`).
 578
 579The value of `<path>` must be in canonical form. That is it must not:
 580
 581* contain an empty directory component (e.g. `foo//bar` is invalid),
 582* end with a directory separator (e.g. `foo/` is invalid),
 583* start with a directory separator (e.g. `/foo` is invalid),
 584* contain the special component `.` or `..` (e.g. `foo/./bar` and
 585  `foo/../bar` are invalid).
 586
 587The root of the tree can be represented by an empty string as `<path>`.
 588
 589It is recommended that `<path>` always be encoded using UTF-8.
 590
 591`filedelete`
 592^^^^^^^^^^^^
 593Included in a `commit` command to remove a file or recursively
 594delete an entire directory from the branch.  If the file or directory
 595removal makes its parent directory empty, the parent directory will
 596be automatically removed too.  This cascades up the tree until the
 597first non-empty directory or the root is reached.
 598
 599....
 600        'D' SP <path> LF
 601....
 602
 603here `<path>` is the complete path of the file or subdirectory to
 604be removed from the branch.
 605See `filemodify` above for a detailed description of `<path>`.
 606
 607`filecopy`
 608^^^^^^^^^^
 609Recursively copies an existing file or subdirectory to a different
 610location within the branch.  The existing file or directory must
 611exist.  If the destination exists it will be completely replaced
 612by the content copied from the source.
 613
 614....
 615        'C' SP <path> SP <path> LF
 616....
 617
 618here the first `<path>` is the source location and the second
 619`<path>` is the destination.  See `filemodify` above for a detailed
 620description of what `<path>` may look like.  To use a source path
 621that contains SP the path must be quoted.
 622
 623A `filecopy` command takes effect immediately.  Once the source
 624location has been copied to the destination any future commands
 625applied to the source location will not impact the destination of
 626the copy.
 627
 628`filerename`
 629^^^^^^^^^^^^
 630Renames an existing file or subdirectory to a different location
 631within the branch.  The existing file or directory must exist. If
 632the destination exists it will be replaced by the source directory.
 633
 634....
 635        'R' SP <path> SP <path> LF
 636....
 637
 638here the first `<path>` is the source location and the second
 639`<path>` is the destination.  See `filemodify` above for a detailed
 640description of what `<path>` may look like.  To use a source path
 641that contains SP the path must be quoted.
 642
 643A `filerename` command takes effect immediately.  Once the source
 644location has been renamed to the destination any future commands
 645applied to the source location will create new files there and not
 646impact the destination of the rename.
 647
 648Note that a `filerename` is the same as a `filecopy` followed by a
 649`filedelete` of the source location.  There is a slight performance
 650advantage to using `filerename`, but the advantage is so small
 651that it is never worth trying to convert a delete/add pair in
 652source material into a rename for fast-import.  This `filerename`
 653command is provided just to simplify frontends that already have
 654rename information and don't want bother with decomposing it into a
 655`filecopy` followed by a `filedelete`.
 656
 657`filedeleteall`
 658^^^^^^^^^^^^^^^
 659Included in a `commit` command to remove all files (and also all
 660directories) from the branch.  This command resets the internal
 661branch structure to have no files in it, allowing the frontend
 662to subsequently add all interesting files from scratch.
 663
 664....
 665        'deleteall' LF
 666....
 667
 668This command is extremely useful if the frontend does not know
 669(or does not care to know) what files are currently on the branch,
 670and therefore cannot generate the proper `filedelete` commands to
 671update the content.
 672
 673Issuing a `filedeleteall` followed by the needed `filemodify`
 674commands to set the correct content will produce the same results
 675as sending only the needed `filemodify` and `filedelete` commands.
 676The `filedeleteall` approach may however require fast-import to use slightly
 677more memory per active branch (less than 1 MiB for even most large
 678projects); so frontends that can easily obtain only the affected
 679paths for a commit are encouraged to do so.
 680
 681`notemodify`
 682^^^^^^^^^^^^
 683Included in a `commit` `<notes_ref>` command to add a new note
 684annotating a `<commit-ish>` or change this annotation contents.
 685Internally it is similar to filemodify 100644 on `<commit-ish>`
 686path (maybe split into subdirectories). It's not advised to
 687use any other commands to write to the `<notes_ref>` tree except
 688`filedeleteall` to delete all existing notes in this tree.
 689This command has two different means of specifying the content
 690of the note.
 691
 692External data format::
 693        The data content for the note was already supplied by a prior
 694        `blob` command.  The frontend just needs to connect it to the
 695        commit that is to be annotated.
 696+
 697....
 698        'N' SP <dataref> SP <commit-ish> LF
 699....
 700+
 701Here `<dataref>` can be either a mark reference (`:<idnum>`)
 702set by a prior `blob` command, or a full 40-byte SHA-1 of an
 703existing Git blob object.
 704
 705Inline data format::
 706        The data content for the note has not been supplied yet.
 707        The frontend wants to supply it as part of this modify
 708        command.
 709+
 710....
 711        'N' SP 'inline' SP <commit-ish> LF
 712        data
 713....
 714+
 715See below for a detailed description of the `data` command.
 716
 717In both formats `<commit-ish>` is any of the commit specification
 718expressions also accepted by `from` (see above).
 719
 720`mark`
 721~~~~~~
 722Arranges for fast-import to save a reference to the current object, allowing
 723the frontend to recall this object at a future point in time, without
 724knowing its SHA-1.  Here the current object is the object creation
 725command the `mark` command appears within.  This can be `commit`,
 726`tag`, and `blob`, but `commit` is the most common usage.
 727
 728....
 729        'mark' SP ':' <idnum> LF
 730....
 731
 732where `<idnum>` is the number assigned by the frontend to this mark.
 733The value of `<idnum>` is expressed as an ASCII decimal integer.
 734The value 0 is reserved and cannot be used as
 735a mark.  Only values greater than or equal to 1 may be used as marks.
 736
 737New marks are created automatically.  Existing marks can be moved
 738to another object simply by reusing the same `<idnum>` in another
 739`mark` command.
 740
 741`tag`
 742~~~~~
 743Creates an annotated tag referring to a specific commit.  To create
 744lightweight (non-annotated) tags see the `reset` command below.
 745
 746....
 747        'tag' SP <name> LF
 748        'from' SP <commit-ish> LF
 749        'tagger' (SP <name>)? SP LT <email> GT SP <when> LF
 750        data
 751....
 752
 753where `<name>` is the name of the tag to create.
 754
 755Tag names are automatically prefixed with `refs/tags/` when stored
 756in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
 757use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
 758corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
 759
 760The value of `<name>` must be a valid refname in Git and therefore
 761may contain forward slashes.  As `LF` is not valid in a Git refname,
 762no quoting or escaping syntax is supported here.
 763
 764The `from` command is the same as in the `commit` command; see
 765above for details.
 766
 767The `tagger` command uses the same format as `committer` within
 768`commit`; again see above for details.
 769
 770The `data` command following `tagger` must supply the annotated tag
 771message (see below for `data` command syntax).  To import an empty
 772tag message use a 0 length data.  Tag messages are free-form and are
 773not interpreted by Git.  Currently they must be encoded in UTF-8,
 774as fast-import does not permit other encodings to be specified.
 775
 776Signing annotated tags during import from within fast-import is not
 777supported.  Trying to include your own PGP/GPG signature is not
 778recommended, as the frontend does not (easily) have access to the
 779complete set of bytes which normally goes into such a signature.
 780If signing is required, create lightweight tags from within fast-import with
 781`reset`, then create the annotated versions of those tags offline
 782with the standard 'git tag' process.
 783
 784`reset`
 785~~~~~~~
 786Creates (or recreates) the named branch, optionally starting from
 787a specific revision.  The reset command allows a frontend to issue
 788a new `from` command for an existing branch, or to create a new
 789branch from an existing commit without creating a new commit.
 790
 791....
 792        'reset' SP <ref> LF
 793        ('from' SP <commit-ish> LF)?
 794        LF?
 795....
 796
 797For a detailed description of `<ref>` and `<commit-ish>` see above
 798under `commit` and `from`.
 799
 800The `LF` after the command is optional (it used to be required).
 801
 802The `reset` command can also be used to create lightweight
 803(non-annotated) tags.  For example:
 804
 805====
 806        reset refs/tags/938
 807        from :938
 808====
 809
 810would create the lightweight tag `refs/tags/938` referring to
 811whatever commit mark `:938` references.
 812
 813`blob`
 814~~~~~~
 815Requests writing one file revision to the packfile.  The revision
 816is not connected to any commit; this connection must be formed in
 817a subsequent `commit` command by referencing the blob through an
 818assigned mark.
 819
 820....
 821        'blob' LF
 822        mark?
 823        data
 824....
 825
 826The mark command is optional here as some frontends have chosen
 827to generate the Git SHA-1 for the blob on their own, and feed that
 828directly to `commit`.  This is typically more work than it's worth
 829however, as marks are inexpensive to store and easy to use.
 830
 831`data`
 832~~~~~~
 833Supplies raw data (for use as blob/file content, commit messages, or
 834annotated tag messages) to fast-import.  Data can be supplied using an exact
 835byte count or delimited with a terminating line.  Real frontends
 836intended for production-quality conversions should always use the
 837exact byte count format, as it is more robust and performs better.
 838The delimited format is intended primarily for testing fast-import.
 839
 840Comment lines appearing within the `<raw>` part of `data` commands
 841are always taken to be part of the body of the data and are therefore
 842never ignored by fast-import.  This makes it safe to import any
 843file/message content whose lines might start with `#`.
 844
 845Exact byte count format::
 846        The frontend must specify the number of bytes of data.
 847+
 848....
 849        'data' SP <count> LF
 850        <raw> LF?
 851....
 852+
 853where `<count>` is the exact number of bytes appearing within
 854`<raw>`.  The value of `<count>` is expressed as an ASCII decimal
 855integer.  The `LF` on either side of `<raw>` is not
 856included in `<count>` and will not be included in the imported data.
 857+
 858The `LF` after `<raw>` is optional (it used to be required) but
 859recommended.  Always including it makes debugging a fast-import
 860stream easier as the next command always starts in column 0
 861of the next line, even if `<raw>` did not end with an `LF`.
 862
 863Delimited format::
 864        A delimiter string is used to mark the end of the data.
 865        fast-import will compute the length by searching for the delimiter.
 866        This format is primarily useful for testing and is not
 867        recommended for real data.
 868+
 869....
 870        'data' SP '<<' <delim> LF
 871        <raw> LF
 872        <delim> LF
 873        LF?
 874....
 875+
 876where `<delim>` is the chosen delimiter string.  The string `<delim>`
 877must not appear on a line by itself within `<raw>`, as otherwise
 878fast-import will think the data ends earlier than it really does.  The `LF`
 879immediately trailing `<raw>` is part of `<raw>`.  This is one of
 880the limitations of the delimited format, it is impossible to supply
 881a data chunk which does not have an LF as its last byte.
 882+
 883The `LF` after `<delim> LF` is optional (it used to be required).
 884
 885`checkpoint`
 886~~~~~~~~~~~~
 887Forces fast-import to close the current packfile, start a new one, and to
 888save out all current branch refs, tags and marks.
 889
 890....
 891        'checkpoint' LF
 892        LF?
 893....
 894
 895Note that fast-import automatically switches packfiles when the current
 896packfile reaches --max-pack-size, or 4 GiB, whichever limit is
 897smaller.  During an automatic packfile switch fast-import does not update
 898the branch refs, tags or marks.
 899
 900As a `checkpoint` can require a significant amount of CPU time and
 901disk IO (to compute the overall pack SHA-1 checksum, generate the
 902corresponding index file, and update the refs) it can easily take
 903several minutes for a single `checkpoint` command to complete.
 904
 905Frontends may choose to issue checkpoints during extremely large
 906and long running imports, or when they need to allow another Git
 907process access to a branch.  However given that a 30 GiB Subversion
 908repository can be loaded into Git through fast-import in about 3 hours,
 909explicit checkpointing may not be necessary.
 910
 911The `LF` after the command is optional (it used to be required).
 912
 913`progress`
 914~~~~~~~~~~
 915Causes fast-import to print the entire `progress` line unmodified to
 916its standard output channel (file descriptor 1) when the command is
 917processed from the input stream.  The command otherwise has no impact
 918on the current import, or on any of fast-import's internal state.
 919
 920....
 921        'progress' SP <any> LF
 922        LF?
 923....
 924
 925The `<any>` part of the command may contain any sequence of bytes
 926that does not contain `LF`.  The `LF` after the command is optional.
 927Callers may wish to process the output through a tool such as sed to
 928remove the leading part of the line, for example:
 929
 930====
 931        frontend | git fast-import | sed 's/^progress //'
 932====
 933
 934Placing a `progress` command immediately after a `checkpoint` will
 935inform the reader when the `checkpoint` has been completed and it
 936can safely access the refs that fast-import updated.
 937
 938`get-mark`
 939~~~~~~~~~~
 940Causes fast-import to print the SHA-1 corresponding to a mark to
 941stdout or to the file descriptor previously arranged with the
 942`--cat-blob-fd` argument. The command otherwise has no impact on the
 943current import; its purpose is to retrieve SHA-1s that later commits
 944might want to refer to in their commit messages.
 945
 946....
 947        'get-mark' SP ':' <idnum> LF
 948....
 949
 950This command can be used anywhere in the stream that comments are
 951accepted.  In particular, the `get-mark` command can be used in the
 952middle of a commit but not in the middle of a `data` command.
 953
 954See ``Responses To Commands'' below for details about how to read
 955this output safely.
 956
 957`cat-blob`
 958~~~~~~~~~~
 959Causes fast-import to print a blob to a file descriptor previously
 960arranged with the `--cat-blob-fd` argument.  The command otherwise
 961has no impact on the current import; its main purpose is to
 962retrieve blobs that may be in fast-import's memory but not
 963accessible from the target repository.
 964
 965....
 966        'cat-blob' SP <dataref> LF
 967....
 968
 969The `<dataref>` can be either a mark reference (`:<idnum>`)
 970set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
 971ready to be written.
 972
 973Output uses the same format as `git cat-file --batch`:
 974
 975====
 976        <sha1> SP 'blob' SP <size> LF
 977        <contents> LF
 978====
 979
 980This command can be used anywhere in the stream that comments are
 981accepted.  In particular, the `cat-blob` command can be used in the
 982middle of a commit but not in the middle of a `data` command.
 983
 984See ``Responses To Commands'' below for details about how to read
 985this output safely.
 986
 987`ls`
 988~~~~
 989Prints information about the object at a path to a file descriptor
 990previously arranged with the `--cat-blob-fd` argument.  This allows
 991printing a blob from the active commit (with `cat-blob`) or copying a
 992blob or tree from a previous commit for use in the current one (with
 993`filemodify`).
 994
 995The `ls` command can be used anywhere in the stream that comments are
 996accepted, including the middle of a commit.
 997
 998Reading from the active commit::
 999        This form can only be used in the middle of a `commit`.
1000        The path names a directory entry within fast-import's
1001        active commit.  The path must be quoted in this case.
1002+
1003....
1004        'ls' SP <path> LF
1005....
1006
1007Reading from a named tree::
1008        The `<dataref>` can be a mark reference (`:<idnum>`) or the
1009        full 40-byte SHA-1 of a Git tag, commit, or tree object,
1010        preexisting or waiting to be written.
1011        The path is relative to the top level of the tree
1012        named by `<dataref>`.
1013+
1014....
1015        'ls' SP <dataref> SP <path> LF
1016....
1017
1018See `filemodify` above for a detailed description of `<path>`.
1019
1020Output uses the same format as `git ls-tree <tree> -- <path>`:
1021
1022====
1023        <mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF
1024====
1025
1026The <dataref> represents the blob, tree, or commit object at <path>
1027and can be used in later 'get-mark', 'cat-blob', 'filemodify', or
1028'ls' commands.
1029
1030If there is no file or subtree at that path, 'git fast-import' will
1031instead report
1032
1033====
1034        missing SP <path> LF
1035====
1036
1037See ``Responses To Commands'' below for details about how to read
1038this output safely.
1039
1040`feature`
1041~~~~~~~~~
1042Require that fast-import supports the specified feature, or abort if
1043it does not.
1044
1045....
1046        'feature' SP <feature> ('=' <argument>)? LF
1047....
1048
1049The <feature> part of the command may be any one of the following:
1050
1051date-format::
1052export-marks::
1053relative-marks::
1054no-relative-marks::
1055force::
1056        Act as though the corresponding command-line option with
1057        a leading '--' was passed on the command line
1058        (see OPTIONS, above).
1059
1060import-marks::
1061import-marks-if-exists::
1062        Like --import-marks except in two respects: first, only one
1063        "feature import-marks" or "feature import-marks-if-exists"
1064        command is allowed per stream; second, an --import-marks=
1065        or --import-marks-if-exists command-line option overrides
1066        any of these "feature" commands in the stream; third,
1067        "feature import-marks-if-exists" like a corresponding
1068        command-line option silently skips a nonexistent file.
1069
1070get-mark::
1071cat-blob::
1072ls::
1073        Require that the backend support the 'get-mark', 'cat-blob',
1074        or 'ls' command respectively.
1075        Versions of fast-import not supporting the specified command
1076        will exit with a message indicating so.
1077        This lets the import error out early with a clear message,
1078        rather than wasting time on the early part of an import
1079        before the unsupported command is detected.
1080
1081notes::
1082        Require that the backend support the 'notemodify' (N)
1083        subcommand to the 'commit' command.
1084        Versions of fast-import not supporting notes will exit
1085        with a message indicating so.
1086
1087done::
1088        Error out if the stream ends without a 'done' command.
1089        Without this feature, errors causing the frontend to end
1090        abruptly at a convenient point in the stream can go
1091        undetected.  This may occur, for example, if an import
1092        front end dies in mid-operation without emitting SIGTERM
1093        or SIGKILL at its subordinate git fast-import instance.
1094
1095`option`
1096~~~~~~~~
1097Processes the specified option so that git fast-import behaves in a
1098way that suits the frontend's needs.
1099Note that options specified by the frontend are overridden by any
1100options the user may specify to git fast-import itself.
1101
1102....
1103    'option' SP <option> LF
1104....
1105
1106The `<option>` part of the command may contain any of the options
1107listed in the OPTIONS section that do not change import semantics,
1108without the leading '--' and is treated in the same way.
1109
1110Option commands must be the first commands on the input (not counting
1111feature commands), to give an option command after any non-option
1112command is an error.
1113
1114The following command-line options change import semantics and may therefore
1115not be passed as option:
1116
1117* date-format
1118* import-marks
1119* export-marks
1120* cat-blob-fd
1121* force
1122
1123`done`
1124~~~~~~
1125If the `done` feature is not in use, treated as if EOF was read.
1126This can be used to tell fast-import to finish early.
1127
1128If the `--done` command-line option or `feature done` command is
1129in use, the `done` command is mandatory and marks the end of the
1130stream.
1131
1132Responses To Commands
1133---------------------
1134New objects written by fast-import are not available immediately.
1135Most fast-import commands have no visible effect until the next
1136checkpoint (or completion).  The frontend can send commands to
1137fill fast-import's input pipe without worrying about how quickly
1138they will take effect, which improves performance by simplifying
1139scheduling.
1140
1141For some frontends, though, it is useful to be able to read back
1142data from the current repository as it is being updated (for
1143example when the source material describes objects in terms of
1144patches to be applied to previously imported objects).  This can
1145be accomplished by connecting the frontend and fast-import via
1146bidirectional pipes:
1147
1148====
1149        mkfifo fast-import-output
1150        frontend <fast-import-output |
1151        git fast-import >fast-import-output
1152====
1153
1154A frontend set up this way can use `progress`, `get-mark`, `ls`, and
1155`cat-blob` commands to read information from the import in progress.
1156
1157To avoid deadlock, such frontends must completely consume any
1158pending output from `progress`, `ls`, `get-mark`, and `cat-blob` before
1159performing writes to fast-import that might block.
1160
1161Crash Reports
1162-------------
1163If fast-import is supplied invalid input it will terminate with a
1164non-zero exit status and create a crash report in the top level of
1165the Git repository it was importing into.  Crash reports contain
1166a snapshot of the internal fast-import state as well as the most
1167recent commands that lead up to the crash.
1168
1169All recent commands (including stream comments, file changes and
1170progress commands) are shown in the command history within the crash
1171report, but raw file data and commit messages are excluded from the
1172crash report.  This exclusion saves space within the report file
1173and reduces the amount of buffering that fast-import must perform
1174during execution.
1175
1176After writing a crash report fast-import will close the current
1177packfile and export the marks table.  This allows the frontend
1178developer to inspect the repository state and resume the import from
1179the point where it crashed.  The modified branches and tags are not
1180updated during a crash, as the import did not complete successfully.
1181Branch and tag information can be found in the crash report and
1182must be applied manually if the update is needed.
1183
1184An example crash:
1185
1186====
1187        $ cat >in <<END_OF_INPUT
1188        # my very first test commit
1189        commit refs/heads/master
1190        committer Shawn O. Pearce <spearce> 19283 -0400
1191        # who is that guy anyway?
1192        data <<EOF
1193        this is my commit
1194        EOF
1195        M 644 inline .gitignore
1196        data <<EOF
1197        .gitignore
1198        EOF
1199        M 777 inline bob
1200        END_OF_INPUT
1201
1202        $ git fast-import <in
1203        fatal: Corrupt mode: M 777 inline bob
1204        fast-import: dumping crash report to .git/fast_import_crash_8434
1205
1206        $ cat .git/fast_import_crash_8434
1207        fast-import crash report:
1208            fast-import process: 8434
1209            parent process     : 1391
1210            at Sat Sep 1 00:58:12 2007
1211
1212        fatal: Corrupt mode: M 777 inline bob
1213
1214        Most Recent Commands Before Crash
1215        ---------------------------------
1216          # my very first test commit
1217          commit refs/heads/master
1218          committer Shawn O. Pearce <spearce> 19283 -0400
1219          # who is that guy anyway?
1220          data <<EOF
1221          M 644 inline .gitignore
1222          data <<EOF
1223        * M 777 inline bob
1224
1225        Active Branch LRU
1226        -----------------
1227            active_branches = 1 cur, 5 max
1228
1229          pos  clock name
1230          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1231           1)      0 refs/heads/master
1232
1233        Inactive Branches
1234        -----------------
1235        refs/heads/master:
1236          status      : active loaded dirty
1237          tip commit  : 0000000000000000000000000000000000000000
1238          old tree    : 0000000000000000000000000000000000000000
1239          cur tree    : 0000000000000000000000000000000000000000
1240          commit clock: 0
1241          last pack   :
1242
1243
1244        -------------------
1245        END OF CRASH REPORT
1246====
1247
1248Tips and Tricks
1249---------------
1250The following tips and tricks have been collected from various
1251users of fast-import, and are offered here as suggestions.
1252
1253Use One Mark Per Commit
1254~~~~~~~~~~~~~~~~~~~~~~~
1255When doing a repository conversion, use a unique mark per commit
1256(`mark :<n>`) and supply the --export-marks option on the command
1257line.  fast-import will dump a file which lists every mark and the Git
1258object SHA-1 that corresponds to it.  If the frontend can tie
1259the marks back to the source repository, it is easy to verify the
1260accuracy and completeness of the import by comparing each Git
1261commit to the corresponding source revision.
1262
1263Coming from a system such as Perforce or Subversion this should be
1264quite simple, as the fast-import mark can also be the Perforce changeset
1265number or the Subversion revision number.
1266
1267Freely Skip Around Branches
1268~~~~~~~~~~~~~~~~~~~~~~~~~~~
1269Don't bother trying to optimize the frontend to stick to one branch
1270at a time during an import.  Although doing so might be slightly
1271faster for fast-import, it tends to increase the complexity of the frontend
1272code considerably.
1273
1274The branch LRU builtin to fast-import tends to behave very well, and the
1275cost of activating an inactive branch is so low that bouncing around
1276between branches has virtually no impact on import performance.
1277
1278Handling Renames
1279~~~~~~~~~~~~~~~~
1280When importing a renamed file or directory, simply delete the old
1281name(s) and modify the new name(s) during the corresponding commit.
1282Git performs rename detection after-the-fact, rather than explicitly
1283during a commit.
1284
1285Use Tag Fixup Branches
1286~~~~~~~~~~~~~~~~~~~~~~
1287Some other SCM systems let the user create a tag from multiple
1288files which are not from the same commit/changeset.  Or to create
1289tags which are a subset of the files available in the repository.
1290
1291Importing these tags as-is in Git is impossible without making at
1292least one commit which ``fixes up'' the files to match the content
1293of the tag.  Use fast-import's `reset` command to reset a dummy branch
1294outside of your normal branch space to the base commit for the tag,
1295then commit one or more file fixup commits, and finally tag the
1296dummy branch.
1297
1298For example since all normal branches are stored under `refs/heads/`
1299name the tag fixup branch `TAG_FIXUP`.  This way it is impossible for
1300the fixup branch used by the importer to have namespace conflicts
1301with real branches imported from the source (the name `TAG_FIXUP`
1302is not `refs/heads/TAG_FIXUP`).
1303
1304When committing fixups, consider using `merge` to connect the
1305commit(s) which are supplying file revisions to the fixup branch.
1306Doing so will allow tools such as 'git blame' to track
1307through the real commit history and properly annotate the source
1308files.
1309
1310After fast-import terminates the frontend will need to do `rm .git/TAG_FIXUP`
1311to remove the dummy branch.
1312
1313Import Now, Repack Later
1314~~~~~~~~~~~~~~~~~~~~~~~~
1315As soon as fast-import completes the Git repository is completely valid
1316and ready for use.  Typically this takes only a very short time,
1317even for considerably large projects (100,000+ commits).
1318
1319However repacking the repository is necessary to improve data
1320locality and access performance.  It can also take hours on extremely
1321large projects (especially if -f and a large --window parameter is
1322used).  Since repacking is safe to run alongside readers and writers,
1323run the repack in the background and let it finish when it finishes.
1324There is no reason to wait to explore your new Git project!
1325
1326If you choose to wait for the repack, don't try to run benchmarks
1327or performance tests until repacking is completed.  fast-import outputs
1328suboptimal packfiles that are simply never seen in real use
1329situations.
1330
1331Repacking Historical Data
1332~~~~~~~~~~~~~~~~~~~~~~~~~
1333If you are repacking very old imported data (e.g. older than the
1334last year), consider expending some extra CPU time and supplying
1335--window=50 (or higher) when you run 'git repack'.
1336This will take longer, but will also produce a smaller packfile.
1337You only need to expend the effort once, and everyone using your
1338project will benefit from the smaller repository.
1339
1340Include Some Progress Messages
1341~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1342Every once in a while have your frontend emit a `progress` message
1343to fast-import.  The contents of the messages are entirely free-form,
1344so one suggestion would be to output the current month and year
1345each time the current commit date moves into the next month.
1346Your users will feel better knowing how much of the data stream
1347has been processed.
1348
1349
1350Packfile Optimization
1351---------------------
1352When packing a blob fast-import always attempts to deltify against the last
1353blob written.  Unless specifically arranged for by the frontend,
1354this will probably not be a prior version of the same file, so the
1355generated delta will not be the smallest possible.  The resulting
1356packfile will be compressed, but will not be optimal.
1357
1358Frontends which have efficient access to all revisions of a
1359single file (for example reading an RCS/CVS ,v file) can choose
1360to supply all revisions of that file as a sequence of consecutive
1361`blob` commands.  This allows fast-import to deltify the different file
1362revisions against each other, saving space in the final packfile.
1363Marks can be used to later identify individual file revisions during
1364a sequence of `commit` commands.
1365
1366The packfile(s) created by fast-import do not encourage good disk access
1367patterns.  This is caused by fast-import writing the data in the order
1368it is received on standard input, while Git typically organizes
1369data within packfiles to make the most recent (current tip) data
1370appear before historical data.  Git also clusters commits together,
1371speeding up revision traversal through better cache locality.
1372
1373For this reason it is strongly recommended that users repack the
1374repository with `git repack -a -d` after fast-import completes, allowing
1375Git to reorganize the packfiles for faster data access.  If blob
1376deltas are suboptimal (see above) then also adding the `-f` option
1377to force recomputation of all deltas can significantly reduce the
1378final packfile size (30-50% smaller can be quite typical).
1379
1380
1381Memory Utilization
1382------------------
1383There are a number of factors which affect how much memory fast-import
1384requires to perform an import.  Like critical sections of core
1385Git, fast-import uses its own memory allocators to amortize any overheads
1386associated with malloc.  In practice fast-import tends to amortize any
1387malloc overheads to 0, due to its use of large block allocations.
1388
1389per object
1390~~~~~~~~~~
1391fast-import maintains an in-memory structure for every object written in
1392this execution.  On a 32 bit system the structure is 32 bytes,
1393on a 64 bit system the structure is 40 bytes (due to the larger
1394pointer sizes).  Objects in the table are not deallocated until
1395fast-import terminates.  Importing 2 million objects on a 32 bit system
1396will require approximately 64 MiB of memory.
1397
1398The object table is actually a hashtable keyed on the object name
1399(the unique SHA-1).  This storage configuration allows fast-import to reuse
1400an existing or already written object and avoid writing duplicates
1401to the output packfile.  Duplicate blobs are surprisingly common
1402in an import, typically due to branch merges in the source.
1403
1404per mark
1405~~~~~~~~
1406Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
1407bytes, depending on pointer size) per mark.  Although the array
1408is sparse, frontends are still strongly encouraged to use marks
1409between 1 and n, where n is the total number of marks required for
1410this import.
1411
1412per branch
1413~~~~~~~~~~
1414Branches are classified as active and inactive.  The memory usage
1415of the two classes is significantly different.
1416
1417Inactive branches are stored in a structure which uses 96 or 120
1418bytes (32 bit or 64 bit systems, respectively), plus the length of
1419the branch name (typically under 200 bytes), per branch.  fast-import will
1420easily handle as many as 10,000 inactive branches in under 2 MiB
1421of memory.
1422
1423Active branches have the same overhead as inactive branches, but
1424also contain copies of every tree that has been recently modified on
1425that branch.  If subtree `include` has not been modified since the
1426branch became active, its contents will not be loaded into memory,
1427but if subtree `src` has been modified by a commit since the branch
1428became active, then its contents will be loaded in memory.
1429
1430As active branches store metadata about the files contained on that
1431branch, their in-memory storage size can grow to a considerable size
1432(see below).
1433
1434fast-import automatically moves active branches to inactive status based on
1435a simple least-recently-used algorithm.  The LRU chain is updated on
1436each `commit` command.  The maximum number of active branches can be
1437increased or decreased on the command line with --active-branches=.
1438
1439per active tree
1440~~~~~~~~~~~~~~~
1441Trees (aka directories) use just 12 bytes of memory on top of the
1442memory required for their entries (see ``per active file'' below).
1443The cost of a tree is virtually 0, as its overhead amortizes out
1444over the individual file entries.
1445
1446per active file entry
1447~~~~~~~~~~~~~~~~~~~~~
1448Files (and pointers to subtrees) within active trees require 52 or 64
1449bytes (32/64 bit platforms) per entry.  To conserve space, file and
1450tree names are pooled in a common string table, allowing the filename
1451``Makefile'' to use just 16 bytes (after including the string header
1452overhead) no matter how many times it occurs within the project.
1453
1454The active branch LRU, when coupled with the filename string pool
1455and lazy loading of subtrees, allows fast-import to efficiently import
1456projects with 2,000+ branches and 45,114+ files in a very limited
1457memory footprint (less than 2.7 MiB per active branch).
1458
1459Signals
1460-------
1461Sending *SIGUSR1* to the 'git fast-import' process ends the current
1462packfile early, simulating a `checkpoint` command.  The impatient
1463operator can use this facility to peek at the objects and refs from an
1464import in progress, at the cost of some added running time and worse
1465compression.
1466
1467SEE ALSO
1468--------
1469linkgit:git-fast-export[1]
1470
1471GIT
1472---
1473Part of the linkgit:git[1] suite