Andrew's git - gitweb.git/blob - Documentation/git-fast-import.txt

   1git-fast-import(1)
   2==================
   3
   4NAME
   5----
   6git-fast-import - Backend for fast Git data importers
   7
   8
   9SYNOPSIS
  10--------
  11frontend | 'git fast-import' [options]
  12
  13DESCRIPTION
  14-----------
  15This program is usually not what the end user wants to run directly.
  16Most end users want to use one of the existing frontend programs,
  17which parses a specific type of foreign source and feeds the contents
  18stored there to 'git-fast-import'.
  19
  20fast-import reads a mixed command/data stream from standard input and
  21writes one or more packfiles directly into the current repository.
  22When EOF is received on standard input, fast import writes out
  23updated branch and tag refs, fully updating the current repository
  24with the newly imported data.
  25
  26The fast-import backend itself can import into an empty repository (one that
  27has already been initialized by 'git-init') or incrementally
  28update an existing populated repository.  Whether or not incremental
  29imports are supported from a particular foreign source depends on
  30the frontend program in use.
  31
  32
  33OPTIONS
  34-------
  35--date-format=<fmt>::
  36        Specify the type of dates the frontend will supply to
  37        fast-import within `author`, `committer` and `tagger` commands.
  38        See ``Date Formats'' below for details about which formats
  39        are supported, and their syntax.
  40
  41--force::
  42        Force updating modified existing branches, even if doing
  43        so would cause commits to be lost (as the new commit does
  44        not contain the old commit).
  45
  46--max-pack-size=<n>::
  47        Maximum size of each output packfile, expressed in MiB.
  48        The default is 4096 (4 GiB) as that is the maximum allowed
  49        packfile size (due to file format limitations). Some
  50        importers may wish to lower this, such as to ensure the
  51        resulting packfiles fit on CDs.
  52
  53--depth=<n>::
  54        Maximum delta depth, for blob and tree deltification.
  55        Default is 10.
  56
  57--active-branches=<n>::
  58        Maximum number of branches to maintain active at once.
  59        See ``Memory Utilization'' below for details.  Default is 5.
  60
  61--export-marks=<file>::
  62        Dumps the internal marks table to <file> when complete.
  63        Marks are written one per line as `:markid SHA-1`.
  64        Frontends can use this file to validate imports after they
  65        have been completed, or to save the marks table across
  66        incremental runs.  As <file> is only opened and truncated
  67        at checkpoint (or completion) the same path can also be
  68        safely given to \--import-marks.
  69
  70--import-marks=<file>::
  71        Before processing any input, load the marks specified in
  72        <file>.  The input file must exist, must be readable, and
  73        must use the same format as produced by \--export-marks.
  74        Multiple options may be supplied to import more than one
  75        set of marks.  If a mark is defined to different values,
  76        the last file wins.
  77
  78--export-pack-edges=<file>::
  79        After creating a packfile, print a line of data to
  80        <file> listing the filename of the packfile and the last
  81        commit on each branch that was written to that packfile.
  82        This information may be useful after importing projects
  83        whose total object set exceeds the 4 GiB packfile limit,
  84        as these commits can be used as edge points during calls
  85        to 'git-pack-objects'.
  86
  87--quiet::
  88        Disable all non-fatal output, making fast-import silent when it
  89        is successful.  This option disables the output shown by
  90        \--stats.
  91
  92--stats::
  93        Display some basic statistics about the objects fast-import has
  94        created, the packfiles they were stored into, and the
  95        memory used by fast-import during this run.  Showing this output
  96        is currently the default, but can be disabled with \--quiet.
  97
  98
  99Performance
 100-----------
 101The design of fast-import allows it to import large projects in a minimum
 102amount of memory usage and processing time.  Assuming the frontend
 103is able to keep up with fast-import and feed it a constant stream of data,
 104import times for projects holding 10+ years of history and containing
 105100,000+ individual commits are generally completed in just 1-2
 106hours on quite modest (~$2,000 USD) hardware.
 107
 108Most bottlenecks appear to be in foreign source data access (the
 109source just cannot extract revisions fast enough) or disk IO (fast-import
 110writes as fast as the disk will take the data).  Imports will run
 111faster if the source data is stored on a different drive than the
 112destination Git repository (due to less IO contention).
 113
 114
 115Development Cost
 116----------------
 117A typical frontend for fast-import tends to weigh in at approximately 200
 118lines of Perl/Python/Ruby code.  Most developers have been able to
 119create working importers in just a couple of hours, even though it
 120is their first exposure to fast-import, and sometimes even to Git.  This is
 121an ideal situation, given that most conversion tools are throw-away
 122(use once, and never look back).
 123
 124
 125Parallel Operation
 126------------------
 127Like 'git-push' or 'git-fetch', imports handled by fast-import are safe to
 128run alongside parallel `git repack -a -d` or `git gc` invocations,
 129or any other Git operation (including 'git-prune', as loose objects
 130are never used by fast-import).
 131
 132fast-import does not lock the branch or tag refs it is actively importing.
 133After the import, during its ref update phase, fast-import tests each
 134existing branch ref to verify the update will be a fast-forward
 135update (the commit stored in the ref is contained in the new
 136history of the commit to be written).  If the update is not a
 137fast-forward update, fast-import will skip updating that ref and instead
 138prints a warning message.  fast-import will always attempt to update all
 139branch refs, and does not stop on the first failure.
 140
 141Branch updates can be forced with \--force, but its recommended that
 142this only be used on an otherwise quiet repository.  Using \--force
 143is not necessary for an initial import into an empty repository.
 144
 145
 146Technical Discussion
 147--------------------
 148fast-import tracks a set of branches in memory.  Any branch can be created
 149or modified at any point during the import process by sending a
 150`commit` command on the input stream.  This design allows a frontend
 151program to process an unlimited number of branches simultaneously,
 152generating commits in the order they are available from the source
 153data.  It also simplifies the frontend programs considerably.
 154
 155fast-import does not use or alter the current working directory, or any
 156file within it.  (It does however update the current Git repository,
 157as referenced by `GIT_DIR`.)  Therefore an import frontend may use
 158the working directory for its own purposes, such as extracting file
 159revisions from the foreign source.  This ignorance of the working
 160directory also allows fast-import to run very quickly, as it does not
 161need to perform any costly file update operations when switching
 162between branches.
 163
 164Input Format
 165------------
 166With the exception of raw file data (which Git does not interpret)
 167the fast-import input format is text (ASCII) based.  This text based
 168format simplifies development and debugging of frontend programs,
 169especially when a higher level language such as Perl, Python or
 170Ruby is being used.
 171
 172fast-import is very strict about its input.  Where we say SP below we mean
 173*exactly* one space.  Likewise LF means one (and only one) linefeed.
 174Supplying additional whitespace characters will cause unexpected
 175results, such as branch names or file names with leading or trailing
 176spaces in their name, or early termination of fast-import when it encounters
 177unexpected input.
 178
 179Stream Comments
 180~~~~~~~~~~~~~~~
 181To aid in debugging frontends fast-import ignores any line that
 182begins with `#` (ASCII pound/hash) up to and including the line
 183ending `LF`.  A comment line may contain any sequence of bytes
 184that does not contain an LF and therefore may be used to include
 185any detailed debugging information that might be specific to the
 186frontend and useful when inspecting a fast-import data stream.
 187
 188Date Formats
 189~~~~~~~~~~~~
 190The following date formats are supported.  A frontend should select
 191the format it will use for this import by passing the format name
 192in the \--date-format=<fmt> command line option.
 193
 194`raw`::
 195        This is the Git native format and is `<time> SP <offutc>`.
 196        It is also fast-import's default format, if \--date-format was
 197        not specified.
 198+
 199The time of the event is specified by `<time>` as the number of
 200seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
 201written as an ASCII decimal integer.
 202+
 203The local offset is specified by `<offutc>` as a positive or negative
 204offset from UTC.  For example EST (which is 5 hours behind UTC)
 205would be expressed in `<tz>` by ``-0500'' while UTC is ``+0000''.
 206The local offset does not affect `<time>`; it is used only as an
 207advisement to help formatting routines display the timestamp.
 208+
 209If the local offset is not available in the source material, use
 210``+0000'', or the most common local offset.  For example many
 211organizations have a CVS repository which has only ever been accessed
 212by users who are located in the same location and timezone.  In this
 213case a reasonable offset from UTC could be assumed.
 214+
 215Unlike the `rfc2822` format, this format is very strict.  Any
 216variation in formatting will cause fast-import to reject the value.
 217
 218`rfc2822`::
 219        This is the standard email format as described by RFC 2822.
 220+
 221An example value is ``Tue Feb 6 11:22:18 2007 -0500''.  The Git
 222parser is accurate, but a little on the lenient side.  It is the
 223same parser used by 'git-am' when applying patches
 224received from email.
 225+
 226Some malformed strings may be accepted as valid dates.  In some of
 227these cases Git will still be able to obtain the correct date from
 228the malformed string.  There are also some types of malformed
 229strings which Git will parse wrong, and yet consider valid.
 230Seriously malformed strings will be rejected.
 231+
 232Unlike the `raw` format above, the timezone/UTC offset information
 233contained in an RFC 2822 date string is used to adjust the date
 234value to UTC prior to storage.  Therefore it is important that
 235this information be as accurate as possible.
 236+
 237If the source material uses RFC 2822 style dates,
 238the frontend should let fast-import handle the parsing and conversion
 239(rather than attempting to do it itself) as the Git parser has
 240been well tested in the wild.
 241+
 242Frontends should prefer the `raw` format if the source material
 243already uses UNIX-epoch format, can be coaxed to give dates in that
 244format, or its format is easily convertible to it, as there is no
 245ambiguity in parsing.
 246
 247`now`::
 248        Always use the current time and timezone.  The literal
 249        `now` must always be supplied for `<when>`.
 250+
 251This is a toy format.  The current time and timezone of this system
 252is always copied into the identity string at the time it is being
 253created by fast-import.  There is no way to specify a different time or
 254timezone.
 255+
 256This particular format is supplied as its short to implement and
 257may be useful to a process that wants to create a new commit
 258right now, without needing to use a working directory or
 259'git-update-index'.
 260+
 261If separate `author` and `committer` commands are used in a `commit`
 262the timestamps may not match, as the system clock will be polled
 263twice (once for each command).  The only way to ensure that both
 264author and committer identity information has the same timestamp
 265is to omit `author` (thus copying from `committer`) or to use a
 266date format other than `now`.
 267
 268Commands
 269~~~~~~~~
 270fast-import accepts several commands to update the current repository
 271and control the current import process.  More detailed discussion
 272(with examples) of each command follows later.
 273
 274`commit`::
 275        Creates a new branch or updates an existing branch by
 276        creating a new commit and updating the branch to point at
 277        the newly created commit.
 278
 279`tag`::
 280        Creates an annotated tag object from an existing commit or
 281        branch.  Lightweight tags are not supported by this command,
 282        as they are not recommended for recording meaningful points
 283        in time.
 284
 285`reset`::
 286        Reset an existing branch (or a new branch) to a specific
 287        revision.  This command must be used to change a branch to
 288        a specific revision without making a commit on it.
 289
 290`blob`::
 291        Convert raw file data into a blob, for future use in a
 292        `commit` command.  This command is optional and is not
 293        needed to perform an import.
 294
 295`checkpoint`::
 296        Forces fast-import to close the current packfile, generate its
 297        unique SHA-1 checksum and index, and start a new packfile.
 298        This command is optional and is not needed to perform
 299        an import.
 300
 301`progress`::
 302        Causes fast-import to echo the entire line to its own
 303        standard output.  This command is optional and is not needed
 304        to perform an import.
 305
 306`feature`::
 307        Require that fast-import supports the specified feature, or
 308        abort if it does not.
 309
 310`option`::
 311        Specify any of the options listed under OPTIONS that do not
 312        change stream semantic to suit the frontend's needs. This
 313        command is optional and is not needed to perform an import.
 314
 315`commit`
 316~~~~~~~~
 317Create or update a branch with a new commit, recording one logical
 318change to the project.
 319
 320....
 321        'commit' SP <ref> LF
 322        mark?
 323        ('author' SP <name> SP LT <email> GT SP <when> LF)?
 324        'committer' SP <name> SP LT <email> GT SP <when> LF
 325        data
 326        ('from' SP <committish> LF)?
 327        ('merge' SP <committish> LF)?
 328        (filemodify | filedelete | filecopy | filerename | filedeleteall | notemodify)*
 329        LF?
 330....
 331
 332where `<ref>` is the name of the branch to make the commit on.
 333Typically branch names are prefixed with `refs/heads/` in
 334Git, so importing the CVS branch symbol `RELENG-1_0` would use
 335`refs/heads/RELENG-1_0` for the value of `<ref>`.  The value of
 336`<ref>` must be a valid refname in Git.  As `LF` is not valid in
 337a Git refname, no quoting or escaping syntax is supported here.
 338
 339A `mark` command may optionally appear, requesting fast-import to save a
 340reference to the newly created commit for future use by the frontend
 341(see below for format).  It is very common for frontends to mark
 342every commit they create, thereby allowing future branch creation
 343from any imported commit.
 344
 345The `data` command following `committer` must supply the commit
 346message (see below for `data` command syntax).  To import an empty
 347commit message use a 0 length data.  Commit messages are free-form
 348and are not interpreted by Git.  Currently they must be encoded in
 349UTF-8, as fast-import does not permit other encodings to be specified.
 350
 351Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`,
 352`filedeleteall` and `notemodify` commands
 353may be included to update the contents of the branch prior to
 354creating the commit.  These commands may be supplied in any order.
 355However it is recommended that a `filedeleteall` command precede
 356all `filemodify`, `filecopy`, `filerename` and `notemodify` commands in
 357the same commit, as `filedeleteall` wipes the branch clean (see below).
 358
 359The `LF` after the command is optional (it used to be required).
 360
 361`author`
 362^^^^^^^^
 363An `author` command may optionally appear, if the author information
 364might differ from the committer information.  If `author` is omitted
 365then fast-import will automatically use the committer's information for
 366the author portion of the commit.  See below for a description of
 367the fields in `author`, as they are identical to `committer`.
 368
 369`committer`
 370^^^^^^^^^^^
 371The `committer` command indicates who made this commit, and when
 372they made it.
 373
 374Here `<name>` is the person's display name (for example
 375``Com M Itter'') and `<email>` is the person's email address
 376(``cm@example.com'').  `LT` and `GT` are the literal less-than (\x3c)
 377and greater-than (\x3e) symbols.  These are required to delimit
 378the email address from the other fields in the line.  Note that
 379`<name>` is free-form and may contain any sequence of bytes, except
 380`LT` and `LF`.  It is typically UTF-8 encoded.
 381
 382The time of the change is specified by `<when>` using the date format
 383that was selected by the \--date-format=<fmt> command line option.
 384See ``Date Formats'' above for the set of supported formats, and
 385their syntax.
 386
 387`from`
 388^^^^^^
 389The `from` command is used to specify the commit to initialize
 390this branch from.  This revision will be the first ancestor of the
 391new commit.
 392
 393Omitting the `from` command in the first commit of a new branch
 394will cause fast-import to create that commit with no ancestor. This
 395tends to be desired only for the initial commit of a project.
 396If the frontend creates all files from scratch when making a new
 397branch, a `merge` command may be used instead of `from` to start
 398the commit with an empty tree.
 399Omitting the `from` command on existing branches is usually desired,
 400as the current commit on that branch is automatically assumed to
 401be the first ancestor of the new commit.
 402
 403As `LF` is not valid in a Git refname or SHA-1 expression, no
 404quoting or escaping syntax is supported within `<committish>`.
 405
 406Here `<committish>` is any of the following:
 407
 408* The name of an existing branch already in fast-import's internal branch
 409  table.  If fast-import doesn't know the name, its treated as a SHA-1
 410  expression.
 411
 412* A mark reference, `:<idnum>`, where `<idnum>` is the mark number.
 413+
 414The reason fast-import uses `:` to denote a mark reference is this character
 415is not legal in a Git branch name.  The leading `:` makes it easy
 416to distinguish between the mark 42 (`:42`) and the branch 42 (`42`
 417or `refs/heads/42`), or an abbreviated SHA-1 which happened to
 418consist only of base-10 digits.
 419+
 420Marks must be declared (via `mark`) before they can be used.
 421
 422* A complete 40 byte or abbreviated commit SHA-1 in hex.
 423
 424* Any valid Git SHA-1 expression that resolves to a commit.  See
 425  ``SPECIFYING REVISIONS'' in linkgit:git-rev-parse[1] for details.
 426
 427The special case of restarting an incremental import from the
 428current branch value should be written as:
 429----
 430        from refs/heads/branch^0
 431----
 432The `{caret}0` suffix is necessary as fast-import does not permit a branch to
 433start from itself, and the branch is created in memory before the
 434`from` command is even read from the input.  Adding `{caret}0` will force
 435fast-import to resolve the commit through Git's revision parsing library,
 436rather than its internal branch table, thereby loading in the
 437existing value of the branch.
 438
 439`merge`
 440^^^^^^^
 441Includes one additional ancestor commit.  If the `from` command is
 442omitted when creating a new branch, the first `merge` commit will be
 443the first ancestor of the current commit, and the branch will start
 444out with no files.  An unlimited number of `merge` commands per
 445commit are permitted by fast-import, thereby establishing an n-way merge.
 446However Git's other tools never create commits with more than 15
 447additional ancestors (forming a 16-way merge).  For this reason
 448it is suggested that frontends do not use more than 15 `merge`
 449commands per commit; 16, if starting a new, empty branch.
 450
 451Here `<committish>` is any of the commit specification expressions
 452also accepted by `from` (see above).
 453
 454`filemodify`
 455^^^^^^^^^^^^
 456Included in a `commit` command to add a new file or change the
 457content of an existing file.  This command has two different means
 458of specifying the content of the file.
 459
 460External data format::
 461        The data content for the file was already supplied by a prior
 462        `blob` command.  The frontend just needs to connect it.
 463+
 464....
 465        'M' SP <mode> SP <dataref> SP <path> LF
 466....
 467+
 468Here `<dataref>` can be either a mark reference (`:<idnum>`)
 469set by a prior `blob` command, or a full 40-byte SHA-1 of an
 470existing Git blob object.
 471
 472Inline data format::
 473        The data content for the file has not been supplied yet.
 474        The frontend wants to supply it as part of this modify
 475        command.
 476+
 477....
 478        'M' SP <mode> SP 'inline' SP <path> LF
 479        data
 480....
 481+
 482See below for a detailed description of the `data` command.
 483
 484In both formats `<mode>` is the type of file entry, specified
 485in octal.  Git only supports the following modes:
 486
 487* `100644` or `644`: A normal (not-executable) file.  The majority
 488  of files in most projects use this mode.  If in doubt, this is
 489  what you want.
 490* `100755` or `755`: A normal, but executable, file.
 491* `120000`: A symlink, the content of the file will be the link target.
 492* `160000`: A gitlink, SHA-1 of the object refers to a commit in
 493  another repository. Git links can only be specified by SHA or through
 494  a commit mark. They are used to implement submodules.
 495
 496In both formats `<path>` is the complete path of the file to be added
 497(if not already existing) or modified (if already existing).
 498
 499A `<path>` string must use UNIX-style directory separators (forward
 500slash `/`), may contain any byte other than `LF`, and must not
 501start with double quote (`"`).
 502
 503If an `LF` or double quote must be encoded into `<path>` shell-style
 504quoting should be used, e.g. `"path/with\n and \" in it"`.
 505
 506The value of `<path>` must be in canonical form. That is it must not:
 507
 508* contain an empty directory component (e.g. `foo//bar` is invalid),
 509* end with a directory separator (e.g. `foo/` is invalid),
 510* start with a directory separator (e.g. `/foo` is invalid),
 511* contain the special component `.` or `..` (e.g. `foo/./bar` and
 512  `foo/../bar` are invalid).
 513
 514It is recommended that `<path>` always be encoded using UTF-8.
 515
 516`filedelete`
 517^^^^^^^^^^^^
 518Included in a `commit` command to remove a file or recursively
 519delete an entire directory from the branch.  If the file or directory
 520removal makes its parent directory empty, the parent directory will
 521be automatically removed too.  This cascades up the tree until the
 522first non-empty directory or the root is reached.
 523
 524....
 525        'D' SP <path> LF
 526....
 527
 528here `<path>` is the complete path of the file or subdirectory to
 529be removed from the branch.
 530See `filemodify` above for a detailed description of `<path>`.
 531
 532`filecopy`
 533^^^^^^^^^^^^
 534Recursively copies an existing file or subdirectory to a different
 535location within the branch.  The existing file or directory must
 536exist.  If the destination exists it will be completely replaced
 537by the content copied from the source.
 538
 539....
 540        'C' SP <path> SP <path> LF
 541....
 542
 543here the first `<path>` is the source location and the second
 544`<path>` is the destination.  See `filemodify` above for a detailed
 545description of what `<path>` may look like.  To use a source path
 546that contains SP the path must be quoted.
 547
 548A `filecopy` command takes effect immediately.  Once the source
 549location has been copied to the destination any future commands
 550applied to the source location will not impact the destination of
 551the copy.
 552
 553`filerename`
 554^^^^^^^^^^^^
 555Renames an existing file or subdirectory to a different location
 556within the branch.  The existing file or directory must exist. If
 557the destination exists it will be replaced by the source directory.
 558
 559....
 560        'R' SP <path> SP <path> LF
 561....
 562
 563here the first `<path>` is the source location and the second
 564`<path>` is the destination.  See `filemodify` above for a detailed
 565description of what `<path>` may look like.  To use a source path
 566that contains SP the path must be quoted.
 567
 568A `filerename` command takes effect immediately.  Once the source
 569location has been renamed to the destination any future commands
 570applied to the source location will create new files there and not
 571impact the destination of the rename.
 572
 573Note that a `filerename` is the same as a `filecopy` followed by a
 574`filedelete` of the source location.  There is a slight performance
 575advantage to using `filerename`, but the advantage is so small
 576that it is never worth trying to convert a delete/add pair in
 577source material into a rename for fast-import.  This `filerename`
 578command is provided just to simplify frontends that already have
 579rename information and don't want bother with decomposing it into a
 580`filecopy` followed by a `filedelete`.
 581
 582`filedeleteall`
 583^^^^^^^^^^^^^^^
 584Included in a `commit` command to remove all files (and also all
 585directories) from the branch.  This command resets the internal
 586branch structure to have no files in it, allowing the frontend
 587to subsequently add all interesting files from scratch.
 588
 589....
 590        'deleteall' LF
 591....
 592
 593This command is extremely useful if the frontend does not know
 594(or does not care to know) what files are currently on the branch,
 595and therefore cannot generate the proper `filedelete` commands to
 596update the content.
 597
 598Issuing a `filedeleteall` followed by the needed `filemodify`
 599commands to set the correct content will produce the same results
 600as sending only the needed `filemodify` and `filedelete` commands.
 601The `filedeleteall` approach may however require fast-import to use slightly
 602more memory per active branch (less than 1 MiB for even most large
 603projects); so frontends that can easily obtain only the affected
 604paths for a commit are encouraged to do so.
 605
 606`notemodify`
 607^^^^^^^^^^^^
 608Included in a `commit` command to add a new note (annotating a given
 609commit) or change the content of an existing note.  This command has
 610two different means of specifying the content of the note.
 611
 612External data format::
 613        The data content for the note was already supplied by a prior
 614        `blob` command.  The frontend just needs to connect it to the
 615        commit that is to be annotated.
 616+
 617....
 618        'N' SP <dataref> SP <committish> LF
 619....
 620+
 621Here `<dataref>` can be either a mark reference (`:<idnum>`)
 622set by a prior `blob` command, or a full 40-byte SHA-1 of an
 623existing Git blob object.
 624
 625Inline data format::
 626        The data content for the note has not been supplied yet.
 627        The frontend wants to supply it as part of this modify
 628        command.
 629+
 630....
 631        'N' SP 'inline' SP <committish> LF
 632        data
 633....
 634+
 635See below for a detailed description of the `data` command.
 636
 637In both formats `<committish>` is any of the commit specification
 638expressions also accepted by `from` (see above).
 639
 640`mark`
 641~~~~~~
 642Arranges for fast-import to save a reference to the current object, allowing
 643the frontend to recall this object at a future point in time, without
 644knowing its SHA-1.  Here the current object is the object creation
 645command the `mark` command appears within.  This can be `commit`,
 646`tag`, and `blob`, but `commit` is the most common usage.
 647
 648....
 649        'mark' SP ':' <idnum> LF
 650....
 651
 652where `<idnum>` is the number assigned by the frontend to this mark.
 653The value of `<idnum>` is expressed as an ASCII decimal integer.
 654The value 0 is reserved and cannot be used as
 655a mark.  Only values greater than or equal to 1 may be used as marks.
 656
 657New marks are created automatically.  Existing marks can be moved
 658to another object simply by reusing the same `<idnum>` in another
 659`mark` command.
 660
 661`tag`
 662~~~~~
 663Creates an annotated tag referring to a specific commit.  To create
 664lightweight (non-annotated) tags see the `reset` command below.
 665
 666....
 667        'tag' SP <name> LF
 668        'from' SP <committish> LF
 669        'tagger' SP <name> SP LT <email> GT SP <when> LF
 670        data
 671....
 672
 673where `<name>` is the name of the tag to create.
 674
 675Tag names are automatically prefixed with `refs/tags/` when stored
 676in Git, so importing the CVS branch symbol `RELENG-1_0-FINAL` would
 677use just `RELENG-1_0-FINAL` for `<name>`, and fast-import will write the
 678corresponding ref as `refs/tags/RELENG-1_0-FINAL`.
 679
 680The value of `<name>` must be a valid refname in Git and therefore
 681may contain forward slashes.  As `LF` is not valid in a Git refname,
 682no quoting or escaping syntax is supported here.
 683
 684The `from` command is the same as in the `commit` command; see
 685above for details.
 686
 687The `tagger` command uses the same format as `committer` within
 688`commit`; again see above for details.
 689
 690The `data` command following `tagger` must supply the annotated tag
 691message (see below for `data` command syntax).  To import an empty
 692tag message use a 0 length data.  Tag messages are free-form and are
 693not interpreted by Git.  Currently they must be encoded in UTF-8,
 694as fast-import does not permit other encodings to be specified.
 695
 696Signing annotated tags during import from within fast-import is not
 697supported.  Trying to include your own PGP/GPG signature is not
 698recommended, as the frontend does not (easily) have access to the
 699complete set of bytes which normally goes into such a signature.
 700If signing is required, create lightweight tags from within fast-import with
 701`reset`, then create the annotated versions of those tags offline
 702with the standard 'git-tag' process.
 703
 704`reset`
 705~~~~~~~
 706Creates (or recreates) the named branch, optionally starting from
 707a specific revision.  The reset command allows a frontend to issue
 708a new `from` command for an existing branch, or to create a new
 709branch from an existing commit without creating a new commit.
 710
 711....
 712        'reset' SP <ref> LF
 713        ('from' SP <committish> LF)?
 714        LF?
 715....
 716
 717For a detailed description of `<ref>` and `<committish>` see above
 718under `commit` and `from`.
 719
 720The `LF` after the command is optional (it used to be required).
 721
 722The `reset` command can also be used to create lightweight
 723(non-annotated) tags.  For example:
 724
 725====
 726        reset refs/tags/938
 727        from :938
 728====
 729
 730would create the lightweight tag `refs/tags/938` referring to
 731whatever commit mark `:938` references.
 732
 733`blob`
 734~~~~~~
 735Requests writing one file revision to the packfile.  The revision
 736is not connected to any commit; this connection must be formed in
 737a subsequent `commit` command by referencing the blob through an
 738assigned mark.
 739
 740....
 741        'blob' LF
 742        mark?
 743        data
 744....
 745
 746The mark command is optional here as some frontends have chosen
 747to generate the Git SHA-1 for the blob on their own, and feed that
 748directly to `commit`.  This is typically more work than its worth
 749however, as marks are inexpensive to store and easy to use.
 750
 751`data`
 752~~~~~~
 753Supplies raw data (for use as blob/file content, commit messages, or
 754annotated tag messages) to fast-import.  Data can be supplied using an exact
 755byte count or delimited with a terminating line.  Real frontends
 756intended for production-quality conversions should always use the
 757exact byte count format, as it is more robust and performs better.
 758The delimited format is intended primarily for testing fast-import.
 759
 760Comment lines appearing within the `<raw>` part of `data` commands
 761are always taken to be part of the body of the data and are therefore
 762never ignored by fast-import.  This makes it safe to import any
 763file/message content whose lines might start with `#`.
 764
 765Exact byte count format::
 766        The frontend must specify the number of bytes of data.
 767+
 768....
 769        'data' SP <count> LF
 770        <raw> LF?
 771....
 772+
 773where `<count>` is the exact number of bytes appearing within
 774`<raw>`.  The value of `<count>` is expressed as an ASCII decimal
 775integer.  The `LF` on either side of `<raw>` is not
 776included in `<count>` and will not be included in the imported data.
 777+
 778The `LF` after `<raw>` is optional (it used to be required) but
 779recommended.  Always including it makes debugging a fast-import
 780stream easier as the next command always starts in column 0
 781of the next line, even if `<raw>` did not end with an `LF`.
 782
 783Delimited format::
 784        A delimiter string is used to mark the end of the data.
 785        fast-import will compute the length by searching for the delimiter.
 786        This format is primarily useful for testing and is not
 787        recommended for real data.
 788+
 789....
 790        'data' SP '<<' <delim> LF
 791        <raw> LF
 792        <delim> LF
 793        LF?
 794....
 795+
 796where `<delim>` is the chosen delimiter string.  The string `<delim>`
 797must not appear on a line by itself within `<raw>`, as otherwise
 798fast-import will think the data ends earlier than it really does.  The `LF`
 799immediately trailing `<raw>` is part of `<raw>`.  This is one of
 800the limitations of the delimited format, it is impossible to supply
 801a data chunk which does not have an LF as its last byte.
 802+
 803The `LF` after `<delim> LF` is optional (it used to be required).
 804
 805`checkpoint`
 806~~~~~~~~~~~~
 807Forces fast-import to close the current packfile, start a new one, and to
 808save out all current branch refs, tags and marks.
 809
 810....
 811        'checkpoint' LF
 812        LF?
 813....
 814
 815Note that fast-import automatically switches packfiles when the current
 816packfile reaches \--max-pack-size, or 4 GiB, whichever limit is
 817smaller.  During an automatic packfile switch fast-import does not update
 818the branch refs, tags or marks.
 819
 820As a `checkpoint` can require a significant amount of CPU time and
 821disk IO (to compute the overall pack SHA-1 checksum, generate the
 822corresponding index file, and update the refs) it can easily take
 823several minutes for a single `checkpoint` command to complete.
 824
 825Frontends may choose to issue checkpoints during extremely large
 826and long running imports, or when they need to allow another Git
 827process access to a branch.  However given that a 30 GiB Subversion
 828repository can be loaded into Git through fast-import in about 3 hours,
 829explicit checkpointing may not be necessary.
 830
 831The `LF` after the command is optional (it used to be required).
 832
 833`progress`
 834~~~~~~~~~~
 835Causes fast-import to print the entire `progress` line unmodified to
 836its standard output channel (file descriptor 1) when the command is
 837processed from the input stream.  The command otherwise has no impact
 838on the current import, or on any of fast-import's internal state.
 839
 840....
 841        'progress' SP <any> LF
 842        LF?
 843....
 844
 845The `<any>` part of the command may contain any sequence of bytes
 846that does not contain `LF`.  The `LF` after the command is optional.
 847Callers may wish to process the output through a tool such as sed to
 848remove the leading part of the line, for example:
 849
 850====
 851        frontend | git fast-import | sed 's/^progress //'
 852====
 853
 854Placing a `progress` command immediately after a `checkpoint` will
 855inform the reader when the `checkpoint` has been completed and it
 856can safely access the refs that fast-import updated.
 857
 858`feature`
 859~~~~~~~~~
 860Require that fast-import supports the specified feature, or abort if
 861it does not.
 862
 863....
 864        'feature' SP <feature> LF
 865....
 866
 867The <feature> part of the command may be any string matching
 868^[a-zA-Z][a-zA-Z-]*$ and should be understood by fast-import.
 869
 870Feature work identical as their option counterparts with the
 871exception of the import-marks feature, see below.
 872
 873The following features are currently supported:
 874
 875* date-format
 876* import-marks
 877* export-marks
 878* force
 879
 880The import-marks behaves differently from when it is specified as
 881commandline option in that only one "feature import-marks" is allowed
 882per stream. Also, any --import-marks= specified on the commandline
 883will override those from the stream (if any).
 884
 885`option`
 886~~~~~~~~
 887Processes the specified option so that git fast-import behaves in a
 888way that suits the frontend's needs.
 889Note that options specified by the frontend are overridden by any
 890options the user may specify to git fast-import itself.
 891
 892....
 893    'option' SP <option> LF
 894....
 895
 896The `<option>` part of the command may contain any of the options
 897listed in the OPTIONS section that do not change import semantics,
 898without the leading '--' and is treated in the same way.
 899
 900Option commands must be the first commands on the input (not counting
 901feature commands), to give an option command after any non-option
 902command is an error.
 903
 904The following commandline options change import semantics and may therefore
 905not be passed as option:
 906
 907* date-format
 908* import-marks
 909* export-marks
 910* force
 911
 912Crash Reports
 913-------------
 914If fast-import is supplied invalid input it will terminate with a
 915non-zero exit status and create a crash report in the top level of
 916the Git repository it was importing into.  Crash reports contain
 917a snapshot of the internal fast-import state as well as the most
 918recent commands that lead up to the crash.
 919
 920All recent commands (including stream comments, file changes and
 921progress commands) are shown in the command history within the crash
 922report, but raw file data and commit messages are excluded from the
 923crash report.  This exclusion saves space within the report file
 924and reduces the amount of buffering that fast-import must perform
 925during execution.
 926
 927After writing a crash report fast-import will close the current
 928packfile and export the marks table.  This allows the frontend
 929developer to inspect the repository state and resume the import from
 930the point where it crashed.  The modified branches and tags are not
 931updated during a crash, as the import did not complete successfully.
 932Branch and tag information can be found in the crash report and
 933must be applied manually if the update is needed.
 934
 935An example crash:
 936
 937====
 938        $ cat >in <<END_OF_INPUT
 939        # my very first test commit
 940        commit refs/heads/master
 941        committer Shawn O. Pearce <spearce> 19283 -0400
 942        # who is that guy anyway?
 943        data <<EOF
 944        this is my commit
 945        EOF
 946        M 644 inline .gitignore
 947        data <<EOF
 948        .gitignore
 949        EOF
 950        M 777 inline bob
 951        END_OF_INPUT
 952
 953        $ git fast-import <in
 954        fatal: Corrupt mode: M 777 inline bob
 955        fast-import: dumping crash report to .git/fast_import_crash_8434
 956
 957        $ cat .git/fast_import_crash_8434
 958        fast-import crash report:
 959            fast-import process: 8434
 960            parent process     : 1391
 961            at Sat Sep 1 00:58:12 2007
 962
 963        fatal: Corrupt mode: M 777 inline bob
 964
 965        Most Recent Commands Before Crash
 966        ---------------------------------
 967          # my very first test commit
 968          commit refs/heads/master
 969          committer Shawn O. Pearce <spearce> 19283 -0400
 970          # who is that guy anyway?
 971          data <<EOF
 972          M 644 inline .gitignore
 973          data <<EOF
 974        * M 777 inline bob
 975
 976        Active Branch LRU
 977        -----------------
 978            active_branches = 1 cur, 5 max
 979
 980          pos  clock name
 981          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 982           1)      0 refs/heads/master
 983
 984        Inactive Branches
 985        -----------------
 986        refs/heads/master:
 987          status      : active loaded dirty
 988          tip commit  : 0000000000000000000000000000000000000000
 989          old tree    : 0000000000000000000000000000000000000000
 990          cur tree    : 0000000000000000000000000000000000000000
 991          commit clock: 0
 992          last pack   :
 993
 994
 995        -------------------
 996        END OF CRASH REPORT
 997====
 998
 999Tips and Tricks
1000---------------
1001The following tips and tricks have been collected from various
1002users of fast-import, and are offered here as suggestions.
1003
1004Use One Mark Per Commit
1005~~~~~~~~~~~~~~~~~~~~~~~
1006When doing a repository conversion, use a unique mark per commit
1007(`mark :<n>`) and supply the \--export-marks option on the command
1008line.  fast-import will dump a file which lists every mark and the Git
1009object SHA-1 that corresponds to it.  If the frontend can tie
1010the marks back to the source repository, it is easy to verify the
1011accuracy and completeness of the import by comparing each Git
1012commit to the corresponding source revision.
1013
1014Coming from a system such as Perforce or Subversion this should be
1015quite simple, as the fast-import mark can also be the Perforce changeset
1016number or the Subversion revision number.
1017
1018Freely Skip Around Branches
1019~~~~~~~~~~~~~~~~~~~~~~~~~~~
1020Don't bother trying to optimize the frontend to stick to one branch
1021at a time during an import.  Although doing so might be slightly
1022faster for fast-import, it tends to increase the complexity of the frontend
1023code considerably.
1024
1025The branch LRU builtin to fast-import tends to behave very well, and the
1026cost of activating an inactive branch is so low that bouncing around
1027between branches has virtually no impact on import performance.
1028
1029Handling Renames
1030~~~~~~~~~~~~~~~~
1031When importing a renamed file or directory, simply delete the old
1032name(s) and modify the new name(s) during the corresponding commit.
1033Git performs rename detection after-the-fact, rather than explicitly
1034during a commit.
1035
1036Use Tag Fixup Branches
1037~~~~~~~~~~~~~~~~~~~~~~
1038Some other SCM systems let the user create a tag from multiple
1039files which are not from the same commit/changeset.  Or to create
1040tags which are a subset of the files available in the repository.
1041
1042Importing these tags as-is in Git is impossible without making at
1043least one commit which ``fixes up'' the files to match the content
1044of the tag.  Use fast-import's `reset` command to reset a dummy branch
1045outside of your normal branch space to the base commit for the tag,
1046then commit one or more file fixup commits, and finally tag the
1047dummy branch.
1048
1049For example since all normal branches are stored under `refs/heads/`
1050name the tag fixup branch `TAG_FIXUP`.  This way it is impossible for
1051the fixup branch used by the importer to have namespace conflicts
1052with real branches imported from the source (the name `TAG_FIXUP`
1053is not `refs/heads/TAG_FIXUP`).
1054
1055When committing fixups, consider using `merge` to connect the
1056commit(s) which are supplying file revisions to the fixup branch.
1057Doing so will allow tools such as 'git-blame' to track
1058through the real commit history and properly annotate the source
1059files.
1060
1061After fast-import terminates the frontend will need to do `rm .git/TAG_FIXUP`
1062to remove the dummy branch.
1063
1064Import Now, Repack Later
1065~~~~~~~~~~~~~~~~~~~~~~~~
1066As soon as fast-import completes the Git repository is completely valid
1067and ready for use.  Typically this takes only a very short time,
1068even for considerably large projects (100,000+ commits).
1069
1070However repacking the repository is necessary to improve data
1071locality and access performance.  It can also take hours on extremely
1072large projects (especially if -f and a large \--window parameter is
1073used).  Since repacking is safe to run alongside readers and writers,
1074run the repack in the background and let it finish when it finishes.
1075There is no reason to wait to explore your new Git project!
1076
1077If you choose to wait for the repack, don't try to run benchmarks
1078or performance tests until repacking is completed.  fast-import outputs
1079suboptimal packfiles that are simply never seen in real use
1080situations.
1081
1082Repacking Historical Data
1083~~~~~~~~~~~~~~~~~~~~~~~~~
1084If you are repacking very old imported data (e.g. older than the
1085last year), consider expending some extra CPU time and supplying
1086\--window=50 (or higher) when you run 'git-repack'.
1087This will take longer, but will also produce a smaller packfile.
1088You only need to expend the effort once, and everyone using your
1089project will benefit from the smaller repository.
1090
1091Include Some Progress Messages
1092~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1093Every once in a while have your frontend emit a `progress` message
1094to fast-import.  The contents of the messages are entirely free-form,
1095so one suggestion would be to output the current month and year
1096each time the current commit date moves into the next month.
1097Your users will feel better knowing how much of the data stream
1098has been processed.
1099
1100
1101Packfile Optimization
1102---------------------
1103When packing a blob fast-import always attempts to deltify against the last
1104blob written.  Unless specifically arranged for by the frontend,
1105this will probably not be a prior version of the same file, so the
1106generated delta will not be the smallest possible.  The resulting
1107packfile will be compressed, but will not be optimal.
1108
1109Frontends which have efficient access to all revisions of a
1110single file (for example reading an RCS/CVS ,v file) can choose
1111to supply all revisions of that file as a sequence of consecutive
1112`blob` commands.  This allows fast-import to deltify the different file
1113revisions against each other, saving space in the final packfile.
1114Marks can be used to later identify individual file revisions during
1115a sequence of `commit` commands.
1116
1117The packfile(s) created by fast-import do not encourage good disk access
1118patterns.  This is caused by fast-import writing the data in the order
1119it is received on standard input, while Git typically organizes
1120data within packfiles to make the most recent (current tip) data
1121appear before historical data.  Git also clusters commits together,
1122speeding up revision traversal through better cache locality.
1123
1124For this reason it is strongly recommended that users repack the
1125repository with `git repack -a -d` after fast-import completes, allowing
1126Git to reorganize the packfiles for faster data access.  If blob
1127deltas are suboptimal (see above) then also adding the `-f` option
1128to force recomputation of all deltas can significantly reduce the
1129final packfile size (30-50% smaller can be quite typical).
1130
1131
1132Memory Utilization
1133------------------
1134There are a number of factors which affect how much memory fast-import
1135requires to perform an import.  Like critical sections of core
1136Git, fast-import uses its own memory allocators to amortize any overheads
1137associated with malloc.  In practice fast-import tends to amortize any
1138malloc overheads to 0, due to its use of large block allocations.
1139
1140per object
1141~~~~~~~~~~
1142fast-import maintains an in-memory structure for every object written in
1143this execution.  On a 32 bit system the structure is 32 bytes,
1144on a 64 bit system the structure is 40 bytes (due to the larger
1145pointer sizes).  Objects in the table are not deallocated until
1146fast-import terminates.  Importing 2 million objects on a 32 bit system
1147will require approximately 64 MiB of memory.
1148
1149The object table is actually a hashtable keyed on the object name
1150(the unique SHA-1).  This storage configuration allows fast-import to reuse
1151an existing or already written object and avoid writing duplicates
1152to the output packfile.  Duplicate blobs are surprisingly common
1153in an import, typically due to branch merges in the source.
1154
1155per mark
1156~~~~~~~~
1157Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
1158bytes, depending on pointer size) per mark.  Although the array
1159is sparse, frontends are still strongly encouraged to use marks
1160between 1 and n, where n is the total number of marks required for
1161this import.
1162
1163per branch
1164~~~~~~~~~~
1165Branches are classified as active and inactive.  The memory usage
1166of the two classes is significantly different.
1167
1168Inactive branches are stored in a structure which uses 96 or 120
1169bytes (32 bit or 64 bit systems, respectively), plus the length of
1170the branch name (typically under 200 bytes), per branch.  fast-import will
1171easily handle as many as 10,000 inactive branches in under 2 MiB
1172of memory.
1173
1174Active branches have the same overhead as inactive branches, but
1175also contain copies of every tree that has been recently modified on
1176that branch.  If subtree `include` has not been modified since the
1177branch became active, its contents will not be loaded into memory,
1178but if subtree `src` has been modified by a commit since the branch
1179became active, then its contents will be loaded in memory.
1180
1181As active branches store metadata about the files contained on that
1182branch, their in-memory storage size can grow to a considerable size
1183(see below).
1184
1185fast-import automatically moves active branches to inactive status based on
1186a simple least-recently-used algorithm.  The LRU chain is updated on
1187each `commit` command.  The maximum number of active branches can be
1188increased or decreased on the command line with \--active-branches=.
1189
1190per active tree
1191~~~~~~~~~~~~~~~
1192Trees (aka directories) use just 12 bytes of memory on top of the
1193memory required for their entries (see ``per active file'' below).
1194The cost of a tree is virtually 0, as its overhead amortizes out
1195over the individual file entries.
1196
1197per active file entry
1198~~~~~~~~~~~~~~~~~~~~~
1199Files (and pointers to subtrees) within active trees require 52 or 64
1200bytes (32/64 bit platforms) per entry.  To conserve space, file and
1201tree names are pooled in a common string table, allowing the filename
1202``Makefile'' to use just 16 bytes (after including the string header
1203overhead) no matter how many times it occurs within the project.
1204
1205The active branch LRU, when coupled with the filename string pool
1206and lazy loading of subtrees, allows fast-import to efficiently import
1207projects with 2,000+ branches and 45,114+ files in a very limited
1208memory footprint (less than 2.7 MiB per active branch).
1209
1210
1211Author
1212------
1213Written by Shawn O. Pearce <spearce@spearce.org>.
1214
1215Documentation
1216--------------
1217Documentation by Shawn O. Pearce <spearce@spearce.org>.
1218
1219GIT
1220---
1221Part of the linkgit:git[1] suite