Documentation / gitattributes.txton commit convert: add 'working-tree-encoding' attribute (107642f)
   1gitattributes(5)
   2================
   3
   4NAME
   5----
   6gitattributes - defining attributes per path
   7
   8SYNOPSIS
   9--------
  10$GIT_DIR/info/attributes, .gitattributes
  11
  12
  13DESCRIPTION
  14-----------
  15
  16A `gitattributes` file is a simple text file that gives
  17`attributes` to pathnames.
  18
  19Each line in `gitattributes` file is of form:
  20
  21        pattern attr1 attr2 ...
  22
  23That is, a pattern followed by an attributes list,
  24separated by whitespaces. Leading and trailing whitespaces are
  25ignored. Lines that begin with '#' are ignored. Patterns
  26that begin with a double quote are quoted in C style.
  27When the pattern matches the path in question, the attributes
  28listed on the line are given to the path.
  29
  30Each attribute can be in one of these states for a given path:
  31
  32Set::
  33
  34        The path has the attribute with special value "true";
  35        this is specified by listing only the name of the
  36        attribute in the attribute list.
  37
  38Unset::
  39
  40        The path has the attribute with special value "false";
  41        this is specified by listing the name of the attribute
  42        prefixed with a dash `-` in the attribute list.
  43
  44Set to a value::
  45
  46        The path has the attribute with specified string value;
  47        this is specified by listing the name of the attribute
  48        followed by an equal sign `=` and its value in the
  49        attribute list.
  50
  51Unspecified::
  52
  53        No pattern matches the path, and nothing says if
  54        the path has or does not have the attribute, the
  55        attribute for the path is said to be Unspecified.
  56
  57When more than one pattern matches the path, a later line
  58overrides an earlier line.  This overriding is done per
  59attribute.  The rules how the pattern matches paths are the
  60same as in `.gitignore` files; see linkgit:gitignore[5].
  61Unlike `.gitignore`, negative patterns are forbidden.
  62
  63When deciding what attributes are assigned to a path, Git
  64consults `$GIT_DIR/info/attributes` file (which has the highest
  65precedence), `.gitattributes` file in the same directory as the
  66path in question, and its parent directories up to the toplevel of the
  67work tree (the further the directory that contains `.gitattributes`
  68is from the path in question, the lower its precedence). Finally
  69global and system-wide files are considered (they have the lowest
  70precedence).
  71
  72When the `.gitattributes` file is missing from the work tree, the
  73path in the index is used as a fall-back.  During checkout process,
  74`.gitattributes` in the index is used and then the file in the
  75working tree is used as a fall-back.
  76
  77If you wish to affect only a single repository (i.e., to assign
  78attributes to files that are particular to
  79one user's workflow for that repository), then
  80attributes should be placed in the `$GIT_DIR/info/attributes` file.
  81Attributes which should be version-controlled and distributed to other
  82repositories (i.e., attributes of interest to all users) should go into
  83`.gitattributes` files. Attributes that should affect all repositories
  84for a single user should be placed in a file specified by the
  85`core.attributesFile` configuration option (see linkgit:git-config[1]).
  86Its default value is $XDG_CONFIG_HOME/git/attributes. If $XDG_CONFIG_HOME
  87is either not set or empty, $HOME/.config/git/attributes is used instead.
  88Attributes for all users on a system should be placed in the
  89`$(prefix)/etc/gitattributes` file.
  90
  91Sometimes you would need to override a setting of an attribute
  92for a path to `Unspecified` state.  This can be done by listing
  93the name of the attribute prefixed with an exclamation point `!`.
  94
  95
  96EFFECTS
  97-------
  98
  99Certain operations by Git can be influenced by assigning
 100particular attributes to a path.  Currently, the following
 101operations are attributes-aware.
 102
 103Checking-out and checking-in
 104~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 105
 106These attributes affect how the contents stored in the
 107repository are copied to the working tree files when commands
 108such as 'git checkout' and 'git merge' run.  They also affect how
 109Git stores the contents you prepare in the working tree in the
 110repository upon 'git add' and 'git commit'.
 111
 112`text`
 113^^^^^^
 114
 115This attribute enables and controls end-of-line normalization.  When a
 116text file is normalized, its line endings are converted to LF in the
 117repository.  To control what line ending style is used in the working
 118directory, use the `eol` attribute for a single file and the
 119`core.eol` configuration variable for all text files.
 120Note that `core.autocrlf` overrides `core.eol`
 121
 122Set::
 123
 124        Setting the `text` attribute on a path enables end-of-line
 125        normalization and marks the path as a text file.  End-of-line
 126        conversion takes place without guessing the content type.
 127
 128Unset::
 129
 130        Unsetting the `text` attribute on a path tells Git not to
 131        attempt any end-of-line conversion upon checkin or checkout.
 132
 133Set to string value "auto"::
 134
 135        When `text` is set to "auto", the path is marked for automatic
 136        end-of-line conversion.  If Git decides that the content is
 137        text, its line endings are converted to LF on checkin.
 138        When the file has been committed with CRLF, no conversion is done.
 139
 140Unspecified::
 141
 142        If the `text` attribute is unspecified, Git uses the
 143        `core.autocrlf` configuration variable to determine if the
 144        file should be converted.
 145
 146Any other value causes Git to act as if `text` has been left
 147unspecified.
 148
 149`eol`
 150^^^^^
 151
 152This attribute sets a specific line-ending style to be used in the
 153working directory.  It enables end-of-line conversion without any
 154content checks, effectively setting the `text` attribute.  Note that
 155setting this attribute on paths which are in the index with CRLF line
 156endings may make the paths to be considered dirty.  Adding the path to
 157the index again will normalize the line endings in the index.
 158
 159Set to string value "crlf"::
 160
 161        This setting forces Git to normalize line endings for this
 162        file on checkin and convert them to CRLF when the file is
 163        checked out.
 164
 165Set to string value "lf"::
 166
 167        This setting forces Git to normalize line endings to LF on
 168        checkin and prevents conversion to CRLF when the file is
 169        checked out.
 170
 171Backwards compatibility with `crlf` attribute
 172^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 173
 174For backwards compatibility, the `crlf` attribute is interpreted as
 175follows:
 176
 177------------------------
 178crlf            text
 179-crlf           -text
 180crlf=input      eol=lf
 181------------------------
 182
 183End-of-line conversion
 184^^^^^^^^^^^^^^^^^^^^^^
 185
 186While Git normally leaves file contents alone, it can be configured to
 187normalize line endings to LF in the repository and, optionally, to
 188convert them to CRLF when files are checked out.
 189
 190If you simply want to have CRLF line endings in your working directory
 191regardless of the repository you are working with, you can set the
 192config variable "core.autocrlf" without using any attributes.
 193
 194------------------------
 195[core]
 196        autocrlf = true
 197------------------------
 198
 199This does not force normalization of text files, but does ensure
 200that text files that you introduce to the repository have their line
 201endings normalized to LF when they are added, and that files that are
 202already normalized in the repository stay normalized.
 203
 204If you want to ensure that text files that any contributor introduces to
 205the repository have their line endings normalized, you can set the
 206`text` attribute to "auto" for _all_ files.
 207
 208------------------------
 209*       text=auto
 210------------------------
 211
 212The attributes allow a fine-grained control, how the line endings
 213are converted.
 214Here is an example that will make Git normalize .txt, .vcproj and .sh
 215files, ensure that .vcproj files have CRLF and .sh files have LF in
 216the working directory, and prevent .jpg files from being normalized
 217regardless of their content.
 218
 219------------------------
 220*               text=auto
 221*.txt           text
 222*.vcproj        text eol=crlf
 223*.sh            text eol=lf
 224*.jpg           -text
 225------------------------
 226
 227NOTE: When `text=auto` conversion is enabled in a cross-platform
 228project using push and pull to a central repository the text files
 229containing CRLFs should be normalized.
 230
 231From a clean working directory:
 232
 233-------------------------------------------------
 234$ echo "* text=auto" >.gitattributes
 235$ git add --renormalize .
 236$ git status        # Show files that will be normalized
 237$ git commit -m "Introduce end-of-line normalization"
 238-------------------------------------------------
 239
 240If any files that should not be normalized show up in 'git status',
 241unset their `text` attribute before running 'git add -u'.
 242
 243------------------------
 244manual.pdf      -text
 245------------------------
 246
 247Conversely, text files that Git does not detect can have normalization
 248enabled manually.
 249
 250------------------------
 251weirdchars.txt  text
 252------------------------
 253
 254If `core.safecrlf` is set to "true" or "warn", Git verifies if
 255the conversion is reversible for the current setting of
 256`core.autocrlf`.  For "true", Git rejects irreversible
 257conversions; for "warn", Git only prints a warning but accepts
 258an irreversible conversion.  The safety triggers to prevent such
 259a conversion done to the files in the work tree, but there are a
 260few exceptions.  Even though...
 261
 262- 'git add' itself does not touch the files in the work tree, the
 263  next checkout would, so the safety triggers;
 264
 265- 'git apply' to update a text file with a patch does touch the files
 266  in the work tree, but the operation is about text files and CRLF
 267  conversion is about fixing the line ending inconsistencies, so the
 268  safety does not trigger;
 269
 270- 'git diff' itself does not touch the files in the work tree, it is
 271  often run to inspect the changes you intend to next 'git add'.  To
 272  catch potential problems early, safety triggers.
 273
 274
 275`working-tree-encoding`
 276^^^^^^^^^^^^^^^^^^^^^^^
 277
 278Git recognizes files encoded in ASCII or one of its supersets (e.g.
 279UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
 280encodings (e.g. UTF-16) are interpreted as binary and consequently
 281built-in Git text processing tools (e.g. 'git diff') as well as most Git
 282web front ends do not visualize the contents of these files by default.
 283
 284In these cases you can tell Git the encoding of a file in the working
 285directory with the `working-tree-encoding` attribute. If a file with this
 286attribute is added to Git, then Git reencodes the content from the
 287specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
 288content in its internal data structure (called "the index"). On checkout
 289the content is reencoded back to the specified encoding.
 290
 291Please note that using the `working-tree-encoding` attribute may have a
 292number of pitfalls:
 293
 294- Alternative Git implementations (e.g. JGit or libgit2) and older Git
 295  versions (as of March 2018) do not support the `working-tree-encoding`
 296  attribute. If you decide to use the `working-tree-encoding` attribute
 297  in your repository, then it is strongly recommended to ensure that all
 298  clients working with the repository support it.
 299
 300  For example, Microsoft Visual Studio resources files (`*.rc`) or
 301  PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
 302  If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
 303  a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
 304  stored as UTF-8 internally. A client without `working-tree-encoding`
 305  support will checkout `foo.ps1` as UTF-8 encoded file. This will
 306  typically cause trouble for the users of this file.
 307
 308  If a Git client, that does not support the `working-tree-encoding`
 309  attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
 310  stored "as-is" internally (in this example probably as UTF-16).
 311  A client with `working-tree-encoding` support will interpret the
 312  internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
 313  That operation will fail and cause an error.
 314
 315- Reencoding content requires resources that might slow down certain
 316  Git operations (e.g 'git checkout' or 'git add').
 317
 318Use the `working-tree-encoding` attribute only if you cannot store a file
 319in UTF-8 encoding and if you want Git to be able to process the content
 320as text.
 321
 322As an example, use the following attributes if your '*.ps1' files are
 323UTF-16 encoded with byte order mark (BOM) and you want Git to perform
 324automatic line ending conversion based on your platform.
 325
 326------------------------
 327*.ps1           text working-tree-encoding=UTF-16
 328------------------------
 329
 330Use the following attributes if your '*.ps1' files are UTF-16 little
 331endian encoded without BOM and you want Git to use Windows line endings
 332in the working directory. Please note, it is highly recommended to
 333explicitly define the line endings with `eol` if the `working-tree-encoding`
 334attribute is used to avoid ambiguity.
 335
 336------------------------
 337*.ps1           text working-tree-encoding=UTF-16LE eol=CRLF
 338------------------------
 339
 340You can get a list of all available encodings on your platform with the
 341following command:
 342
 343------------------------
 344iconv --list
 345------------------------
 346
 347If you do not know the encoding of a file, then you can use the `file`
 348command to guess the encoding:
 349
 350------------------------
 351file foo.ps1
 352------------------------
 353
 354
 355`ident`
 356^^^^^^^
 357
 358When the attribute `ident` is set for a path, Git replaces
 359`$Id$` in the blob object with `$Id:`, followed by the
 36040-character hexadecimal blob object name, followed by a dollar
 361sign `$` upon checkout.  Any byte sequence that begins with
 362`$Id:` and ends with `$` in the worktree file is replaced
 363with `$Id$` upon check-in.
 364
 365
 366`filter`
 367^^^^^^^^
 368
 369A `filter` attribute can be set to a string value that names a
 370filter driver specified in the configuration.
 371
 372A filter driver consists of a `clean` command and a `smudge`
 373command, either of which can be left unspecified.  Upon
 374checkout, when the `smudge` command is specified, the command is
 375fed the blob object from its standard input, and its standard
 376output is used to update the worktree file.  Similarly, the
 377`clean` command is used to convert the contents of worktree file
 378upon checkin. By default these commands process only a single
 379blob and terminate. If a long running `process` filter is used
 380in place of `clean` and/or `smudge` filters, then Git can process
 381all blobs with a single filter command invocation for the entire
 382life of a single Git command, for example `git add --all`. If a
 383long running `process` filter is configured then it always takes
 384precedence over a configured single blob filter. See section
 385below for the description of the protocol used to communicate with
 386a `process` filter.
 387
 388One use of the content filtering is to massage the content into a shape
 389that is more convenient for the platform, filesystem, and the user to use.
 390For this mode of operation, the key phrase here is "more convenient" and
 391not "turning something unusable into usable".  In other words, the intent
 392is that if someone unsets the filter driver definition, or does not have
 393the appropriate filter program, the project should still be usable.
 394
 395Another use of the content filtering is to store the content that cannot
 396be directly used in the repository (e.g. a UUID that refers to the true
 397content stored outside Git, or an encrypted content) and turn it into a
 398usable form upon checkout (e.g. download the external content, or decrypt
 399the encrypted content).
 400
 401These two filters behave differently, and by default, a filter is taken as
 402the former, massaging the contents into more convenient shape.  A missing
 403filter driver definition in the config, or a filter driver that exits with
 404a non-zero status, is not an error but makes the filter a no-op passthru.
 405
 406You can declare that a filter turns a content that by itself is unusable
 407into a usable content by setting the filter.<driver>.required configuration
 408variable to `true`.
 409
 410Note: Whenever the clean filter is changed, the repo should be renormalized:
 411$ git add --renormalize .
 412
 413For example, in .gitattributes, you would assign the `filter`
 414attribute for paths.
 415
 416------------------------
 417*.c     filter=indent
 418------------------------
 419
 420Then you would define a "filter.indent.clean" and "filter.indent.smudge"
 421configuration in your .git/config to specify a pair of commands to
 422modify the contents of C programs when the source files are checked
 423in ("clean" is run) and checked out (no change is made because the
 424command is "cat").
 425
 426------------------------
 427[filter "indent"]
 428        clean = indent
 429        smudge = cat
 430------------------------
 431
 432For best results, `clean` should not alter its output further if it is
 433run twice ("clean->clean" should be equivalent to "clean"), and
 434multiple `smudge` commands should not alter `clean`'s output
 435("smudge->smudge->clean" should be equivalent to "clean").  See the
 436section on merging below.
 437
 438The "indent" filter is well-behaved in this regard: it will not modify
 439input that is already correctly indented.  In this case, the lack of a
 440smudge filter means that the clean filter _must_ accept its own output
 441without modifying it.
 442
 443If a filter _must_ succeed in order to make the stored contents usable,
 444you can declare that the filter is `required`, in the configuration:
 445
 446------------------------
 447[filter "crypt"]
 448        clean = openssl enc ...
 449        smudge = openssl enc -d ...
 450        required
 451------------------------
 452
 453Sequence "%f" on the filter command line is replaced with the name of
 454the file the filter is working on.  A filter might use this in keyword
 455substitution.  For example:
 456
 457------------------------
 458[filter "p4"]
 459        clean = git-p4-filter --clean %f
 460        smudge = git-p4-filter --smudge %f
 461------------------------
 462
 463Note that "%f" is the name of the path that is being worked on. Depending
 464on the version that is being filtered, the corresponding file on disk may
 465not exist, or may have different contents. So, smudge and clean commands
 466should not try to access the file on disk, but only act as filters on the
 467content provided to them on standard input.
 468
 469Long Running Filter Process
 470^^^^^^^^^^^^^^^^^^^^^^^^^^^
 471
 472If the filter command (a string value) is defined via
 473`filter.<driver>.process` then Git can process all blobs with a
 474single filter invocation for the entire life of a single Git
 475command. This is achieved by using a packet format (pkt-line,
 476see technical/protocol-common.txt) based protocol over standard
 477input and standard output as follows. All packets, except for the
 478"*CONTENT" packets and the "0000" flush packet, are considered
 479text and therefore are terminated by a LF.
 480
 481Git starts the filter when it encounters the first file
 482that needs to be cleaned or smudged. After the filter started
 483Git sends a welcome message ("git-filter-client"), a list of supported
 484protocol version numbers, and a flush packet. Git expects to read a welcome
 485response message ("git-filter-server"), exactly one protocol version number
 486from the previously sent list, and a flush packet. All further
 487communication will be based on the selected version. The remaining
 488protocol description below documents "version=2". Please note that
 489"version=42" in the example below does not exist and is only there
 490to illustrate how the protocol would look like with more than one
 491version.
 492
 493After the version negotiation Git sends a list of all capabilities that
 494it supports and a flush packet. Git expects to read a list of desired
 495capabilities, which must be a subset of the supported capabilities list,
 496and a flush packet as response:
 497------------------------
 498packet:          git> git-filter-client
 499packet:          git> version=2
 500packet:          git> version=42
 501packet:          git> 0000
 502packet:          git< git-filter-server
 503packet:          git< version=2
 504packet:          git< 0000
 505packet:          git> capability=clean
 506packet:          git> capability=smudge
 507packet:          git> capability=not-yet-invented
 508packet:          git> 0000
 509packet:          git< capability=clean
 510packet:          git< capability=smudge
 511packet:          git< 0000
 512------------------------
 513Supported filter capabilities in version 2 are "clean", "smudge",
 514and "delay".
 515
 516Afterwards Git sends a list of "key=value" pairs terminated with
 517a flush packet. The list will contain at least the filter command
 518(based on the supported capabilities) and the pathname of the file
 519to filter relative to the repository root. Right after the flush packet
 520Git sends the content split in zero or more pkt-line packets and a
 521flush packet to terminate content. Please note, that the filter
 522must not send any response before it received the content and the
 523final flush packet. Also note that the "value" of a "key=value" pair
 524can contain the "=" character whereas the key would never contain
 525that character.
 526------------------------
 527packet:          git> command=smudge
 528packet:          git> pathname=path/testfile.dat
 529packet:          git> 0000
 530packet:          git> CONTENT
 531packet:          git> 0000
 532------------------------
 533
 534The filter is expected to respond with a list of "key=value" pairs
 535terminated with a flush packet. If the filter does not experience
 536problems then the list must contain a "success" status. Right after
 537these packets the filter is expected to send the content in zero
 538or more pkt-line packets and a flush packet at the end. Finally, a
 539second list of "key=value" pairs terminated with a flush packet
 540is expected. The filter can change the status in the second list
 541or keep the status as is with an empty list. Please note that the
 542empty list must be terminated with a flush packet regardless.
 543
 544------------------------
 545packet:          git< status=success
 546packet:          git< 0000
 547packet:          git< SMUDGED_CONTENT
 548packet:          git< 0000
 549packet:          git< 0000  # empty list, keep "status=success" unchanged!
 550------------------------
 551
 552If the result content is empty then the filter is expected to respond
 553with a "success" status and a flush packet to signal the empty content.
 554------------------------
 555packet:          git< status=success
 556packet:          git< 0000
 557packet:          git< 0000  # empty content!
 558packet:          git< 0000  # empty list, keep "status=success" unchanged!
 559------------------------
 560
 561In case the filter cannot or does not want to process the content,
 562it is expected to respond with an "error" status.
 563------------------------
 564packet:          git< status=error
 565packet:          git< 0000
 566------------------------
 567
 568If the filter experiences an error during processing, then it can
 569send the status "error" after the content was (partially or
 570completely) sent.
 571------------------------
 572packet:          git< status=success
 573packet:          git< 0000
 574packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
 575packet:          git< 0000
 576packet:          git< status=error
 577packet:          git< 0000
 578------------------------
 579
 580In case the filter cannot or does not want to process the content
 581as well as any future content for the lifetime of the Git process,
 582then it is expected to respond with an "abort" status at any point
 583in the protocol.
 584------------------------
 585packet:          git< status=abort
 586packet:          git< 0000
 587------------------------
 588
 589Git neither stops nor restarts the filter process in case the
 590"error"/"abort" status is set. However, Git sets its exit code
 591according to the `filter.<driver>.required` flag, mimicking the
 592behavior of the `filter.<driver>.clean` / `filter.<driver>.smudge`
 593mechanism.
 594
 595If the filter dies during the communication or does not adhere to
 596the protocol then Git will stop the filter process and restart it
 597with the next file that needs to be processed. Depending on the
 598`filter.<driver>.required` flag Git will interpret that as error.
 599
 600After the filter has processed a command it is expected to wait for
 601a "key=value" list containing the next command. Git will close
 602the command pipe on exit. The filter is expected to detect EOF
 603and exit gracefully on its own. Git will wait until the filter
 604process has stopped.
 605
 606Delay
 607^^^^^
 608
 609If the filter supports the "delay" capability, then Git can send the
 610flag "can-delay" after the filter command and pathname. This flag
 611denotes that the filter can delay filtering the current blob (e.g. to
 612compensate network latencies) by responding with no content but with
 613the status "delayed" and a flush packet.
 614------------------------
 615packet:          git> command=smudge
 616packet:          git> pathname=path/testfile.dat
 617packet:          git> can-delay=1
 618packet:          git> 0000
 619packet:          git> CONTENT
 620packet:          git> 0000
 621packet:          git< status=delayed
 622packet:          git< 0000
 623------------------------
 624
 625If the filter supports the "delay" capability then it must support the
 626"list_available_blobs" command. If Git sends this command, then the
 627filter is expected to return a list of pathnames representing blobs
 628that have been delayed earlier and are now available.
 629The list must be terminated with a flush packet followed
 630by a "success" status that is also terminated with a flush packet. If
 631no blobs for the delayed paths are available, yet, then the filter is
 632expected to block the response until at least one blob becomes
 633available. The filter can tell Git that it has no more delayed blobs
 634by sending an empty list. As soon as the filter responds with an empty
 635list, Git stops asking. All blobs that Git has not received at this
 636point are considered missing and will result in an error.
 637
 638------------------------
 639packet:          git> command=list_available_blobs
 640packet:          git> 0000
 641packet:          git< pathname=path/testfile.dat
 642packet:          git< pathname=path/otherfile.dat
 643packet:          git< 0000
 644packet:          git< status=success
 645packet:          git< 0000
 646------------------------
 647
 648After Git received the pathnames, it will request the corresponding
 649blobs again. These requests contain a pathname and an empty content
 650section. The filter is expected to respond with the smudged content
 651in the usual way as explained above.
 652------------------------
 653packet:          git> command=smudge
 654packet:          git> pathname=path/testfile.dat
 655packet:          git> 0000
 656packet:          git> 0000  # empty content!
 657packet:          git< status=success
 658packet:          git< 0000
 659packet:          git< SMUDGED_CONTENT
 660packet:          git< 0000
 661packet:          git< 0000  # empty list, keep "status=success" unchanged!
 662------------------------
 663
 664Example
 665^^^^^^^
 666
 667A long running filter demo implementation can be found in
 668`contrib/long-running-filter/example.pl` located in the Git
 669core repository. If you develop your own long running filter
 670process then the `GIT_TRACE_PACKET` environment variables can be
 671very helpful for debugging (see linkgit:git[1]).
 672
 673Please note that you cannot use an existing `filter.<driver>.clean`
 674or `filter.<driver>.smudge` command with `filter.<driver>.process`
 675because the former two use a different inter process communication
 676protocol than the latter one.
 677
 678
 679Interaction between checkin/checkout attributes
 680^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 681
 682In the check-in codepath, the worktree file is first converted
 683with `filter` driver (if specified and corresponding driver
 684defined), then the result is processed with `ident` (if
 685specified), and then finally with `text` (again, if specified
 686and applicable).
 687
 688In the check-out codepath, the blob content is first converted
 689with `text`, and then `ident` and fed to `filter`.
 690
 691
 692Merging branches with differing checkin/checkout attributes
 693^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 694
 695If you have added attributes to a file that cause the canonical
 696repository format for that file to change, such as adding a
 697clean/smudge filter or text/eol/ident attributes, merging anything
 698where the attribute is not in place would normally cause merge
 699conflicts.
 700
 701To prevent these unnecessary merge conflicts, Git can be told to run a
 702virtual check-out and check-in of all three stages of a file when
 703resolving a three-way merge by setting the `merge.renormalize`
 704configuration variable.  This prevents changes caused by check-in
 705conversion from causing spurious merge conflicts when a converted file
 706is merged with an unconverted file.
 707
 708As long as a "smudge->clean" results in the same output as a "clean"
 709even on files that are already smudged, this strategy will
 710automatically resolve all filter-related conflicts.  Filters that do
 711not act in this way may cause additional merge conflicts that must be
 712resolved manually.
 713
 714
 715Generating diff text
 716~~~~~~~~~~~~~~~~~~~~
 717
 718`diff`
 719^^^^^^
 720
 721The attribute `diff` affects how Git generates diffs for particular
 722files. It can tell Git whether to generate a textual patch for the path
 723or to treat the path as a binary file.  It can also affect what line is
 724shown on the hunk header `@@ -k,l +n,m @@` line, tell Git to use an
 725external command to generate the diff, or ask Git to convert binary
 726files to a text format before generating the diff.
 727
 728Set::
 729
 730        A path to which the `diff` attribute is set is treated
 731        as text, even when they contain byte values that
 732        normally never appear in text files, such as NUL.
 733
 734Unset::
 735
 736        A path to which the `diff` attribute is unset will
 737        generate `Binary files differ` (or a binary patch, if
 738        binary patches are enabled).
 739
 740Unspecified::
 741
 742        A path to which the `diff` attribute is unspecified
 743        first gets its contents inspected, and if it looks like
 744        text and is smaller than core.bigFileThreshold, it is treated
 745        as text. Otherwise it would generate `Binary files differ`.
 746
 747String::
 748
 749        Diff is shown using the specified diff driver.  Each driver may
 750        specify one or more options, as described in the following
 751        section. The options for the diff driver "foo" are defined
 752        by the configuration variables in the "diff.foo" section of the
 753        Git config file.
 754
 755
 756Defining an external diff driver
 757^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 758
 759The definition of a diff driver is done in `gitconfig`, not
 760`gitattributes` file, so strictly speaking this manual page is a
 761wrong place to talk about it.  However...
 762
 763To define an external diff driver `jcdiff`, add a section to your
 764`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 765
 766----------------------------------------------------------------
 767[diff "jcdiff"]
 768        command = j-c-diff
 769----------------------------------------------------------------
 770
 771When Git needs to show you a diff for the path with `diff`
 772attribute set to `jcdiff`, it calls the command you specified
 773with the above configuration, i.e. `j-c-diff`, with 7
 774parameters, just like `GIT_EXTERNAL_DIFF` program is called.
 775See linkgit:git[1] for details.
 776
 777
 778Defining a custom hunk-header
 779^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 780
 781Each group of changes (called a "hunk") in the textual diff output
 782is prefixed with a line of the form:
 783
 784        @@ -k,l +n,m @@ TEXT
 785
 786This is called a 'hunk header'.  The "TEXT" portion is by default a line
 787that begins with an alphabet, an underscore or a dollar sign; this
 788matches what GNU 'diff -p' output uses.  This default selection however
 789is not suited for some contents, and you can use a customized pattern
 790to make a selection.
 791
 792First, in .gitattributes, you would assign the `diff` attribute
 793for paths.
 794
 795------------------------
 796*.tex   diff=tex
 797------------------------
 798
 799Then, you would define a "diff.tex.xfuncname" configuration to
 800specify a regular expression that matches a line that you would
 801want to appear as the hunk header "TEXT". Add a section to your
 802`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 803
 804------------------------
 805[diff "tex"]
 806        xfuncname = "^(\\\\(sub)*section\\{.*)$"
 807------------------------
 808
 809Note.  A single level of backslashes are eaten by the
 810configuration file parser, so you would need to double the
 811backslashes; the pattern above picks a line that begins with a
 812backslash, and zero or more occurrences of `sub` followed by
 813`section` followed by open brace, to the end of line.
 814
 815There are a few built-in patterns to make this easier, and `tex`
 816is one of them, so you do not have to write the above in your
 817configuration file (you still need to enable this with the
 818attribute mechanism, via `.gitattributes`).  The following built in
 819patterns are available:
 820
 821- `ada` suitable for source code in the Ada language.
 822
 823- `bibtex` suitable for files with BibTeX coded references.
 824
 825- `cpp` suitable for source code in the C and C++ languages.
 826
 827- `csharp` suitable for source code in the C# language.
 828
 829- `css` suitable for cascading style sheets.
 830
 831- `fortran` suitable for source code in the Fortran language.
 832
 833- `fountain` suitable for Fountain documents.
 834
 835- `html` suitable for HTML/XHTML documents.
 836
 837- `java` suitable for source code in the Java language.
 838
 839- `matlab` suitable for source code in the MATLAB language.
 840
 841- `objc` suitable for source code in the Objective-C language.
 842
 843- `pascal` suitable for source code in the Pascal/Delphi language.
 844
 845- `perl` suitable for source code in the Perl language.
 846
 847- `php` suitable for source code in the PHP language.
 848
 849- `python` suitable for source code in the Python language.
 850
 851- `ruby` suitable for source code in the Ruby language.
 852
 853- `tex` suitable for source code for LaTeX documents.
 854
 855
 856Customizing word diff
 857^^^^^^^^^^^^^^^^^^^^^
 858
 859You can customize the rules that `git diff --word-diff` uses to
 860split words in a line, by specifying an appropriate regular expression
 861in the "diff.*.wordRegex" configuration variable.  For example, in TeX
 862a backslash followed by a sequence of letters forms a command, but
 863several such commands can be run together without intervening
 864whitespace.  To separate them, use a regular expression in your
 865`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 866
 867------------------------
 868[diff "tex"]
 869        wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
 870------------------------
 871
 872A built-in pattern is provided for all languages listed in the
 873previous section.
 874
 875
 876Performing text diffs of binary files
 877^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 878
 879Sometimes it is desirable to see the diff of a text-converted
 880version of some binary files. For example, a word processor
 881document can be converted to an ASCII text representation, and
 882the diff of the text shown. Even though this conversion loses
 883some information, the resulting diff is useful for human
 884viewing (but cannot be applied directly).
 885
 886The `textconv` config option is used to define a program for
 887performing such a conversion. The program should take a single
 888argument, the name of a file to convert, and produce the
 889resulting text on stdout.
 890
 891For example, to show the diff of the exif information of a
 892file instead of the binary information (assuming you have the
 893exif tool installed), add the following section to your
 894`$GIT_DIR/config` file (or `$HOME/.gitconfig` file):
 895
 896------------------------
 897[diff "jpg"]
 898        textconv = exif
 899------------------------
 900
 901NOTE: The text conversion is generally a one-way conversion;
 902in this example, we lose the actual image contents and focus
 903just on the text data. This means that diffs generated by
 904textconv are _not_ suitable for applying. For this reason,
 905only `git diff` and the `git log` family of commands (i.e.,
 906log, whatchanged, show) will perform text conversion. `git
 907format-patch` will never generate this output. If you want to
 908send somebody a text-converted diff of a binary file (e.g.,
 909because it quickly conveys the changes you have made), you
 910should generate it separately and send it as a comment _in
 911addition to_ the usual binary diff that you might send.
 912
 913Because text conversion can be slow, especially when doing a
 914large number of them with `git log -p`, Git provides a mechanism
 915to cache the output and use it in future diffs.  To enable
 916caching, set the "cachetextconv" variable in your diff driver's
 917config. For example:
 918
 919------------------------
 920[diff "jpg"]
 921        textconv = exif
 922        cachetextconv = true
 923------------------------
 924
 925This will cache the result of running "exif" on each blob
 926indefinitely. If you change the textconv config variable for a
 927diff driver, Git will automatically invalidate the cache entries
 928and re-run the textconv filter. If you want to invalidate the
 929cache manually (e.g., because your version of "exif" was updated
 930and now produces better output), you can remove the cache
 931manually with `git update-ref -d refs/notes/textconv/jpg` (where
 932"jpg" is the name of the diff driver, as in the example above).
 933
 934Choosing textconv versus external diff
 935^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 936
 937If you want to show differences between binary or specially-formatted
 938blobs in your repository, you can choose to use either an external diff
 939command, or to use textconv to convert them to a diff-able text format.
 940Which method you choose depends on your exact situation.
 941
 942The advantage of using an external diff command is flexibility. You are
 943not bound to find line-oriented changes, nor is it necessary for the
 944output to resemble unified diff. You are free to locate and report
 945changes in the most appropriate way for your data format.
 946
 947A textconv, by comparison, is much more limiting. You provide a
 948transformation of the data into a line-oriented text format, and Git
 949uses its regular diff tools to generate the output. There are several
 950advantages to choosing this method:
 951
 9521. Ease of use. It is often much simpler to write a binary to text
 953   transformation than it is to perform your own diff. In many cases,
 954   existing programs can be used as textconv filters (e.g., exif,
 955   odt2txt).
 956
 9572. Git diff features. By performing only the transformation step
 958   yourself, you can still utilize many of Git's diff features,
 959   including colorization, word-diff, and combined diffs for merges.
 960
 9613. Caching. Textconv caching can speed up repeated diffs, such as those
 962   you might trigger by running `git log -p`.
 963
 964
 965Marking files as binary
 966^^^^^^^^^^^^^^^^^^^^^^^
 967
 968Git usually guesses correctly whether a blob contains text or binary
 969data by examining the beginning of the contents. However, sometimes you
 970may want to override its decision, either because a blob contains binary
 971data later in the file, or because the content, while technically
 972composed of text characters, is opaque to a human reader. For example,
 973many postscript files contain only ASCII characters, but produce noisy
 974and meaningless diffs.
 975
 976The simplest way to mark a file as binary is to unset the diff
 977attribute in the `.gitattributes` file:
 978
 979------------------------
 980*.ps -diff
 981------------------------
 982
 983This will cause Git to generate `Binary files differ` (or a binary
 984patch, if binary patches are enabled) instead of a regular diff.
 985
 986However, one may also want to specify other diff driver attributes. For
 987example, you might want to use `textconv` to convert postscript files to
 988an ASCII representation for human viewing, but otherwise treat them as
 989binary files. You cannot specify both `-diff` and `diff=ps` attributes.
 990The solution is to use the `diff.*.binary` config option:
 991
 992------------------------
 993[diff "ps"]
 994  textconv = ps2ascii
 995  binary = true
 996------------------------
 997
 998Performing a three-way merge
 999~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1000
1001`merge`
1002^^^^^^^
1003
1004The attribute `merge` affects how three versions of a file are
1005merged when a file-level merge is necessary during `git merge`,
1006and other commands such as `git revert` and `git cherry-pick`.
1007
1008Set::
1009
1010        Built-in 3-way merge driver is used to merge the
1011        contents in a way similar to 'merge' command of `RCS`
1012        suite.  This is suitable for ordinary text files.
1013
1014Unset::
1015
1016        Take the version from the current branch as the
1017        tentative merge result, and declare that the merge has
1018        conflicts.  This is suitable for binary files that do
1019        not have a well-defined merge semantics.
1020
1021Unspecified::
1022
1023        By default, this uses the same built-in 3-way merge
1024        driver as is the case when the `merge` attribute is set.
1025        However, the `merge.default` configuration variable can name
1026        different merge driver to be used with paths for which the
1027        `merge` attribute is unspecified.
1028
1029String::
1030
1031        3-way merge is performed using the specified custom
1032        merge driver.  The built-in 3-way merge driver can be
1033        explicitly specified by asking for "text" driver; the
1034        built-in "take the current branch" driver can be
1035        requested with "binary".
1036
1037
1038Built-in merge drivers
1039^^^^^^^^^^^^^^^^^^^^^^
1040
1041There are a few built-in low-level merge drivers defined that
1042can be asked for via the `merge` attribute.
1043
1044text::
1045
1046        Usual 3-way file level merge for text files.  Conflicted
1047        regions are marked with conflict markers `<<<<<<<`,
1048        `=======` and `>>>>>>>`.  The version from your branch
1049        appears before the `=======` marker, and the version
1050        from the merged branch appears after the `=======`
1051        marker.
1052
1053binary::
1054
1055        Keep the version from your branch in the work tree, but
1056        leave the path in the conflicted state for the user to
1057        sort out.
1058
1059union::
1060
1061        Run 3-way file level merge for text files, but take
1062        lines from both versions, instead of leaving conflict
1063        markers.  This tends to leave the added lines in the
1064        resulting file in random order and the user should
1065        verify the result. Do not use this if you do not
1066        understand the implications.
1067
1068
1069Defining a custom merge driver
1070^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1071
1072The definition of a merge driver is done in the `.git/config`
1073file, not in the `gitattributes` file, so strictly speaking this
1074manual page is a wrong place to talk about it.  However...
1075
1076To define a custom merge driver `filfre`, add a section to your
1077`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
1078
1079----------------------------------------------------------------
1080[merge "filfre"]
1081        name = feel-free merge driver
1082        driver = filfre %O %A %B %L %P
1083        recursive = binary
1084----------------------------------------------------------------
1085
1086The `merge.*.name` variable gives the driver a human-readable
1087name.
1088
1089The `merge.*.driver` variable's value is used to construct a
1090command to run to merge ancestor's version (`%O`), current
1091version (`%A`) and the other branches' version (`%B`).  These
1092three tokens are replaced with the names of temporary files that
1093hold the contents of these versions when the command line is
1094built. Additionally, %L will be replaced with the conflict marker
1095size (see below).
1096
1097The merge driver is expected to leave the result of the merge in
1098the file named with `%A` by overwriting it, and exit with zero
1099status if it managed to merge them cleanly, or non-zero if there
1100were conflicts.
1101
1102The `merge.*.recursive` variable specifies what other merge
1103driver to use when the merge driver is called for an internal
1104merge between common ancestors, when there are more than one.
1105When left unspecified, the driver itself is used for both
1106internal merge and the final merge.
1107
1108The merge driver can learn the pathname in which the merged result
1109will be stored via placeholder `%P`.
1110
1111
1112`conflict-marker-size`
1113^^^^^^^^^^^^^^^^^^^^^^
1114
1115This attribute controls the length of conflict markers left in
1116the work tree file during a conflicted merge.  Only setting to
1117the value to a positive integer has any meaningful effect.
1118
1119For example, this line in `.gitattributes` can be used to tell the merge
1120machinery to leave much longer (instead of the usual 7-character-long)
1121conflict markers when merging the file `Documentation/git-merge.txt`
1122results in a conflict.
1123
1124------------------------
1125Documentation/git-merge.txt     conflict-marker-size=32
1126------------------------
1127
1128
1129Checking whitespace errors
1130~~~~~~~~~~~~~~~~~~~~~~~~~~
1131
1132`whitespace`
1133^^^^^^^^^^^^
1134
1135The `core.whitespace` configuration variable allows you to define what
1136'diff' and 'apply' should consider whitespace errors for all paths in
1137the project (See linkgit:git-config[1]).  This attribute gives you finer
1138control per path.
1139
1140Set::
1141
1142        Notice all types of potential whitespace errors known to Git.
1143        The tab width is taken from the value of the `core.whitespace`
1144        configuration variable.
1145
1146Unset::
1147
1148        Do not notice anything as error.
1149
1150Unspecified::
1151
1152        Use the value of the `core.whitespace` configuration variable to
1153        decide what to notice as error.
1154
1155String::
1156
1157        Specify a comma separate list of common whitespace problems to
1158        notice in the same format as the `core.whitespace` configuration
1159        variable.
1160
1161
1162Creating an archive
1163~~~~~~~~~~~~~~~~~~~
1164
1165`export-ignore`
1166^^^^^^^^^^^^^^^
1167
1168Files and directories with the attribute `export-ignore` won't be added to
1169archive files.
1170
1171`export-subst`
1172^^^^^^^^^^^^^^
1173
1174If the attribute `export-subst` is set for a file then Git will expand
1175several placeholders when adding this file to an archive.  The
1176expansion depends on the availability of a commit ID, i.e., if
1177linkgit:git-archive[1] has been given a tree instead of a commit or a
1178tag then no replacement will be done.  The placeholders are the same
1179as those for the option `--pretty=format:` of linkgit:git-log[1],
1180except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
1181in the file.  E.g. the string `$Format:%H$` will be replaced by the
1182commit hash.
1183
1184
1185Packing objects
1186~~~~~~~~~~~~~~~
1187
1188`delta`
1189^^^^^^^
1190
1191Delta compression will not be attempted for blobs for paths with the
1192attribute `delta` set to false.
1193
1194
1195Viewing files in GUI tools
1196~~~~~~~~~~~~~~~~~~~~~~~~~~
1197
1198`encoding`
1199^^^^^^^^^^
1200
1201The value of this attribute specifies the character encoding that should
1202be used by GUI tools (e.g. linkgit:gitk[1] and linkgit:git-gui[1]) to
1203display the contents of the relevant file. Note that due to performance
1204considerations linkgit:gitk[1] does not use this attribute unless you
1205manually enable per-file encodings in its options.
1206
1207If this attribute is not set or has an invalid value, the value of the
1208`gui.encoding` configuration variable is used instead
1209(See linkgit:git-config[1]).
1210
1211
1212USING MACRO ATTRIBUTES
1213----------------------
1214
1215You do not want any end-of-line conversions applied to, nor textual diffs
1216produced for, any binary file you track.  You would need to specify e.g.
1217
1218------------
1219*.jpg -text -diff
1220------------
1221
1222but that may become cumbersome, when you have many attributes.  Using
1223macro attributes, you can define an attribute that, when set, also
1224sets or unsets a number of other attributes at the same time.  The
1225system knows a built-in macro attribute, `binary`:
1226
1227------------
1228*.jpg binary
1229------------
1230
1231Setting the "binary" attribute also unsets the "text" and "diff"
1232attributes as above.  Note that macro attributes can only be "Set",
1233though setting one might have the effect of setting or unsetting other
1234attributes or even returning other attributes to the "Unspecified"
1235state.
1236
1237
1238DEFINING MACRO ATTRIBUTES
1239-------------------------
1240
1241Custom macro attributes can be defined only in top-level gitattributes
1242files (`$GIT_DIR/info/attributes`, the `.gitattributes` file at the
1243top level of the working tree, or the global or system-wide
1244gitattributes files), not in `.gitattributes` files in working tree
1245subdirectories.  The built-in macro attribute "binary" is equivalent
1246to:
1247
1248------------
1249[attr]binary -diff -merge -text
1250------------
1251
1252
1253EXAMPLE
1254-------
1255
1256If you have these three `gitattributes` file:
1257
1258----------------------------------------------------------------
1259(in $GIT_DIR/info/attributes)
1260
1261a*      foo !bar -baz
1262
1263(in .gitattributes)
1264abc     foo bar baz
1265
1266(in t/.gitattributes)
1267ab*     merge=filfre
1268abc     -foo -bar
1269*.c     frotz
1270----------------------------------------------------------------
1271
1272the attributes given to path `t/abc` are computed as follows:
1273
12741. By examining `t/.gitattributes` (which is in the same
1275   directory as the path in question), Git finds that the first
1276   line matches.  `merge` attribute is set.  It also finds that
1277   the second line matches, and attributes `foo` and `bar`
1278   are unset.
1279
12802. Then it examines `.gitattributes` (which is in the parent
1281   directory), and finds that the first line matches, but
1282   `t/.gitattributes` file already decided how `merge`, `foo`
1283   and `bar` attributes should be given to this path, so it
1284   leaves `foo` and `bar` unset.  Attribute `baz` is set.
1285
12863. Finally it examines `$GIT_DIR/info/attributes`.  This file
1287   is used to override the in-tree settings.  The first line is
1288   a match, and `foo` is set, `bar` is reverted to unspecified
1289   state, and `baz` is unset.
1290
1291As the result, the attributes assignment to `t/abc` becomes:
1292
1293----------------------------------------------------------------
1294foo     set to true
1295bar     unspecified
1296baz     set to false
1297merge   set to string value "filfre"
1298frotz   unspecified
1299----------------------------------------------------------------
1300
1301
1302SEE ALSO
1303--------
1304linkgit:git-check-attr[1].
1305
1306GIT
1307---
1308Part of the linkgit:git[1] suite