Documentation / gitattributes.txton commit convert: add round trip check based on 'core.checkRoundtripEncoding' (e92d622)
   1gitattributes(5)
   2================
   3
   4NAME
   5----
   6gitattributes - defining attributes per path
   7
   8SYNOPSIS
   9--------
  10$GIT_DIR/info/attributes, .gitattributes
  11
  12
  13DESCRIPTION
  14-----------
  15
  16A `gitattributes` file is a simple text file that gives
  17`attributes` to pathnames.
  18
  19Each line in `gitattributes` file is of form:
  20
  21        pattern attr1 attr2 ...
  22
  23That is, a pattern followed by an attributes list,
  24separated by whitespaces. Leading and trailing whitespaces are
  25ignored. Lines that begin with '#' are ignored. Patterns
  26that begin with a double quote are quoted in C style.
  27When the pattern matches the path in question, the attributes
  28listed on the line are given to the path.
  29
  30Each attribute can be in one of these states for a given path:
  31
  32Set::
  33
  34        The path has the attribute with special value "true";
  35        this is specified by listing only the name of the
  36        attribute in the attribute list.
  37
  38Unset::
  39
  40        The path has the attribute with special value "false";
  41        this is specified by listing the name of the attribute
  42        prefixed with a dash `-` in the attribute list.
  43
  44Set to a value::
  45
  46        The path has the attribute with specified string value;
  47        this is specified by listing the name of the attribute
  48        followed by an equal sign `=` and its value in the
  49        attribute list.
  50
  51Unspecified::
  52
  53        No pattern matches the path, and nothing says if
  54        the path has or does not have the attribute, the
  55        attribute for the path is said to be Unspecified.
  56
  57When more than one pattern matches the path, a later line
  58overrides an earlier line.  This overriding is done per
  59attribute.  The rules how the pattern matches paths are the
  60same as in `.gitignore` files; see linkgit:gitignore[5].
  61Unlike `.gitignore`, negative patterns are forbidden.
  62
  63When deciding what attributes are assigned to a path, Git
  64consults `$GIT_DIR/info/attributes` file (which has the highest
  65precedence), `.gitattributes` file in the same directory as the
  66path in question, and its parent directories up to the toplevel of the
  67work tree (the further the directory that contains `.gitattributes`
  68is from the path in question, the lower its precedence). Finally
  69global and system-wide files are considered (they have the lowest
  70precedence).
  71
  72When the `.gitattributes` file is missing from the work tree, the
  73path in the index is used as a fall-back.  During checkout process,
  74`.gitattributes` in the index is used and then the file in the
  75working tree is used as a fall-back.
  76
  77If you wish to affect only a single repository (i.e., to assign
  78attributes to files that are particular to
  79one user's workflow for that repository), then
  80attributes should be placed in the `$GIT_DIR/info/attributes` file.
  81Attributes which should be version-controlled and distributed to other
  82repositories (i.e., attributes of interest to all users) should go into
  83`.gitattributes` files. Attributes that should affect all repositories
  84for a single user should be placed in a file specified by the
  85`core.attributesFile` configuration option (see linkgit:git-config[1]).
  86Its default value is $XDG_CONFIG_HOME/git/attributes. If $XDG_CONFIG_HOME
  87is either not set or empty, $HOME/.config/git/attributes is used instead.
  88Attributes for all users on a system should be placed in the
  89`$(prefix)/etc/gitattributes` file.
  90
  91Sometimes you would need to override a setting of an attribute
  92for a path to `Unspecified` state.  This can be done by listing
  93the name of the attribute prefixed with an exclamation point `!`.
  94
  95
  96EFFECTS
  97-------
  98
  99Certain operations by Git can be influenced by assigning
 100particular attributes to a path.  Currently, the following
 101operations are attributes-aware.
 102
 103Checking-out and checking-in
 104~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 105
 106These attributes affect how the contents stored in the
 107repository are copied to the working tree files when commands
 108such as 'git checkout' and 'git merge' run.  They also affect how
 109Git stores the contents you prepare in the working tree in the
 110repository upon 'git add' and 'git commit'.
 111
 112`text`
 113^^^^^^
 114
 115This attribute enables and controls end-of-line normalization.  When a
 116text file is normalized, its line endings are converted to LF in the
 117repository.  To control what line ending style is used in the working
 118directory, use the `eol` attribute for a single file and the
 119`core.eol` configuration variable for all text files.
 120Note that `core.autocrlf` overrides `core.eol`
 121
 122Set::
 123
 124        Setting the `text` attribute on a path enables end-of-line
 125        normalization and marks the path as a text file.  End-of-line
 126        conversion takes place without guessing the content type.
 127
 128Unset::
 129
 130        Unsetting the `text` attribute on a path tells Git not to
 131        attempt any end-of-line conversion upon checkin or checkout.
 132
 133Set to string value "auto"::
 134
 135        When `text` is set to "auto", the path is marked for automatic
 136        end-of-line conversion.  If Git decides that the content is
 137        text, its line endings are converted to LF on checkin.
 138        When the file has been committed with CRLF, no conversion is done.
 139
 140Unspecified::
 141
 142        If the `text` attribute is unspecified, Git uses the
 143        `core.autocrlf` configuration variable to determine if the
 144        file should be converted.
 145
 146Any other value causes Git to act as if `text` has been left
 147unspecified.
 148
 149`eol`
 150^^^^^
 151
 152This attribute sets a specific line-ending style to be used in the
 153working directory.  It enables end-of-line conversion without any
 154content checks, effectively setting the `text` attribute.  Note that
 155setting this attribute on paths which are in the index with CRLF line
 156endings may make the paths to be considered dirty.  Adding the path to
 157the index again will normalize the line endings in the index.
 158
 159Set to string value "crlf"::
 160
 161        This setting forces Git to normalize line endings for this
 162        file on checkin and convert them to CRLF when the file is
 163        checked out.
 164
 165Set to string value "lf"::
 166
 167        This setting forces Git to normalize line endings to LF on
 168        checkin and prevents conversion to CRLF when the file is
 169        checked out.
 170
 171Backwards compatibility with `crlf` attribute
 172^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 173
 174For backwards compatibility, the `crlf` attribute is interpreted as
 175follows:
 176
 177------------------------
 178crlf            text
 179-crlf           -text
 180crlf=input      eol=lf
 181------------------------
 182
 183End-of-line conversion
 184^^^^^^^^^^^^^^^^^^^^^^
 185
 186While Git normally leaves file contents alone, it can be configured to
 187normalize line endings to LF in the repository and, optionally, to
 188convert them to CRLF when files are checked out.
 189
 190If you simply want to have CRLF line endings in your working directory
 191regardless of the repository you are working with, you can set the
 192config variable "core.autocrlf" without using any attributes.
 193
 194------------------------
 195[core]
 196        autocrlf = true
 197------------------------
 198
 199This does not force normalization of text files, but does ensure
 200that text files that you introduce to the repository have their line
 201endings normalized to LF when they are added, and that files that are
 202already normalized in the repository stay normalized.
 203
 204If you want to ensure that text files that any contributor introduces to
 205the repository have their line endings normalized, you can set the
 206`text` attribute to "auto" for _all_ files.
 207
 208------------------------
 209*       text=auto
 210------------------------
 211
 212The attributes allow a fine-grained control, how the line endings
 213are converted.
 214Here is an example that will make Git normalize .txt, .vcproj and .sh
 215files, ensure that .vcproj files have CRLF and .sh files have LF in
 216the working directory, and prevent .jpg files from being normalized
 217regardless of their content.
 218
 219------------------------
 220*               text=auto
 221*.txt           text
 222*.vcproj        text eol=crlf
 223*.sh            text eol=lf
 224*.jpg           -text
 225------------------------
 226
 227NOTE: When `text=auto` conversion is enabled in a cross-platform
 228project using push and pull to a central repository the text files
 229containing CRLFs should be normalized.
 230
 231From a clean working directory:
 232
 233-------------------------------------------------
 234$ echo "* text=auto" >.gitattributes
 235$ git add --renormalize .
 236$ git status        # Show files that will be normalized
 237$ git commit -m "Introduce end-of-line normalization"
 238-------------------------------------------------
 239
 240If any files that should not be normalized show up in 'git status',
 241unset their `text` attribute before running 'git add -u'.
 242
 243------------------------
 244manual.pdf      -text
 245------------------------
 246
 247Conversely, text files that Git does not detect can have normalization
 248enabled manually.
 249
 250------------------------
 251weirdchars.txt  text
 252------------------------
 253
 254If `core.safecrlf` is set to "true" or "warn", Git verifies if
 255the conversion is reversible for the current setting of
 256`core.autocrlf`.  For "true", Git rejects irreversible
 257conversions; for "warn", Git only prints a warning but accepts
 258an irreversible conversion.  The safety triggers to prevent such
 259a conversion done to the files in the work tree, but there are a
 260few exceptions.  Even though...
 261
 262- 'git add' itself does not touch the files in the work tree, the
 263  next checkout would, so the safety triggers;
 264
 265- 'git apply' to update a text file with a patch does touch the files
 266  in the work tree, but the operation is about text files and CRLF
 267  conversion is about fixing the line ending inconsistencies, so the
 268  safety does not trigger;
 269
 270- 'git diff' itself does not touch the files in the work tree, it is
 271  often run to inspect the changes you intend to next 'git add'.  To
 272  catch potential problems early, safety triggers.
 273
 274
 275`working-tree-encoding`
 276^^^^^^^^^^^^^^^^^^^^^^^
 277
 278Git recognizes files encoded in ASCII or one of its supersets (e.g.
 279UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
 280encodings (e.g. UTF-16) are interpreted as binary and consequently
 281built-in Git text processing tools (e.g. 'git diff') as well as most Git
 282web front ends do not visualize the contents of these files by default.
 283
 284In these cases you can tell Git the encoding of a file in the working
 285directory with the `working-tree-encoding` attribute. If a file with this
 286attribute is added to Git, then Git reencodes the content from the
 287specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
 288content in its internal data structure (called "the index"). On checkout
 289the content is reencoded back to the specified encoding.
 290
 291Please note that using the `working-tree-encoding` attribute may have a
 292number of pitfalls:
 293
 294- Alternative Git implementations (e.g. JGit or libgit2) and older Git
 295  versions (as of March 2018) do not support the `working-tree-encoding`
 296  attribute. If you decide to use the `working-tree-encoding` attribute
 297  in your repository, then it is strongly recommended to ensure that all
 298  clients working with the repository support it.
 299
 300  For example, Microsoft Visual Studio resources files (`*.rc`) or
 301  PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
 302  If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
 303  a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
 304  stored as UTF-8 internally. A client without `working-tree-encoding`
 305  support will checkout `foo.ps1` as UTF-8 encoded file. This will
 306  typically cause trouble for the users of this file.
 307
 308  If a Git client, that does not support the `working-tree-encoding`
 309  attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
 310  stored "as-is" internally (in this example probably as UTF-16).
 311  A client with `working-tree-encoding` support will interpret the
 312  internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
 313  That operation will fail and cause an error.
 314
 315- Reencoding content to non-UTF encodings can cause errors as the
 316  conversion might not be UTF-8 round trip safe. If you suspect your
 317  encoding to not be round trip safe, then add it to
 318  `core.checkRoundtripEncoding` to make Git check the round trip
 319  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
 320  set) is known to have round trip issues with UTF-8 and is checked by
 321  default.
 322
 323- Reencoding content requires resources that might slow down certain
 324  Git operations (e.g 'git checkout' or 'git add').
 325
 326Use the `working-tree-encoding` attribute only if you cannot store a file
 327in UTF-8 encoding and if you want Git to be able to process the content
 328as text.
 329
 330As an example, use the following attributes if your '*.ps1' files are
 331UTF-16 encoded with byte order mark (BOM) and you want Git to perform
 332automatic line ending conversion based on your platform.
 333
 334------------------------
 335*.ps1           text working-tree-encoding=UTF-16
 336------------------------
 337
 338Use the following attributes if your '*.ps1' files are UTF-16 little
 339endian encoded without BOM and you want Git to use Windows line endings
 340in the working directory. Please note, it is highly recommended to
 341explicitly define the line endings with `eol` if the `working-tree-encoding`
 342attribute is used to avoid ambiguity.
 343
 344------------------------
 345*.ps1           text working-tree-encoding=UTF-16LE eol=CRLF
 346------------------------
 347
 348You can get a list of all available encodings on your platform with the
 349following command:
 350
 351------------------------
 352iconv --list
 353------------------------
 354
 355If you do not know the encoding of a file, then you can use the `file`
 356command to guess the encoding:
 357
 358------------------------
 359file foo.ps1
 360------------------------
 361
 362
 363`ident`
 364^^^^^^^
 365
 366When the attribute `ident` is set for a path, Git replaces
 367`$Id$` in the blob object with `$Id:`, followed by the
 36840-character hexadecimal blob object name, followed by a dollar
 369sign `$` upon checkout.  Any byte sequence that begins with
 370`$Id:` and ends with `$` in the worktree file is replaced
 371with `$Id$` upon check-in.
 372
 373
 374`filter`
 375^^^^^^^^
 376
 377A `filter` attribute can be set to a string value that names a
 378filter driver specified in the configuration.
 379
 380A filter driver consists of a `clean` command and a `smudge`
 381command, either of which can be left unspecified.  Upon
 382checkout, when the `smudge` command is specified, the command is
 383fed the blob object from its standard input, and its standard
 384output is used to update the worktree file.  Similarly, the
 385`clean` command is used to convert the contents of worktree file
 386upon checkin. By default these commands process only a single
 387blob and terminate. If a long running `process` filter is used
 388in place of `clean` and/or `smudge` filters, then Git can process
 389all blobs with a single filter command invocation for the entire
 390life of a single Git command, for example `git add --all`. If a
 391long running `process` filter is configured then it always takes
 392precedence over a configured single blob filter. See section
 393below for the description of the protocol used to communicate with
 394a `process` filter.
 395
 396One use of the content filtering is to massage the content into a shape
 397that is more convenient for the platform, filesystem, and the user to use.
 398For this mode of operation, the key phrase here is "more convenient" and
 399not "turning something unusable into usable".  In other words, the intent
 400is that if someone unsets the filter driver definition, or does not have
 401the appropriate filter program, the project should still be usable.
 402
 403Another use of the content filtering is to store the content that cannot
 404be directly used in the repository (e.g. a UUID that refers to the true
 405content stored outside Git, or an encrypted content) and turn it into a
 406usable form upon checkout (e.g. download the external content, or decrypt
 407the encrypted content).
 408
 409These two filters behave differently, and by default, a filter is taken as
 410the former, massaging the contents into more convenient shape.  A missing
 411filter driver definition in the config, or a filter driver that exits with
 412a non-zero status, is not an error but makes the filter a no-op passthru.
 413
 414You can declare that a filter turns a content that by itself is unusable
 415into a usable content by setting the filter.<driver>.required configuration
 416variable to `true`.
 417
 418Note: Whenever the clean filter is changed, the repo should be renormalized:
 419$ git add --renormalize .
 420
 421For example, in .gitattributes, you would assign the `filter`
 422attribute for paths.
 423
 424------------------------
 425*.c     filter=indent
 426------------------------
 427
 428Then you would define a "filter.indent.clean" and "filter.indent.smudge"
 429configuration in your .git/config to specify a pair of commands to
 430modify the contents of C programs when the source files are checked
 431in ("clean" is run) and checked out (no change is made because the
 432command is "cat").
 433
 434------------------------
 435[filter "indent"]
 436        clean = indent
 437        smudge = cat
 438------------------------
 439
 440For best results, `clean` should not alter its output further if it is
 441run twice ("clean->clean" should be equivalent to "clean"), and
 442multiple `smudge` commands should not alter `clean`'s output
 443("smudge->smudge->clean" should be equivalent to "clean").  See the
 444section on merging below.
 445
 446The "indent" filter is well-behaved in this regard: it will not modify
 447input that is already correctly indented.  In this case, the lack of a
 448smudge filter means that the clean filter _must_ accept its own output
 449without modifying it.
 450
 451If a filter _must_ succeed in order to make the stored contents usable,
 452you can declare that the filter is `required`, in the configuration:
 453
 454------------------------
 455[filter "crypt"]
 456        clean = openssl enc ...
 457        smudge = openssl enc -d ...
 458        required
 459------------------------
 460
 461Sequence "%f" on the filter command line is replaced with the name of
 462the file the filter is working on.  A filter might use this in keyword
 463substitution.  For example:
 464
 465------------------------
 466[filter "p4"]
 467        clean = git-p4-filter --clean %f
 468        smudge = git-p4-filter --smudge %f
 469------------------------
 470
 471Note that "%f" is the name of the path that is being worked on. Depending
 472on the version that is being filtered, the corresponding file on disk may
 473not exist, or may have different contents. So, smudge and clean commands
 474should not try to access the file on disk, but only act as filters on the
 475content provided to them on standard input.
 476
 477Long Running Filter Process
 478^^^^^^^^^^^^^^^^^^^^^^^^^^^
 479
 480If the filter command (a string value) is defined via
 481`filter.<driver>.process` then Git can process all blobs with a
 482single filter invocation for the entire life of a single Git
 483command. This is achieved by using a packet format (pkt-line,
 484see technical/protocol-common.txt) based protocol over standard
 485input and standard output as follows. All packets, except for the
 486"*CONTENT" packets and the "0000" flush packet, are considered
 487text and therefore are terminated by a LF.
 488
 489Git starts the filter when it encounters the first file
 490that needs to be cleaned or smudged. After the filter started
 491Git sends a welcome message ("git-filter-client"), a list of supported
 492protocol version numbers, and a flush packet. Git expects to read a welcome
 493response message ("git-filter-server"), exactly one protocol version number
 494from the previously sent list, and a flush packet. All further
 495communication will be based on the selected version. The remaining
 496protocol description below documents "version=2". Please note that
 497"version=42" in the example below does not exist and is only there
 498to illustrate how the protocol would look like with more than one
 499version.
 500
 501After the version negotiation Git sends a list of all capabilities that
 502it supports and a flush packet. Git expects to read a list of desired
 503capabilities, which must be a subset of the supported capabilities list,
 504and a flush packet as response:
 505------------------------
 506packet:          git> git-filter-client
 507packet:          git> version=2
 508packet:          git> version=42
 509packet:          git> 0000
 510packet:          git< git-filter-server
 511packet:          git< version=2
 512packet:          git< 0000
 513packet:          git> capability=clean
 514packet:          git> capability=smudge
 515packet:          git> capability=not-yet-invented
 516packet:          git> 0000
 517packet:          git< capability=clean
 518packet:          git< capability=smudge
 519packet:          git< 0000
 520------------------------
 521Supported filter capabilities in version 2 are "clean", "smudge",
 522and "delay".
 523
 524Afterwards Git sends a list of "key=value" pairs terminated with
 525a flush packet. The list will contain at least the filter command
 526(based on the supported capabilities) and the pathname of the file
 527to filter relative to the repository root. Right after the flush packet
 528Git sends the content split in zero or more pkt-line packets and a
 529flush packet to terminate content. Please note, that the filter
 530must not send any response before it received the content and the
 531final flush packet. Also note that the "value" of a "key=value" pair
 532can contain the "=" character whereas the key would never contain
 533that character.
 534------------------------
 535packet:          git> command=smudge
 536packet:          git> pathname=path/testfile.dat
 537packet:          git> 0000
 538packet:          git> CONTENT
 539packet:          git> 0000
 540------------------------
 541
 542The filter is expected to respond with a list of "key=value" pairs
 543terminated with a flush packet. If the filter does not experience
 544problems then the list must contain a "success" status. Right after
 545these packets the filter is expected to send the content in zero
 546or more pkt-line packets and a flush packet at the end. Finally, a
 547second list of "key=value" pairs terminated with a flush packet
 548is expected. The filter can change the status in the second list
 549or keep the status as is with an empty list. Please note that the
 550empty list must be terminated with a flush packet regardless.
 551
 552------------------------
 553packet:          git< status=success
 554packet:          git< 0000
 555packet:          git< SMUDGED_CONTENT
 556packet:          git< 0000
 557packet:          git< 0000  # empty list, keep "status=success" unchanged!
 558------------------------
 559
 560If the result content is empty then the filter is expected to respond
 561with a "success" status and a flush packet to signal the empty content.
 562------------------------
 563packet:          git< status=success
 564packet:          git< 0000
 565packet:          git< 0000  # empty content!
 566packet:          git< 0000  # empty list, keep "status=success" unchanged!
 567------------------------
 568
 569In case the filter cannot or does not want to process the content,
 570it is expected to respond with an "error" status.
 571------------------------
 572packet:          git< status=error
 573packet:          git< 0000
 574------------------------
 575
 576If the filter experiences an error during processing, then it can
 577send the status "error" after the content was (partially or
 578completely) sent.
 579------------------------
 580packet:          git< status=success
 581packet:          git< 0000
 582packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
 583packet:          git< 0000
 584packet:          git< status=error
 585packet:          git< 0000
 586------------------------
 587
 588In case the filter cannot or does not want to process the content
 589as well as any future content for the lifetime of the Git process,
 590then it is expected to respond with an "abort" status at any point
 591in the protocol.
 592------------------------
 593packet:          git< status=abort
 594packet:          git< 0000
 595------------------------
 596
 597Git neither stops nor restarts the filter process in case the
 598"error"/"abort" status is set. However, Git sets its exit code
 599according to the `filter.<driver>.required` flag, mimicking the
 600behavior of the `filter.<driver>.clean` / `filter.<driver>.smudge`
 601mechanism.
 602
 603If the filter dies during the communication or does not adhere to
 604the protocol then Git will stop the filter process and restart it
 605with the next file that needs to be processed. Depending on the
 606`filter.<driver>.required` flag Git will interpret that as error.
 607
 608After the filter has processed a command it is expected to wait for
 609a "key=value" list containing the next command. Git will close
 610the command pipe on exit. The filter is expected to detect EOF
 611and exit gracefully on its own. Git will wait until the filter
 612process has stopped.
 613
 614Delay
 615^^^^^
 616
 617If the filter supports the "delay" capability, then Git can send the
 618flag "can-delay" after the filter command and pathname. This flag
 619denotes that the filter can delay filtering the current blob (e.g. to
 620compensate network latencies) by responding with no content but with
 621the status "delayed" and a flush packet.
 622------------------------
 623packet:          git> command=smudge
 624packet:          git> pathname=path/testfile.dat
 625packet:          git> can-delay=1
 626packet:          git> 0000
 627packet:          git> CONTENT
 628packet:          git> 0000
 629packet:          git< status=delayed
 630packet:          git< 0000
 631------------------------
 632
 633If the filter supports the "delay" capability then it must support the
 634"list_available_blobs" command. If Git sends this command, then the
 635filter is expected to return a list of pathnames representing blobs
 636that have been delayed earlier and are now available.
 637The list must be terminated with a flush packet followed
 638by a "success" status that is also terminated with a flush packet. If
 639no blobs for the delayed paths are available, yet, then the filter is
 640expected to block the response until at least one blob becomes
 641available. The filter can tell Git that it has no more delayed blobs
 642by sending an empty list. As soon as the filter responds with an empty
 643list, Git stops asking. All blobs that Git has not received at this
 644point are considered missing and will result in an error.
 645
 646------------------------
 647packet:          git> command=list_available_blobs
 648packet:          git> 0000
 649packet:          git< pathname=path/testfile.dat
 650packet:          git< pathname=path/otherfile.dat
 651packet:          git< 0000
 652packet:          git< status=success
 653packet:          git< 0000
 654------------------------
 655
 656After Git received the pathnames, it will request the corresponding
 657blobs again. These requests contain a pathname and an empty content
 658section. The filter is expected to respond with the smudged content
 659in the usual way as explained above.
 660------------------------
 661packet:          git> command=smudge
 662packet:          git> pathname=path/testfile.dat
 663packet:          git> 0000
 664packet:          git> 0000  # empty content!
 665packet:          git< status=success
 666packet:          git< 0000
 667packet:          git< SMUDGED_CONTENT
 668packet:          git< 0000
 669packet:          git< 0000  # empty list, keep "status=success" unchanged!
 670------------------------
 671
 672Example
 673^^^^^^^
 674
 675A long running filter demo implementation can be found in
 676`contrib/long-running-filter/example.pl` located in the Git
 677core repository. If you develop your own long running filter
 678process then the `GIT_TRACE_PACKET` environment variables can be
 679very helpful for debugging (see linkgit:git[1]).
 680
 681Please note that you cannot use an existing `filter.<driver>.clean`
 682or `filter.<driver>.smudge` command with `filter.<driver>.process`
 683because the former two use a different inter process communication
 684protocol than the latter one.
 685
 686
 687Interaction between checkin/checkout attributes
 688^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 689
 690In the check-in codepath, the worktree file is first converted
 691with `filter` driver (if specified and corresponding driver
 692defined), then the result is processed with `ident` (if
 693specified), and then finally with `text` (again, if specified
 694and applicable).
 695
 696In the check-out codepath, the blob content is first converted
 697with `text`, and then `ident` and fed to `filter`.
 698
 699
 700Merging branches with differing checkin/checkout attributes
 701^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 702
 703If you have added attributes to a file that cause the canonical
 704repository format for that file to change, such as adding a
 705clean/smudge filter or text/eol/ident attributes, merging anything
 706where the attribute is not in place would normally cause merge
 707conflicts.
 708
 709To prevent these unnecessary merge conflicts, Git can be told to run a
 710virtual check-out and check-in of all three stages of a file when
 711resolving a three-way merge by setting the `merge.renormalize`
 712configuration variable.  This prevents changes caused by check-in
 713conversion from causing spurious merge conflicts when a converted file
 714is merged with an unconverted file.
 715
 716As long as a "smudge->clean" results in the same output as a "clean"
 717even on files that are already smudged, this strategy will
 718automatically resolve all filter-related conflicts.  Filters that do
 719not act in this way may cause additional merge conflicts that must be
 720resolved manually.
 721
 722
 723Generating diff text
 724~~~~~~~~~~~~~~~~~~~~
 725
 726`diff`
 727^^^^^^
 728
 729The attribute `diff` affects how Git generates diffs for particular
 730files. It can tell Git whether to generate a textual patch for the path
 731or to treat the path as a binary file.  It can also affect what line is
 732shown on the hunk header `@@ -k,l +n,m @@` line, tell Git to use an
 733external command to generate the diff, or ask Git to convert binary
 734files to a text format before generating the diff.
 735
 736Set::
 737
 738        A path to which the `diff` attribute is set is treated
 739        as text, even when they contain byte values that
 740        normally never appear in text files, such as NUL.
 741
 742Unset::
 743
 744        A path to which the `diff` attribute is unset will
 745        generate `Binary files differ` (or a binary patch, if
 746        binary patches are enabled).
 747
 748Unspecified::
 749
 750        A path to which the `diff` attribute is unspecified
 751        first gets its contents inspected, and if it looks like
 752        text and is smaller than core.bigFileThreshold, it is treated
 753        as text. Otherwise it would generate `Binary files differ`.
 754
 755String::
 756
 757        Diff is shown using the specified diff driver.  Each driver may
 758        specify one or more options, as described in the following
 759        section. The options for the diff driver "foo" are defined
 760        by the configuration variables in the "diff.foo" section of the
 761        Git config file.
 762
 763
 764Defining an external diff driver
 765^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 766
 767The definition of a diff driver is done in `gitconfig`, not
 768`gitattributes` file, so strictly speaking this manual page is a
 769wrong place to talk about it.  However...
 770
 771To define an external diff driver `jcdiff`, add a section to your
 772`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 773
 774----------------------------------------------------------------
 775[diff "jcdiff"]
 776        command = j-c-diff
 777----------------------------------------------------------------
 778
 779When Git needs to show you a diff for the path with `diff`
 780attribute set to `jcdiff`, it calls the command you specified
 781with the above configuration, i.e. `j-c-diff`, with 7
 782parameters, just like `GIT_EXTERNAL_DIFF` program is called.
 783See linkgit:git[1] for details.
 784
 785
 786Defining a custom hunk-header
 787^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 788
 789Each group of changes (called a "hunk") in the textual diff output
 790is prefixed with a line of the form:
 791
 792        @@ -k,l +n,m @@ TEXT
 793
 794This is called a 'hunk header'.  The "TEXT" portion is by default a line
 795that begins with an alphabet, an underscore or a dollar sign; this
 796matches what GNU 'diff -p' output uses.  This default selection however
 797is not suited for some contents, and you can use a customized pattern
 798to make a selection.
 799
 800First, in .gitattributes, you would assign the `diff` attribute
 801for paths.
 802
 803------------------------
 804*.tex   diff=tex
 805------------------------
 806
 807Then, you would define a "diff.tex.xfuncname" configuration to
 808specify a regular expression that matches a line that you would
 809want to appear as the hunk header "TEXT". Add a section to your
 810`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 811
 812------------------------
 813[diff "tex"]
 814        xfuncname = "^(\\\\(sub)*section\\{.*)$"
 815------------------------
 816
 817Note.  A single level of backslashes are eaten by the
 818configuration file parser, so you would need to double the
 819backslashes; the pattern above picks a line that begins with a
 820backslash, and zero or more occurrences of `sub` followed by
 821`section` followed by open brace, to the end of line.
 822
 823There are a few built-in patterns to make this easier, and `tex`
 824is one of them, so you do not have to write the above in your
 825configuration file (you still need to enable this with the
 826attribute mechanism, via `.gitattributes`).  The following built in
 827patterns are available:
 828
 829- `ada` suitable for source code in the Ada language.
 830
 831- `bibtex` suitable for files with BibTeX coded references.
 832
 833- `cpp` suitable for source code in the C and C++ languages.
 834
 835- `csharp` suitable for source code in the C# language.
 836
 837- `css` suitable for cascading style sheets.
 838
 839- `fortran` suitable for source code in the Fortran language.
 840
 841- `fountain` suitable for Fountain documents.
 842
 843- `html` suitable for HTML/XHTML documents.
 844
 845- `java` suitable for source code in the Java language.
 846
 847- `matlab` suitable for source code in the MATLAB language.
 848
 849- `objc` suitable for source code in the Objective-C language.
 850
 851- `pascal` suitable for source code in the Pascal/Delphi language.
 852
 853- `perl` suitable for source code in the Perl language.
 854
 855- `php` suitable for source code in the PHP language.
 856
 857- `python` suitable for source code in the Python language.
 858
 859- `ruby` suitable for source code in the Ruby language.
 860
 861- `tex` suitable for source code for LaTeX documents.
 862
 863
 864Customizing word diff
 865^^^^^^^^^^^^^^^^^^^^^
 866
 867You can customize the rules that `git diff --word-diff` uses to
 868split words in a line, by specifying an appropriate regular expression
 869in the "diff.*.wordRegex" configuration variable.  For example, in TeX
 870a backslash followed by a sequence of letters forms a command, but
 871several such commands can be run together without intervening
 872whitespace.  To separate them, use a regular expression in your
 873`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
 874
 875------------------------
 876[diff "tex"]
 877        wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
 878------------------------
 879
 880A built-in pattern is provided for all languages listed in the
 881previous section.
 882
 883
 884Performing text diffs of binary files
 885^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 886
 887Sometimes it is desirable to see the diff of a text-converted
 888version of some binary files. For example, a word processor
 889document can be converted to an ASCII text representation, and
 890the diff of the text shown. Even though this conversion loses
 891some information, the resulting diff is useful for human
 892viewing (but cannot be applied directly).
 893
 894The `textconv` config option is used to define a program for
 895performing such a conversion. The program should take a single
 896argument, the name of a file to convert, and produce the
 897resulting text on stdout.
 898
 899For example, to show the diff of the exif information of a
 900file instead of the binary information (assuming you have the
 901exif tool installed), add the following section to your
 902`$GIT_DIR/config` file (or `$HOME/.gitconfig` file):
 903
 904------------------------
 905[diff "jpg"]
 906        textconv = exif
 907------------------------
 908
 909NOTE: The text conversion is generally a one-way conversion;
 910in this example, we lose the actual image contents and focus
 911just on the text data. This means that diffs generated by
 912textconv are _not_ suitable for applying. For this reason,
 913only `git diff` and the `git log` family of commands (i.e.,
 914log, whatchanged, show) will perform text conversion. `git
 915format-patch` will never generate this output. If you want to
 916send somebody a text-converted diff of a binary file (e.g.,
 917because it quickly conveys the changes you have made), you
 918should generate it separately and send it as a comment _in
 919addition to_ the usual binary diff that you might send.
 920
 921Because text conversion can be slow, especially when doing a
 922large number of them with `git log -p`, Git provides a mechanism
 923to cache the output and use it in future diffs.  To enable
 924caching, set the "cachetextconv" variable in your diff driver's
 925config. For example:
 926
 927------------------------
 928[diff "jpg"]
 929        textconv = exif
 930        cachetextconv = true
 931------------------------
 932
 933This will cache the result of running "exif" on each blob
 934indefinitely. If you change the textconv config variable for a
 935diff driver, Git will automatically invalidate the cache entries
 936and re-run the textconv filter. If you want to invalidate the
 937cache manually (e.g., because your version of "exif" was updated
 938and now produces better output), you can remove the cache
 939manually with `git update-ref -d refs/notes/textconv/jpg` (where
 940"jpg" is the name of the diff driver, as in the example above).
 941
 942Choosing textconv versus external diff
 943^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 944
 945If you want to show differences between binary or specially-formatted
 946blobs in your repository, you can choose to use either an external diff
 947command, or to use textconv to convert them to a diff-able text format.
 948Which method you choose depends on your exact situation.
 949
 950The advantage of using an external diff command is flexibility. You are
 951not bound to find line-oriented changes, nor is it necessary for the
 952output to resemble unified diff. You are free to locate and report
 953changes in the most appropriate way for your data format.
 954
 955A textconv, by comparison, is much more limiting. You provide a
 956transformation of the data into a line-oriented text format, and Git
 957uses its regular diff tools to generate the output. There are several
 958advantages to choosing this method:
 959
 9601. Ease of use. It is often much simpler to write a binary to text
 961   transformation than it is to perform your own diff. In many cases,
 962   existing programs can be used as textconv filters (e.g., exif,
 963   odt2txt).
 964
 9652. Git diff features. By performing only the transformation step
 966   yourself, you can still utilize many of Git's diff features,
 967   including colorization, word-diff, and combined diffs for merges.
 968
 9693. Caching. Textconv caching can speed up repeated diffs, such as those
 970   you might trigger by running `git log -p`.
 971
 972
 973Marking files as binary
 974^^^^^^^^^^^^^^^^^^^^^^^
 975
 976Git usually guesses correctly whether a blob contains text or binary
 977data by examining the beginning of the contents. However, sometimes you
 978may want to override its decision, either because a blob contains binary
 979data later in the file, or because the content, while technically
 980composed of text characters, is opaque to a human reader. For example,
 981many postscript files contain only ASCII characters, but produce noisy
 982and meaningless diffs.
 983
 984The simplest way to mark a file as binary is to unset the diff
 985attribute in the `.gitattributes` file:
 986
 987------------------------
 988*.ps -diff
 989------------------------
 990
 991This will cause Git to generate `Binary files differ` (or a binary
 992patch, if binary patches are enabled) instead of a regular diff.
 993
 994However, one may also want to specify other diff driver attributes. For
 995example, you might want to use `textconv` to convert postscript files to
 996an ASCII representation for human viewing, but otherwise treat them as
 997binary files. You cannot specify both `-diff` and `diff=ps` attributes.
 998The solution is to use the `diff.*.binary` config option:
 999
1000------------------------
1001[diff "ps"]
1002  textconv = ps2ascii
1003  binary = true
1004------------------------
1005
1006Performing a three-way merge
1007~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1008
1009`merge`
1010^^^^^^^
1011
1012The attribute `merge` affects how three versions of a file are
1013merged when a file-level merge is necessary during `git merge`,
1014and other commands such as `git revert` and `git cherry-pick`.
1015
1016Set::
1017
1018        Built-in 3-way merge driver is used to merge the
1019        contents in a way similar to 'merge' command of `RCS`
1020        suite.  This is suitable for ordinary text files.
1021
1022Unset::
1023
1024        Take the version from the current branch as the
1025        tentative merge result, and declare that the merge has
1026        conflicts.  This is suitable for binary files that do
1027        not have a well-defined merge semantics.
1028
1029Unspecified::
1030
1031        By default, this uses the same built-in 3-way merge
1032        driver as is the case when the `merge` attribute is set.
1033        However, the `merge.default` configuration variable can name
1034        different merge driver to be used with paths for which the
1035        `merge` attribute is unspecified.
1036
1037String::
1038
1039        3-way merge is performed using the specified custom
1040        merge driver.  The built-in 3-way merge driver can be
1041        explicitly specified by asking for "text" driver; the
1042        built-in "take the current branch" driver can be
1043        requested with "binary".
1044
1045
1046Built-in merge drivers
1047^^^^^^^^^^^^^^^^^^^^^^
1048
1049There are a few built-in low-level merge drivers defined that
1050can be asked for via the `merge` attribute.
1051
1052text::
1053
1054        Usual 3-way file level merge for text files.  Conflicted
1055        regions are marked with conflict markers `<<<<<<<`,
1056        `=======` and `>>>>>>>`.  The version from your branch
1057        appears before the `=======` marker, and the version
1058        from the merged branch appears after the `=======`
1059        marker.
1060
1061binary::
1062
1063        Keep the version from your branch in the work tree, but
1064        leave the path in the conflicted state for the user to
1065        sort out.
1066
1067union::
1068
1069        Run 3-way file level merge for text files, but take
1070        lines from both versions, instead of leaving conflict
1071        markers.  This tends to leave the added lines in the
1072        resulting file in random order and the user should
1073        verify the result. Do not use this if you do not
1074        understand the implications.
1075
1076
1077Defining a custom merge driver
1078^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1079
1080The definition of a merge driver is done in the `.git/config`
1081file, not in the `gitattributes` file, so strictly speaking this
1082manual page is a wrong place to talk about it.  However...
1083
1084To define a custom merge driver `filfre`, add a section to your
1085`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
1086
1087----------------------------------------------------------------
1088[merge "filfre"]
1089        name = feel-free merge driver
1090        driver = filfre %O %A %B %L %P
1091        recursive = binary
1092----------------------------------------------------------------
1093
1094The `merge.*.name` variable gives the driver a human-readable
1095name.
1096
1097The `merge.*.driver` variable's value is used to construct a
1098command to run to merge ancestor's version (`%O`), current
1099version (`%A`) and the other branches' version (`%B`).  These
1100three tokens are replaced with the names of temporary files that
1101hold the contents of these versions when the command line is
1102built. Additionally, %L will be replaced with the conflict marker
1103size (see below).
1104
1105The merge driver is expected to leave the result of the merge in
1106the file named with `%A` by overwriting it, and exit with zero
1107status if it managed to merge them cleanly, or non-zero if there
1108were conflicts.
1109
1110The `merge.*.recursive` variable specifies what other merge
1111driver to use when the merge driver is called for an internal
1112merge between common ancestors, when there are more than one.
1113When left unspecified, the driver itself is used for both
1114internal merge and the final merge.
1115
1116The merge driver can learn the pathname in which the merged result
1117will be stored via placeholder `%P`.
1118
1119
1120`conflict-marker-size`
1121^^^^^^^^^^^^^^^^^^^^^^
1122
1123This attribute controls the length of conflict markers left in
1124the work tree file during a conflicted merge.  Only setting to
1125the value to a positive integer has any meaningful effect.
1126
1127For example, this line in `.gitattributes` can be used to tell the merge
1128machinery to leave much longer (instead of the usual 7-character-long)
1129conflict markers when merging the file `Documentation/git-merge.txt`
1130results in a conflict.
1131
1132------------------------
1133Documentation/git-merge.txt     conflict-marker-size=32
1134------------------------
1135
1136
1137Checking whitespace errors
1138~~~~~~~~~~~~~~~~~~~~~~~~~~
1139
1140`whitespace`
1141^^^^^^^^^^^^
1142
1143The `core.whitespace` configuration variable allows you to define what
1144'diff' and 'apply' should consider whitespace errors for all paths in
1145the project (See linkgit:git-config[1]).  This attribute gives you finer
1146control per path.
1147
1148Set::
1149
1150        Notice all types of potential whitespace errors known to Git.
1151        The tab width is taken from the value of the `core.whitespace`
1152        configuration variable.
1153
1154Unset::
1155
1156        Do not notice anything as error.
1157
1158Unspecified::
1159
1160        Use the value of the `core.whitespace` configuration variable to
1161        decide what to notice as error.
1162
1163String::
1164
1165        Specify a comma separate list of common whitespace problems to
1166        notice in the same format as the `core.whitespace` configuration
1167        variable.
1168
1169
1170Creating an archive
1171~~~~~~~~~~~~~~~~~~~
1172
1173`export-ignore`
1174^^^^^^^^^^^^^^^
1175
1176Files and directories with the attribute `export-ignore` won't be added to
1177archive files.
1178
1179`export-subst`
1180^^^^^^^^^^^^^^
1181
1182If the attribute `export-subst` is set for a file then Git will expand
1183several placeholders when adding this file to an archive.  The
1184expansion depends on the availability of a commit ID, i.e., if
1185linkgit:git-archive[1] has been given a tree instead of a commit or a
1186tag then no replacement will be done.  The placeholders are the same
1187as those for the option `--pretty=format:` of linkgit:git-log[1],
1188except that they need to be wrapped like this: `$Format:PLACEHOLDERS$`
1189in the file.  E.g. the string `$Format:%H$` will be replaced by the
1190commit hash.
1191
1192
1193Packing objects
1194~~~~~~~~~~~~~~~
1195
1196`delta`
1197^^^^^^^
1198
1199Delta compression will not be attempted for blobs for paths with the
1200attribute `delta` set to false.
1201
1202
1203Viewing files in GUI tools
1204~~~~~~~~~~~~~~~~~~~~~~~~~~
1205
1206`encoding`
1207^^^^^^^^^^
1208
1209The value of this attribute specifies the character encoding that should
1210be used by GUI tools (e.g. linkgit:gitk[1] and linkgit:git-gui[1]) to
1211display the contents of the relevant file. Note that due to performance
1212considerations linkgit:gitk[1] does not use this attribute unless you
1213manually enable per-file encodings in its options.
1214
1215If this attribute is not set or has an invalid value, the value of the
1216`gui.encoding` configuration variable is used instead
1217(See linkgit:git-config[1]).
1218
1219
1220USING MACRO ATTRIBUTES
1221----------------------
1222
1223You do not want any end-of-line conversions applied to, nor textual diffs
1224produced for, any binary file you track.  You would need to specify e.g.
1225
1226------------
1227*.jpg -text -diff
1228------------
1229
1230but that may become cumbersome, when you have many attributes.  Using
1231macro attributes, you can define an attribute that, when set, also
1232sets or unsets a number of other attributes at the same time.  The
1233system knows a built-in macro attribute, `binary`:
1234
1235------------
1236*.jpg binary
1237------------
1238
1239Setting the "binary" attribute also unsets the "text" and "diff"
1240attributes as above.  Note that macro attributes can only be "Set",
1241though setting one might have the effect of setting or unsetting other
1242attributes or even returning other attributes to the "Unspecified"
1243state.
1244
1245
1246DEFINING MACRO ATTRIBUTES
1247-------------------------
1248
1249Custom macro attributes can be defined only in top-level gitattributes
1250files (`$GIT_DIR/info/attributes`, the `.gitattributes` file at the
1251top level of the working tree, or the global or system-wide
1252gitattributes files), not in `.gitattributes` files in working tree
1253subdirectories.  The built-in macro attribute "binary" is equivalent
1254to:
1255
1256------------
1257[attr]binary -diff -merge -text
1258------------
1259
1260
1261EXAMPLE
1262-------
1263
1264If you have these three `gitattributes` file:
1265
1266----------------------------------------------------------------
1267(in $GIT_DIR/info/attributes)
1268
1269a*      foo !bar -baz
1270
1271(in .gitattributes)
1272abc     foo bar baz
1273
1274(in t/.gitattributes)
1275ab*     merge=filfre
1276abc     -foo -bar
1277*.c     frotz
1278----------------------------------------------------------------
1279
1280the attributes given to path `t/abc` are computed as follows:
1281
12821. By examining `t/.gitattributes` (which is in the same
1283   directory as the path in question), Git finds that the first
1284   line matches.  `merge` attribute is set.  It also finds that
1285   the second line matches, and attributes `foo` and `bar`
1286   are unset.
1287
12882. Then it examines `.gitattributes` (which is in the parent
1289   directory), and finds that the first line matches, but
1290   `t/.gitattributes` file already decided how `merge`, `foo`
1291   and `bar` attributes should be given to this path, so it
1292   leaves `foo` and `bar` unset.  Attribute `baz` is set.
1293
12943. Finally it examines `$GIT_DIR/info/attributes`.  This file
1295   is used to override the in-tree settings.  The first line is
1296   a match, and `foo` is set, `bar` is reverted to unspecified
1297   state, and `baz` is unset.
1298
1299As the result, the attributes assignment to `t/abc` becomes:
1300
1301----------------------------------------------------------------
1302foo     set to true
1303bar     unspecified
1304baz     set to false
1305merge   set to string value "filfre"
1306frotz   unspecified
1307----------------------------------------------------------------
1308
1309
1310SEE ALSO
1311--------
1312linkgit:git-check-attr[1].
1313
1314GIT
1315---
1316Part of the linkgit:git[1] suite