Documentation / i18n.txton commit Merge branch 'cc/apply-introduce-state' (0bbda4b)
   1Git is to some extent character encoding agnostic.
   2
   3 - The contents of the blob objects are uninterpreted sequences
   4   of bytes.  There is no encoding translation at the core
   5   level.
   6
   7 - Path names are encoded in UTF-8 normalization form C. This
   8   applies to tree objects, the index file, ref names, as well as
   9   path names in command line arguments, environment variables
  10   and config files (`.git/config` (see linkgit:git-config[1]),
  11   linkgit:gitignore[5], linkgit:gitattributes[5] and
  12   linkgit:gitmodules[5]).
  13+
  14Note that Git at the core level treats path names simply as
  15sequences of non-NUL bytes, there are no path name encoding
  16conversions (except on Mac and Windows). Therefore, using
  17non-ASCII path names will mostly work even on platforms and file
  18systems that use legacy extended ASCII encodings. However,
  19repositories created on such systems will not work properly on
  20UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
  21Additionally, many Git-based tools simply assume path names to
  22be UTF-8 and will fail to display other encodings correctly.
  23
  24 - Commit log messages are typically encoded in UTF-8, but other
  25   extended ASCII encodings are also supported. This includes
  26   ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
  27   EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
  28   EUC-x, CP9xx etc.).
  29
  30Although we encourage that the commit log messages are encoded
  31in UTF-8, both the core and Git Porcelain are designed not to
  32force UTF-8 on projects.  If all participants of a particular
  33project find it more convenient to use legacy encodings, Git
  34does not forbid it.  However, there are a few things to keep in
  35mind.
  36
  37. 'git commit' and 'git commit-tree' issues
  38  a warning if the commit log message given to it does not look
  39  like a valid UTF-8 string, unless you explicitly say your
  40  project uses a legacy encoding.  The way to say this is to
  41  have i18n.commitencoding in `.git/config` file, like this:
  42+
  43------------
  44[i18n]
  45        commitencoding = ISO-8859-1
  46------------
  47+
  48Commit objects created with the above setting record the value
  49of `i18n.commitencoding` in its `encoding` header.  This is to
  50help other people who look at them later.  Lack of this header
  51implies that the commit log message is encoded in UTF-8.
  52
  53. 'git log', 'git show', 'git blame' and friends look at the
  54  `encoding` header of a commit object, and try to re-code the
  55  log message into UTF-8 unless otherwise specified.  You can
  56  specify the desired output encoding with
  57  `i18n.logoutputencoding` in `.git/config` file, like this:
  58+
  59------------
  60[i18n]
  61        logoutputencoding = ISO-8859-1
  62------------
  63+
  64If you do not have this configuration variable, the value of
  65`i18n.commitencoding` is used instead.
  66
  67Note that we deliberately chose not to re-code the commit log
  68message when a commit is made to force UTF-8 at the commit
  69object level, because re-coding to UTF-8 is not necessarily a
  70reversible operation.