contrib / diff-highlight / READMEon commit Merge branch 'maint' (ee6e4c7)
   1diff-highlight
   2==============
   3
   4Line oriented diffs are great for reviewing code, because for most
   5hunks, you want to see the old and the new segments of code next to each
   6other. Sometimes, though, when an old line and a new line are very
   7similar, it's hard to immediately see the difference.
   8
   9You can use "--color-words" to highlight only the changed portions of
  10lines. However, this can often be hard to read for code, as it loses
  11the line structure, and you end up with oddly formatted bits.
  12
  13Instead, this script post-processes the line-oriented diff, finds pairs
  14of lines, and highlights the differing segments.  It's currently very
  15simple and stupid about doing these tasks. In particular:
  16
  17  1. It will only highlight hunks in which the number of removed and
  18     added lines is the same, and it will pair lines within the hunk by
  19     position (so the first removed line is compared to the first added
  20     line, and so forth). This is simple and tends to work well in
  21     practice. More complex changes don't highlight well, so we tend to
  22     exclude them due to the "same number of removed and added lines"
  23     restriction. Or even if we do try to highlight them, they end up
  24     not highlighting because of our "don't highlight if the whole line
  25     would be highlighted" rule.
  26
  27  2. It will find the common prefix and suffix of two lines, and
  28     consider everything in the middle to be "different". It could
  29     instead do a real diff of the characters between the two lines and
  30     find common subsequences. However, the point of the highlight is to
  31     call attention to a certain area. Even if some small subset of the
  32     highlighted area actually didn't change, that's OK. In practice it
  33     ends up being more readable to just have a single blob on the line
  34     showing the interesting bit.
  35
  36The goal of the script is therefore not to be exact about highlighting
  37changes, but to call attention to areas of interest without being
  38visually distracting.  Non-diff lines and existing diff coloration is
  39preserved; the intent is that the output should look exactly the same as
  40the input, except for the occasional highlight.
  41
  42Use
  43---
  44
  45You can try out the diff-highlight program with:
  46
  47---------------------------------------------
  48git log -p --color | /path/to/diff-highlight
  49---------------------------------------------
  50
  51If you want to use it all the time, drop it in your $PATH and put the
  52following in your git configuration:
  53
  54---------------------------------------------
  55[pager]
  56        log = diff-highlight | less
  57        show = diff-highlight | less
  58        diff = diff-highlight | less
  59---------------------------------------------
  60
  61
  62Color Config
  63------------
  64
  65You can configure the highlight colors and attributes using git's
  66config. The colors for "old" and "new" lines can be specified
  67independently. There are two "modes" of configuration:
  68
  69  1. You can specify a "highlight" color and a matching "reset" color.
  70     This will retain any existing colors in the diff, and apply the
  71     "highlight" and "reset" colors before and after the highlighted
  72     portion.
  73
  74  2. You can specify a "normal" color and a "highlight" color. In this
  75     case, existing colors are dropped from that line. The non-highlighted
  76     bits of the line get the "normal" color, and the highlights get the
  77     "highlight" color.
  78
  79If no "new" colors are specified, they default to the "old" colors. If
  80no "old" colors are specified, the default is to reverse the foreground
  81and background for highlighted portions.
  82
  83Examples:
  84
  85---------------------------------------------
  86# Underline highlighted portions
  87[color "diff-highlight"]
  88oldHighlight = ul
  89oldReset = noul
  90---------------------------------------------
  91
  92---------------------------------------------
  93# Varying background intensities
  94[color "diff-highlight"]
  95oldNormal = "black #f8cbcb"
  96oldHighlight = "black #ffaaaa"
  97newNormal = "black #cbeecb"
  98newHighlight = "black #aaffaa"
  99---------------------------------------------
 100
 101
 102Bugs
 103----
 104
 105Because diff-highlight relies on heuristics to guess which parts of
 106changes are important, there are some cases where the highlighting is
 107more distracting than useful. Fortunately, these cases are rare in
 108practice, and when they do occur, the worst case is simply a little
 109extra highlighting. This section documents some cases known to be
 110sub-optimal, in case somebody feels like working on improving the
 111heuristics.
 112
 1131. Two changes on the same line get highlighted in a blob. For example,
 114   highlighting:
 115
 116----------------------------------------------
 117-foo(buf, size);
 118+foo(obj->buf, obj->size);
 119----------------------------------------------
 120
 121   yields (where the inside of "+{}" would be highlighted):
 122
 123----------------------------------------------
 124-foo(buf, size);
 125+foo(+{obj->buf, obj->}size);
 126----------------------------------------------
 127
 128   whereas a more semantically meaningful output would be:
 129
 130----------------------------------------------
 131-foo(buf, size);
 132+foo(+{obj->}buf, +{obj->}size);
 133----------------------------------------------
 134
 135   Note that doing this right would probably involve a set of
 136   content-specific boundary patterns, similar to word-diff. Otherwise
 137   you get junk like:
 138
 139-----------------------------------------------------
 140-this line has some -{i}nt-{ere}sti-{ng} text on it
 141+this line has some +{fa}nt+{a}sti+{c} text on it
 142-----------------------------------------------------
 143
 144   which is less readable than the current output.
 145
 1462. The multi-line matching assumes that lines in the pre- and post-image
 147   match by position. This is often the case, but can be fooled when a
 148   line is removed from the top and a new one added at the bottom (or
 149   vice versa). Unless the lines in the middle are also changed, diffs
 150   will show this as two hunks, and it will not get highlighted at all
 151   (which is good). But if the lines in the middle are changed, the
 152   highlighting can be misleading. Here's a pathological case:
 153
 154-----------------------------------------------------
 155-one
 156-two
 157-three
 158-four
 159+two 2
 160+three 3
 161+four 4
 162+five 5
 163-----------------------------------------------------
 164
 165   which gets highlighted as:
 166
 167-----------------------------------------------------
 168-one
 169-t-{wo}
 170-three
 171-f-{our}
 172+two 2
 173+t+{hree 3}
 174+four 4
 175+f+{ive 5}
 176-----------------------------------------------------
 177
 178   because it matches "two" to "three 3", and so forth. It would be
 179   nicer as:
 180
 181-----------------------------------------------------
 182-one
 183-two
 184-three
 185-four
 186+two +{2}
 187+three +{3}
 188+four +{4}
 189+five 5
 190-----------------------------------------------------
 191
 192   which would probably involve pre-matching the lines into pairs
 193   according to some heuristic.