1diff-highlight 2============== 3 4Line oriented diffs are great for reviewing code, because for most 5hunks, you want to see the old and the new segments of code next to each 6other. Sometimes, though, when an old line and a new line are very 7similar, it's hard to immediately see the difference. 8 9You can use "--color-words" to highlight only the changed portions of 10lines. However, this can often be hard to read for code, as it loses 11the line structure, and you end up with oddly formatted bits. 12 13Instead, this script post-processes the line-oriented diff, finds pairs 14of lines, and highlights the differing segments. It's currently very 15simple and stupid about doing these tasks. In particular: 16 17 1. It will only highlight hunks in which the number of removed and 18 added lines is the same, and it will pair lines within the hunk by 19 position (so the first removed line is compared to the first added 20 line, and so forth). This is simple and tends to work well in 21 practice. More complex changes don't highlight well, so we tend to 22 exclude them due to the "same number of removed and added lines" 23 restriction. Or even if we do try to highlight them, they end up 24 not highlighting because of our "don't highlight if the whole line 25 would be highlighted" rule. 26 27 2. It will find the common prefix and suffix of two lines, and 28 consider everything in the middle to be "different". It could 29 instead do a real diff of the characters between the two lines and 30 find common subsequences. However, the point of the highlight is to 31 call attention to a certain area. Even if some small subset of the 32 highlighted area actually didn't change, that's OK. In practice it 33 ends up being more readable to just have a single blob on the line 34 showing the interesting bit. 35 36The goal of the script is therefore not to be exact about highlighting 37changes, but to call attention to areas of interest without being 38visually distracting. Non-diff lines and existing diff coloration is 39preserved; the intent is that the output should look exactly the same as 40the input, except for the occasional highlight. 41 42Use 43--- 44 45You can try out the diff-highlight program with: 46 47--------------------------------------------- 48git log -p --color | /path/to/diff-highlight 49--------------------------------------------- 50 51If you want to use it all the time, drop it in your $PATH and put the 52following in your git configuration: 53 54--------------------------------------------- 55[pager] 56 log = diff-highlight | less 57 show = diff-highlight | less 58 diff = diff-highlight | less 59--------------------------------------------- 60 61 62Color Config 63------------ 64 65You can configure the highlight colors and attributes using git's 66config. The colors for "old" and "new" lines can be specified 67independently. There are two "modes" of configuration: 68 69 1. You can specify a "highlight" color and a matching "reset" color. 70 This will retain any existing colors in the diff, and apply the 71 "highlight" and "reset" colors before and after the highlighted 72 portion. 73 74 2. You can specify a "normal" color and a "highlight" color. In this 75 case, existing colors are dropped from that line. The non-highlighted 76 bits of the line get the "normal" color, and the highlights get the 77 "highlight" color. 78 79If no "new" colors are specified, they default to the "old" colors. If 80no "old" colors are specified, the default is to reverse the foreground 81and background for highlighted portions. 82 83Examples: 84 85--------------------------------------------- 86# Underline highlighted portions 87[color "diff-highlight"] 88oldHighlight = ul 89oldReset = noul 90--------------------------------------------- 91 92--------------------------------------------- 93# Varying background intensities 94[color "diff-highlight"] 95oldNormal = "black #f8cbcb" 96oldHighlight = "black #ffaaaa" 97newNormal = "black #cbeecb" 98newHighlight = "black #aaffaa" 99--------------------------------------------- 100 101 102Using diff-highlight as a module 103-------------------------------- 104 105If you want to pre- or post- process the highlighted lines as part of 106another perl script, you can use the DiffHighlight module. You can 107either "require" it or just cat the module together with your script (to 108avoid run-time dependencies). 109 110Your script may set up one or more of the following variables: 111 112 - $DiffHighlight::line_cb - this should point to a function which is 113 called whenever DiffHighlight has lines (which may contain 114 highlights) to output. The default function prints each line to 115 stdout. Note that the function may be called with multiple lines. 116 117 - $DiffHighlight::flush_cb - this should point to a function which 118 flushes the output (because DiffHighlight believes it has completed 119 processing a logical chunk of input). The default function flushes 120 stdout. 121 122The script may then feed lines, one at a time, to DiffHighlight::handle_line(). 123When lines are done processing, they will be fed to $line_cb. Note that 124DiffHighlight may queue up many input lines (to analyze a whole hunk) 125before calling $line_cb. After providing all lines, call 126DiffHighlight::flush() to flush any unprocessed lines. 127 128If you just want to process stdin, DiffHighlight::highlight_stdin() 129is a convenience helper which will loop and flush for you. 130 131 132Bugs 133---- 134 135Because diff-highlight relies on heuristics to guess which parts of 136changes are important, there are some cases where the highlighting is 137more distracting than useful. Fortunately, these cases are rare in 138practice, and when they do occur, the worst case is simply a little 139extra highlighting. This section documents some cases known to be 140sub-optimal, in case somebody feels like working on improving the 141heuristics. 142 1431. Two changes on the same line get highlighted in a blob. For example, 144 highlighting: 145 146---------------------------------------------- 147-foo(buf, size); 148+foo(obj->buf, obj->size); 149---------------------------------------------- 150 151 yields (where the inside of "+{}" would be highlighted): 152 153---------------------------------------------- 154-foo(buf, size); 155+foo(+{obj->buf, obj->}size); 156---------------------------------------------- 157 158 whereas a more semantically meaningful output would be: 159 160---------------------------------------------- 161-foo(buf, size); 162+foo(+{obj->}buf, +{obj->}size); 163---------------------------------------------- 164 165 Note that doing this right would probably involve a set of 166 content-specific boundary patterns, similar to word-diff. Otherwise 167 you get junk like: 168 169----------------------------------------------------- 170-this line has some -{i}nt-{ere}sti-{ng} text on it 171+this line has some +{fa}nt+{a}sti+{c} text on it 172----------------------------------------------------- 173 174 which is less readable than the current output. 175 1762. The multi-line matching assumes that lines in the pre- and post-image 177 match by position. This is often the case, but can be fooled when a 178 line is removed from the top and a new one added at the bottom (or 179 vice versa). Unless the lines in the middle are also changed, diffs 180 will show this as two hunks, and it will not get highlighted at all 181 (which is good). But if the lines in the middle are changed, the 182 highlighting can be misleading. Here's a pathological case: 183 184----------------------------------------------------- 185-one 186-two 187-three 188-four 189+two 2 190+three 3 191+four 4 192+five 5 193----------------------------------------------------- 194 195 which gets highlighted as: 196 197----------------------------------------------------- 198-one 199-t-{wo} 200-three 201-f-{our} 202+two 2 203+t+{hree 3} 204+four 4 205+f+{ive 5} 206----------------------------------------------------- 207 208 because it matches "two" to "three 3", and so forth. It would be 209 nicer as: 210 211----------------------------------------------------- 212-one 213-two 214-three 215-four 216+two +{2} 217+three +{3} 218+four +{4} 219+five 5 220----------------------------------------------------- 221 222 which would probably involve pre-matching the lines into pairs 223 according to some heuristic.