From: Phillip Wood Date: Fri, 23 Nov 2018 11:16:56 +0000 (+0000) Subject: diff --color-moved-ws: optimize allow-indentation-change X-Git-Tag: v2.21.0-rc0~70^2~2 X-Git-Url: https://git.lorimer.id.au/gitweb.git/diff_plain/7a4252c4df49fe07bf91dbb5be2c6012f6a65329?hp=b0a2ba47761fa7bffb5a33e5a76f85da50a00ba5 diff --color-moved-ws: optimize allow-indentation-change When running git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0 cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7% return after comparing a and b. By comparing the lengths first we can return early in all but 0.03% of those cases without dereferencing the string pointers. The comparison between a and c fails in 6.8% of calls, by comparing the lengths first we reject all the failing calls without dereferencing the string pointers. This reduces the time to run the command above by by 42% from 14.6s to 8.5s. This is still much slower than the normal --color-moved which takes ~0.6-0.7s to run but is a significant improvement. The next commits will replace the current implementation with one that works with mixed tabs and spaces in the indentation. I think it is worth optimizing the current implementation first to enable a fair comparison between the two implementations. Signed-off-by: Phillip Wood Reviewed-by: Stefan Beller Signed-off-by: Junio C Hamano --- diff --git a/diff.c b/diff.c index b648f67413..4ee58012e5 100644 --- a/diff.c +++ b/diff.c @@ -831,20 +831,23 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, int n) { struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n]; - int al = cur->es->len, cl = l->len; + int al = cur->es->len, bl = match->es->len, cl = l->len; const char *a = cur->es->line, *b = match->es->line, *c = l->line; - + const char *orig_a = a; int wslen; /* - * We need to check if 'cur' is equal to 'match'. - * As those are from the same (+/-) side, we do not need to adjust for - * indent changes. However these were found using fuzzy matching - * so we do have to check if they are equal. + * We need to check if 'cur' is equal to 'match'. As those + * are from the same (+/-) side, we do not need to adjust for + * indent changes. However these were found using fuzzy + * matching so we do have to check if they are equal. Here we + * just check the lengths. We delay calling memcmp() to check + * the contents until later as if the length comparison for a + * and c fails we can avoid the call all together. */ - if (strcmp(a, b)) + if (al != bl) return 1; if (!pmb->wsd.string) @@ -872,7 +875,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o, al -= wslen; } - if (al != cl || memcmp(a, c, al)) + if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al)) return 1; return 0;