rerere: adjust 'forget' to multi-variant world order
Because conflicts with the same contents inside conflict blocks
enclosed by "<<<<<<<" and ">>>>>>>" can now have multiple variants
to help three-way merge to adjust to the differences outside the
conflict blocks, "rerere forget $path" needs to be taught that there
may be multiple recorded resolutions that share the same conflict
hash (which groups the conflicts with "the same contents inside
conflict blocks"), among which there are some that would not be
relevant to the conflict we are looking at. These "other variants"
that happen to share the same conflict hash should not be cleared,
and the variant that would apply to the current conflict may not be
the zero-th one (which is the only one that is cleared by the
current code).
After finding the conflict hash, iterate over the existing variants
and try to resolve the conflict using each of them to find the one
that "cleanly" resolves the current conflict. That is the one we
want to forget and record the preimage for, so that the user can
record the corrected resolution.
The merge() helper function is given an existing rerere ID (i.e. the
name of the .git/rr-cache/* subdirectory, and the variant number)
that identifies one <preimage, postimage> pair, try to see if the
conflicted state in the given path can be resolved by using the pair,
and if this succeeds, then update the conflicted path with the
result in the working tree.
To implement rerere_forget() in the multiple variant world, we'd
need a helper to do the "see if a <preimage, postimage> pair cleanly
resolves a conflicted state we have in-core" part, without actually
touching any file in the working tree, in order to identify which
variant(s) to remove. Split the logic to do so into a separate
helper function try_merge() out of merge().
"rerere forget" is the only user of handle_cache() helper, which in
turn is the only user of rerere_io that reads from an in-core buffer
whose getline method is implemented as rerere_mem_getline(). Gather
them together.
This enables the multiple-variant support for real. Multiple
conflicts of the same shape can have differences in contexts where
they appear, interfering the replaying of recorded resolution of one
conflict to another, and in such a case, their resolutions are
recorded as different variants under the same conflict ID.
We still need to adjust garbage collection codepaths for this
change, but the basic "replay" functionality is functional with
this change.
t4200: rerere a merge with two identical conflicts
When the context of multiple identical conflicts are different, two
seemingly the same conflict resolution cannot be safely applied.
In such a case, at least we should be able to record these two
resolutions separately in the rerere database, and reuse them when
we see the same conflict later.
The shape of the conflict in a path determines the conflict ID. The
preimage and postimage pair that was recorded for the conflict ID
previously may or may not replay well for the conflict we just saw.
Currently, we punt when the previous resolution does not cleanly
replay, but ideally we should then be able to record the currently
conflicted path by assigning a new 'variant', and then record the
resolution the user is going to make.
Introduce a mechanism to have more than one variant for a given
conflict ID; we do not actually assign any variant other than 0th
variant yet at this step.
We record the preimage only when there is no directory to record the
conflict we encountered, i.e. when $GIT_DIR/rr-cache/$ID does not
exist. As the plan is to allow multiple <preimage,postimage> pairs
as variants for the same conflict ID eventually, this logic needs to
go.
As the first step in that direction, stop the "did we create the
directory? Then we record the preimage" logic. Instead, we record
if a preimage does not exist when we saw a conflict in a path. Also
make sure that we remove a stale postimage, which most likely is
totally unrelated to the resolution of this new conflict, when we
create a new preimage under $ID when $GIT_DIR/rr-cache/$ID already
exists.
In later patches, we will further update this logic to be "do we
have <preimage,postimage> pair that cleanly resolve the current
conflicts? If not, record a new preimage as a new variant", but
that does not happen at this stage yet.
rerere: handle leftover rr-cache/$ID directory and postimage files
If by some accident there is only $GIT_DIR/rr-cache/$ID directory
existed, we wouldn't have recorded a preimage for a conflict that
is newly encountered, which would mean after a manual resolution,
we wouldn't have recorded it by storing the postimage, because the
logic used to be "if there is no rr-cache/$ID directory, then we are
the first so record the preimage". Instead, record preimage if we
do not have one.
In addition, if there is only $GIT_DIR/rr-cache/$ID/postimage
without corresponding preimage, we would have tried to call into
merge() and punted.
These would have been a situation frustratingly hard to recover
from.
rerere: scan $GIT_DIR/rr-cache/$ID when instantiating a rerere_id
This will help fixing bootstrap corner-case issues, e.g. having an
empty $GIT_DIR/rr-cache/$ID directory would fail to record a
preimage, in later changes in this series.
The plan is to keep assigning the backward compatible conflict ID
based on the hash of the (normalized) text of conflicts, keep using
that conflict ID as the directory name under $GIT_DIR/rr-cache/, but
allow each conflicted path to use a separate "variant" to record
resolutions, i.e. having more than one <preimage,postimage> pairs
under $GIT_DIR/rr-cache/$ID/ directory. As the first step in that
direction, separate the shared "conflict ID" out of the rerere_id
structure.
The plan is to keep information per $ID in rerere_dir, that can be
shared among rerere_id that is per conflicted path.
When we are done with rerere(), which can be directly called from
other programs like "git apply", "git commit" and "git merge", the
shared rerere_dir structures can be freed entirely, so they are not
reference-counted and they are not freed when we release rerere_id's
that reference them.
This shouldn't overflow, as we are copying a sha1 hex into a
41-byte buffer. But it does not hurt to use a bound-checking
function, which protects us and makes auditing for overflows
easier.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rerere: use "struct rerere_id" instead of "char *" for conflict ID
This gives a thin abstraction between the conflict ID that is a hash
value obtained by inspecting the conflicts and the name of the
directory under $GIT_DIR/rr-cache/, in which the previous resolution
is recorded to be replayed. The plan is to make sure that the
presence of the directory does not imply the presense of a previous
resolution and vice-versa, and later allow us to have more than one
pair of <preimage, postimage> for a given conflict ID.
rerere: refactor "replay" part of do_plain_rerere()
Extract the body of a loop that attempts to replay recorded
resolution for each conflicted path into a helper function, not
because I want to call it from multiple places later, but because
the logic has become too deeply nested and hard to read.
rerere: fix benign off-by-one non-bug and clarify code
rerere_io_putconflict() wants to use a limited fixed-sized buf[] on
stack repeatedly to formulate a longer string, but its implementation
is doubly confusing:
* When it knows that the whole thing fits in buf[], it wants to
fill early part of buf[] with conflict marker characters,
followed by a LF and a NUL. It miscounts the size of the buffer
by 1 and does not use the last byte of buf[].
* When it needs to show only the early part of a long conflict
marker string (because the whole thing does not fit in buf[]), it
adjusts the number of bytes shown in the current round in a
strange-looking way. It makes sure that this round does not emit
all bytes and leaves at least one byte to the next round, so that
"it all fits" case will pick up the rest and show the terminating
LF. While this is correct, one needs to stop and think for a
while to realize why it is correct without an explanation.
Fix the benign off-by-one, and add comments to explain the
strange-looking size adjustment.
rerere: do not leak mmfile[] for a path with multiple stage #1 entries
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Plug the leak by reading only the first one of the same stage
entries.
Strictly speaking, this fix does change the semantics, in that we
used to use the last stage #1 entry as the common ancestor when
doing the plain-vanilla three-way merge, but with the leak fix, we
will use the first stage #1 entry. But it is not a grave backward
compatibility breakage. Either way, we are arbitrarily picking one
of multiple stage #1 entries and using it, ignoring others, and
there is no meaning in the ordering of these stage #1 entries.
handle_cache() loops 3 times starting from an index entry that is
unmerged, while ignoring an entry for a path that is different from
what we are looking for.
As the index is sorted, once we see a different path, we know we saw
all stages for the path we are interested in. Just loop while we
see the same path and then break, instead of continuing for 3 times.
As the nature of the conflict marker line determines if there should
be a SP and label after it, the caller shouldn't have to pass the
parameter redundantly.
rerere: write out each record of MERGE_RR in one go
Instead of writing the hash for a conflict, a HT, and the path
with three separate write_in_full() calls, format them into a
single record into a strbuf and write it out in one go.
As a more recent "rerere remaining" codepath abuses the .util field
of the merge_rr data to store a sentinel token, make sure that
codepath does not call into this function (of course, "remaining" is
a read-only operation and currently does not call it).
The MERGE_RR file records a collection of NUL-terminated entries,
each of which consists of
- a hash that identifies the conflict
- a HT
- the pathname
We used to read this piece-by-piece, and worse yet, read the
pathname part a byte at a time into a fixed buffer of size PATH_MAX.
Instead, read a whole entry using strbuf_getwholeline() and parse
out the fields. This way, we issue fewer read(2) calls and more
importantly we do not have to limit the pathname to PATH_MAX.
The merge_rr string list stores the conflict ID (a hexadecimal
string that is used to index into $GIT_DIR/rr-cache) in the .util
field of its elements, and when do_plain_rerere() resolves a
conflict, the field is cleared. Also, when rerere_forget()
recomputes the conflict ID to updates the preimage file, the
conflict ID for the path is updated.
We forgot to free the existing conflict ID when we did these two
operations.
When ac49f5ca (rerere "remaining", 2011-02-16) split out a new
helper function check_one_conflict() out of find_conflict()
function, so that the latter will use the returned value from the
new helper to update the loop control variable that is an index into
active_cache[], the new variable incremented the index by one too
many when it found a path with only stage #1 entry at the very end
of active_cache[].
This "strange" return value does not have any effect on the loop
control of two callers of this function, as they all notice that
active_nr+2 is larger than active_nr just like active_nr+1 is, but
nevertheless it puzzles the readers when they are trying to figure
out what the function is trying to do.
In fact, there is no need to do an early return. The code that
follows after skipping the stage #1 entry is fully prepared to
handle a case where the entry is at the very end of active_cache[].
Help future readers from unnecessary confusion by dropping an early
return. We skip the stage #1 entry, and if there are stage #2 and
stage #3 entries for the same path, we diagnose the path as
THREE_STAGED (otherwise we say PUNTED), and then we skip all entries
for the same path.
Merge branch 'jc/diff-no-index-d-f' into maint-2.3
The usual "git diff" when seeing a file turning into a directory
showed a patchset to remove the file and create all files in the
directory, but "git diff --no-index" simply refused to work. Also,
when asked to compare a file and a directory, imitate POSIX "diff"
and compare the file with the file with the same name in the
directory, instead of refusing to run.
* jc/diff-no-index-d-f:
diff-no-index: align D/F handling with that of normal Git
diff-no-index: DWIM "diff D F" into "diff D/F F"
Merge branch 'tb/connect-ipv6-parse-fix' into maint
An earlier update to the parser that disects a URL broke an
address, followed by a colon, followed by an empty string (instead
of the port number), e.g. ssh://example.com:/path/to/repo.
* tb/connect-ipv6-parse-fix:
connect.c: ignore extra colon after hostname
The "git push --signed" protocol extension did not limit what the
"nonce" that is a server-chosen string can contain or how long it
can be, which was unnecessarily lax. Limit both the length and the
alphabet to a reasonably small space that can still have enough
entropy.
* jc/push-cert:
push --signed: tighten what the receiving end can ask to sign
"diff-highlight" (in contrib/) used to show byte-by-byte
differences, which meant that multi-byte characters can be chopped
in the middle. It learned to pay attention to character boundaries
(assuming the UTF-8 payload).
* jk/colors:
diff-highlight: do not split multibyte characters
* jk/test-annoyances:
t5551: make EXPENSIVE test cheaper
t5541: move run_with_cmdline_limit to test-lib.sh
t: pass GIT_TRACE through Apache
t: redirect stderr GIT_TRACE to descriptor 4
t: translate SIGINT to an exit
The old wording was somehow implying that <start> and <end> were not
regular expressions. Also, the common case is to use a plain function
name here so <funcname> makes sense (the fact that it is a regular
expression is documented in line-range-format.txt).
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_date_basic(): let the system handle DST conversion
The function parses the input to compute the broken-down time in
"struct tm", and the GMT timezone offset. If the timezone offset
does not exist in the input, the broken-down time is turned into the
number of seconds since epoch both in the current timezone and in
GMT and the offset is computed as their difference.
However, we forgot to make sure tm.tm_isdst is set to -1 (i.e. let
the system figure out if DST is in effect in the current timezone
when turning the broken-down time to the number of seconds since
epoch); it is done so at the beginning of the function, but a call
to match_digit() in the function can lead to a call to gmtime_r() to
clobber the field.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Diagnosed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_date_basic(): return early when given a bogus timestamp
When the input does not have GMT timezone offset, the code computes
it by computing the local and GMT time for the given timestamp. But
there is no point doing so if the given timestamp is known to be a
bogus one.
Changed inaccurate count of "rough rules" from three to the more
generic 'a few'.
Signed-off-by: Julian Gindi <juliangindi@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ignore an extra ':' at the end of the hostname in URL's like
"ssh://example.com:/path/to/repo"
The colon is meant to separate a port number from the hostname.
If the port is empty, the colon should be ignored, see RFC 3986.
It had been working for URLs with ssh:// scheme, but was unintentionally
broken in 86ceb3, "allow ssh://user@[2001:db8::1]/repo.git"
Reported-by: Reid Woodbury Jr. <reidw@rawsound.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the input is UTF-8 and Perl is operating on bytes instead of
characters, a diff that changes one multibyte character to another
that shares an initial byte sequence will result in a broken diff
display as the common byte sequence prefix will be separated from
the rest of the bytes in the multibyte character.
For example, if a single line contains only the unicode character
U+C9C4 (encoded as UTF-8 0xEC, 0xA7, 0x84) and that line is then
changed to the unicode character U+C9C0 (encoded as UTF-8 0xEC,
0xA7, 0x80), when operating on bytes diff-highlight will show only
the single byte change from 0x84 to 0x80 thus creating invalid UTF-8
and a broken diff display.
Fix this by putting Perl into character mode when splitting the line
and then back into byte mode after the split is finished.
The utf8::xxx functions require Perl 5.8 so we require that as well.
Also, since we are mucking with code in the split_line function, we
change a '*' quantifier to a '+' quantifier when matching the $COLOR
expression which has the side effect of speeding everything up while
eliminating useless '' elements in the returned array.
Reported-by: Yi EungJun <semtlenori@gmail.com> Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
push --signed: tighten what the receiving end can ask to sign
Instead of blindly trusting the receiving side to give us a sensible
nonce to sign, limit the length (max 256 bytes) and the alphabet
(alnum and a few selected punctuations, enough to encode in base64)
that can be used in nonce.
howto: document more tools for recovery corruption
Long ago, I documented a corruption recovery I did and gave
some C code that I used to help find a flipped bit. I had
to fix a similar case recently, and I ended up writing a few
more tools. I hope nobody ever has to use these, but it
does not hurt to share them, just in case.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merges with an absurd number of parents are still a bad idea because
they do not render well in tools like gitk, but if they are present
in the repository being imported into git then there's no need to
avoid reproducing them faithfully.
In olden times, before v1.6.0-rc0~194 (2008-06-27), git commit-tree
and higher-level tools built on top of it were limited to writing 16
parents for a commit. Nowadays normal git operations are happy to
write more parents when asked, so the motivation for this note in the
fast-import documentation is gone and we can remove it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb.conf.txt: say "build-time", not "built-time"
"build-time" is used everywhere else.
Signed-off-by: Jérôme Zago <git-patch@agt-the-walker.net> Reviewed-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b3256eb (standardize and improve lookup rules for external local
repos), enter_repo() was modified to use a different precedence
ordering of suffixes for DWIM of the repository path, and to ensure
that the repository path is actually valid instead of just testing
for existence.
However, the documentation was not modified to reflect these
changes. Fix the documentation to match the code.
Documentation contributed by Jeff King.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Paul Tan <pyokagan@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cherry-pick: fix docs describing handling of empty commits
Commit b27cfb0 (git-cherry-pick: Add keep-redundant-commits
option, 2012-04-20), added the --keep-redundant-commits
option, and switched the default behavior (without that
option) to silently ignore empty commits. Later, the second
half of that commit was reverted in ac2b0e8 (cherry-pick:
regression fix for empty commits, 2012-05-29), but the
documentation added for --keep-redundant-commits was never
updated to match. Let's do so now.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
docs: clarify what git-rebase's "-p" / "--preserve-merges" does
Ignoring a merge can be read as ignoring the changes a merge commit
introduces altogether, as if the entire side branch the merge commit
merged was removed from the history. But that is not what happens
if "-p" is not specified. What happens is that the individual
commits a merge commit introduces are replayed in order, and only
any possible merge conflict resolutions or manual amendments to the
merge commit are ignored.
Get this straight in the docs.
Also, do not say that merge commits are *tried* to be recreated. As that is
true almost everywhere it is better left unsaid.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse-options.h: OPTION_{BIT,SET_INT} do not store pointer to defval
When 20d1c652 (parse-options: remove unused OPT_SET_PTR, 2014-03-30)
removed OPT_SET_PTR, the comment in the header that describes what
the option did to defval field was left behind by mistake. Remove
it.
Signed-off-by: Ivan Ukhov <ivan.ukhov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
An failure early in the "git clone" that started creating the
working tree and repository could have resulted in some directories
and files left without getting cleaned up.
* jk/cleanup-failed-clone:
clone: drop period from end of die_errno message
clone: initialize atexit cleanup handler earlier
"git fetch" that fetches a commit using the allow-tip-sha1-in-want
extension could have failed to fetch all the requested refs.
* jk/fetch-pack:
fetch-pack: remove dead assignment to ref->new_sha1
fetch_refs_via_pack: free extra copy of refs
filter_ref: make a copy of extra "sought" entries
filter_ref: avoid overwriting ref->old_sha1 with garbage
Merge branch 'tg/fix-check-order-with-split-index' into maint
The split-index mode introduced at v2.3.0-rc0~41 was broken in the
codepath to protect us against a broken reimplementation of Git
that writes an invalid index with duplicated index entries, etc.
* tg/fix-check-order-with-split-index:
read-cache: fix reading of split index
Merge branch 'jk/prune-with-corrupt-refs' into maint
"git prune" used to largely ignore broken refs when deciding which
objects are still being used, which could spread an existing small
damage and make it a larger one.
* jk/prune-with-corrupt-refs:
refs.c: drop curate_packed_refs
repack: turn on "ref paranoia" when doing a destructive repack
prune: turn on ref_paranoia flag
refs: introduce a "ref paranoia" flag
t5312: test object deletion code paths in a corrupted repository
Merge branch 'js/completion-ctags-pattern-substitution-fix' into maint
The code that reads from the ctags file in the completion script
(in contrib/) did not spell ${param/pattern/string} substitution
correctly, which happened to work with bash but not with zsh.
* js/completion-ctags-pattern-substitution-fix:
contrib/completion: escape the forward slash in __git_match_ctag
diff-no-index: align D/F handling with that of normal Git
When a commit changes a path P that used to be a file to a directory
and creates a new path P/X in it, "git show" would say that file P
was removed and file P/X was created for such a commit.
However, if we compare two directories, D1 and D2, where D1 has a
file D1/P in it and D2 has a directory D2/P under which there is a
file D2/P/X, and ask "git diff --no-index D1 D2" to show their
differences, we simply get a refusal "file/directory conflict".
Surely, that may be what GNU diff does, but we can do better and it
is easy to do so.
docs: clarify "preserve" option wording for git-pull
The "also" sounds as if "preserve" does a rebase as an additional
step that "true" would not do, but that is not the case. Clarify
this by omitting "also", and rewording the sentence a bit.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The help text for the --force-with-lease option to git-push
does not parse cleanly. Clean up the wording and syntax to
be more sensible. Also remove redundant information in the
"--force-with-lease alone" description.
Signed-off-by: Phil Hord <hordp@cisco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --no-index" was supposed to be a poor-man's approach to
allow using Git diff goodies outside of a Git repository, without
having to patch mainstream diff implementations.
Unlike a POSIX diff that treats "diff D F" (or "diff F D") as a
request to compare D/F and F (or F and D/F) when D is a directory
and F is a file, however, we did not accept such a command line and
instead barfed with "file/directory conflict".
Imitate what POSIX diff does and append the basename of the file
after the name of the directory before comparing.
The expected call sequence is for the caller to use match_pathspec()
repeatedly on a set of pathspecs, accumulating the "hits" in a
separate array, and then call this function to diagnose a pathspec
that never matched anything, as that can indicate a typo from the
command line, e.g. "git commit Maekfile".
Many builtin commands use this function from builtin/ls-files.c,
which is not a very healthy arrangement. ls-files might have been
the first command to feel the need for such a helper, but the need
is shared by everybody who uses the "match and then report" pattern.
Move it to dir.c where match_pathspec() is defined.
At the first look, a user may think the default version is "23". Even
with UNIX background, there's no reference anywhere close that may
indicate this is glob or regex.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>