It looks like in 89c3b0ad43 (name-hash: add perf test for lazy_init_name_hash,
2017-03-23) p0004 was not created with the execute unix rights.
Let's fix that.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lazy-init codepath will not be exercised uniless threaded. Skip
the entire test on a single-core box. Also replace a hard-coded
constant of 2000 (number of cache entries to manifacture for tests)
with a variable with a human readable name.
Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add check for the end of the entries for the thread partition.
Add test for lazy init name hash with specific directory structure
The lazy init hash name was causing a buffer overflow when the last
entry in the index was multiple folder deep with parent folders that
did not have any files in them.
This adds a test for the boundary condition of the thread partitions
with the folder structure that was triggering the buffer overflow.
The fix was to check if it is the last entry for the thread partition
in the handle_range_dir and not try to use the next entry in the cache.
Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Created t/perf/p0004-lazy-init-name-hash.sh test
to demonstrate correctness and performance gains
with the multithreaded version of lazy_init_name_hash().
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add t/helper/test-lazy-init-name-hash.c test code
to demonstrate performance times for lazy_init_name_hash()
using the original single-threaded and the new multi-threaded
code paths.
Includes a --dump option to dump the created hashmaps to
stdout. You can use this to run both code paths and
confirm that they generate the same hashmaps.
Includes a --analyze option to analyze performance of both
code paths over a range of index sizes to help you find a
lower bound for the LAZY_THREAD_COST in name-hash.c.
For example, passing "-a 4000" will set "istate.cache_nr"
to 4000 and then try the multi-threaded code -- probably
giving 2 threads with 2000 entries each. It will then
run both the single-threaded (1x4000) and the multi-threaded
(2x2000) and compare the times. It will then repeat the
test with 8000, 12000, and etc. so that you can see the
cross over.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
name-hash: perf improvement for lazy_init_name_hash
Improve performance of lazy_init_name_hash() when
ignore_case is set. Teach name-hash to build the
istate.name_hash and istate.dir_hash simultaneously
using a forward-diving technique on the pathname
of the index_entry, rather than adding name_hash
entries and then searching backwards in the pathname
for parent directories.
This borrows algorithm ideas from clear_ce_flags_{1,dir}.
Multiple threads are used with the new algorithm to
speed hashmap construction.
This new code path is only used when threads are present
(a compiler settings) and when the index is large enough
to warrant the pthread complexity.
The code in clear_ce_flags_dir() uses a linear search to
find the adjacent index entries with the same prefix; a
binary search is used here handle_range_dir() to further
speed things up.
The size of LAZY_THREAD_COST was determined from rough
analysis using:
t/helper/test-lazy-init-name-hash --analyze
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach hashmap to allow rehashes to be suppressed.
This is useful when hashmaps are accessed by multiple
threads. It still requires the caller to properly
manage their locking. This just prevents unexpected
rehashing during inserts and deletes.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashmap: allow memihash computation to be continued
Add variant of memihash() to allow the hash computation to
be continued. There are times when we compute the hash on
a full path and then the hash on just the path to the parent
directory. This can be expensive on large repositories.
With this, we can hash the parent directory first. And then
continue the computation to include the "/filename".
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
name-hash: specify initial size for istate.dir_hash table
Specify an initial size for the istate.dir_hash HashMap matching
the size of the istate.name_hash.
Previously hashmap_init() was given 0, causing a 64 bucket
hashmap to be created. When working with very large
repositories, this would cause numerous rehash() calls to
realloc and rebalance the hashmap. This is especially true
when the worktree is deep, with many directories containing
a few files.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/execv-dashed-external' into maint
Typing ^C to pager, which usually does not kill it, killed Git and
took the pager down as a collateral damage in certain process-tree
structure. This has been fixed.
* jk/execv-dashed-external:
execv_dashed_external: wait for child on signal death
execv_dashed_external: stop exiting with negative code
execv_dashed_external: use child_process struct
Merge branch 'jk/archive-zip-userdiff-config' into maint
"git archive" did not read the standard configuration files, and
failed to notice a file that is marked as binary via the userdiff
driver configuration.
Merge branch 'dt/disable-bitmap-in-auto-gc' into maint
It is natural that "git gc --auto" may not attempt to pack
everything into a single pack, and there is no point in warning
when the user has configured the system to use the pack bitmap,
leading to disabling further "gc".
* dt/disable-bitmap-in-auto-gc:
repack: die on incremental + write-bitmap-index
auto gc: don't write bitmaps for incremental repacks
Leakage of lockfiles in the config subsystem has been fixed.
* nd/config-misc-fixes:
config.c: handle lock file in error case in git_config_rename_...
config.c: rename label unlock_and_out
config.c: handle error case for fstat() calls
Compression setting for producing packfiles were spread across
three codepaths, one of which did not honor any configuration.
Unify these so that all of them honor core.compression and
pack.compression variables the same way.
Merge branch 'jk/make-tags-find-sources-tweak' into maint
Update the procedure to generate "tags" for developer support.
* jk/make-tags-find-sources-tweak:
Makefile: exclude contrib from FIND_SOURCE_FILES
Makefile: match shell scripts in FIND_SOURCE_FILES
Makefile: exclude test cruft from FIND_SOURCE_FILES
Makefile: reformat FIND_SOURCE_FILES
Some platforms no longer understand "latin-1" that is still seen in
the wild in e-mail headers; replace them with "iso-8859-1" that is
more widely known when conversion fails from/to it.
* jc/latin-1:
utf8: accept "latin-1" as ISO-8859-1
utf8: refactor code to decide fallback encoding
The `perforce` and `perforce-server` package were moved from brew [1][2]
to cask [3]. Teach TravisCI the new location.
Perforce updates their binaries without version bumps. That made the
brew install (legitimately!) fail due to checksum mismatches. The
workaround is not necessary anymore as Cask [4] allows to disable the
checksum test for individual formulas.
When looking for documentation for a specific function, you may be tempted
to run
git -C Documentation grep index_name_pos
only to find the file technical/api-in-core-index.txt, which doesn't
help for understanding the given function. It would be better to not find
these functions in the documentation, such that people directly dive into
the code instead.
In the previous patches we have documented
* index_name_pos()
* remove_index_entry_at()
* add_[file_]to_index()
in cache.h
We already have documentation for:
* add_index_entry()
* read_index()
Which leaves us with a TODO for:
* cache -> the_index macros
* refresh_index()
* discard_index()
* ie_match_stat() and ie_modified(); how they are different and when to
use which.
* write_index() that was renamed to write_locked_index
* cache_tree_invalidate_path()
* cache_tree_update()
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The version of the "replace isatty() hack" that got merged a few
weeks ago did not actually reflect the latest iteration of the patch
series: v3 was sent out with these changes, as requested by the
reviewer Johannes Sixt:
- reworded the comment about "recycling handles"
- moved the reassignment of the `console` variable before the dup2()
call so that it is valid at all times
- removed the "handle = INVALID_HANDLE_VALUE" assignment, as the local
variable `handle` is not used afterwards anyway
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kh/tutorial-grammofix:
doc: omit needless "for"
doc: make the intent of sentence clearer
doc: add verb in front of command to run
doc: add articles (grammar)
Merge branch 'dt/smart-http-detect-server-going-away' into maint
When the http server gives an incomplete response to a smart-http
rpc call, it could lead to client waiting for a full response that
will never come. Teach the client side to notice this condition
and abort the transfer.
An improvement counterproposal has failed.
cf. <20161114194049.mktpsvgdhex2f4zv@sigill.intra.peff.net>
* dt/smart-http-detect-server-going-away:
upload-pack: optionally allow fetching any sha1
remote-curl: don't hang when a server dies before any output
Merge branch 'gv/p4-multi-path-commit-fix' into maint
"git p4" that tracks multile p4 paths imported a single changelist
that touches files in these multiple paths as one commit, followed
by many empty commits. This has been fixed.
Even though an fix was attempted in Git 2.9.3 days, but running
"git difftool --dir-diff" from a subdirectory never worked. This
has been fixed.
* jk/difftool-in-subdir:
difftool: rename variables for consistency
difftool: chdir as early as possible
difftool: sanitize $workdir as early as possible
difftool: fix dir-diff index creation when in a subdirectory
Merge branch 'jc/push-default-explicit' into maint
A lazy "git push" without refspec did not internally use a fully
specified refspec to perform 'current', 'simple', or 'upstream'
push, causing unnecessary "ambiguous ref" errors.
* jc/push-default-explicit:
push: test pushing ambiguously named branches
push: do not use potentially ambiguous default refspec
Merge branch 'jk/index-pack-wo-repo-from-stdin' into maint
"git index-pack --stdin" needs an access to an existing repository,
but "git index-pack file.pack" to generate an .idx file that
corresponds to a packfile does not.
* jk/index-pack-wo-repo-from-stdin:
index-pack: skip collision check when not in repository
t: use nongit() function where applicable
index-pack: complain when --stdin is used outside of a repo
t5000: extract nongit function to test-lib-functions.sh
Merge branch 'jk/parseopt-usage-msg-opt' into maint
The function usage_msg_opt() has been updated to say "fatal:"
before the custom message programs give, when they want to die
with a message about wrong command line options followed by the
standard usage string.
* jk/parseopt-usage-msg-opt:
parse-options: print "fatal:" before usage_msg_opt()
Merge branch 'jk/quote-env-path-list-component' into maint
A recent update to receive-pack to make it easier to drop garbage
objects made it clear that GIT_ALTERNATE_OBJECT_DIRECTORIES cannot
have a pathname with a colon in it (no surprise!), and this in turn
made it impossible to push into a repository at such a path. This
has been fixed by introducing a quoting mechanism used when
appending such a path to the colon-separated list.
* jk/quote-env-path-list-component:
t5615-alternate-env: double-quotes in file names do not work on Windows
t5547-push-quarantine: run the path separator test on Windows, too
tmp-objdir: quote paths we add to alternates
alternates: accept double-quoted paths
Merge branch 'sb/sequencer-abort-safety' into maint
Unlike "git am --abort", "git cherry-pick --abort" moved HEAD back
to where cherry-pick started while picking multiple changes, when
the cherry-pick stopped to ask for help from the user, and the user
did "git reset --hard" to a different commit in order to re-attempt
the operation.
* sb/sequencer-abort-safety:
Revert "sequencer: remove useless get_dir() function"
sequencer: remove useless get_dir() function
sequencer: make sequencer abort safer
t3510: test that cherry-pick --abort does not unsafely change HEAD
am: change safe_to_abort()'s not rewinding error into a warning
am: fix filename in safe_to_abort() error message
"git pull --rebase", when there is no new commits on our side since
we forked from the upstream, should be able to fast-forward without
invoking "git rebase", but it didn't.
Merge branch 'ak/commit-only-allow-empty' into maint
"git commit --allow-empty --only" (no pathspec) with dirty index
ought to be an acceptable way to create a new commit that does not
change any paths, but it was forbidden, perhaps because nobody
needed it so far.
* ak/commit-only-allow-empty:
commit: remove 'Clever' message for --only --amend
commit: make --only --allow-empty work without paths
Merge branch 'jk/stash-disable-renames-internally' into maint
When diff.renames configuration is on (and with Git 2.9 and later,
it is enabled by default, which made it worse), "git stash"
misbehaved if a file is removed and another file with a very
similar content is added.
* jk/stash-disable-renames-internally:
stash: prefer plumbing over git-diff
Merge branch 'jk/http-walker-limit-redirect' into maint
Update the error messages from the dumb-http client when it fails
to obtain loose objects; we used to give sensible error message
only upon 404 but we now forbid unexpected redirects that needs to
be reported with something sensible.
* jk/http-walker-limit-redirect:
http-walker: complain about non-404 loose object errors
http: treat http-alternates like redirects
http: make redirects more obvious
remote-curl: rename shadowed options variable
http: always update the base URL for redirects
http: simplify update_url_from_redirect
Merge branch 'jc/renormalize-merge-kill-safer-crlf' into maint
Fix a corner case in merge-recursive regression that crept in
during 2.10 development cycle.
* jc/renormalize-merge-kill-safer-crlf:
convert: git cherry-pick -Xrenormalize did not work
merge-recursive: handle NULL in add_cacheinfo() correctly
cherry-pick: demonstrate a segmentation fault
The output from "git worktree list" was made in readdir() order,
and was unstable.
* nd/worktree-list-fixup:
worktree list: keep the list sorted
worktree.c: get_worktrees() takes a new flag argument
get_worktrees() must return main worktree as first item even on error
worktree: reorder an if statement
worktree.c: zero new 'struct worktree' on allocation
Merge branch 'hv/submodule-not-yet-pushed-fix' into maint
The code in "git push" to compute if any commit being pushed in the
superproject binds a commit in a submodule that hasn't been pushed
out was overly inefficient, making it unusable even for a small
project that does not have any submodule but have a reasonable
number of refs.
* hv/submodule-not-yet-pushed-fix:
submodule_needs_pushing(): explain the behaviour when we cannot answer
batch check whether submodule needs pushing into one call
serialize collection of refs that contain submodule changes
serialize collection of changed submodules
Merge branch 'dt/empty-submodule-in-merge' into maint
An empty directory in a working tree that can simply be nuked used
to interfere while merging or cherry-picking a change to create a
submodule directory there, which has been fixed..
* dt/empty-submodule-in-merge:
submodules: allow empty working-tree dirs in merge/cherry-pick
Update the isatty() emulation for Windows by updating the previous
hack that depended on internals of (older) MSVC runtime.
* js/mingw-isatty:
mingw: replace isatty() hack
mingw: fix colourization on Cygwin pseudo terminals
mingw: adjust is_console() to work with stdin
mingw: intercept isatty() to handle /dev/null as Git expects it
The character width table has been updated to match Unicode 9.0
* bb/unicode-9.0:
unicode_width.h: update the width tables to Unicode 9.0
update_unicode.sh: remove the plane filter
update_unicode.sh: automatically download newer definition files
update_unicode.sh: pin the uniset repo to a known good commit
update_unicode.sh: remove an unnecessary subshell level
update_unicode.sh: move it into contrib/update-unicode
There are some "gray areas" around when to omit braces from
a conditional or loop body. Since that seems to have
resulted in some arguments, let's be a little more clear
about our preferred style.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7810: avoid assumption about invalid regex syntax
A few of the tests want to check that "git grep -P -E" will
override -P with -E, and vice versa. To do so, we use a
regex with "\x{..}", which is valid in PCRE but not defined
by POSIX (for basic or extended regular expressions).
However, POSIX declares quite a lot of syntax, including
"\x", as "undefined". That leaves implementations free to
extend the standard if they choose. At least one, musl libc,
implements "\x" in the same way as PCRE. Our tests check
that "-E" complains about "\x", which fails with musl.
We can fix this by finding some construct which behaves
reliably on both PCRE and POSIX, but differently in each
system.
One such construct is the use of backslash inside brackets.
In PCRE, "[\d]" interprets "\d" as it would outside the
brackets, matching a digit. Whereas in POSIX, the backslash
must be treated literally, and we match either it or a
literal "d". Moreover, implementations are not free to
change this according to POSIX, so we should be able to rely
on it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
execv_dashed_external: wait for child on signal death
When you hit ^C to interrupt a git command going to a pager,
this usually leaves the pager running. But when a dashed
external is in use, the pager ends up in a funny state and
quits (but only after eating one more character from the
terminal!). This fixes it.
Explaining the reason will require a little background.
When git runs a pager, it's important for the git process to
hang around and wait for the pager to finish, even though it
has no more data to feed it. This is because git spawns the
pager as a child, and thus the git process is the session
leader on the terminal. After it dies, the pager will finish
its current read from the terminal (eating the one
character), and then get EIO trying to read again.
When you hit ^C, that sends SIGINT to git and to the pager,
and it's a similar situation. The pager ignores it, but the
git process needs to hang around until the pager is done. We
addressed that long ago in a3da882120 (pager: do
wait_for_pager on signal death, 2009-01-22).
But when you have a dashed external (or an alias pointing to
a builtin, which will re-exec git for the builtin), there's
an extra process in the mix. For instance, running:
$ git -c alias.l=log l
will end up with a process tree like:
git (parent)
\
git-log (child)
\
less (pager)
If you hit ^C, SIGINT goes to all of them. The pager ignores
it, and the child git process will end up in wait_for_pager().
But the parent git process will die, and the usual EIO
trouble happens.
So we really want the parent git process to wait_for_pager(),
but of course it doesn't know anything about the pager at
all, since it was started by the child. However, we can
have it wait on the git-log child, which in turn is waiting
on the pager. And that's what this patch does.
There are a few design decisions here worth explaining:
1. The new feature is attached to run-command's
clean_on_exit feature. Partly this is convenience,
since that feature already has a signal handler that
deals with child cleanup.
But it's also a meaningful connection. The main reason
that dashed externals use clean_on_exit is to bind the
two processes together. If somebody kills the parent
with a signal, we propagate that to the child (in this
instance with SIGINT, we do propagate but it doesn't
matter because the original signal went to the whole
process group). Likewise, we do not want the parent
to go away until the child has done so.
In a traditional Unix world, we'd probably accomplish
this binding by just having the parent execve() the
child directly. But since that doesn't work on Windows,
everything goes through run_command's more spawn-like
interface.
2. We do _not_ automatically waitpid() on any
clean_on_exit children. For dashed externals this makes
sense; we know that the parent is doing nothing but
waiting for the child to exit anyway. But with other
children, it's possible that the child, after getting
the signal, could be waiting on the parent to do
something (like closing a descriptor). If we were to
wait on such a child, we'd end up in a deadlock. So
this errs on the side of caution, and lets callers
enable the feature explicitly.
3. When we send children the cleanup signal, we send all
the signals first, before waiting on any children. This
is to avoid the case where one child might be waiting
on another one to exit, causing a deadlock. We inform
all of them that it's time to die before reaping any.
In practice, there is only ever one dashed external run
from a given process, so this doesn't matter much now.
But it future-proofs us if other callers start using
the wait_after_clean mechanism.
There's no automated test here, because it would end up racy
and unportable. But it's easy to reproduce the situation by
running the log command given above and hitting ^C.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
execv_dashed_external: stop exiting with negative code
When we try to exec a git sub-command, we pass along the
status code from run_command(). But that may return -1 if we
ran into an error with pipe() or execve(). This tends to
work (and end up as 255 due to twos-complement wraparound
and truncation), but in general it's probably a good idea to
avoid negative exit codes for portability.
We can easily translate to the normal generic "128" code we
get when syscalls cause us to die.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we run a dashed external, we use the one-liner
run_command_v_opt() to do so. Let's switch to using a
child_process struct, which has two advantages:
1. We can drop all of the allocation and cleanup code for
building our custom argv array, and just rely on the
builtin argv_array (at the minor cost of doing a few
extra mallocs).
2. We have access to the complete range of child_process
options, not just the ones that the "_opt()" form can
forward.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_exec_path: do not return the result of getenv()
The result of getenv() is not guaranteed by POSIX to last
beyond another call to getenv(), or setenv(), etc. We
should duplicate the string before returning to the caller
to avoid any surprises.
We already keep a cached pointer to avoid repeatedly leaking
the result of system_path(). We can use the same pointer
here to avoid allocating and leaking for each call.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_exec_path: avoid Coverity warning about unfree()d result
Technically, it is correct that git_exec_path() returns a possibly
malloc()ed string returned from system_path(), and it is sometimes
not allocated. Cache the result in a static variable and make sure
that we call system_path() only once, which plugs a potential leak.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
blame: output porcelain "previous" header for each file
It's possible for content currently found in one file to
have originated in two separate files, each of which may
have been modified in some single older commit. The
--porcelain output generates an incorrect "previous" header
in this case, whereas --line-porcelain gets it right. The
problem is that the porcelain output tries to omit repeated
details of commits, and treats "previous" as a property of
the commit, when it is really a property of the blamed block
of lines.
Let's look at an example. In a case like this, you might see
this output from --line-porcelain:
The "filename" fields tell us that the two lines are from
two different files. But notice that the filename also
appears in the "previous" field, which tells us where to
start a re-blame. The second content line never appeared in
file_one at all, so we would obviously need to re-blame from
file_two (or possibly even some other file, if had just been
renamed to file_two in SOME_SHA1).
So far so good. Now here's what --porcelain looks like:
We've dropped the author and committer fields from the
second line, as they would just be repeats. But we can't
omit "filename", because it depends on the actual block of
blamed lines, not just the commit. This is handled by
emit_porcelain_details(), which will show the filename
either if it is the first mention of the commit _or_ if the
commit has multiple paths in it.
But we don't give "previous" the same handling. It's written
inside emit_one_suspect_detail(), which bails early if we've
already seen that commit. And so the output above is wrong;
a reader would assume that the correct place to re-blame
line two is from file_one, but that's obviously nonsense.
Let's treat "previous" the same as "filename", and show it
fresh whenever we know we are in a confusing case like this.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
You can already ask blame for full sha1s with "-l" or with
"--abbrev=40". But for consistency with other parts of Git,
we should support "--no-abbrev".
Worse, blame already accepts --no-abbrev, but it's totally
broken. When we see --no-abbrev, the abbrev variable is set
to 0, which is then used as a printf precision. For regular
sha1s, that means we print nothing at all (which is very
wrong). For boundary commits we decrement it to "-1", which
printf interprets as "no limit" (which is almost correct,
except it misses the 39-length magic explained in the
previous commit).
Let's detect --no-abbrev and behave as if --abbrev=40 was
given.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The blame command internally adds 1 to any requested sha1
abbreviation length, and then subtracts it when outputting a
boundary commit. This lets regular and boundary sha1s line
up visually, but it misses one corner case.
When the requested length is 40, we bump the value to 41.
But since we only have 40 characters, that's all we can show
(fortunately the truncation is done by a printf precision
field, so it never tries to read past the end of the
buffer). So a normal sha1 shows 40 hex characters, and a
boundary sha1 shows "^" plus 40 hex characters. The result
is misaligned.
The "-l" option to show long sha1s gets around this by
skipping the "abbrev" variable entirely and just always
using GIT_SHA1_HEXSZ. This avoids the "+1" issue, but it
does mean that boundary commits only have 39 characters
printed. This is somewhat odd, but it does look good
visually: the results are aligned and left-justified. The
alternative would be to allocate an extra column that would
contain either an extra space or the "^" boundary marker.
As this is by definition the human-readable view, it's
probably not that big a deal either way (and of course
--porcelain, etc, correctly produce correct 40-hex sha1s).
But for consistency, this patch teaches --abbrev=40 to
produce the same output as "-l" (always left-aligned, with
40-hex for normal sha1s, and "^" plus 39-hex for
boundaries).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generate the squash commit message incrementally running
a sed script once for each commit. It parses "This is
a combination of <N> commits" from the first line of the
existing message, adds one to <N>, and uses the result as
the number of our current message.
Since f2d17068fd (i18n: rebase-interactive: mark comments of
squash for translation, 2016-06-17), the first line may be
localized, and sed uses a pretty liberal regex, looking for:
/^#.*([0-9][0-9]*)/
The "[0-9][0-9]*" tries to match double digits, but it
doesn't quite work. The first ".*" is greedy, so if you
have:
This is a combination of 10 commits.
it will eat up "This is a combination of 1", leaving "0" to
match the first "[0-9]" digit, and then skipping the
optional match of "[0-9]*".
As a result, the count resets every 10 commits, and a
15-commit squash would end up as:
# This is a combination of 5 commits.
# This is the 1st commit message:
...
# This is the commit message #2:
... and so on ..
# This is the commit message #10:
...
# This is the commit message #1:
...
# This is the commit message #2:
... etc, up to 5 ...
We can fix this by making the ".*" less greedy. Instead of
depending on ".*?" working portably, we can just limit the
match to non-digit characters, which accomplishes the same
thing.
Reported-by: Brandon Tolsch <btolsch@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 4aff646d17 (archive-zip: mark text files in archives,
2015-03-05), the zip archiver will look at the userdiff
driver to decide whether a file is text or binary. This
usually doesn't need to look any further than the attributes
themselves (e.g., "-diff", etc). But if the user defines a
custom driver like "diff=foo", we need to look at
"diff.foo.binary" in the config. Prior to this patch, we
didn't actually load it.