archive-tar: write extended headers for file sizes >= 8GB
The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.
These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).
This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.
Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.
If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.
But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.
This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:
1. You have a file between 8GB and 64GB.
2. Your tar implementation _doesn't_ know about pax
extended headers.
3. Your tar implementation _does_ parse 12-byte sizes from
the ustar header without a delimiter.
It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.
Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. For values larger than this, we have to add pax
extended headers to specify the real data, and git does not
yet know how to do so.
Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.
We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:
1. being able to read pax headers
2. being able to handle huge sizes or mtimes
3. supporting a "t" format we can parse
So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:
dd if=/dev/zero seek=64G bs=1 count=1 of=huge
touch -d @68719476737 huge
tar cf - --format=pax |
head -c 2048
using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.
Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then took only
the first 2k of it. That should be enough to generate 2MB of
data before hitting an inflate error, which is plenty for us
to generate the tar header (and then die of SIGPIPE while
streaming the rest out).
The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.
Helped-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is sometimes useful to be able to read exactly N bytes from a
pipe. Doing this portably turns out to be surprisingly difficult
in shell scripts.
We want a solution that:
- is portable
- never reads more than N bytes due to buffering (which
would mean those bytes are not available to the next
program to read from the same pipe)
- handles partial reads by looping until N bytes are read
(or we see EOF)
- is resilient to stray signals giving us EINTR while
trying to read (even though we don't send them, things
like SIGWINCH could cause apparently-random failures)
Some possible solutions are:
- "head -c" is not portable, and implementations may
buffer (though GNU head does not)
- "read -N" is a bash-ism, and thus not portable
- "dd bs=$n count=1" does not handle partial reads. GNU dd
has iflags=fullblock, but that is not portable
- "dd bs=1 count=$n" fixes the partial read problem (all
reads are 1-byte, so there can be no partial response).
It does make a lot of write() calls, but for our tests
that's unlikely to matter. It's fairly portable. We
already use it in our tests, and it's unlikely that
implementations would screw up any of our criteria. The
most unknown one would be signal handling.
- perl can do a sysread() loop pretty easily. On my Linux
system, at least, it seems to restart the read() call
automatically. If that turns out not to be portable,
though, it would be easy for us to handle it.
That makes the perl solution the least bad (because we
conveniently omitted "length of code" as a criterion).
It's also what t9300 is currently using, so we can just pull
the implementation from there.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:
def foo
do_foo_stuff()
+ common_ending()
+end
+
+def bar
+ do_bar_stuff()
+
common_ending()
end
when the new heuristic is in use. In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.
Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.
A couple of bugs around core.autocrlf have been fixed.
* tb/core-eol-fix:
convert.c: ident + core.autocrlf didn't work
t0027: test cases for combined attributes
convert: allow core.autocrlf=input and core.eol=crlf
t0027: make commit_chk_wrnNNO() reliable
Merge branch 'ar/diff-args-osx-precompose' into maint
Many commands normalize command line arguments from NFD to NFC
variant of UTF-8 on OSX, but commands in the "diff" family did
not, causing "git diff $path" to complain that no such path is
known to Git. They have been taught to do the normalization.
* ar/diff-args-osx-precompose:
diff: run arguments through precompose_argv
"git rebase -i", after it fails to auto-resolve the conflict, had
an unnecessary call to "git rerere" from its very early days, which
was spotted recently; the call has been removed.
* js/rebase-i-dedup-call-to-rerere:
rebase -i: remove an unnecessary 'rerere' invocation
t2300: run git-sh-setup in an environment that better mimics the real life
When we run scripted Porcelains, "git" potty has set up the $PATH by
prepending $GIT_EXEC_PATH, the path given by "git --exec-path=$there
$cmd", etc. already. Because of this, scripted Porcelains can
dot-source shell script library like git-sh-setup with simple dot
without specifying any path.
t2300 however dot-sources git-sh-setup without adjusting $PATH like
the real "git" potty does. This has not been a problem so far, but
once git-sh-setup wants to rely on the $PATH adjustment, just like
any scripted Porcelains already do, it would become one. It cannot
for example dot-source another shell library without specifying the
full path to it by prefixing $(git --exec-path).
In t5500::check_prot_host_port_path(), diagport is not a variable
used elsewhere and the function is not recursively called so this
can simply lose the "local", which may not be supported by shell
(besides, the function liberally clobbers other variables without
making them "local").
t7403::reset_submodule_urls() overrides the "root" variable used
in the test framework for no good reason; its use is not about
temporarily relocating where the test repositories are created.
This assignment can be made not to clobber the variable by moving
them into the subshells it already uses. Its value is always
$TRASH_DIRECTORY, so we could use it instead there, and this
function that is called only once and its two subshells may not be
necessary (instead, the caller can use "git -C $there config" and
set a value that is derived from $TRASH_DIRECTORY), but this is a
minimum fix that is needed to lose "local".
Helped-by: John Keeping <john@keeping.me.uk> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio pointed out `relative_path` was using bashisms via the
local variables. As the longer term goal is to rewrite most of the
submodule code in C, do it now.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 48308681 (2016-02-29, git submodule update: have a dedicated helper
for cloning), the helper communicated errors back only via exit code,
and dance with printing '#unmatched' in case of error was left to
git-submodule.sh as it uses the output of the helper and pipes it into
shell commands. This change makes the helper consistent by never
printing '#unmatched' in the helper but always handling these piping
issues in the actual shell script.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
is unportable when some-program is actually a shell
function, like test_must_fail (on some shells FOO remains
set after the function returns, and on others it does not).
We sometimes get around this by using env, like:
test_must_fail env FOO=BAR some-program
But that only works because test_must_fail's arguments are
themselves a command which can be run. You can't run:
env FOO=BAR test_must_fail some-program
because env does not know about our shell functions. So
there is no equivalent for test_commit, for example, and one
must resort to:
(
FOO=BAR
export FOO
test_commit
)
which is a bit verbose. Let's add a version of "env" that
works _inside_ the shell, by creating a subshell, exporting
variables from its argument list, and running the command.
Its use is demonstrated on a currently-unportable case in
t4014.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct faulty recommendation to use "git submodule deinit ." when
de-initialising all submodules, which would result in a strange
error message in a pathological corner case.
* sb/submodule-deinit-all:
submodule deinit: require '--all' instead of '.' for all submodules
Merge branch 'jk/test-send-sh-x-trace-elsewhere' into maint
Running tests with '-x' option to trace the individual command
executions is a useful way to debug test scripts, but some tests
that capture the standard error stream and check what the command
said can be broken with the trace output mixed in. When running
our tests under "bash", however, we can redirect the trace output
to another file descriptor to keep the standard error of programs
being tested intact.
* jk/test-send-sh-x-trace-elsewhere:
test-lib: set BASH_XTRACEFD automatically
Merge branch 'js/name-rev-use-oldest-ref' into maint
"git describe --contains" often made a hard-to-justify choice of
tag to give name to a given commit, because it tried to come up
with a name with smallest number of hops from a tag, causing an old
commit whose close descendant that is recently tagged were not
described with respect to an old tag but with a newer tag. It did
not help that its computation of "hop" count was further tweaked to
penalize being on a side branch of a merge. The logic has been
updated to favor using the tag with the oldest tagger date, which
is a lot easier to explain to the end users: "We describe a commit
in terms of the (chronologically) oldest tag that contains the
commit."
* js/name-rev-use-oldest-ref:
name-rev: include taggerdate in considering the best name
rebase -i: remove an unnecessary 'rerere' invocation
Interactive rebase uses 'git cherry-pick' and 'git merge' to replay
commits. Both invoke the 'rerere' machinery when they fail due to merge
conflicts. Note that all code paths with these two commands also invoke
the shell function die_with_patch when the commands fail.
Since commit 629716d2 ("rerere: do use multiple variants") the second
operation of the rerere machinery can be observed by a duplicated
message "Recorded preimage for 'file'". This second operation records
the same preimage as the first one and, hence, only wastes cycles.
Remove the 'git rerere' invocation from die_with_patch.
Shell function die_with_patch can be called after the failure of
"git commit", too, which also calls into the rerere machinery, but it
does so only after a successful commit to record the resolution.
Therefore, it is wrong to call 'git rerere' from die_with_patch after
"git commit" fails.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch-all" has been sped up, by taking advantage
of the fact that it does not have to read a list of objects, in two
ways.
* jk/cat-file-buffered-batch-all:
cat-file: default to --buffer when --batch-all-objects is used
cat-file: avoid noop calls to sha1_object_info_extended
"git fast-import --export-marks" would overwrite the existing marks
file even when it makes a dump from its custom die routine.
Prevent it from doing so when we have an import-marks file but
haven't finished reading it.
* fc/fast-import-broken-marks-file:
fast-import: do not truncate exported marks file
Backticks are emphasized through monospaced styling in the HTML
version of Git documentation. But they were left unstyled in the
manual pages.
To make the man pages more comfortably read, `MAN_BOLD_LITERAL` was
added by 5121a6d (Documentation: option to render literal text as
bold for manpages, 2009-03-27). It allowed the user to build the
manpages with literals in bold style.
For precaution it was not set by default back then.
Since 79c461d (docs: default to more modern toolset, 2010-11-19), it
is assumed ASCIIDOC 8 and at least docbook-xsl 1.73 are used, so the
need for compatibility concern is much lessor now.
Remove `MAN_BOLD_LITERAL`, and typeset literals as bold by default .
Add `NO_MAN_BOLD_LITERAL`, a new Makefile option, disabling this
feature when defined.
Signed-off-by: Erwan MATHONIERE <erwan.mathoniere@grenoble-inp.org> Signed-off-by: Samuel GROOT <samuel.groot@grenoble-inp.org> Signed-off-by: Tom RUSSELLO <tom.russello@grenoble-inp.org> Signed-off-by: Matthieu MOY <matthieu.moy@grenoble-inp.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: move 'ifdef DEVELOPER' after config.mak* inclusion
The DEVELOPER knob was introduced in 658df95 (add DEVELOPER makefile
knob to check for acknowledged warnings, 2016-02-25), and works well
when used as "make DEVELOPER=1", and when the configure script was not
used.
However, the advice given in CodingGuidelines to add DEVELOPER=1 to
config.mak does not: config.mak is included after testing for
DEVELOPER in the Makefile, and at least GNU Make's manual specifies
"Conditional directives are parsed immediately", hence the config.mak
declaration is not visible at the time the conditional is evaluated.
Also, when using the configure script to generate a
config.mak.autogen, the later file contained a "CFLAGS = <flags>"
initialization, which overrode the "CFLAGS += -W..." triggered by
DEVELOPER.
This patch fixes both issues.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
CLI commands which are mentioned in the readme are now formatted with
the Markdown code syntax to make the documentation more readable.
Signed-off-by: Benjamin Dopplinger <b.dopplinger@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation: add instructions to help setup gmail 2FA
For those who use two-factor authentication with gmail, git-send-email
will not work unless it is setup with an app-specific password. The
example for setting up git-send-email for use with gmail will now
include information on generating and storing the app-specific password.
Signed-off-by: Michael Rappazzo <rappazzo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, .git and optionally any files whose name starts with a
dot are now marked as hidden, with a core.hideDotFiles knob to
customize this behaviour.
Merge branch 'kf/gpg-sig-verification-doc' into maint
Documentation for "git merge --verify-signatures" has been updated
to clarify that the signature of only the commit at the tip is
verified. Also the phrasing used for signature and key validity is
adjusted to align with that used by OpenPGP.
* va/i18n-misc-updates:
i18n: unpack-trees: avoid substituting only a verb in sentences
i18n: builtin/pull.c: split strings marked for translation
i18n: builtin/pull.c: mark placeholders for translation
i18n: git-parse-remote.sh: mark strings for translation
i18n: branch: move comment for translators
i18n: branch: unmark string for translation
i18n: builtin/rm.c: remove a comma ',' from string
i18n: unpack-trees: mark strings for translation
i18n: builtin/branch.c: mark option for translation
i18n: index-pack: use plural string instead of normal one
mingw: make isatty() recognize MSYS2's pseudo terminals (/dev/pty*)
MSYS2 emulates pseudo terminals via named pipes, and isatty() returns 0
for such file descriptors. Therefore, some interactive functionality
(such as launching a pager, asking if a failed unlink should be repeated
etc.) doesn't work when run in a terminal emulator that uses MSYS2's
ptys (such as mintty).
However, MSYS2 uses special names for its pty pipes ('msys-*-pty*'),
which allows us to distinguish them from normal piped input / output.
On startup, check if stdin / stdout / stderr are connected to such pipes
using the NtQueryObject API from NTDll.dll. If the names match, adjust
the flags in MSVCRT's ioinfo structure accordingly.
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24) converted cases of "sprintf" to
"xsnprintf", but accidentally left one as just "snprintf".
This meant that we could silently truncate the resulting
buffer instead of flagging an error.
In practice, this is impossible to achieve, as we are
formatting a ustar checksum, which can be at most 7
characters. But the point of xsnprintf is to document and
check for "should be impossible" conditions; this site was
just accidentally mis-converted during f2f0267.
Noticed-by: Paul Green <Paul.Green@stratus.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>