Git over HTTPS has a high request startup latency, since the SSL
negotiation can take up to a second. In order to reduce this latency,
connections should be left open to the Git server across requests
(or invocations of the git commandline).
Reduce SSL startup latency by running a daemon job that keeps
connections open to a Git server. The daemon job
(git-remote-persistent-https--proxy) is started on the first request
through the client binary (git-remote-persistent-https) and remains
running for 24 hours after the last request, or until a new daemon
binary is placed in the PATH. The client determines the daemon's
HTTP address by communicating over a UNIX socket with the daemon.
From there, the rest of the Git protocol work is delegated to the
"git-remote-http" binary, with the environment's http_proxy set to
the daemon.
Accessing /pub/scm/linux/kernel/git/torvalds/linux repository hosted
at kernel.googlesource.com with "git ls-remote" over https:// and
persistent-https:// 5 times shows that the first request takes about
the same time (0.193s vs 0.208s---there is a slight set-up cost for
the local proxy); as expected, the other four requests are much
faster (~0.18s vs ~0.08s).
Incidentally, this also has the benefit of HTTP keep-alive working
across Git command invocations. Its common for servers to use a 5
minute keep-alive on an HTTP 1.1 connection. Git-over-HTTP commonly
uses Transfer-Encoding: chunked on replies, so keep-alive will
generally just work, even though a pack stream's length isn't known
in advance. Because the helper is an external process holding that
connection open, we also benefit from being able to reuse an
existing TCP connection to the server. The same "git ls-remote"
test against http:// vs persistent-https:// URL shows that the
former takes ~0.09s while the first request for the latter is about
0.134s with set-up cost, and subsequent requests are ~0.065s,
shaving around one RTT to the server.
Signed-off-by: Colby Ranger <cranger@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 9765b6a (rebase: align variable content, 2011-02-06), the code
to error out was moved up one level. Unfortunately, one reference
to a function parameter wasn't rewritten as it should, leading to
the wrong parameter being errored on.
This error was propagated by 71786f5 (rebase: factor out reference
parsing, 2011-02-06) and merged in 78c6e0f (Merge branch
'mz/rebase', 2011-04-28).
Correct this by reporting $onto_name istead.
Reported-By: Manuela Hutter <manuelah@opera.com> Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 6cf378f (docs: stop using asciidoc no-inline-literal),
we no longer support asciidoc versions less than 8.4.1,
which introduced inline literals. Note this in the INSTALL
document.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The earlier "--keep-redundant-commit" series broke "cherry-pick"
that is given a commit whose change is already in the current
history. Such a cherry-pick would result in an empty change, and
should stop with an error, telling the user that conflict resolution
may have made the result empty (which is exactly what is happening),
but we silently dropped the change on the floor without any message
nor non-zero exit code.
submodules: print "registered for path" message only once
Since 2cd9de3e (submodule add: always initialize .git/config entry) the
message "Submodule '\$name' (\$url) registered for path '\$sm_path'" is
printed every time cmd_init() is called, e.g. each time "git submodule
update" is used with the --init option.
This was not intended and leads to bogus output which can confuse users
and build systems. Apart from that the $url variable was not set after the
first run which did the actual initialization and only "()" was printed
in subsequent runs where "($url)" was meant to inform the user about the
upstream repo.
Fix that by moving the say command in question into the if block where the
url is initialized, restoring the behavior that was in place before the 2cd9de3e commit. While at it also remove the comment which still describes
the logic used before 2cd9de3e and add a comment about how things work now.
Reported-by: Nicolas Viennot and Sid Nair <nicolas@viennot.com> Reported-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
By Jonathan Nieder
via Eric Wong
* git://bogomips.org/git-svn:
git-svn: make Git::SVN::Fetcher a separate file
git-svn: rename SVN::Git::* packages to Git::SVN::*
git-svn: move Git::SVN::Prompt into its own file
This test is pretty old and did not follow some of our more
modern best practices. In particular:
1. It chdir'd all over the place, leaving later tests to
deal with the fallout. Do our chdirs in subshells
instead.
2. It did not use test_must_fail.
3. It did not use test_line_count.
4. It checked for the non-existence of a ref by looking in the
.git/refs directory (since we pack refs during clone
these days, this will always be succeed, making the
test useless).
Note that one call to "-e .git/refs/..." remains,
because it is checking for the existence of a symbolic
ref, not a ref itself.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
By Vitor Antunes
* va/git-p4-test:
git-p4: Clean up branch test cases
git-p4: Verify detection of "empty" branch creation
git-p4: Test changelists touching two branches
Fixes quite a lot of brokenness when ident information needs to be taken
from the system and cleans up the code.
By Jeff King
* jk/ident-gecos-strbuf: (22 commits)
format-patch: do not use bogus email addresses in message ids
ident: reject bogus email addresses with IDENT_STRICT
ident: rename IDENT_ERROR_ON_NO_NAME to IDENT_STRICT
format-patch: use GIT_COMMITTER_EMAIL in message ids
ident: let callers omit name with fmt_indent
ident: refactor NO_DATE flag in fmt_ident
ident: reword empty ident error message
format-patch: refactor get_patch_filename
ident: trim whitespace from default name/email
ident: use a dynamic strbuf in fmt_ident
ident: use full dns names to generate email addresses
ident: report passwd errors with a more friendly message
drop length limitations on gecos-derived names and emails
ident: don't write fallback username into git_default_name
fmt_ident: drop IDENT_WARN_ON_NO_NAME code
format-patch: use default email for generating message ids
ident: trim trailing newline from /etc/mailname
move git_default_* variables to ident.c
move identity config parsing to ident.c
fmt-merge-msg: don't use static buffer in record_person
...
The way "fetch-pack" that is given multiple references to fetch tried to
remove duplicates was very inefficient.
By Jeff King
* jk/fetch-pack-remove-dups-optim:
fetch-pack: sort incoming heads list earlier
fetch-pack: avoid quadratic loop in filter_refs
fetch-pack: sort the list of incoming refs
add sorting infrastructure for list refs
fetch-pack: avoid quadratic behavior in remove_duplicates
fetch-pack: sort incoming heads
Avoid unnecessary temporary allocations while looking for matching refs
inside refs API.
By René Scharfe (3) and Junio C Hamano (1)
* rs/refs-string-slice:
refs: do not create ref_entry when searching
refs: use strings directly in find_containing_dir()
refs: convert parameter of create_dir_entry() to length-limited string
refs: convert parameter of search_ref_dir() to length-limited string
Tighten constness of some local variables in a callchain.
By Michael Haggerty
* mh/fetch-pack-constness:
cmd_fetch_pack(): respect constness of argv parameter
cmd_fetch_pack(): combine the loop termination conditions
cmd_fetch_pack(): handle non-option arguments outside of the loop
cmd_fetch_pack(): declare dest to be const
Do not autosquash in case of an implied interactive rebase
The option to autosquash is only used in case of an interactive rebase.
When merges are preserved, rebase uses an interactive rebase internally,
but in this case autosquash should still be disabled.
Signed-off-by: Vincent van Ravesteijn <vfr@lyx.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce cost of deletion in levenstein distance (4 -> 3)
Before this patch, a character deletion has the same cost as 2 swaps, or
4 additions, so Git prefers suggesting a completely scrambled command
name to removing a character. For example, "git tags" suggests "stage",
but not "tag".
By setting the deletion cost to 3, we keep it higher than swaps or
additions, but prefer 1 deletion to 2 swaps. "git tags" now suggests
"tag" in addition to staged.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-p4: Verify detection of "empty" branch creation
Current implementation of new branch parent detection works on the
principle that the new branch is a complete integration, with no
changes, of the original files.
This test shows this deficiency in the particular case when the new
branch is created from a subset of the original files.
Signed-off-by: Vitor Antunes <vitor.hda@gmail.com> Acked-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to modify two different branches in P4 in a single
changelist. git-p4 correctly detects this and commits the relevant
changes to the different branches separately. This test proves that and
avoid future regressions in this behavior.
Signed-off-by: Vitor Antunes <vitor.hda@gmail.com> Acked-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch removes a chunk of code (the Git::SVN::Fetcher consumer of
libsvn's tree delta protocol) from git-svn.perl and documents its
interface so the hurried reader does not have to read that code right
away.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Eric Wong <normalperson@yhbt.net>
git-svn: rename SVN::Git::* packages to Git::SVN::*
Using names in the Git:: namespace means these cannot conflict with a
hypothetical binding teaching Subversion to interact with git
repositories.
Currently the packages are private to git-svn.perl so the choice of
name isn't likely to make much difference. This change is mainly
meant as preparation for splitting out the packages in question as
modules on the public search path.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Eric Wong <normalperson@yhbt.net>
git-svn.perl is very long (around 6500 lines) and although it is
nicely split into modules, some new readers do not even notice --- it
is too distracting to see all this functionality collected in a single
file.
Splitting it into multiple files would make it easier for people
to read individual modules straight through and to experiment with
components separately.
Let's start with Git::SVN::Prompt. For simplicity, we install this as
a module in the standard search path, just like the existing Git and
Git::I18N modules. In the process, add a manpage explaining its
interface and that it is not likely to be useful for other projects to
avoid confusion.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Eric Wong <normalperson@yhbt.net>
"git grep -e '$pattern'", unlike the case where the patterns are read from
a file, did not treat individual lines in the given pattern argument as
separate regular expressions as it should.
When a submodule repository uses alternate object store mechanism, some
commands that were started from the superproject did not notice it and
failed with "No such object" errors. The subcommands of "git submodule"
command that recursed into the submodule in a separate process were OK;
only the ones that cheated and peeked directly into the submodule's
repository from the primary process were affected.
By Heiko Voigt
* hv/submodule-alt-odb:
teach add_submodule_odb() to look for alternates
fmt-merge-message: add empty line between tag and signature verification
When adding the information from a tag, put an empty line between the
message of the tag and the commented-out signature verification
information.
At least for the kernel workflow, I often end up re-formatting the message
that people send me in the tag data. In that situation, putting the tag
message and the tag signature verification back-to-back then means that
normal editor "reflow parapgraph" command will get confused and think that
the signature is a continuation of the last message paragraph.
So I always end up having to first add an empty line, and then go back and
reflow the last paragraph. Let's just do it in git directly.
The extra vertical space also makes the verification visually stand out
more from the user-supplied message, so it looks a bit more readable to me
too, but that may be just an odd personal preference.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It turns out that the comment is a remnant from older days when the
heading said ".git/config" (which is indeed relative to the top of the
worktree).
It was only when the heading was changed to refer more precisely to
<git dir>/config (see v1.5.3.2~18, AsciiDoc tweak to avoid leading
dot, 2007-09-14) that the parenthesis stopped making sense. Remove
it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
format-patch: do not use bogus email addresses in message ids
We can ask git_committer_info to be strict about coming up
with an email, which will die automatically on a poorly
configured machine. This is better than letting invalid
message-ids into the wild.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ident: reject bogus email addresses with IDENT_STRICT
If we come up with a hostname like "foo.(none)" because the
user's machine is not fully qualified, we should reject this
in strict mode (e.g., when we are making a commit object),
just as we reject an empty gecos username.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/maint-status-porcelain-z-b' into maint
"git status --porcelain" ignored "--branch" option by mistake. The output
for "git status --branch -z" was also incorrect and did not terminate the
record for the current branch name with NUL as asked.
By Jeff King
* jk/maint-status-porcelain-z-b:
status: respect "-b" for porcelain format
status: fix null termination with "-b"
status: refactor null_termination option
commit: refactor option parsing
ident: rename IDENT_ERROR_ON_NO_NAME to IDENT_STRICT
Callers who ask for ERROR_ON_NO_NAME are not so much
concerned that the name will be blank (because, after all,
we will fall back to using the username), but rather it is a
check to make sure that low-quality identities do not end up
in things like commit messages or emails (whereas it is OK
for them to end up in things like reflogs).
When future commits add more quality checks on the identity,
each of these callers would want to use those checks, too.
Rather than modify each of them later to add a new flag,
let's refactor the flag.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
format-patch: use GIT_COMMITTER_EMAIL in message ids
Before commit 43ae9f4, we generated the tail of a message id
by calling git_committer_info and parsing the email out of
the result. 43ae9f4 changed to use ident_default_email
directly, so we didn't have to bother with parsing. As a
side effect, it meant we no longer used GIT_COMMITTER_EMAIL
at all.
In general, this is probably reasonable behavior. Either the
default email is sane on your system, or you are using
user.email to provide something sane. The exception is if
you rely on GIT_COMMITTER_EMAIL being set all the time to
override the bogus generated email.
This is unlikely to match anybody's real-life setup, but we
do use it in the test environment. And furthermore, it's
what we have always done, and the change in 43ae9f4 was
about cleaning up, not fixing any bug; we should be
conservative and keep the behavior identical.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callers want to see all of "$name <$email> $date", but
a few want only limited parts, omitting the date, or even
the name. We already have IDENT_NO_DATE to handle the date
part, but there's not a good option for getting just the
email. Callers have to done one of:
1. Call ident_default_email; this does not respect
environment variables, nor does it promise to trim
whitespace or other crud from the result.
2. Call git_{committer,author}_info; this returns the name
and email, leaving the caller to parse out the wanted
bits.
This patch adds IDENT_NO_NAME; it stops short of adding
IDENT_NO_EMAIL, as no callers want it (nor are likely to),
and it complicates the error handling of the function.
When no name is requested, the angle brackets (<>) around
the email address are also omitted.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a short-hand, we extract this flag into the local
variable "name_addr_only". It's more accurate to simply
negate this and refer to it as "want_date", which will be
less confusing when we add more NO_* flags.
While we're touching this part of the code, let's move the
call to ident_default_date() only when we are actually going
to use it, not when we have NO_DATE set, or when we get a
date from the environment.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's on point in printing the name, since it is by
definition the empty string if we have reached this code
path. Instead, let's be more clear that we are complaining
about the empty name, but still show the email address that
it is attached to (since that may provide some context to
the user).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid sorting if references are added to ref_cache in order
The old code allowed many references to be efficiently added to a
single directory, because it just appended the references to the
containing directory unsorted without doing any searching (and
therefore without requiring any intermediate sorting). But the old
code was inefficient when a large number of subdirectories were added
to a directory, because the directory always had to be searched to see
if the new subdirectory already existed, and this search required the
directory to be sorted first. The same was repeated for every new
subdirectory, so the time scaled like O(N^2), where N is the number of
subdirectories within a single directory.
In practice, references are often added to the ref_cache in
lexicographic order, for example when reading the packed-refs file.
So build some intelligence into add_entry_to_dir() to optimize for the
case of references and/or subdirectories being added in lexicographic
order: if the existing entries were already sorted, and the new entry
comes after the last existing entry, then adjust ref_dir::sorted to
reflect the fact that the ref_dir is still sorted.
Thanks to Peff for pointing out the performance regression that
inspired this change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If stderr isn't a tty, we shouldn't be printing incremental progress
messages. In particular, this affects 'git checkout -f . >&logfile'
unless you provided -q. And git-new-workdir has no way to provide -q.
It would probably be better to have progress.c check isatty(2) all the time,
but that wouldn't allow things like 'git push --progress' to force progress
reporting to on, so I won't try to solve the general case right now.
Actual fix suggested by Jeff King.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
osxkeychain: pull make config from top-level directory
The default compiler and cflags were mostly "works for me"
when I built the original version. We need to be much less
careful here than usual, because we know we are building
only on OS X. But it's only polite to at least respect the
CFLAGS and CC definitions that the user may have provided
earlier.
While we're at it, let's update our definitions and rules to
be more like the top-level Makefile; default our CFLAGS to
include -O2, and make sure we use CFLAGS and LDFLAGS when
linking.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 4435968 started sorting heads fed to fetch-pack so
that later commits could use more optimized algorithms;
commit 7db8d53 switched the remove_duplicates function to
such an algorithm.
Of course, the sorting is more effective if you do it
_before_ the algorithm in question.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff: import new 32-bit version of count_masked_bytes()
Import the latest 32-bit implementation of count_masked_bytes() from
Linux (arch/x86/include/asm/word-at-a-time.h). It's shorter and avoids
overflows and negative numbers.
This fixes test failures on 32-bit, where negative partial results had
been shifted right using the "wrong" method (logical shift right instead
of arithmetic short right). The compiler is free to chose the method,
so it was only wrong in the sense that it didn't work as intended by us.
Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff: avoid more compiler warnings with XDL_FAST_HASH on 32-bit machines
Hide literals that can cause compiler warnings for 32-bit architectures in
expressions that evaluate to small numbers there. Some compilers warn that
0x0001020304050608 won't fit into a 32-bit long, others that shifting right
by 56 bits clears a 32-bit value completely.
The correct values are calculated in the 64-bit case, which is all that matters
in this if-branch.
Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff: avoid compiler warnings with XDL_FAST_HASH on 32-bit machines
Import macro REPEAT_BYTE from Linux (arch/x86/include/asm/word-at-a-time.h)
to avoid 64-bit integer literals, which cause some 32-bit compilers to
print warnings.
Reported-by: Øyvind A. Holm <sunny@sunbase.org> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The search_ref_dir() function is about looking up an existing ref_entry in
a sorted array of ref_entry stored in dir->entries, but it still allocates
a new ref_entry and frees it before returning. This is only because the
call to bsearch(3) was coded in a suboptimal way. Unlike the comparison
function given to qsort(3), the first parameter to its comparison function
does not need to point at an object that is shaped like an element in the
array.
Introduce a new comparison function that takes a counted string as the key
and an element in an array of ref_entry and give it to bsearch(), so that
we do not have to allocate a new ref_entry that we will never return to
the caller anyway.
refs: use strings directly in find_containing_dir()
Convert the parameter subdirname of search_for_subdir() to a
length-limted string and then simply pass the interesting slice of the
refname from find_containing_dir(), thereby avoiding to duplicate the
string.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a list of refs that we want to compare against the
"match" array. The current code searches the match list
linearly, giving quadratic behavior over the number of refs
when you want to fetch all of them.
Instead, we can compare the lists as we go, giving us linear
behavior.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having the list sorted means we can avoid some quadratic
algorithms when comparing lists.
These should typically be sorted already, but they do come
from the remote, so let's be extra careful. Our ref-sorting
implementation does a mergesort, so we do not have to care
about performance degrading in the common case that the list
is already sorted.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch-pack: avoid quadratic behavior in remove_duplicates
We remove duplicate entries from the list of refs we are
fed in fetch-pack. The original algorithm is quadratic over
the number of refs, but since the list is now guaranteed to
be sorted, we can do it in linear time.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason to preserve the incoming order of the
heads we're requested to fetch. By having them sorted, we
can replace some of the quadratic algorithms with linear
ones.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_fetch_pack(): respect constness of argv parameter
The old code cast away the constness of the strings passed to the
function in argument argv[], which could result in their being
modified by filter_refs(). Fix by copying reference names from argv
and putting them into our own array (similarly to how refnames passed
to stdin were already handled).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
avoid segfault when reading header of malformed commits
If a commit object has a header line at the end of the
buffer that is missing its newline (or if it appears so
because the content on the header line contains a stray
NUL), then git will segfault.
Interestingly, this case is explicitly handled and we do
correctly scan the final line for the header we are looking
for. But if we don't find it, we will dereference NULL while
trying to look at the next line.
Git will never generate such a commit, but it's good to be
defensive. We could die() in such a case, but since it's
easy enough to handle it gracefully, let's just issue a
warning and continue (so you could still view such a commit
with "git show", though you might be missing headers after
the NUL).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pretty: avoid buffer overflow in format_person_part
When we parse the name and email from a commit to
pretty-print them, we usually can just put the result
directly into our strbuf result. However, if we are going to
use the mailmap, then we must first copy them into a
NUL-terminated buffer to feed to the mailmap machinery.
We did so by using strlcpy into a static buffer, but we used
it wrong. We fed it the length of the substring we wanted to
copy, but never checked that that length was less than the
size of the destination buffer.
The simplest fix is to just use snprintf to copy the
substring properly while still respecting the destination
buffer's size. It might seem like replacing the static
buffer with a strbuf would help, but we need to feed a
static buffer to the mailmap machinery anyway, so there's
not much benefit to handling arbitrary sizes.
A more ideal solution would be for mailmap to grow an
interface that:
1. Takes a pointer and length combination, instead of
assuming a NUL-terminated string.
2. Returns a pointer to the mailmap's allocated string,
rather than copying it into the buffer.
Then we could avoid the need for an extra buffer entirely.
However, doing this would involve a lot of refactoring of
mailmap and of string_list (which mailmap uses to store the
map itself). For now, let's do the simplest thing to fix the
bug.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 4b340cf split the logic to parse an ident line out of
pretty.c's format_person_part. But in doing so, it
accidentally introduced an off-by-one error that caused it
to think that single-character names were invalid.
This manifested itself as the "%an" format failing to show
anything at all for a single-character name.
Reported-by: Brian Turner <bturner@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_patch_filename function expects a commit argument
and uses it to get the sanitized subject line when making a
patch filename. However, we also want to use this same
function for the cover letter, which does not have a commit
object. The current solution is to create a fake commit with
the subject "cover letter". Instead, let's make the
get_patch_filename interface more flexibile, and allow
passing a direct subject.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually these values get fed to fmt_ident, which will trim
any cruft anyway, but there are a few code paths which use
them directly. Let's clean them up for the benefit of those
callers. Furthermore, fmt_ident will look at the pre-trimmed
value and decide whether to invoke ERROR_ON_NO_NAME; this
check can be fooled by a name consisting only of spaces.
Note that we only bother to clean up when we are pulling the
information from gecos or from system files. Any other value
comes from a config file, where we will have cleaned up
accidental whitespace already.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we accept arbitrary-sized names and email
addresses, the only remaining limit is in the actual
formatting of the names into a buffer. The current limit is
1000 characters, which is not likely to be reached, but
using a strbuf is one less error condition we have to worry
about.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ident: use full dns names to generate email addresses
When we construct an email address from the username and
hostname, we generate the host part of the email with this
procedure:
1. add the result of gethostname
2. if it has a dot, ok, it's fully qualified
3. if not, then look up the unqualified hostname via
gethostbyname; take the domain name of the result and
append it to the hostname
Step 3 can actually produce a bogus result, as the name
returned by gethostbyname may not be related to the hostname
we fed it (e.g., consider a machine "foo" with names
"foo.one.example.com" and "bar.two.example.com"; we may have
the latter returned and generate the bogus name
"foo.two.example.com").
This patch simply uses the full hostname returned by
gethostbyname. In the common case that the first part is the
same as the unqualified hostname, the behavior is identical.
And in the case that it is not the same, we are much more
likely to be generating a valid name.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ident: report passwd errors with a more friendly message
When getpwuid fails, we give a cute but cryptic message.
While it makes sense if you know that getpwuid or identity
functions are being called, this code is triggered behind
the scenes by quite a few git commands these days (e.g.,
receive-pack on a remote server might use it for a reflog;
the current message is hard to distinguish from an
authentication error). Let's switch to something that gives
a little more context.
While we're at it, we can factor out all of the
cut-and-pastes of the "you don't exist" message into a
wrapper function. Rather than provide xgetpwuid, let's make
it even more specific to just getting the passwd entry for
the current uid. That's the only way we use getpwuid anyway,
and it lets us make an even more specific error message.
The current message also fails to mention errno. While the
usual cause for getpwuid failing is that the user does not
exist, mentioning errno makes it easier to diagnose these
problems. Note that POSIX specifies that errno remain
untouched if the passwd entry does not exist (but will be
set on actual errors), whereas some systems will return
ENOENT or similar for a missing entry. We handle both cases
in our wrapper.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
drop length limitations on gecos-derived names and emails
When we pull the user's name from the GECOS field of the
passwd file (or generate an email address based on their
username and hostname), we put the result into a
static buffer. While it's extremely unlikely that anybody
ever hit these limits (after all, in such a case their
parents must have hated them), we still had to deal with the
error cases in our code.
Converting these static buffers to strbufs lets us simplify
the code and drop some error messages from the documentation
that have confused some users.
The conversion is mostly mechanical: replace string copies
with strbuf equivalents, and access the strbuf.buf directly.
There are a few exceptions:
- copy_gecos and copy_email are the big winners in code
reduction (since they no longer have to manage the
string length manually)
- git_ident_config wants to replace old versions of
the default name (e.g., if we read the config multiple
times), so it must reset+add to the strbuf instead of
just adding
Note that there is still one length limitation: the
gethostname interface requires us to provide a static
buffer, so we arbitrarily choose 1024 bytes for the
hostname.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ident: don't write fallback username into git_default_name
The fmt_ident function gets a flag that tells us whether to
die if the name field is blank. If it is blank and we don't
die, then we fall back to the username from the passwd file.
The current code writes the value into git_default_name.
However, that's not necessarily correct, as the empty value
might have come from git_default_name, or it might have been
passed in. This leads to two potential problems:
1. If we are overriding an empty name in the passed-in
value, then we may be overwriting a perfectly good name
(from gitconfig or gecos) in the git_default_name
buffer. Later calls to fmt_ident will end up using the
fallback name, even though a better name was available.
2. If we override an empty gecos name, we end up with the
fallback name in git_default_name. A later call that
uses IDENT_ERROR_ON_NO_NAME will see the fallback name
and think that it is a good name, instead of producing
an error. In other words, a blank gecos name would
cause an error with this code:
because in the latter case, the first call has polluted
the name buffer.
Instead, let's make the fallback a per-invocation variable.
We can just use the pw->pw_name string directly, since it
only needs to persist through the rest of the function (and
we don't do any other getpwent calls).
Note that while this solves (1) for future invocations of
fmt_indent, the current invocation might use the fallback
when it could in theory load a better value from
git_default_name. However, by not passing
IDENT_ERROR_ON_NO_NAME, the caller is indicating that it
does not care too much about the name, anyway, so we don't
bother; this is primarily about protecting future callers
who do care.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
format-patch: use default email for generating message ids
We try to generate a sane message id for cover letters and
threading by appending some changing bits to the front of
the user's email address. The current code parses the email
out of the results of git_committer_info, but we can do this
much more easily by just calling ident_default_email
ourselves.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use fgets to read the /etc/mailname file, which means we
will typically end up with an extra newline in our
git_default_email. Most of the time this doesn't matter, as
fmt_ident will skip it as cruft, but there is one code path
that accesses it directly (in http-push.c:lock_remote).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason anybody outside of ident.c should access
these directly (they should use the new accessors which make
sure the variables are initialized), so we can make them
file-scope statics.
While we're at it, move user_ident_explicitly_given into
ident.c; while still globally visible, it makes more sense
to reside with the ident code.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason for this to be in config, except that once
upon a time all of the config parsing was there. It makes
more sense to keep the ident code together.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fmt-merge-msg: don't use static buffer in record_person
The record_person function just parses out the "name" field
of the person line in a commit and adds it to a string_list.
The only reason we need an extra buffer is that the
string_list functions require a NUL-terminated string.
Instead of the static buffer, we can just allocate a
temporary NUL-terminated copy. In addition to removing a
useless limit, this removes the only user of MAX_GITNAME
outside of ident.c.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function sets up the default name, email, and date, and
is not publicly available. Let's split it into three public
functions so that callers can get just the parts they need.
While we're at it, let's change the interface to simple
accessors. The original function was called only by fmt_ident,
and contained logic for "if we already have some other
value, don't load the default" which properly belongs in
fmt_ident.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading patterns from a file, we pass the lines as allocated string
buffers to append_grep_pat() and never free them. That's not a problem
because they are needed until the program ends anyway.
However, now that the function duplicates the pattern string, we can
reuse the strbuf after calling that function. This simplifies the code
a bit and plugs a minor memory leak.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>