utf8.c: fix strbuf_utf8_replace() consuming data beyond input string
The main loop in strbuf_utf8_replace() could summed up as:
while ('src' is still valid) {
1) advance 'src' to copy ANSI escape sequences
2) advance 'src' to copy/replace visible characters
}
The problem is after #1, 'src' may have reached the end of the string
(so 'src' points to NUL) and #2 will continue to copy that NUL as if
it's a normal character. Because the output is stored in a strbuf,
this NUL accounted in the 'len' field as well. Check after #1 and
break the loop if necessary.
The test does not look obvious, but the combination of %>>() should
make a call trace like this
where %C(auto)%d would insert a color reset escape sequence in the end
of the string given to strbuf_utf8_replace() and show_log() uses
fwrite() to send everything to stdout (including the incorrect NUL
inserted by strbuf_utf8_replace)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
read-cache: check for leading symlinks when refreshing index
Don't add paths with leading symlinks to the index while refreshing; we
only track those symlinks themselves. We already ignore them while
preloading (see read_index_preload.c).
Reported-by: Nikolay Avdeev <avdeev@math.vsu.ru> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit c9a42c4 (bundle: allow rev-list options to exclude annotated
tags, 2009-01-02), support for excluding annotated tags outside the
specified date range was added. However, the wrong order of parameters
was chosen when calling memchr().
Fix this by swapping the character to search for with the maximum length
parameter. Also cover this behavior with an additional test.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you list stashes, you can provide arbitrary git-log
options to change the display. However, adding just "-p"
does nothing, because each stash is actually a merge commit.
This implementation detail is easy to forget, leading to
confused users who think "-p" is not working. We can make
this easier by defaulting to "--first-parent -m", which will
show the diff against the working tree. This omits the
index portion of the stash entirely, but it's simple and it
matches what "git stash show" provides.
People who are more clueful about stash's true form can use
"--cc" to override the "-m", and the "--first-parent" will
then do nothing. For diffs, it only affects non-combined
diffs, so "--cc" overrides it. And for the traversal, we are
walking the linear reflog anyway, so we do not even care
about the parents.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
branch.c: replace `git_config()` with `git_config_get_string()
Use `git_config_get_string()` instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow. While we are at
it, return -1 if we find no value for the queried variable. Original code
returned 0 for all cases, which was checked by `add_branch_desc()` in
fmt-merge-msg.c resulting in addition of a spurious newline to the `out`
strbuf. Now, the newline addition is skipped as -1 is returned to the caller
if no value is found.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whitespace breakages are checked while the patch is being parsed.
Disable them at the beginning of parse_chunk(), where each
individual patch is parsed, immediately after we learn the name of
the file the patch applies to and before we start parsing the diff
contained in the patch.
One may naively think that we should be able to not just skip the
whitespace checks but simply fast-forward to the next patch without
doing anything once use_patch() tells us that this patch is not
going to be used. But in reality we cannot really skip much of the
parsing in order to do such a "fast-forward", primarily because
parsing "@@ -k,l +m,n @@" lines and counting the input lines is how
we determine the boundaries of individual patches.
apply: use the right attribute for paths in non-Git patches
We parse each patchfile and find the name of the path the patch
applies to, and then use that name to consult the attribute system
to find the whitespace rules to be used, and also the target file
(either in the working tree or in the index) to replay the changes
against.
Unlike a Git-generated patch, a non-Git patch is taken to have the
pathnames relative to the current working directory. The names
found in such a patch are modified by prepending the prefix by the
prefix_patches() helper function introduced in 56185f49 (git-apply:
require -p<n> when working in a subdirectory., 2007-02-19).
However, this prefixing is done after the patch is fully parsed and
affects only what target files are patched. Because the attributes
are checked against the names found in the patch during the parsing,
not against the final pathname, the whitespace check that is done
during parsing ends up using attributes for a wrong path for non-Git
patches.
Fix this by doing the prefix much earlier, immediately after the
header part of each patch is parsed and we learn the name of the
path the patch affects.
Add tests for `git_config_get_string_const()`, check whether it
dies printing the line number and the file name if a NULL
value is retrieved for the given key.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Semantic errors (for example, for alias.* variables NULL values are
not allowed) in configuration files cause a die printing the line
number and file name of the offending value.
Add a test documenting that such errors cause a die printing the
accurate line number and file name.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.
Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
config: add `git_die_config()` to the config-set API
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.
It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,
if (!git_config_get_value(key, &value)){
if (!strcmp(value, "foo"))
git_config_die(key, "value: `%s` is illegal", value);
else
/* do work */
}
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.
Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.
Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.
Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a callback returns a negative value to `git_config*()` family,
they call `die()` while printing the line number and the file name.
Currently the printed line number is off by one, thus printing the
wrong line number.
Make `linenr` point to the line we just parsed during the call
to callback to get accurate line number in error messages.
Commit-message-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
imap-send doc: omit confusing "to use imap-send" modifier
It wouldn't make sense for these configuration variables to be
required for Git in general to function. 'Required' in this context
means required for git imap-send to work.
Noticed while trying to figure out what the sentence describing
imap.tunnel meant.
[jn: expanded to also simplify explanation of imap.folder and
imap.host in the same way]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pretty.c: make git_pretty_formats_config return -1 on git_config_string failure
`git_pretty_formats_config()` continues without checking git_config_string's
return value which can lead to a SEGFAULT. Instead return -1 when
git_config_string fails signalling `git_config()` to die printing the location
of the erroneous variable.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git archive's tar format uses extended pax headers to encode metadata
into the archive. Most tar implementations correctly treat these as
metadata, but some that do not understand the pax format extract these
as files instead. Apply the tar.umask setting to these entries to
prevent tampering by other users.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make tests pass on msysgit by mostly disabling ones that are
infeasible on that platform.
* sk/mingw-tests-workaround:
t800[12]: work around MSys limitation
t9902: mingw-specific fix for gitfile link files
t4210: skip command-line encoding tests on mingw
MinGW: disable legacy encoding tests
t0110/MinGW: skip tests that pass arbitrary bytes on the command line
MinGW: Skip test redirecting to fd 4
If the user provides an empty format with "--format=", we
end up putting in extra whitespace that the user cannot
prevent. This comes from two places:
1. If the format is missing a terminating newline, we add
one automatically. This makes sense for --format=%h, but
not for a truly empty format.
2. We add an extra newline between the pretty-printed
format and a diff or diffstat. If the format is empty,
there's no point in doing so if there's nothing to
separate.
With this patch, one can get a diff with no other cruft out
of "diff-tree --format= $commit".
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, we treated "--pretty=" or "--format=" as "give me
the default format". This was not planned nor documented,
but only what happened to work due to our parsing of
"--pretty" (which should give the default format).
Let's instead let these be an actual empty userformat.
Otherwise one must write out the annoyingly long
"--pretty=tformat:" to get the same behavior.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
revision: drop useless string offset when parsing "--pretty"
Once upon a time, we parsed pretty options by looking for
"--pretty" at the start of the string, and then feeding the
rest (including an "=") to get_commit_format. Later, commit 48ded91 (log --pretty: do not accept bogus "--prettyshort",
2008-05-25) split this into a separate check for "--pretty"
versus "--pretty=".
However, when parsing "--pretty", we still passed "arg+8" to
get_commit_format. This is useless, since it will always
point to the NUL terminator at the end of the string. We can
simply pass NULL instead; both parameters are treated the
same by get_commit_format.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit --amend: test specifies authorship but forgets to check
The test case "--amend option copies authorship" specifies that the
git-commit option `--amend` uses the authorship of the replaced
commit for the new commit. Add the omitted check that this property
actually holds.
Signed-off-by: Fabian Ruch <bafain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remotes are stored as an array, so looking one up or adding one without
duplication is an O(n) operation. Reading an entire config file full of
remotes is O(n^2) in the number of remotes. For a repository with tens of
thousands of remotes, the running time can hit multiple minutes.
Hash tables are way faster. So we add a hashmap from remote name to
struct remote and use it for all lookups. The time to add a new remote to
a repo that already has 50,000 remotes drops from ~2 minutes to < 1
second.
We retain the old array of remotes so iterators proceed in config-file
order.
Signed-off-by: Patrick Reynolds <patrick.reynolds@github.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the `config_set` C API as a set of simple commands in order to
facilitate testing. Add tests for the `config_set` API as well as for
`git_config_get_*()` family for the usual config files.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
add `config_set` API for caching config-like files
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.
Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.
Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.
Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Feeding the result of a real_path() call to real_path() again doesn't
change that result -- the returned path won't get any more real. Avoid
such a double call in set_git_dir_init() and for the same reason stop
calling real_path() before feeding paths to set_git_work_tree(), as the
latter already takes care of that.
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a PATH_MAX-sized buffer, which can be too small on some
file systems, use strbuf_getcwd(), which handles any path getcwd()
returns. Also preserve the errno set by strbuf_getcwd() instead of
setting it to ENAMETOOLONG; that way a more appropriate error message
can be shown based on the actual reason for failing.
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add strbuf_getcwd(), which puts the current working directory into a
strbuf. Because it doesn't use a fixed-size buffer it supports
arbitrarily long paths, provided the platform's getcwd() does as well.
At least on Linux and FreeBSD it handles paths longer than PATH_MAX
just fine.
Suggested-by: Karsten Blees <karsten.blees@gmail.com> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
t4013: test diff-tree's --stdin commit formatting
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
Once upon a time, git-log was just "rev-list | diff-tree",
and we did not bother to test it separately. These days git-log
is implemented internally, but we want to make sure that the
rev-list to diff-tree pipeline continues to function. Let's
add a basic sanity test.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'jk/alloc-commit-id-maint' into maint
* jk/alloc-commit-id-maint:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
We generally want to avoid lookup_unknown_object, because it
results in allocating more memory for the object than may be
strictly necessary.
In this case, it is used to check whether we have an
already-parsed object before calling parse_object, to save
us from reading the object from disk. Using lookup_object
would be fine for that purpose, but we can take it a step
further. Since this code was written, parse_object already
learned the "check lookup_object" optimization, so we can
simply call parse_object directly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of the "index" field of struct commit is that
every allocated commit would have one. It is supposed to be
an invariant that whenever object->type is set to
OBJ_COMMIT, we have a unique index.
Commit 969eba6 (commit: push commit_index update into
alloc_commit_node, 2014-06-10) covered this case for
newly-allocated commits. However, we may also allocate an
"unknown" object via lookup_unknown_object, and only later
convert it to a commit. We must make sure that we set the
commit index when we switch the type field.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.
While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only way that "obj" can be non-NULL is if it came from
one of the lookup_* functions. These functions always ensure
that the object has the expected type (and return NULL
otherwise), so there is no need for us to set the type.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the allocator functions for tree, blobs, etc are all
very similar, we originally used a macro to avoid repeating
ourselves. Since the prior commit, though, the heavy lifting
is done by an inline helper function. The macro does still
save us a few lines, but at some readability cost. It
obfuscates the function definitions (and makes them hard to
find via grep).
Much worse, though, is the fact that it isn't used
consistently for all allocators. Somebody coming later may
be tempted to modify DEFINE_ALLOCATOR, but they would miss
alloc_commit_node, which is treated specially.
Let's just drop the macro and write everything out
explicitly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>