Use API to read blob data in smaller chunks in more places to reduce the
memory footprint.
By Nguyễn Thái Ngọc Duy (6) and Junio C Hamano (1)
* nd/stream-more:
update-server-info: respect core.bigfilethreshold
fsck: use streaming API for writing lost-found blobs
show: use streaming API for showing blobs
parse_object: avoid putting whole blob in core
cat-file: use streaming API to print blobs
Add more large blob test cases
streaming: make streaming-write-entry to be more reusable
Be more specific if upstream branch is not tracked
If the branch configured as upstream didn't have a local tracking
branch, git said "Upstream branch not found". We can be more helpful,
and separate the cases when upstream is not configured, and when it is
configured, but the upstream branch is not tracked in a local branch.
The following configuration leads to the second scenario:
Instead of just saying that no upstream exists for such branch,
which is true but not very helpful, check that there's no
refs/heads/barnhc_wiht_tpyo and tell it to the user.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide branch name in error message when using @{u}
When using @{u} or @{upstream} it is common to omit the branch name,
implying current branch. If the upstream is not configured, the error
message was "No upstream branch found for ''".
When resolving '@{u}', branch_get() is called, which almost always
returns a description of a branch. This allows us to use a branch name
in the error message, even if the user said something like '@{u}'.
The only case when branch_get() returns NULL is when HEAD points to so
something which is not a branch. Of course this also means that no
upstream is configured, but it is better to directly say that HEAD
does not point to a branch.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1507: add tests to document @{upstream} behaviour
In preparation for future changes, add tests which show error messages
with @{upstream} in various conditions:
- test branch@{u} with . as remote
- check error message for branch@{u} on a branch with
* no upstream,
* on a branch with a configured upstream which doesn't have a
remote-tracking branch
- check error message for branch@{u} when branch 'branch' does not
exist
- check error message for @{u} without the branch name
Right now the messages are very similar, but various cases can and
will be distinguished.
Note: test_i18ncmp is not used, because currently error output is not
internationalized. test_cmp will be switched to test_i18ncmp in a later
patch, when error messages are internationalized.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-import doc: cat-blob and ls responses need to be consumed quickly
If fast-import's command pipe and the frontend's cat-blob/ls response
pipe are both filled, there can be a deadlock. Luckily all existing
frontends consume any pending cat-blob/ls responses completely before
writing the next command.
Document the requirements so future frontend authors and users can be
spared from the problem, too. It is not always easy to catch that
kind of bug by testing.
To set the scene, add some words of explanation to help the novice
understand that "cat-blob" and "ls" output are meant for consumption
by the frontend.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodules: recursive fetch also checks new tags for submodule commits
Since 88a21979c (fetch/pull: recurse into submodules when necessary) all
fetched commits are examined if they contain submodule changes (unless
configuration or command line options inhibit that). If a newly recorded
submodule commit is not present in the submodule, a fetch is run inside
it to download that commit.
Checking new refs was done in an else branch where it wasn't executed for
tags. This normally isn't a problem because tags are only fetched with
the branches they live on, then checking the new commits in the fetched
branches for submodule commits will also process all tags. But when a
specific tag is fetched (or the refspec contains refs/tags/) commits only
reachable by tags won't be searched for submodule commits, which is a bug.
Fix that by moving the code outside the if/else construct to handle new
tags just like any other ref. The performance impact of adding tags that
most of the time lie on a branch which is checked anyway for new submodule
commit should be minimal, as since 6859de4 (fetch: avoid quadratic loop
checking for updated submodules) all ref-tips are collected first and then
fed to a single rev-list.
Spotted-by: Jeff King <peff@peff.net> Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
test: use test_i18ncmp for "Patch format detection failed" message
v1.7.8.5~2 (am: don't infloop for an empty input file, 2012-02-25)
added a check for the human-readable message "Patch format detection
failed." but we forgot to suppress that check when running tests with
git configured to write output in another language.
Noticed by running tests with GETTEXT_POISON=YesPlease.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
test: do not rely on US English tracking-info messages
When v1.7.9.2~28^2 (2012-02-02) marked "Your branch is behind" and
friends for translation, it forgot to adjust tests not to check those
messages when tests are being run with git configured to write its
output in another language.
With this patch applied, t2020 and t6040 pass again with
GETTEXT_POISON=YesPlease.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Explained-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply: document buffer ownership rules across functions
In general, the private functions in this file were not very
much documented; even though what each of them do is reasonably
self explanatory, the ownership rules for various buffers and
data structures were not very obvious.
gitweb: Refinement highlightning in combined diffs
The highlightning of combined diffs is currently disabled. This is
because output from a combined diff is much harder to highlight because
it is not obvious which removed and added lines should be compared.
Current code requires that the number of added lines is equal to the
number of removed lines and only skips first +/- character, treating
second +/- as a line content, Thus, it is not possible to simply use
existing algorithm unchanged for combined diffs.
Let's start with a simple case: only highlight changes that come from
one parent, i.e. when every removed line has a corresponding added line
for the same parent. This way the highlightning cannot get wrong. For
example, following diffs would be highlighted:
- removed line for first parent
+ added line for first parent
context line
-removed line for second parent
+added line for second parent
or
- removed line for first parent
-removed line for second parent
+ added line for first parent
+added line for second parent
but following output will not:
- removed line for first parent
-removed line for second parent
+added line for second parent
++added line for both parents
In other words, we require that pattern of '-'-es in pre-image matches
pattern of '+'-es in post-image.
Further changes may introduce more intelligent approach that better
handles combined diffs.
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reading diff output is sometimes very hard, even if it's colored,
especially if lines differ only in few characters. This is often true
when a commit fixes a typo or renames some variables or functions.
This commit teaches gitweb to highlight characters that are different
between old and new line with a light green/red background. This should
work in the similar manner as in Trac or GitHub.
The algorithm that compares lines is based on contrib/diff-highlight.
Basically, it works by determining common prefix/suffix of corresponding
lines and highlightning only the middle part of lines. For more
information, see contrib/diff-highlight/README.
Combined diffs are not supported but a following commit will change it.
Since we need to pass esc_html()'ed or esc_html_hl_regions()'ed lines to
format_diff_lines(), so it was taught to accept preformatted lines
passed as a reference.
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: Push formatting diff lines to print_diff_chunk()
Now lines are formatted closer to place where we actually use HTML
formatted output.
This means that we put raw lines in the @chunk accumulator, rather than
formatted lines. Because we still need to know class (type) of line
when accumulating data to post-process and print, process_diff_line()
subroutine was retired and replaced by diff_line_class() used in
git_patchset_body() and new restructured format_diff_line() used in
print_diff_chunk().
As a side effect, we have to pass \%from and \%to down to callstack.
This is a preparation patch for diff refinement highlightning. It's not
meant to change gitweb output.
[jn: wrote commit message]
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: Use print_diff_chunk() for both side-by-side and inline diffs
This renames print_sidebyside_diff_chunk() to print_diff_chunk() and
makes use of it for both side-by-side and inline diffs. Now diff lines
are always accumulated before they are printed. This opens the
possibility to preprocess diff output before it's printed, which is
needed for diff refinement highlightning (implemented in incoming
patches).
If print_diff_chunk() was left as is, the new function
print_inline_diff_lines() could reorder diff lines. It first prints all
context lines, then all removed lines and finally all added lines. If
the diff output consisted of mixed added and removed lines, gitweb would
reorder these lines. This is true for combined diff output, for
example:
- removed line for first parent
+ added line for first parent
-removed line for second parent
++added line for both parents
would be rendered as:
- removed line for first parent
-removed line for second parent
+ added line for first parent
++added line for both parents
To prevent gitweb from reordering lines, print_diff_chunk() calls
print_diff_lines() as soon as it detects that both added and removed
lines are present and there was a class change, and at the end of chunk.
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, print_sidebyside_diff_chunk() does two things: it
accumulates diff lines and prints them. Accumulation may be used to
perform additional operations on diff lines, so it makes sense to split
these two things. Thus, whole code that formats and prints diff lines
in the 'side-by-side' manner is moved out of print_sidebyside_diff_chunk()
to a separate subroutine and two conditions that control printing
diff liens are merged.
Thanks to that, we can easily (in later patches) replace call to that
subroutine with a call to more generic print_diff_lines() that will
control whether 'inline' or 'side-by-side' diff should be printed.
As a side effect, context lines are printed just before printing added
and removed lines, and at the end of chunk (previously, they were
printed immediately on the class change). However, this doesn't change
gitweb output.
The outcome of this patch is that print_sidebyside_diff_chunk() is now
much shorter and easier to read.
While at it, drop the '# assume that it is change' comment. According
to Jakub Narębski:
What I meant here when I was writing it that they are lines that
changed between two versions, like '!' in original (not unified)
context format.
We can omit this comment.
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: Pass esc_html_hl_regions() options to esc_html()
With this change, esc_html_hl_regions() accepts options and passes them
down to esc_html(). This may be needed if a caller wants to pass
-nbsp=>1 to esc_html().
The idea and implementation example of this change was described in 337da8d2 (gitweb: Introduce esc_html_match_hl and esc_html_hl_regions,
2012-02-27). While other suggestions may be more useful in some cases,
there is no need to implement them at the moment. The
esc_html_hl_regions() interface may be changed later if it's needed.
[mk: extracted from larger patch and wrote commit message]
Signed-off-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: esc_html_hl_regions(): Don't create empty <span> elements
If $end is equal to or less than $begin, esc_html_hl_regions()
generates an empty <span> element. It normally shouldn't be visible in
the web browser, but it doesn't look good when looking at page source.
It also minimally increases generated page size for no special reason.
Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitweb: Use descriptive names in esc_html_hl_regions()
The $s->[0] and $s->[1] variables look a bit cryptic. Let's rename them
to $begin and $end so that it's clear what they do.
Suggested-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
I tentatively named the release notes "1.7.11" but this may have to
be renamed to "1.8" or some other name later. Let's see how well
we would do during this cycle.
gitweb: Fix unintended "--no-merges" for regular Atom feed
The print_feed_meta() subroutine generates links for feeds with and
without merges, in RSS and Atom formats. However because %href_params
was not properly reset, it generated links with "--no-merges" for all
except the very first link.
Before:
<link rel="alternate" title="[..] - Atom feed" href="/?p=.git;a=atom;opt=--no-merges" type="application/atom+xml" />
<link rel="alternate" title="[..] - Atom feed (no merges)" href="/?p=.git;a=atom;opt=--no-merges" type="application/atom+xml" />
After:
<link rel="alternate" title="[..] - Atom feed" href="/?p=.git;a=atom" type="application/atom+xml" />
<link rel="alternate" title="[..] - Atom feed (no merges)" href="/?p=.git;a=atom;opt=--no-merges" type="application/atom+xml" />
Signed-off-by: Sebastian Pipping <sebastian@pipping.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
revision: insert unsorted, then sort in prepare_revision_walk()
Speed up prepare_revision_walk() by adding commits without sorting
to the commit_list and at the end sort the list in one go. Thanks
to mergesort() working behind the scenes, this is a lot faster for
large numbers of commits than the current insert sort.
Also introduce and use commit_list_reverse(), to keep the ordering
of commits sharing the same commit date unchanged. That's because
commit_list_insert_by_date() sorts commits with descending date,
but adds later entries with the same date entries last, while
commit_list_insert() always inserts entries at the top. The
following commit_list_sort_by_date() keeps the order of entries
sharing the same date.
Jeff's test case, in a repo with lots of refs, was to run:
# make a new commit on top of HEAD, but not yet referenced
sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null`
# now do the same "connected" test that receive-pack would do
git rev-list --objects $sha1 --not --all
With a git.git with a ref for each revision, master needs (best of
five):
real 0m2.210s
user 0m2.188s
sys 0m0.016s
And with this patch:
real 0m0.480s
user 0m0.456s
sys 0m0.020s
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit: use mergesort() in commit_list_sort_by_date()
Replace the insertion sort in commit_list_sort_by_date() with a
call to the generic mergesort function. This sets the stage for
using commit_list_sort_by_date() for larger lists, as shown in
the next patch.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a generic bottom-up mergesort implementation for singly linked
lists. It was inspired by Simon Tatham's webpage on the topic[1], but
not so much by his implementation -- for no good reason, really, just a
case of NIH.
The allocations made by unpack_nondirectories() using create_ce_entry()
are never freed.
In the non-merge case, we duplicate them using add_entry() and later
only look at the first allocated element (src[0]), perhaps even only
by mistake. Split out the actual addition from add_entry() into the
new helper do_add_entry() and call this non-duplicating function
instead of add_entry() to avoid the leak.
Valgrind reports this for the command "git archive v1.7.9" without
the patch:
==13372== LEAK SUMMARY:
==13372== definitely lost: 230,986 bytes in 2,325 blocks
==13372== indirectly lost: 0 bytes in 0 blocks
==13372== possibly lost: 98 bytes in 1 blocks
==13372== still reachable: 2,259,198 bytes in 3,243 blocks
==13372== suppressed: 0 bytes in 0 blocks
And with the patch applied:
==13375== LEAK SUMMARY:
==13375== definitely lost: 65 bytes in 1 blocks
==13375== indirectly lost: 0 bytes in 0 blocks
==13375== possibly lost: 0 bytes in 0 blocks
==13375== still reachable: 2,364,417 bytes in 3,245 blocks
==13375== suppressed: 0 bytes in 0 blocks
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
unpack-trees: don't perform any index operation if we're not merging
src[0] points to the index entry in the merge case and to the first
tree to unpack in the non-merge case. We only want to mark the index
entry, so check first if we're merging.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store references hierarchically in a tree that matches the
pseudo-directory structure of the reference names. Add a new kind of
ref_entry (with flag REF_DIR) to represent a whole subdirectory of
references. Sort ref_dirs one subdirectory at a time.
NOTE: the dirs can now be sorted as a side-effect of other function
calls. Therefore, it would be problematic to do something from a
each_ref_fn callback that could provoke the sorting of a directory
that is currently being iterated over (i.e., the directory containing
the entry that is being processed or any of its parents).
This is a bit far-fetched, because a directory is always sorted just
before being iterated over. Therefore, read-only accesses cannot
trigger the sorting of a directory whose iteration has already
started. But if a callback function would add a reference to a parent
directory of the reference in the iteration, then try to resolve a
reference under that directory, a re-sort could be triggered and cause
the iteration to work incorrectly.
Nevertheless...add a comment in refs.h warning against modifications
during iteration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This purely textual change is in preparation for storing references
hierarchically, when the old ref_array structure will represent one
"directory" of references. Rename functions that deal with this
structure analogously, and also rename the structure's "refs" member
to "entries".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reorder definitions in file: first check_refname_format() and helper
functions, then the functions for managing the ref_entry and ref_array
data structures, then ref_cache, then the more "business-logicky"
stuff. No code is changed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
var doc: advertise current DEFAULT_PAGER and DEFAULT_EDITOR settings
Document the default pager and editor chosen at compile time in the
git-var(1) manpage so users curious about what command _this_ copy of
git will fall back to when EDITOR, VISUAL, and PAGER are unset can
find the answer quickly.
In builds leaving those settings uncustomized, this patch makes the
manpage continue to say "usually vi" and "usually less" so the
formatted documentation is usable for a wide audience including users
of custom builds that change those settings. If you would like your
copy of the docs to be less noncommittal, you will need to set
DEFAULT_PAGER=less and DEFAULT_EDITOR=vi explicitly.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl: main test case for the OS command line overflow
This is main test case for the original problem that triggered this
patch series. We create a repo with 50k tags and then test whether
git-clone over the smart HTTP protocol succeeds.
Note that we construct the repo in a slightly different way than the
original script used to reproduce the problem. This is because the
original script just created 50k tags all pointing to the same commit,
so if there was a bug where remote-curl.c was not passing all the refs
to fetch-pack we wouldn't know. The clone would succeed even if only one
tag was passed, because all the other tags were pointing at the same SHA
and would be considered present.
Instead we create a repo with 50k independent (dangling) commits and
then tag each of those commits with a unique tag. This way if one of the
tags is not given to fetch-pack, later stages of the clone would
complain about it.
This allows us to test both that the command line overflow was fixed, as
well as that it was fixed in a way that doesn't leave out any of the
refs.
Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
These test cases focus only on testing the parsing of refs on stdin,
without bothering with the rest of the fetch-pack machinery. We pass in
the refs using different combinations of command line and stdin and then
we watch fetch-pack's stdout to see whether it prints all the refs we
specified (but we ignore their order).
Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can throw an arbitrary number of refs at fetch-pack using
its --stdin option, we use it in the remote-curl helper to bypass the
OS command line length limit.
Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The syntax for the use of mark references in fast-import
demands either a SP (space) or LF (end-of-line) after
a mark reference. Fast-import does not complain when garbage
appears after a mark reference in some cases.
Factor out parsing of mark references and complain if
errant characters are found. Also be a little more careful
when parsing "inline" and SHA1s, complaining if extra
characters appear or if the form of the dataref is unrecognized.
Buggy input can cause fast-import to produce the wrong output,
silently, without error. This makes it difficult to track
down buggy generators of fast-import streams. An example is
seen in the last line of this commit command:
commit refs/heads/S2
committer Name <name@example.com> 1112912893 -0400
data <<COMMIT
commit message
COMMIT
from :1M 100644 :103 hello.c
It is missing a newline and should be:
[...]
from :1
M 100644 :103 hello.c
What fast-import does is to produce a commit with the same
contents for hello.c as in refs/heads/S2^. What the buggy
program was expecting was the contents of blob :103. While
the resulting commit graph looked correct, the contents in
some commits were wrong.
Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check if we even have a parameter before checking its value. Running
this command without any arguments may not make a lot of sense, but
reacting with a segmentation fault is unduly harsh.
While we're at it, avoid casting argv by declaring it const right away.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Print out a trailing newline when --show-prefix is run with cwd
at the top level of the tree which results in an empty prefix.
Behavior is now like --show-cdup.
Fixes an expected failure in t1501.
Signed-off-by: Ross Lagerwall <rosslagerwall@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a Makefile to run subtree tests. This is largely copied
from the standard test suite with irrelevant targets removed
and some paths altered to account for where subtree tests live.
Signed-off-by: David A. Greene <greened@obbligato.org>
Build git-subtree in its contrib directory and install from there.
The main Makefile no longer discovers subcommands build in the main
build area so we cannot count on it to install git-subtree. The user
should make && make install in contrib/subtree to install git-subtree.
Change the rule to install the git-subtree manpage. The main
Documentation area doesn't directly support installing documentation
from other directories so the user will have to do that from within
contrib/subtree for now.
Signed-off-by: David A. Greene <greened@obbligato.org>
rebase -i continue: don't skip commits that only change submodules
When git-rebase--interactive stops due to a conflict and the only change
to be committed is in a submodule, the test for whether there is
anything to be committed ignores the staged submodule change. This
leads rebase to skip creating the commit for the change.
While unstaged submodule changes should be ignored to avoid needing to
update submodules during a rebase, it is safe to remove the
--ignore-submodules option to diff-index because --cached ensures that
it is only checking the index. This was discussed in [1] and a test is
included to ensure that unstaged changes are still ignored correctly.
Avoid bug in Solaris xpg4/sed as used in submodule
The sed provided by Solaris in /usr/xpg4/bin has a bug whereby an
unanchored regex using * for zero or more repetitions sees two
separate matches fed to the substitution engine in some cases.
This is evidenced by:
$ for sed in /usr/xpg4/bin/sed /usr/bin/sed /opt/csw/gnu/sed; do \
echo 'ab' | $sed -e 's|[a]*|X|g'; \
done
XXbX
XbX
XbX
This bug was triggered during a git submodule clone operation as
exercised in the setup stage of t5526-fetch-submodules when using the
default SANE_TOOL_PATH for Solaris. It led to paths such as
..../.. being used in the submodule .git gitdir reference.
Using the expression 's|\([^/]*\(/*\)\)|..\2|g' provides the desired
result with all three three tested sed implementations but is harder
to read. As we do not need to handle fully qualified paths though,
the expression could actually be [^/]+ which isn't properly handled
either. Instead, use [^/][^/]*, as suggested by Andreas Schwab, which
works on all three tested sed implementations.
The new expression is semantically different than the original one.
It will not place a leading '..' on a fully qualified path as the
original expression did. All of the paths being passed through this
regex are relative and did not rely on this behaviour so it's a safe
change.
Signed-off-by: Ben Walton <bwalton@artsci.utoronto.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/cache-tree:
t0090: be prepared that 'wc -l' writes leading blanks
reset: update cache-tree data when appropriate
commit: write cache-tree data when writing index anyway
Refactor cache_tree_update idiom from commit
Test the current state of the cache-tree optimization
Add test-scrap-cache-tree
Merge branch 'cb/maint-t5541-make-server-port-portable' into maint-1.7.8
* cb/maint-t5541-make-server-port-portable:
t5541: check error message against the real port number used
remote-curl: Fix push status report when all branches fail
Merge branch 'tr/maint-bundle-boundary' into maint-1.7.8
* tr/maint-bundle-boundary:
bundle: keep around names passed to add_pending_object()
t5510: ensure we stay in the toplevel test dir
t5510: refactor bundle->pack conversion
Merge branch 'tr/maint-bundle-long-subject' into maint-1.7.8
* tr/maint-bundle-long-subject:
t5704: match tests to modern style
strbuf: improve strbuf_get*line documentation
bundle: use a strbuf to scan the log for boundary commits
bundle: put strbuf_readline_fd in strbuf.c with adjustments
run-command: treat inaccessible directories as ENOENT
When execvp reports EACCES, it can be one of two things:
1. We found a file to execute, but did not have
permissions to do so.
2. We did not have permissions to look in some directory
in the $PATH.
In the former case, we want to consider this a
permissions problem and report it to the user as such (since
getting this for something like "git foo" is likely a
configuration error).
In the latter case, there is a good chance that the
inaccessible directory does not contain anything of
interest. Reporting "permission denied" is confusing to the
user (and prevents our usual "did you mean...?" lookup). It
also prevents git from trying alias lookup, since we do so
only when an external command does not exist (not when it
exists but has an error).
This patch detects EACCES from execvp, checks whether we are
in case (2), and if so converts errno to ENOENT. This
behavior matches that of "bash" (but not of simpler shells
that use execvp more directly, like "dash").
Test stolen from Junio.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
compat/mingw.[ch]: Change return type of exec functions to int
The POSIX standard specifies a return type of int for all six exec
functions. In addition, all exec functions return -1 on error, and
simply do not return on success. However, the current emulation of
the exec functions on mingw are declared with a void return type.
This would cause a problem should any code attempt to call the
exec function in a non-void context. In particular, if an exec
function were used in a conditional it would fail to compile.
In order to improve the fidelity of the emulation, we change the
return type of the mingw_execv[p] functions to int and return -1
on error.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
push: error out when the "upstream" semantics does not make sense
The user can say "git push" without specifying any refspec. When using
the "upstream" semantics via the push.default configuration, the user
wants to update the "upstream" branch of the current branch, which is the
branch at a remote repository the current branch is set to integrate with,
with this command.
However, there are cases that such a "git push" that uses the "upstream"
semantics does not make sense:
- The current branch does not have branch.$name.remote configured. By
definition, "git push" that does not name where to push to will not
know where to push to. The user may explicitly say "git push $there",
but again, by definition, no branch at repository $there is set to
integrate with the current branch in this case and we wouldn't know
which remote branch to update.
- The current branch does have branch.$name.remote configured, but it
does not specify branch.$name.merge that names what branch at the
remote this branch integrates with. "git push" knows where to push in
this case (or the user may explicitly say "git push $remote" to tell us
where to push), but we do not know which remote branch to update.
- The current branch does have its remote and upstream branch configured,
but the user said "git push $there", where $there is not the remote
named by "branch.$name.remote". By definition, no branch at repository
$there is set to integrate with the current branch in this case, and
this push is not meant to update any branch at the remote repository
$there.
The first two cases were already checked correctly, but the third case was
not checked and we ended up updating the branch named branch.$name.merge
at repository $there, which was totally bogus.
add--interactive: ignore unmerged entries in patch mode
When "add -p" sees an unmerged entry, it shows the combined
diff and then immediately skips the hunk. This can be
confusing in a variety of ways, depending on whether there
are other changes to stage (in which case you get the
superfluous combined diff output in between other hunks) or
not (in which case you get the combined diff and the program
exits immediately, rather than seeing "No changes").
The current behavior was not planned, and is just what the
implementation happens to do. Instead, let's explicitly
remove unmerged entries from our list of modified files, and
print a warning that we are ignoring them.
We can cheaply find which entries are unmerged by adding
"--raw" output to the "diff-files --numstat" we already run.
There is one non-obvious thing we must change when parsing
this combined output. Before this patch, when we saw a
numstat line for a file that did not have index changes, we
would create a new record with 'unchanged' in the 'INDEX'
field. Because "--raw" comes before "--numstat", we must
move this special-case down to the raw-line case (and it is
sufficient to move it rather than handle it in both places,
since any file which has a --numstat will also have a --raw
entry).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use SHELL_PATH from build system in run_command.c:prepare_shell_cmd
During the testing of the 1.7.10 rc series on Solaris for OpenCSW, it
was discovered that t7006-pager was failing due to finding a bad "sh"
in PATH after a call to execvp("sh", ...). This call was setup by
run_command.c:prepare_shell_cmd.
The PATH in use at the time saw /opt/csw/bin given precedence to
traditional Solaris paths such as /usr/bin and /usr/xpg4/bin. A
package named schilyutils (Joerg Schilling's utilities) was installed
on the build system and it delivered a modified version of the
traditional Solaris /usr/bin/sh as /opt/csw/bin/sh. This version of
sh suffers from many of the same problems as /usr/bin/sh.
The command-specific pager test failed due to the broken "sh" handling
^ as a pipe character. It tried to fork two processes when it
encountered "sed s/^/foo:/" as the pager command. This problem was
entirely dependent on the PATH of the user at runtime.
Possible fixes for this issue are:
1. Use the standard system() or popen() which both launch a POSIX
shell on Solaris as long as _POSIX_SOURCE is defined.
2. The git wrapper could prepend SANE_TOOL_PATH to PATH thus forcing
all unqualified commands run to use the known good tools on the
system.
3. The run_command.c:prepare_shell_command() could use the same
SHELL_PATH that is in the #! line of all all scripts and not rely
on PATH to find the sh to run.
Option 1 would preclude opening a bidirectional pipe to a filter
script and would also break git for Windows as cmd.exe is spawned from
system() (cf. v1.7.5-rc0~144^2, "alias: use run_command api to execute
aliases, 2011-01-07).
Option 2 is not friendly to users as it would negate their ability to
use tools of their choice in many cases. Alternately, injecting
SANE_TOOL_PATH such that it takes precedence over /bin and /usr/bin
(and anything with lower precedence than those paths) as
git-sh-setup.sh does would not solve the problem either as the user
environment could still allow a bad sh to be found. (Many OpenCSW
users will have /opt/csw/bin leading their PATH and some subset would
have schilyutils installed.)
Option 3 allows us to use a known good shell while still honouring the
users' PATH for the utilities being run. Thus, it solves the problem
while not negatively impacting either users or git's ability to run
external commands in convenient ways. Essentially, the shell is a
special case of tool that should not rely on SANE_TOOL_PATH and must
be called explicitly.
With this patch applied, any code path leading to
run_command.c:prepare_shell_cmd can count on using the same sane shell
that all shell scripts in the git suite use. Both the build system
and run_command.c will default this shell to /bin/sh unless
overridden.
Signed-off-by: Ben Walton <bwalton@artsci.utoronto.ca> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/git-commit: rephrase the "initial-ness" of templates
The description of "commit -t <file>" said the file is used "as the
initial version" of the commit message, but in the context of an SCM,
"version" is a loaded word that can needlesslyl confuse readers.
Explain the purpose of the mechanism without using "version".
fetch-pack: new --stdin option to read refs from stdin
If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.
This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.
Here is an easy way to trigger this problem:
git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
tag=$(perl -e 'print "bla" x 30')
for i in `seq 50000`; do
echo $sha refs/tags/$tag-$i >> .git/packed-refs
done
Then share this repo over the smart HTTP protocol and try cloning it:
$ git clone http://localhost/.../too-many-refs/.git
Cloning into 'too-many-refs'...
fatal: cannot exec 'fetch-pack': Argument list too long
50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.
This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.
Four possible solutions were discussed on the Git mailing list (in no
particular order):
1) Call fetch-pack multiple times with smaller batches of refs.
This was dismissed as inefficient and inelegant.
2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.
3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
This was not favored because of the temp file requirement.
4) Add option --stdin to pass the refs on stdin, one per line.
In the end this option was chosen as the most efficient and most
desirable from scripting perspective.
There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.
If we are going to sneak refs on stdin line by line, it would have to be
done very carefully in the presence of --stateless-rpc, because when
reading refs line by line we might read ahead too much data into our
buffer and eat some of the remote protocol data which is also coming on
stdin.
One way to solve this would be to refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.
In the end we settled on the following solution:
If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.
However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.
This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.
Signed-off-by: Ivan Todoroski <grnch@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitk: fix tabbed preferences construction when using tcl 8.4
In 8.5 the incr command creates the target variable if it does not exist
but in 8.4 using incr on a non-existing variable raises an error. Ensure
we have created our counter variable when creating the tabbed dialog for
non-themed preferences.
Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Portuguese Portuguese translations from Marco Sousa via Jiang Xin
* 'master' of git://github.com/git-l10n/git-po:
l10n: Add the Dutch translation team and initialize nl.po
l10n: Inital Portuguese Portugal language (pt_PT)
l10n: Improve zh_CN translation for Git 1.7.10-rc3