parse-options: Allow abbreviated options when unambiguous
When there is an option "--amend", the option parser now recognizes
"--am" for that option, provided that there is no other option beginning
with "--am".
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
parse-options: make some arguments optional, add callbacks.
* add the possibility to use callbacks to parse some options, this can
help implementing new options kinds with great flexibility. struct option
gains a callback pointer and a `defval' where callbacks user can put
either integers or pointers. callbacks also can use the `value' pointer
for anything, preferably to the pointer to the final storage for the value
though.
* add a `flag' member to struct option to make explicit that this option may
have an optional argument. The semantics depends on the option type. For
INTEGERS, it means that if the switch is not used in its
--long-form=<value> form, and that there is no token after it or that the
token does not starts with a digit, then it's assumed that the switch has
no argument. For STRING or CALLBACK it works the same, except that the
condition is that the next atom starts with a dash. This is needed to
implement backward compatible behaviour with existing ways to parse the
command line. Its use for new options is discouraged.
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The option parser takes argc, argv, an array of struct option
and a usage string. Each of the struct option elements in the array
describes a valid option, its type and a pointer to the location where the
value is written. The entry point is parse_options(), which scans through
the given argv, and matches each option there against the list of valid
options. During the scan, argv is rewritten to only contain the
non-option command line arguments and the number of these is returned.
Aggregation of single switches is allowed:
-rC0 is the same as -r -C 0 (supposing that -C wants an arg).
Every long option automatically support the option with the same name,
prefixed with 'no-' to unset the switch. It assumes that initial value for
strings are "NULL" and for integers is "0".
Long options are supported either with '=' or without:
--some-option=foo is the same as --some-option foo
Acked-by: Kristian Høgsberg <krh@redhat.com> Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
include $PATH in generating list of commands for "help -a"
Git had previously been using the $PATH for scripts--a previous
patch moved exec'ed commands to also use the $PATH. For consistency
"help -a" should also list commands in the $PATH.
The main commands are still listed from the git_exec_path(), but
the $PATH is walked and other git commands (probably extensions) are
listed.
Signed-off-by: Scott R Parish <srp@srparish.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to correctly set up $PATH for non-c based git commands.
Since we already do this, we can just use that $PATH and execvp,
instead of looping over the paths with execve.
This patch adds a setup_path() function to exec_cmd.c, which sets
the $PATH order correctly for our search order. execv_git_cmd() is
stripped down to setting up argv and calling execvp(). git.c's
main() only only needs to call setup_path().
Signed-off-by: Scott R Parish <srp@srparish.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remove unused/unneeded "pattern" argument of list_commands
list_commands() currently accepts and ignores a "pattern" argument,
and then hard codes a prefix as well as some magic numbers. This
hardcodes the prefix inside of the function and removes the magic
numbers.
Signed-off-by: Scott R Parish <srp@srparish.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct handling of upload-pack in builtin-fetch-pack
The field in the args was being ignored in favor of a static constant
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Thanked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Try to avoid a lot of work scanning for excluded files,
by caching some more information when setting up the exclusion
data structure.
Speeds up 'git runstatus' on a repository containing the Qt sources by 30% and
reduces the amount of instructions executed (as measured by valgrind) by a
factor of 2.
* maint:
RelNotes-1.5.3.5: describe recent fixes
merge-recursive.c: mrtree in merge() is not used before set
sha1_file.c: avoid gcc signed overflow warnings
Fix a small memory leak in builtin-add
honor the http.sslVerify option in shell scripts
merge-recursive.c: mrtree in merge() is not used before set
The called function merge_trees() sets its *result, to which the
address of the variable mrtree in merge() function is passed,
only when index_only is set. But that is Ok as the function
uses the value in the variable only under index_only iteration.
However, recent gcc does not realize this. Work it around by
adding a fake initializer.
sha1_file.c: In check_packed_git_:
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
for a piece of code that tries to make sure that off_t is large
enough to hold more than 2^32 offset. The test tried to make
sure these do not wrap-around:
/* make sure we can deal with large pack offsets */
off_t x = 0x7fffffffUL, y = 0xffffffffUL;
if (x > (x + 1) || y > (y + 1)) {
but gcc assumes it can do whatever optimization it wants for a
signed overflow (undefined behaviour) and warns about this
construct.
Follow Linus's suggestion to check sizeof(off_t) instead to work
around the problem.
No longer talk about Cogito since it's deprecated. Some scripts (such as
git-reset or git-branch) have undergone builtinification so adjust the text
to reflect this.
Fix a typo in the description of git-show-branch (merges are indicated by a
`-', not by a `.').
git-pull/git-push do not seem to use the dumb git-ssh-fetch/git-ssh-upload
(the text was probably missing a word).
Adjust a link that wasn't rendered properly because it was wrapped.
Signed-off-by: Benoit Sigoure <tsuna@lrde.epita.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-fetch: do not fail when remote branch disappears
When the branch named with branch.$name.merge is not covered by
the fetch configuration for the remote repository named with
branch.$name.remote, we automatically add that branch to the set
of branches to be fetched. However, if the remote repository
does not have that branch (e.g. it used to exist, but got
removed), this is not a reason to fail the git-fetch itself.
The situation however will be noticed if git-fetch was called by
git-pull, as the resulting FETCH_HEAD would not have any entry
that is marked for merging.
Acked-By: Daniel Barkalow <barkalow@iabervon.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of git://git.kernel.org/pub/scm/gitk/gitk: (34 commits)
gitk: Use the UI font for the diff/old version/new version radio buttons
gitk: Simplify the code for finding commits
gitk: Fix a couple more bugs in the path limiting
gitk: Fix some bugs with path limiting in the diff display
gitk: Use the status window for other functions
gitk: Integrate the reset progress bar in the main frame
gitk: Ensure tabstop setting gets restored by Cancel button
gitk: Limit diff display to listed paths by default
gitk: Fix Tcl error: can't unset findcurline
gitk: Get rid of the diffopts variable
gitk: Fix bug where the last few commits would sometimes not be visible
gitk: Add a font chooser
gitk: Keep track of font attributes ourselves instead of using font actual
gitk: Use named fonts instead of the font specification
gitk: Fix bug causing Tcl error when changing find match type
gitk: Fix the tab setting in the diff display window
gitk: Add progress bars for reading in stuff and for finding
gitk: Fix a couple of bugs
gitk: Simplify highlighting interface and combine with Find function
gitk: Fix bug in generating patches
...
gitk: Use the UI font for the diff/old version/new version radio buttons
This makes the radio buttons for selecting whether to see the full diff,
the old version or the new version use the same font as the other user
interface elements.
This unifies findmore and findmorerev, and adds the ability to do
a search with or without wrap around from the end of the list of
commits to the beginning (or vice versa for reverse searches).
findnext and findprev are gone, and the buttons and keys for searching
all call dofind now. dofind doesn't unmark the matches to start with.
Shift-up and shift-down are back by popular request, and the searches
they do don't wrap around. The other keys that do searches (/, ?,
return, M-f) do wrapping searches except for M-g.
Usually you cannot revert a merge because you do not know which
side of the merge should be considered the mainline (iow, what
change to reverse).
With this patch, cherry-pick and revert learn -m (--mainline)
option that lets you specify the parent number (starting from 1)
of the mainline, so that you can:
git revert -m 1 $merge
to reverse the changes introduced by the $merge commit relative
to its first parent, and:
git cherry-pick -m 2 $merge
to replay the changes introduced by the $merge commit relative
to its second parent.
Bisect run: "skip" current commit if script exit code is 125.
This is incompatible with previous versions because an exit code
of 125 used to mark current commit as "bad". But hopefully this exit
code is not much used by test scripts or other programs. (126 and 127
are used by POSIX compliant shells to mean "found but not
executable" and "command not found", respectively.)
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also fix "bisect bad" and "bisect good" short usage description.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bisect: refactor "bisect_{bad,good,skip}" into "bisect_state".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bisect: refactor some logging into "bisect_write".
Also use "die" instead of "echo >&2 something ; exit 1".
And simplify "bisect_replay".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bisect: implement "bisect skip" to mark untestable revisions.
When there are some "skip"ped revisions, we add the '--bisect-all'
option to "git rev-list --bisect-vars". Then we filter out the
"skip"ped revisions from the result of the rev-list command, and we
modify the "bisect_rev" var accordingly.
We don't always use "--bisect-all" because it is slower
than "--bisect-vars" or "--bisect".
When we cannot find for sure the first bad commit because of
"skip"ped commits, we print the hash of each possible first bad
commit and then we exit with code 2.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Do the fuzzy rename detection limits with the exact renames removed
When we do the fuzzy rename detection, we don't care about the
destinations that we already handled with the exact rename detector.
And, in fact, the code already knew that - but the rename limiter, which
used to run *before* exact renames were detected, did not.
This fixes it so that the rename detection limiter now bases its
decisions on the *remaining* rename counts, rather than the original
ones.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix ugly magic special case in exact rename detection
For historical reasons, the exact rename detection had populated the
filespecs for the entries it compared, and the rest of the similarity
analysis depended on that. I hadn't even bothered to debug why that was
the case when I re-did the rename detection, I just made the new one
have the same broken behaviour, with a note about this special case.
This fixes that fixme. The reason the exact rename detector needed to
fill in the file sizes of the files it checked was that the _inexact_
rename detector was broken, and started comparing file sizes before it
filled them in.
Fixing that allows the exact phase to do the sane thing of never even
caring (since all *it* cares about is really just the SHA1 itself, not
the size nor the contents).
It turns out that this also indirectly fixes a bug: trying to populate
all the filespecs will run out of virtual memory if there is tons and
tons of possible rename options. The fuzzy similarity analysis does the
right thing in this regard, and free's the blob info after it has
generated the hash tables, so the special case code caused more trouble
than just some extra illogical code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do exact rename detection regardless of rename limits
Now that the exact rename detection is linear-time (with a very small
constant factor to boot), there is no longer any reason to limit it by
the number of files involved.
In some trivial testing, I created a repository with a directory that
had a hundred thousand files in it (all with different contents), and
then moved that directory to show the effects of renaming 100,000 files.
With the new code, that resulted in
[torvalds@woody big-rename]$ time ~/git/git show -C | wc -l
400006
real 0m2.071s
user 0m1.520s
sys 0m0.576s
ie the code can correctly detect the hundred thousand renames in about 2
seconds (the number "400006" comes from four lines for each rename:
diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1
similarity index 100%
rename from really-big-dir/file-1-1-1-1-1
rename to moved-big-dir/file-1-1-1-1-1
and the extra six lines is from a one-liner commit message and all the
commit information and spacing).
Most of those two seconds weren't even really the rename detection, it's
really all the other stuff needed to get there.
With the old code, this wouldn't have been practically possible. Doing
a pairwise check of the ten billion possible pairs would have been
prohibitively expensive. In fact, even with the rename limiter in
place, the old code would waste a lot of time just on the diff_filespec
checks, and despite not even trying to find renames, it used to look
like:
[torvalds@woody big-rename]$ time git show -C | wc -l 1400006
real 0m12.337s
user 0m12.285s
sys 0m0.192s
ie we used to take 12 seconds for this load and not even do any rename
detection! (The number 1400006 comes from fourteen lines per file moved:
seven lines each for the delete and the create of a one-liner file, and
the same extra six lines of commit information).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do linear-time/space rename logic for exact renames
This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).
Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.
In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too. In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
copy vs rename detection: avoid unnecessary O(n*m) loops
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).
That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).
Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.
This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection. In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out "exact content match" phase of rename detection
This makes the exact content match a separate function of its own.
Partly to cut down a bit on the size of the diffcore_rename() function
(which is too complex as it is), and partly because there are smarter
ways to do this than an O(m*n) loop over it all, and that function
should be rewritten to take that into account.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code generating perl/Makefile from Makefile.PL was causing trouble
because it didn't considered NO_PERL_MAKEMAKER and ran makemaker
unconditionally, rewriting perl.mak. Makemaker is FUBAR in ActiveState Perl,
and perl/Makefile has a replacement for it.
Besides, a changed Git.pm is *NOT* a reason to rebuild all the perl scripts,
so remove the dependency too.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* db/fetch-pack: (60 commits)
Define compat version of mkdtemp for systems lacking it
Avoid scary errors about tagged trees/blobs during git-fetch
fetch: if not fetching from default remote, ignore default merge
Support 'push --dry-run' for http transport
Support 'push --dry-run' for rsync transport
Fix 'push --all branch...' error handling
Fix compilation when NO_CURL is defined
Added a test for fetching remote tags when there is not tags.
Fix a crash in ls-remote when refspec expands into nothing
Remove duplicate ref matches in fetch
Restore default verbosity for http fetches.
fetch/push: readd rsync support
Introduce remove_dir_recursively()
bundle transport: fix an alloc_ref() call
Allow abbreviations in the first refspec to be merged
Prevent send-pack from segfaulting when a branch doesn't match
Cleanup unnecessary break in remote.c
Cleanup style nit of 'x == NULL' in remote.c
Fix memory leaks when disconnecting transport instances
Ensure builtin-fetch honors {fetch,transfer}.unpackLimit
...
git-remote: fix "Use of uninitialized value in string ne"
martin f krafft <madduck@madduck.net> writes:
> piper:~> git remote show origin
> * remote origin
> URL: ssh://git.madduck.net/~/git/etc/mailplate.git
> Use of uninitialized value in string ne at /usr/local/stow/git/bin/git-remote line 248.
This is because there might not be branch.<name>.remote defined but
the code unconditionally dereferences $branch->{$name}{'REMOTE'} and
compares with another string.
Tested-by: Martin F Krafft <madduck@madduck.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
First, paths ending in a slash were not matching anything. This fixes
path_filter to handle paths ending in a slash (such entries have to
match a directory, and can't match a file, e.g., foo/bar/ can't match
a plain file called foo/bar).
Secondly, clicking in the file list pane (bottom right) was broken
because $treediffs($ids) contained all the files modified by the
commit, not just those within the file list. This fixes that too.
gitk: Fix some bugs with path limiting in the diff display
First, we weren't putting "--" between the ids and the paths in the
git diff-tree/diff-index/diff-files command, so if there was a tag
and a file with the same name, we could get an ambiguity in the
command. This puts the "--" in to make it clear that the paths are
paths.
Secondly, this implements the path limiting for merge diffs as well
as the normal 2-way diffs.
gitk: Integrate the reset progress bar in the main frame
This makes the reset function use a progress bar in the same location
as the progress bars for reading in commits and for finding commits,
instead of a progress bar in a separate detached window. The progress
bar for resetting is red.
This also puts "Resetting" in the status window while the reset is in
progress. The setting of the status window is done through an
extension of the interface used for setting the watch cursor.
gitk: Ensure tabstop setting gets restored by Cancel button
We weren't restoring the tabstop setting if the user pressed the
Cancel button in the Edit/Preferences window. Also improved the
label for the checkbox (made it "Tab spacing" rather than the laconic
"tabstop") and moved it above the "Display nearby tags" checkbox.
gitk: Limit diff display to listed paths by default
When the user has specified a list of paths, either on the command line
or when creating a view, gitk currently displays the diffs for all files
that a commit has modified, not just the ones that match the path list.
This is different from other git commands such as git log. This change
makes gitk behave the same as these other git commands by default, that
is, gitk only displays the diffs for files that match the path list.
There is now a checkbox labelled "Limit diffs to listed paths" in the
Edit/Preferences pane. If that is unchecked, gitk will display the
diffs for all files as before.
When gitk is run with the --merge flag, it will get the list of unmerged
files at startup, intersect that with the paths listed on the command line
(if any), and use that as the list of paths.
Use PRIuMAX instead of 'unsigned long long' in show-index
Elsewhere in Git we already use PRIuMAX and cast to uintmax_t when
we need to display a value that is 'very big' and we're not exactly
sure what the largest display size is for this platform.
This particular fix is needed so we can do the incredibly crazy
temporary hack of:
allowing us to more easily look for locations where we are passing
a pointer to an 8 byte value to a function that expects a 4 byte
value. This can occur on some platforms where sizeof(long) == 8
and sizeof(size_t) == 4.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
Describe more 1.5.3.5 fixes in release notes
Fix diffcore-break total breakage
Fix directory scanner to correctly ignore files without d_type
Improve receive-pack error message about funny ref creation
fast-import: Fix argument order to die in file_change_m
git-gui: Don't display CR within console windows
git-gui: Handle progress bars from newer gits
git-gui: Correctly report failures from git-write-tree
gitk.txt: Fix markup.
send-pack: respect '+' on wildcard refspecs
git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
git-gui: Don't crash when starting gitk from a browser session
git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
git-gui: Handle starting on mapped shares under Cygwin
git-gui: Display message box when we cannot find git in $PATH
git-gui: Avoid using bold text in entire gui for some fonts
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.
In particular, there was a pattern of doing single commits with renames
that looked basically like
- rename "filename.h" -> "filename_64.h"
- create new "filename.c" that includes "filename_32.h" or
"filename_64.h" depending on whether we're 32-bit or 64-bit.
which was preparatory for smushing the two trees together.
Now, there's two issues here:
- "filename.c" *remained*. Yes, it was a rename, but there was a new file
created with the old name in the same commit. This was important,
because we wanted each commit to compile properly, so that it was
bisectable, so splitting the rename into one commit and the "create
helper file" into another was *not* an option.
So we need to break associations where the contents change too much.
Fine. We have the -B flag for that. When we break things up, then the
rename detection will be able to figure out whether there are better
alternatives.
- "git log --follow" didn't with with -B.
Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:
however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!
In particular:
- it used to do
if (base_size < MINIMUM_BREAK_SIZE)
return 0; /* we do not break too small filepair */
which basically says "don't bother to break small files". But that
"base_size" is the *smaller* of the two sizes, which means that if some
large file was rewritten into one that just includes another file, we
would look at the (small) result, and decide that it's smaller than the
break size, so it cannot be worth it to break it up! Even if the other
side was ten times bigger and looked *nothing* like the samell file!
That's clearly bogus. I replaced "base_size" with "max_size", so that
we compare the *bigger* of the filepair with the break size.
- It calculated a "merge_score", which was the score needed to merge it
back together if nothing else wanted it. But even if it was *so*
different that we would never want to merge it back, we wouldn't
consider it a break! That makes no sense. So I added
if (*merge_score_p > break_score)
return 1;
to make it clear that if we wouldn't want to merge it at the end, it
was *definitely* a break.
- It compared the whole "extent of damage", counting all inserts and
deletes, but it based this score on the "base_size", and generated the
damage score with
but that makes no sense either, since quite often, this will result in
a number that is *bigger* than MAX_SCORE! Why? Because base_size is
(again) the smaller of the two files we compare, and when you start out
from a small file and add a lot (or start out from a large file and
remove a lot), the base_size is going to be much smaller than the
damage!
Again, the fix was to replace "base_size" with "max_size", at which
point the damage actually becomes a sane percentage of the whole.
With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now
actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history
git log --stat -B -C
on my kernel tree with the old code and the new code.
There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:
Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.
So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix directory scanner to correctly ignore files without d_type
On Fri, 19 Oct 2007, Todd T. Fries wrote:
> If DT_UNKNOWN exists, then we have to do a stat() of some form to
> find out the right type.
That happened in the case of a pathname that was ignored, and we did
not ask for "dir->show_ignored". That test used to be *together*
with the "DTYPE(de) != DT_DIR", but splitting the two tests up
means that we can do that (common) test before we even bother to
calculate the real dtype.
Of course, that optimization only matters for systems that don't
have, or don't fill in DTYPE properly.
I also clarified the real relationship between "exclude" and
"dir->show_ignored". It used to do
if (exclude != dir->show_ignored) {
..
which wasn't exactly obvious, because it triggers for two different
cases:
- the path is marked excluded, but we are not interested in ignored
files: ignore it
- the path is *not* excluded, but we *are* interested in ignored
files: ignore it unless it's a directory, in which case we might
have ignored files inside the directory and need to recurse
into it).
so this splits them into those two cases, since the first case
doesn't even care about the type.
I also made a the DT_UNKNOWN case a separate helper function,
and added some commentary to the cases.
Linus
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Avoid a dup2(2) in apply_filter() - start_command() can do it for us.
When apply_filter() runs the external (clean or smudge) filter program, it
needs to pass the writable end of a pipe as its stdout. For this purpose,
it used to dup2(2) the file descriptor explicitly to stdout. Now we use
the facilities of start_command() to do it for us.
Furthermore, the path argument of a subordinate function, filter_buffer(),
was not used, so here we replace it to pass the fd instead.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
t0021-conversion.sh: Test that the clean filter really cleans content.
This test uses a rot13 filter, which is its own inverse. It tested only
that the content was the same as the original after both the 'clean' and
the 'smudge' filter were applied. This way it would not detect whether
any filter was run at all. Hence, here we add another test that checks
that the repository contained content that was processed by the 'clean'
filter.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
upload-pack: Run rev-list in an asynchronous function.
This gets rid of an explicit fork().
Since upload-pack has to coordinate two processes (rev-list and
pack-objects), we cannot use the normal finish_async(), but have to monitor
the process explicitly. Hence, there are no changes at this front.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
upload-pack: Move the revision walker into a separate function.
This allows us later to use start_async() with this function, and at
the same time is a nice cleanup that makes a long function
(create_pack_file()) shorter.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use the asyncronous function infrastructure in builtin-fetch-pack.c.
We run the sideband demultiplexer in an asynchronous function.
Note that earlier there was a check in the child process that closed
xd[1] only if it was different from xd[0]; this test is no longer needed
because git_connect() always returns two different file descriptors
(see ec587fde0a76780931c7ac32474c8c000aa45134).
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add infrastructure to run a function asynchronously.
This adds start_async() and finish_async(), which runs a function
asynchronously. Communication with the caller happens only via pipes.
For this reason, this implementation forks off a child process that runs
the function.
[sp: Style nit fixed by removing unnecessary block on if condition
inside of start_async()]
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>