cvsimport: complete the cvsps run before starting the import
We now capture the output of cvsps to a tempfile, and then read it in.
cvsps 2.1 works quite a bit "in memory", and only prints its patchset
info once it has finished talking with cvs, but apparently retaining
all that memory allocation. With this patch, cvsps is finished and
reaped before cvsimport start working (and growing). So the footprint
of the whole process is much lower.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz> Signed-off-by: Junio C Hamano <junkio@cox.net>
cvsimport: ignore CVSPS_NO_BRANCH and impossible branches
cvsps output often contains references to CVSPS_NO_BRANCH, commits
that it could not trace to a branch. Ignore that branch.
Additionally, cvsps will sometimes draw circular relationships
between branches -- where two branches are recorded as opening
from the other. In those cases, and where the ancestor branch
hasn't been seen, ignore it.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz> Signed-off-by: Junio C Hamano <junkio@cox.net>
When extra command line arguments are given to a command that
was alias-expanded, the code generated a wrong argument list,
leaving the original alias in the result, and forgetting to
terminate the new argv list.
P4import currently creates a git tag for every commit it imports.
When importing from a large repository too many tags can be created
for git to manage, so this provides an option to shut that feature
off if necessary.
Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
* git://git.bogomips.org/git-svn: (25 commits)
git-svn: rebuild convenience and bugfixes
git-svn: svn (command-line) 1.0.x compatibility
git-svn: tests no longer fail if LC_ALL is not a UTF-8 locale
git-svn: bugfix and optimize the 'log' command
git-svn: Eliminate temp file usage in libsvn_get_file()
git-svn: fix several small bugs, enable branch optimization
git-svn: avoid creating some small files
git-svn: make the $GIT_DIR/svn/*/revs directory obsolete
git-svn: add support for Perl SVN::* libraries
git-svn: add 'log' command, a facsimile of basic `svn log'
git-svn: add UTF-8 message test
git-svn: add some functionality to better support branches in svn
git-svn: add --shared and --template= options to pass to init-db
git-svn: add --repack and --repack-flags= options
git-svn: minor cleanups, extra error-checking
git-svn: Move all git-svn-related paths into $GIT_DIR/svn
git-svn: support manually placed initial trees from fetch
git-svn: optimize --branch and --branch-all-ref
git-svn: --branch-all-refs / -B support
git-svn: support -C<num> passing to git-diff-tree
...
Revisions with long commit messages were being skipped, since
the 'git-svn-id' metadata line was at the end and git-log uses a
32k buffer to print the commits.
Also the last 'git-svn-id' metadata line in a commit is always
the valid one, so make sure we use that, as well.
Made the verbose flag work by passing the correct option switch
('--summary') to git-log.
Finally, optimize -r/--revision argument handling by passing
the appropriate limits to revision
git-svn: fix several small bugs, enable branch optimization
Share the repack counter between branches when doing
multi-fetch.
Pass the -d flag to git repack by default. That's the
main reason we will want automatic pack generation, to
save space and improve disk cache performance. I won't
add -a by default since it can generate extremely large
packs that make RAM-starved systems unhappy.
We no longer generate the .git/svn/$GIT_SVN_ID/info/uuid
file, either. It was never read in the first place.
Check for and create .rev_db if we need to during fetch (in case
somebody manually blew away their .rev_db and wanted to start
over. Mainly makes debugging easier).
Croak with $? instead of $! if there's an error closing pipes
git-svn: make the $GIT_DIR/svn/*/revs directory obsolete
This is a very intrusive change, so I've beefed up the tests
significantly. Added 'full-test' a target to the Makefile,
to test different possible configurations. This is intended
for maintainers only. Users should only be concerned with
'test' succeeding.
We now have a very simple custom database format for handling
mapping of svn revisions => git commits. Of course, we're
not really using it yet, either.
Also disabled automatic branch-finding on new trees for now.
It's too easily broken. revisions_eq() function should be
helpful for branch detection.
Also removed an extra assertion in fetch_cmd() that wasn't
correctly done. This bug was found by full-test.
This means we no longer have to deal with having bloated SVN
working copies around and we get a nice performance increase as
well because we don't have to exec the SVN binary and start a
new server connection each time.
Of course we have to manually manage memory with SVN::Pool
whenever we can, and hack around cases where SVN just eats
memory despite pools (I blame Perl, too). I would like to
keep memory usage as stable as possible during long fetch/commit
processes since I still use computers with only 256-512M RAM.
commit should always be faster with the SVN library code. The
SVN::Delta interface is leaky (or I'm not using it with pools
correctly), so I'm forking on every commit, but that doesn't
seem to hurt performance too much (at least on normal Unix/Linux
systems where fork() is pretty cheap).
fetch should be faster in most common cases, but probably not all.
fetches will be faster where client/server delta generation is
the bottleneck and not bandwidth. Of course, full-files are
generated server-side via deltas, too. Full files are always
transferred when they're updated, just like git-svnimport and
unlike command-line svn. I'm also hacking around memory leaks
(see comments) here by using some more forks.
I've tested fetch with http://, https://, file://, and svn://
repositories, so we should be reasonably covered in terms of
error handling for fetching.
Of course, we'll keep plain command-line svn compatibility as a
fallback for people running SVN 1.1 (I'm looking into library
support for 1.1.x SVN, too). If you want to force command-line
SVN usage, set GIT_SVN_NO_LIB=1 in your environment.
We also require two simultaneous connections (just like
git-svnimport), but this shouldn't be a problem for most
servers.
Less important commands:
show-ignore is slower because it requires repository
access, but -r/--revision <num> can be specified.
graft-branches may use more memory, but it's a
short-term process and is funky-filename-safe.
git-svn: add 'log' command, a facsimile of basic `svn log'
This quick feature should make it easy to look up svn log
messages when svn users refer to -r/--revision numbers.
The following features from `svn log' are supported:
--revision=<n>[:<n>] - is supported, non-numeric args are not:
HEAD, NEXT, BASE, PREV, etc ...
-v/--verbose - just maps to --raw (in git log), so
it's completely incompatible with
the --verbose output in svn log
--limit=<n> - is NOT the same as --max-count,
doesn't count merged/excluded commits
--incremental - supported (trivial :P)
New features:
--show-commit - shows the git commit sha1, as well
--oneline - our version of --pretty=oneline
Any other arguments are passed directly to `git log'
git-svn: add some functionality to better support branches in svn
New commands:
graft-branches - The most interesting command of the bunch. It
detects branches in SVN via various techniques (currently
regexes and file copies). It can be later extended to handle
svk and other properties people may use to track merges in svk.
Basically, merge tracking is not standardized at all in the SVN
world, and git grafts are perfect for dealing with this
situation.
Existing branch support (via tree matches) is only handled at
fetch time.
The following tow were originally implemented as shell scripts
several months ago, but I just decided to streamline things a
bit and added them to the main script.
multi-init - supports git-svnimport-like command-line syntax for
importing repositories that are layed out as recommended by the
SVN folks. This is a bit more tolerant than the git-svnimport
command-line syntax and doesn't require the user to figure out
where the repository URL ends and where the repository path
begins.
multi-fetch - runs fetch on all known SVN branches we're
tracking. This will NOT discover new branches (unlike
git-svnimport), so multi-init will need to be re-run (it's
idempotent).
Consider these three to be auxilliary commands (like
show-ignore, and rebuild) so their behavior won't receive as
much testing or scrutiny as the core commands (fetch and
commit).
git-svn: Move all git-svn-related paths into $GIT_DIR/svn
Since GIT_SVN_ID usage is probably going to become more
widespread <evil grin>, we won't run the chance of somebody
having a GIT_SVN_ID name that conflicts with one of the default
directories that already exist in $GIT_DIR (branches/tags).
git-svn: support manually placed initial trees from fetch
Sometimes I don't feel like downloading an entire tree again when
I actually decide a branch is worth tracking, so some users can
get around it more easily with this.
By breaking the pipe read once we've seen a commit twice.
This should make -B/--branch-all-ref faster and usable on a
frequent basis.
We use topological order now for calling git-rev-list, and any
commit we've seen before should imply that all parents have been
seen (at least I hope that's the case for --topo-order).
git-svn: don't allow commit if svn tree is not current
If new revisions are fetched, that implies we haven't merged,
acked, or nacked them yet, and attempting to write the tree
we're committing means we'd silently clobber the newly fetched
changes.
If we read the maximum size of our buffer into $buf, and the
last character is '\015', there's a chance that the character is
'\012', which means our regex won't work correctly. At the
worst case, this could introduce an extra newline into the code.
We'll now read an extra character if we see '\015' is the last
character in $buf.
We also forgot to recalculate the length of $buf after doing the
newline substitution, causing some files to appeare truncated.
We'll do that now and force byte semantics in length() for good
measure.
shared repository - add a few missing calls to adjust_shared_perm().
There were a few calls to adjust_shared_perm() that were
missing:
- init-db creates refs, refs/heads, and refs/tags before
reading from templates that could specify sharedrepository in
the config file;
- updating config file created it under user's umask without
adjusting;
- updating refs created it under user's umask without
adjusting;
- switching branches created .git/HEAD under user's umask
without adjusting.
This moves adjust_shared_perm() from sha1_file.c to path.c,
since a few SIMPLE_PROGRAM need to call repository configuration
functions which in turn need to call adjust_shared_perm().
sha1_file.c needs to link with SHA1 computation library which
is usually not linked to SIMPLE_PROGRAM.
Even when invoked with -n flag, git-rm removed the matching
paths anyway. Also includes the missing check spotted by
SungHyun Nam, which caused it to segfault. Now we refuse to run
without any paths.
* git://git.kernel.org/pub/scm/gitk/gitk:
gitk: Re-read the descendent/ancestor tag & head info on update
gitk: Show branch name(s) as well, if "show nearby tags" is enabled
gitk: Show nearby tags
gitk: Add a goto next/previous highlighted commit function
gitk: Provide ability to highlight based on relationship to selected commit
gitk: Fix bug in highlight stuff when no line is selected
gitk: Move "pickaxe" find function to highlight facility
gitk: Improve the text window search function
gitk: First cut at a search function in the patch/file display window
gitk: Highlight paths of interest in tree view as well
gitk: Highlight entries in the file list as well
gitk: Make a row of controls for controlling highlighting
* vb/sendemail:
send-email: a bit more careful domain regexp.
send-email: be more lenient and just catch obvious mistakes.
Cleanup git-send-email.perl:extract_valid_email
A few style fixes to get the code in line with the rest.
- asterisk to make a type a pointer to something goes in front
of the variable, not at the end of the base type.
E.g. a pointer to an integer is "int *ip", not "int* ip".
- open parenthesis for function parameter list, unlike
syntactic constructs, comes immediately after the function
name. E.g. "if (foo) bar();" not "if(foo) bar ();".
- "else" does not come on the same line as the closing brace of
corresponding "if".
The style is mostly a matter of personal taste, and people may
disagree, but consistency is important.
This updates the ref locking code to use creat-rename locking
code we use for the index file, so that it can borrow the code
to clean things up upon signals and program termination.
The framework to create lockfiles that are removed at exit is
first used to reliably write the index file, but it is
applicable to other things, so stop calling it "cache_file".
This also rewords a few remaining error message that called the
index file "cache file".
This ifdef's out more functions that are not used while !USE_MULTI
in http code. Also the dependency of http related objects on http.h
header file was missing in the Makefile.
git-format-patch: add --output-directory long option again
Additionally notices and complains to an -o option without
directory or a duplicated -o option, -o and --stdout given
together. Also delays the creation of directory until all
arguments are parsed, so that the command does not leave an
empty directory behind when it exits after seeing an unrelated
invalid option.
[jc: originally from Dennis Stosberg but with minor fixes, and
documentation updates from Dennis.]
send-email: be more lenient and just catch obvious mistakes.
This cleans up the pattern matching subroutine by introducing
two variables to hold regexp to approximately match local-part
and domain in the e-mail address. It is meant to catch obvious
mistakes with a cheap check.
The patch also moves "scalar" to force Email::Valid->address()
to work in !wantarray environment to extract_valid_address;
earlier it was in the caller of the subroutine, which was way
too error prone.
This disables alias "foo" from being used for git-foo, and when
we do use alias we check the built-in and then existing command
names first and then alias as the fallback. This avoids the
problem of common commands used in scripts getting clobbered by
user specific aliases.
This trivial patch not only simplifies the name hashing, it actually
improves packing for both git and the kernel.
The git archive pack shrinks from 6824090->6622627 bytes (a 3%
improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
mere 0.5% improvement, but still, it's an improvement from making the
hashing much simpler!)
We just create a 32-bit hash, where we "age" previous characters by two
bits, so the last characters in a filename count most. So when we then
compare the hashes in the sort routine, filenames that end the same way
sort the same way.
It takes the subdirectory into account (unless the filename is > 16
characters), but files with the same name within the same subdirectory
will obviously sort closer than files in different subdirectories.
And, incidentally (which is why I tried the hash change in the first
place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
And no, it's not a "good hash" in the sense of being secure or unique, but
that's not what we're looking for. The whole "hash" thing is misnamed
here. It's not so much a hash as a "sorting number".
[jc: rolled in simplification for computing the sorting number
computation for thin pack base objects]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
Anton Blanchard spotted that watching checkout stage of a clone
on a slow terminal takes ages because it forgot to clear the
"once a second happened" flag, so instead of updates the
percentage output for every file it checks out after the first
second has passed.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/tree-2:
fetch.c: do not call process_tree() from process_tree().
tree_entry(): new tree-walking helper function
adjust to the rebased series by Linus.
Remove "tree->entries" tree-entry list from tree parser
Switch "read_tree_recursive()" over to tree-walk functionality
Make "tree_entry" have a SHA1 instead of a union of object pointers
Add raw tree buffer info to "struct tree"
Remove last vestiges of generic tree_entry_list
Convert fetch.c: process_tree() to raw tree walker
Convert "mark_tree_uninteresting()" to raw tree walker
Remove unused "zeropad" entry from tree_list_entry
fsck-objects: avoid unnecessary tree_entry_list usage
Remove "tree->entries" tree-entry list from tree parser
builtin-read-tree.c: avoid tree_entry_list in prime_cache_tree_rec()
Switch "read_tree_recursive()" over to tree-walk functionality
Make "tree_entry" have a SHA1 instead of a union of object pointers
Make "struct tree" contain the pointer to the tree buffer
* sp/reflog:
fetch.c: do not pass uninitialized lock to unlock_ref().
Test that git-branch -l works.
Verify git-commit provides a reflog message.
Enable ref log creation in git checkout -b.
Create/delete branch ref logs.
Include ref log detail in commit, reset, etc.
Change order of -m option to update-ref.
Correct force_write bug in refs.c
Change 'master@noon' syntax to 'master@{noon}'.
Log ref updates made by fetch.
Force writing ref if it doesn't exist.
Added logs/ directory to repository layout.
General ref log reading improvements.
Fix ref log parsing so it works properly.
Support 'master@2 hours ago' syntax
Log ref updates to logs/refs/<ref>
Convert update-ref to use ref_lock API.
Improve abstraction of ref lock/write.
read-tree --reset: update working tree file for conflicted paths.
The earlier "git reset --hard" simplification stopped removing
leftover working tree files from a failed automerge, when
switching back to the HEAD version that does not have the
paths.
This patch, instead of removing the unmerged paths from the
index, drops them down to stage#0 but marks them with mode=0
(the same "to be deleted" marker we internally use for paths
deleted by the merge). one_way_merge() function and the
functions it calls already know what to do with them -- if the
tree we are reading has the path the working tree file is
overwritten, and if it doesn't the working tree file is
removed.
gitk: Show branch name(s) as well, if "show nearby tags" is enabled
This is a small extension to the code that reads the complete commit
graph, to make it compute descendent heads as well as descendent tags.
We don't exclude descendent heads that are descendents of other
descendent heads as we do for tags, since it is useful to know all the
branches that a commit is on.
This adds a feature to the diff display window where it will show
the tags that this commit follows (is a descendent of) and precedes
(is an ancestor of). Specifically, it will show the tags for all
tagged descendents that are not a descendent of another tagged
descendent of this commit, and the tags for all tagged ancestors
that are not ancestors of another tagged ancestor of this commit.
To do this, gitk reads the complete commit graph using git rev-list
and performs a couple of traversals of the tree. This is done in
the background, but since it can be time-consuming, there is an option
to turn it off in the `edit preferences' window.
fetch.c: do not call process_tree() from process_tree().
This function reads a freshly fetched tree object, and schedules
the objects pointed by it for further fetching, so doing
lookup_tree() and process_tree() recursively from there does not
make much sense. We need to use process() on it to make sure we
fetch it first, and leave the recursive processing to later
stages.
When adding packs, skip the pack if we already have it in the packed_git
list. This might happen if we are re-preparing our packs because of a
missing object.
This patch causes read_sha1_file and sha1_object_info to re-examine the
list of packs if an object cannot be found. It works by re-running
prepare_packed_git() after an object fails to be found.
It does not attempt to clean up the old pack list. Old packs which are in
use can continue to be used (until unused by lru selection). New packs
are placed at the front of the list and will thus be examined before old
packs.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>