Tree reloading allows fast-import to swap out the least-recently used
branch by simply deallocating the data structures from memory that
were associated with that branch. Later if the branch becomes active
again it can lazily recreate those structures on demand by reloading
the necessary trees from the pack file it originally wrote them to.
The reloading process is implemented by mmap'ing the pack into
memory and using a much tighter variant of the pack reading code
contained in sha1_file.c. This was a blatent copy from sha1_file.c
but the unpacking functions were significantly simplified and are
actually now in a form that should make it easier to map only the
necessary regions of a pack rather than the entire file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Tags received from the frontend are generated in memory in a simple
linked list in the order that the tag commands were sent by the
frontend. If multiple different tag objects for the same tag name
get generated the last one sent by the frontend will be the one
that gets written out at termination. Multiple tag objects for
the same name will cause all older tags of the same name to be lost.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the branch load count exceeds the number of branches created then
the frontend is causing fast-import to page branches into and out of
memory due to the way its ordering its commits. Performance can
likely be increased if the frontend were to alter its commit
sequence such that it stays on one branch before switching to another
branch, then never returns to the prior branch.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Marks are now saved when the mark directive gets used by the frontend
and may be used in place of a SHA1 expression to locate a previous
SHA1 which fast-import may have generated. This is particularly
useful with commits where the frontend does not (easily) have the
ability to compute the SHA1 for an arbitrary commit but needs it
to generate a branch or tag from that commit.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Converted fast-import to accept standard command line parameters.
The following command line options are now accepted before the
pack name:
--objects=n # replaces the object count after the pack name
--depth=n # delta chain depth to use (default is 10)
--active-branches=n # maximum number of branches to keep in memory
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fixed segfault in fast-import after growing a tree.
Growing a tree caused all subtrees to be deallocated and put back
into the free list yet those subtree's contents were still actively
in use. Consequently they were doled out again and got stomped
on elsewhere. Releasing a tree is now performed in two parts,
either releasing only the content array or releasing the content
array and recursively releasing the subtree(s).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a frontend is smart enough to import a symlink then we should
let them do so. We'll assume that they were smart enough to first
generate a blob to hold the link target, as that's how symlinks
get represented in GIT.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Frontend clients can now send a text stream to fast-import rather
than a binary stream. This should facilitate developing frontend
software as the data stream is easier to view, manipulate and debug
my hand and Mark-I eyeball.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When accepting revision SHA1 IDs from the frontend verify the SHA1
actually refers to a blob and is known to exist. Its an error
to use a SHA1 in a tree if the blob doesn't exist as this would
cause git-fsck-objects to report a missing blob should the pack get
closed without the blob being appended into it or a subsequent pack.
So right now we'll just ask that the frontend "pre-declare" any
blobs it wants to use in a tree before it can use them.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The tree of the current commit can be altered by file_change commands
before the commit gets written to the pack. The file changes are
rather primitive as they simply allow removal of a tree entry or
setting/adding a tree entry.
Currently trees and commits aren't being deltafied when written to
the pack and branch reloading from the current pack doesn't work,
so at most 5 branches can be worked with at any one time.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Implemented branch handling and basic tree support in fast-import.
This provides the basic data structures needed to store trees in
memory while we are processing them for a branch. What we are
attempting to do is track one complete tree for each branch that
the frontend has registered with us through the 'newb' (new_branch)
command. When the frontend edits that tree through 'updf' or 'delf'
commands we'll mark the affected tree(s) as being dirty and recompute
their objects during 'comt' (commit).
Currently the protocol is decidedly _not_ user friendly. I crashed
fast-import by giving it bad input data from Perl. I may try to
improve upon it, or at least upon its error handling.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactored fast-import's internals for future additions.
Too many globals variables were being used not not enough
code was resuable to process trees and commits so this is
a simple refactoring of the existing blob processing code
to get into a state that will be easier to handle trees
and commits in.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Cleaned up memory allocation for object_entry structs.
Although its easy to ask the user to tell us how many objects they
will need, its probably better to dynamically grow the object table
in large units. But if the user can give us a hint as to roughly
how many objects then we can still use it during startup.
Also stopped printing the SHA1 strings to stdout as no user is
currently making use of that facility.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
(1 << i) < hspace is compared in the `int` space rather that in the
unsigned one. the result will be wrong if hspace is between 0x40000000
and 0x80000000.
Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
git-send-email: Don't set author_not_sender from Cc: lines
When an mbox-style patch contains a Cc: line in the header,
git-send-email will check the address against the sender specified
on the command line. If they don't match, sender_not_author will
be set to the address obtained from the Cc line.
When this happens, git-send-email inserts a From: line at the
beginning of the message body with the address obtained from the
Cc line in the header, and the sender might be accused of forging
patch authors.
This patch fixes this by only updating sender_not_author when
processing From: lines, not when processing Cc: lines.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Verify we know how to read a pack before trying to using it.
If the pack format were to ever change or be extended in the future
there is no assurance that just because the pack file lives in
objects/pack and doesn't end in .idx that we can read and decompress
its contents properly.
If we encounter what we think is a pack file and it isn't or we don't
recognize its version then die and suggest to the user that they
upgrade to a newer version of GIT which can handle that pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
The little helper write_or_die() won't come back with bad news about
full disks or broken pipes. It either succeeds or terminates the
program, making additional error handling unnecessary.
This patch adds the new function and uses it to replace two similar
ones (the one in tar-tree originally has been copied from cat-file
btw.). I chose to add the fd parameter which both lacked to make
write_or_die() just as flexible as write() and thus suitable for
lib-ification.
There is a regression: error messages emitted by this function don't
show the program name, while the replaced two functions did. That's
acceptable, I think; a lot of other functions do the same.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
In the name of Standardization, this cleanses the last usage string of
mystical creatures. But they still dwell deep within the source and in
some debug messages, it is said.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch avoids problems if vc-git.el is installed and activated, but
the git executable is not available, for example
http://list-archive.xemacs.org/xemacs-beta/200608/msg00062.html
Signed-off-by: Ville Skyttä <scop@xemacs.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/racy:
Remove the "delay writing to avoid runtime penalty of racy-git avoidance"
Add check program "git-check-racy"
Documentation/technical/racy-git.txt
avoid nanosleep(2)
It is now possible for project to have individual clone/fetch URLs.
They are provided in new file 'cloneurl' added below project's
$GIT_DIR directory.
If there is no cloneurl file, concatenation of git base URLs with
project name is used.
This is merge of Jakub Narebski and David Rientjes
gitweb: Show project's git URL on summary page
with Aneesh Kumar
gitweb: Add support for cloneurl.
gitweb: Support multiple clone urls
patches.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Use the href() function instead of string concatenation to generate
most URLs to our own CGI.
This is a work in progress, not everything has been converted yet.
Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: provide function to format the URL for an action link.
Provide a new function which can be used to generate an URL for the CGI.
This makes it possible to consolidate the URL generation in order to make
it easier to change the encoding of actions into URLs.
Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
finish_connect(): thinkofix
git-mv: succeed even if source is a prefix of destination
Solaris does not support C99 format strings before version 10
All but one callers have ignore the return value from this
function, but the only caller, builtin-tar-tree.c::remote_tar(),
assumed it returns non-zero on failure and zero on success. The
implementation however was returning either the waited pid
(which must be the same as its input) or -1 (an error).
Fix this thinko, while getting rid of an assignment of return
value from waitpid() into a variable of type int.
On Solaris nanosleep(2) is not available in libc; you need to
link with -lrt to get it.
The purpose of the loop is to wait until the next filesystem
timestamp granularity, and the code uses subsecond sleep in the
hope that it can shorten the delay to 0.5 seconds on average
instead of a full second. It is probably not worth depending on
an extra library for this.
We might want to yank out the whole "racy-git avoidance is
costly later at runtime, so let's delay writing the index out"
codepath later, but that is a separate issue and needs some
testing on large trees to figure it out. After playing with the
kernel tree, I have a feeling that the whole thing may not be
worth it.
git-apply --binary: clean up and prepare for --reverse
This cleans up the implementation of "git-apply --binary", and
implements reverse application of binary patches (when git-diff
is converted to emit reversible binary patches).
Earlier, the types of encoding (either deflated literal or
deflated delta) were stored in is_binary field in struct patch,
which meant that we cannot store more than one fragment that
differ in the encoding for a patch. This moves the information
to a field in struct fragment that is otherwise unused for
binary patches, and makes it possible to hang two (or more, but
two is enough) hunks for a binary patch.
The original "binary patch" output from git-diff is internally
parsed into an "is_binary" patch with one fragment. Upcoming
reversible binary patch output will have two fragments, the
first one being the forward patch and the second one the reverse
patch.
On Solaris and the BSDs the definition of "struct sockaddr_storage"
is not available from "netinet/in.h". On Solaris "sys/socket.h" is
enough, at least OpenBSD needs "sys/types.h", too.
Using "sys/types.h" and "sys/socket.h" seems to be a more portable
way.
Signed-off-by: Dennis Stosberg <dennis@stosberg.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
I've always found difficult to figure out git URL for clone from
gitweb URL because git:// and http:// are different on many site
including kernel.org.
I've found this enhancement at http://dev.laptop.org/git when I was on
git channel, and thought that it'd be nice if all public gitweb site
show it's git URL on its page.
This patch allow us to change the home link string. The current
default is "projects" as we all see on gitweb now.
ie. kernel.org might set this variable to "git://git.kernel.org/pub/scm/"
Signed-off-by: Yasushi SHOJI <yashi@atmark-techno.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Separate printing difftree in git_commit into git_difftree_body
Separate printing difftree in git_commit into separate
git_difftree_body subroutine. Add support for "C" (copied) status. For
"M" and "C" add parameter 'fp' (filename parent) to the "diff" link;
currently not supported by git_blobdiff ("blobdiff" action).
Reindented, realigned, added comments.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Change appereance of marker of refs pointing to given object
Change git_get_references to include type of ref in the %refs value, which
means putting everything after 'refs/' as a ref name, not only last
part of the name. Instead of separating refs pointing to the same
object by " / " separator, use anonymous array reference to store all
refs pointing to given object.
Use 'git-ls-remote .' if $projectroot/$project/info/refs does not
exist. (Perhaps it should be used always.)
Refs are now in separate span elements. Class is dependent on the ref
type: currently known classes are 'tag', 'head', 'remote', and 'ref'
(last one for HEAD and other refs in the main directory). There is
encompassing span element of class refs, just in case of unknown ref
type.
This might be considered cleaner separating of git_get_references into
filling %refs hash only, and not taking part in formatting ref marker.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Separate main part of git_history into git_history_body
Separates main part of git_history into git_history_body subroutine,
and makes output more similar to git_shortlog. Adds "diff to current"
link only for history of regular file (blob).
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
gitweb: Separate ref parsing in git_get_refs_list into parse_ref
Note that for each ref there are usually two calls to git subroutines:
first to get the type of ref, second to parse ref if ref is of commit
or tag type.
Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
Rename some of subroutines to better reflect what they do.
Some renames were not performed because subroutine name
reflects hash key.
Subroutines name guideline:
* git_ prefix for subroutines related to git commands,
git repository, or to gitweb actions
* git_get_ prefix for inner subroutines calling git command
or reading some file in the repository and returning some output
* parse_ prefix for subroutines parsing some text (or reading and
parsing some text) into hash or list
* format_ prefix for subroutines formatting, post-processing
or generating some HTML/text fragment
* _get_ infix for subroutines which return result
* _print_ infix for subroutines which print fragment of output
* _body suffix for subroutines which outputs main part (body)
of related action (usually table)
* _nav suffix for subroutines related to navigation bars
* _div suffix for subroutines returning or printing div element
* subroutine names should not be based on how the result is obtained,
as this might change easily
Add a newline before appending "Signed-off-by: " line
Whef the last line of the commit log message does not end with
"^[-A-Za-z]+: [^@]+@", append a newline after it to separate
the body of the commit log message from the run of sign-off and
ack lines. e.g. "Signed-off-by: A U Thor <au.thor@example.com>" or
"Acked-by: Me <myself@example.org>".
sample commit-msg hook: no silent exit on duplicate Signed-off-by lines
git-commit would silently exit if duplicate Signed-off-by
lines were found. Users of git-commit would not know it,
unless they checked '$?'. This patch makes git-commit
actually print out a message that nothing was commited
since duplicate Signed-off-lines were found.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: Junio C Hamano <junkio@cox.net>