This introduces an API to plug custom filters to an input stream.
The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream(). After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.
This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.
streaming_write_entry(): use streaming API in write_entry()
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.
The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.
Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.
streaming: a new API to read from the object store
Given an object name, use open_istream() to get a git_istream handle
that you can read_istream() from as if you are using read(2) to read
the contents of the object, and close it with close_istream() when
you are done.
Currently, we do not do anything fancy--it just calls read_sha1_file()
and keeps the contents in memory as a whole, and carve it out as you
request with read_istream().
sha1_object_info_extended(): hint about objects in delta-base cache
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.
sha1_object_info_extended(): expose a bit more info
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked). The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:
- where the object is stored (loose? cached? packed?)
- if packed, where in which packfile?
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* In the earlier round, this used u.pack.delta to record the length of
the delta chain, but the caller is not necessarily interested in the
length of the delta chain per-se, but may only want to know if it is a
delta against another object or is stored as a deflated data. Calling
packed_object_info_detail() involves walking the reverse index chain to
compute the store size of the object and is unnecessarily expensive.
We could resurrect the code if a new caller wants to know, but I doubt
it.
Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming
* jc/convert:
convert: make it harder to screw up adding a conversion attribute
convert: make it safer to add conversion attributes
convert: give saner names to crlf/eol variables, types and functions
convert: rename the "eol" global variable to "core_eol"
* jc/bigfile:
Bigfile: teach "git add" to send a large file straight to a pack
index_fd(): split into two helper functions
index_fd(): turn write_object and format_check arguments into one flag
* jc/replacing:
read_sha1_file(): allow selective bypassing of replacement mechanism
inline lookup_replace_object() calls
read_sha1_file(): get rid of read_sha1_file_repl() madness
t6050: make sure we test not just commit replacement
Declare lookup_replace_object() in cache.h, not in commit.h
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28),
the extra parameter added in f2e872aa (Work around EMFILE when there are
too many pack files, 2010-11-01) is not used anymore.
Remove it.
Signed-off-by: Junio C Hamano <gitster@pobox.com> Acked-by: Shawn O. Pearce <spearce@spearce.org>
read_sha1_file(): allow selective bypassing of replacement mechanism
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:
- Callers that want it to die with useful diagnosis upon seeing a corrupt
object does not have a way to say that they do not want any object
replacement.
- Callers who do not want it to die but want to handle the errors
themselves are told to arrange to call read_object(), but the function
does not use the replacement mechanism, and also it is a file scope
static function that not many callers can call to begin with.
This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.
Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".
In a repository without object replacement, lookup_replace_object() should
be a no-op. Check the flag "read_replace_refs" on the side of the caller,
and bypess a function call when we know we are not dealing with replacement.
Also, even when we are set up to replace objects, if we do not find any
replacement defined, flip that flag off to avoid function call overhead
for all the later object accesses.
As this change the semantics of the flag from "do we need read the
replacement definition?" to "do we need to check with the lookup table?"
the flag needs to be renamed later to something saner, e.g. "use_replace",
when the codebase is calmer, but not now.
read_sha1_file(): get rid of read_sha1_file_repl() madness
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is. Worse yet, no sane
interface to return the underlying object without replacement is provided.
Remove the function and make only the few callers that want the name of
the replacement object find it themselves.
add, merge, diff: do not use strcasecmp to compare config variable names
The config machinery already makes section and variable names
lowercase when parsing them, so using strcasecmp for comparison just
feels wasteful. No noticeable change intended.
Noticed-by: Jay Soffian <jaysoffian@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bigfile: teach "git add" to send a large file straight to a pack
When adding a new content to the repository, we have always slurped
the blob in its entirety in-core first, and computed the object name
and compressed it into a loose object file. Handling large binary
files (e.g. video and audio asset for games) has been problematic
because of this design.
At the middle level of "git add" callchain is an internal API
index_fd() that takes an open file descriptor to read from the
working tree file being added with its size. Teach it to call out to
fast-import when adding a large blob.
The write-out codepath in entry.c::write_entry() should be taught to
stream, instead of reading everything in core. This should not be so
hard to implement, especially if we limit ourselves only to loose
object files and non-delta representation in packfiles.
* jh/dirstat-lines:
Mark dirstat error messages for translation
Improve error handling when parsing dirstat parameters
New --dirstat=lines mode, doing dirstat analysis based on diffstat
Allow specifying --dirstat cut-off percentage as a floating point number
Add config variable for specifying default --dirstat behavior
Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
Make --dirstat=0 output directories that contribute < 0.1% of changes
Add several testcases for --dirstat and friends
* jn/setup-revisions-glob-and-friends-passthru:
revisions: allow --glob and friends in parse_options-enabled commands
revisions: split out handle_revision_pseudo_opt function
Merge branch 'jc/fix-diff-files-unmerged' into maint
* jc/fix-diff-files-unmerged:
diff-files: show unmerged entries correctly
diff: remove often unused parameters from diff_unmerge()
diff.c: return filepair from diff_unmerge()
test: use $_z40 from test-lib
* vh/config-interactive-singlekey-doc:
git-reset.txt: better docs for '--patch'
git-checkout.txt: better docs for '--patch'
git-stash.txt: better docs for '--patch'
git-add.txt: document 'interactive.singlekey'
config.txt: 'interactive.singlekey; is used by...
* jc/maint-add-p-overlapping-hunks:
t3701: add-p-fix makes the last test to pass
"add -p": work-around an old laziness that does not coalesce hunks
add--interactive.perl: factor out repeated --recount option
t3701: Editing a split hunk in an "add -p" session
add -p: 'q' should really quit
* dm/http-cleanup:
t5541-http-push: add test for chunked
http-push: refactor curl_easy_setup madness
http-push: use const for strings in signatures
http: make curl callbacks match contracts from curl header
* jn/ctags:
gitweb: Mark matched 'ctag' / contents tag (?by_tag=foo)
gitweb: Change the way "content tags" ('ctags') are handled
gitweb: Restructure projects list generation
Do not strip empty lines / trailing spaces from a commit message template
Templates should be just that: A form that the user fills out, and forms
have blanks. If people are attached to not having extra whitespace in the
editor, they can simply clean up their templates.
Added test with editor adding even more whitespace.
Signed-off-by: Boris Faure <billiob@gmail.com>
Based-on-patch-by:Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert: make it harder to screw up adding a conversion attribute
The current internal API requires the callers of setup_convert_check() to
supply the git_attr_check structures (hence they need to know how many to
allocate), but they grab the same set of attributes for given path.
Define a new convert_attrs() API that fills a higher level information that
the callers (convert_to_git and convert_to_working_tree) really want, and
move the common code to interact with the attributes system to it.
convert: make it safer to add conversion attributes
The places that need to pass an array of "struct git_attr_check" needed to
be careful to pass a large enough array and know what index each element
lied. Make it safer and easier to code these.
Besides, the hard-coded sequence of initializing various attributes was
too ugly after we gained more than a few attributes.
convert: give saner names to crlf/eol variables, types and functions
Back when the conversion was only about the end-of-line convention, it
might have made sense to call what we do upon seeing CR/LF simply an
"action", but these days the conversion routines do a lot more than just
tweaking the line ending. Raname "action" to "crlf_action".
The function that decides what end of line conversion to use on the output
codepath was called "determine_output_conversion", as if there is no other
kind of output conversion. Rename it to "output_eol"; it is a function
that returns what EOL convention is to be used.
A function that decides what "crlf_action" needs to be used on the input
codepath, given what conversion attribute is set to the path and global
end-of-line convention, was called "determine_action". Rename it to
"input_crlf_action".
convert: rename the "eol" global variable to "core_eol"
Yes, it is clear that "eol" wants to mean some sort of end-of-line thing,
but as the name of a global variable, it is way too short to describe what
kind of end-of-line thing it wants to represent. Besides, there are many
codepaths that want to use their own local "char *eol" variable to point
at the end of the current line they are processing.
This global variable holds what we read from core.eol configuration
variable. Name it as such.
Split out the case where we do not know the size of the input (hence we
read everything into a strbuf before doing anything) to index_pipe(), and
the other case where we mmap or read the whole data to index_bulk().
Kacper Kornet noticed that a $variable in "word" in the above construct is
not substituted by his pdksh. Modern POSIX compliant shells (e.g. dash,
ksh, bash) all seem to interpret POSIX "2.6.2 Parameter Expansion" that
says "word shall be subjected to tilde expansion, parameter expansion,
command substitution, and arithmetic expansion" in ${parameter<op>word},
to mean that the word is expanded as if it appeared in dq pairs, so if the
word were "'$variable'" (sans dq) it would expand to a single quote, the
value of the $variable and then a single quote.
Johannes Sixt reports that the behavior of quoting at the right of :- when
the ${...:-...} expansion appears in double-quotes was debated recently at
length at the Austin group. We can avoid this issue and future-proof the
test by a slight rewrite.
Helped-by: Johannes Sixt <j.sixt@viscovery.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove gitweb/gitweb.cgi and other legacy targets from main Makefile
Now that there is gitweb/Makefile, let's leave only "gitweb" and
"install-gitweb" targets in main Makefile. Those targets just
delegate to gitweb's Makefile.
Requested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c0cb4ed (git-instaweb: Configure it to work with new gitweb
structure, 2010-05-28) git-instaweb does not re-create gitweb.cgi
etc., but makes use of installed gitweb. Therefore simplify
git-instaweb dependency on gitweb subsystem in main Makefile from
'gitweb/gitweb.cgi gitweb/static/gitweb.css gitweb/static/gitweb.js'
to simply 'gitweb'.
This is preparation for splitting gitweb.perl script, and for
splitting gitweb.js (to be reassembled / combined on build). This way
we don't have to duplicate parts of gitweb/Makefile in main
Makefile... it is also more correct description of git-instaweb
dependency.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: teach verify_parents to check for extra parents
Currently verify_parents only makes sure that the earlier parents of
HEAD match the commits given, and does not care if there are more
parents. This makes it harder than one would like to check that, for
example, parent reduction works correctly when making an octopus.
Fix it by checking that HEAD^(n+1) is not a valid commit name.
Noticed while working on a new test that was supposed to create a
fast-forward one commit ahead but actually created a merge.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge: make branch.<name>.mergeoptions correctly override merge.<option>
The parsing of the additional command line parameters supplied to
the branch.<name>.mergeoptions configuration variable was implemented
at the wrong stage. If any merge-related variable came after we read
branch.<name>.mergeoptions, the earlier value was overwritten.
We should first read all the merge.* configuration, override them by
reading from branch.<name>.mergeoptions and then finally read from
the command line.
This patch should fix it, even though I now strongly suspect that
branch.<name>.mergeoptions that gives a single command line that
needs to be parsed was likely to be an ill-conceived idea to begin
with. Sigh...
Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: eliminate unnecessary setup test assertions
Most of git's tests write files and define shell functions and
variables that will last throughout a test script at the top of
the script, before all test assertions:
. ./test-lib.sh
VAR='some value'
export VAR
>empty
fn () {
do something
}
test_expect_success 'setup' '
... nontrivial commands go here ...
'
Two scripts use a different style with this kind of trivial code
enclosed by a test assertion; fix them. The usual style is easier to
read since there is less indentation to keep track of and no need to
worry about nested quotes; and on the other hand, because the commands
in question are trivial, it should not make the test suite any worse
at catching future bugs in git.
While at it, make some other small tweaks:
- spell function definitions with a space before () for consistency
with other scripts;
- use the self-contained command "git mktree </dev/null" in
preference to "git write-tree" which looks at the index when
writing an empty tree.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-diff-files-unmerged:
diff-files: show unmerged entries correctly
diff: remove often unused parameters from diff_unmerge()
diff.c: return filepair from diff_unmerge()
test: use $_z40 from test-lib
* nd/struct-pathspec:
pathspec: rename per-item field has_wildcard to use_wildcard
Improve tree_entry_interesting() handling code
Convert read_tree{,_recursive} to support struct pathspec
Reimplement read_tree_recursive() using tree_entry_interesting()
While refactoring the options parser in bc3c79a (fast-import: add
(non-)relative-marks feature, 2009-12-04), it was made too lenient
for options that take no argument, fix that.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 09c9957 (send-pack: avoid deadlock when pack-object
dies early, 2011-04-25) attempted to fix a hang in the
stateless rpc case by closing a file descriptor early, but
we still need that descriptor.
Basically the deadlock can happen when pack-objects fails,
and the descriptor to upstream is left open. We never send
the pack, so the upstream is left waiting for us to say
something, and we are left waiting for upstream to close the
connection.
In the non-rpc case, our descriptor points straight to the
upstream. We hand it off to run-command, which takes
ownership and closes the descriptor after pack-objects
finishes (whether it succeeds or not).
Commit 09c9957 tried to emulate that in the rpc case. That
isn't right, though. We actually have a descriptor going
back to the remote-helper, and we need to keep using it
after pack-objects is finished. Closing it early completely
breaks pushing via smart-http.
We still need to do something on error to signal the
remote-helper that we won't be sending any pack data
(otherwise we get the deadlock). In an ideal world, we
would send a special packet back that says "Sorry, there was
an error". But the remote-helper doesn't understand any such
packet, so the best we can do is close the descriptor and
let it report that we hung up unexpectedly.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Describe '-p' as a short form of '--patch' in synopsis. Also include a better
explanation of this option and additionally refer the reader to the patch mode
description of git-add documentation.
Helped-by: Jeff King <peff@peff.net> Mentored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Valentin Haenel <valentin.haenel@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>