ci: exercise the whole test suite with uncommon code in pack-objects
Some recent optimizations have been added to pack-objects to reduce
memory usage and some code paths are split into two: one for common
use cases and one for rare ones. Make sure the rare cases are tested
with Travis since it requires manual test configuration that is
unlikely to be done by developers.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: reorder members to shrink struct object_entry
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).
This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).
Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.
See 87po42cwql.fsf@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: shrink delta_size field in struct object_entry
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.
If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: shrink size field in struct object_entry
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like
make test GIT_TEST_OE_SIZE=4
which forces this code on all objects larger than 3 bytes.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: clarify the use of object_entry::size
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.
"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: don't check size when the object is bad
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.
This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: shrink z_delta_size field in struct object_entry
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: refer to delta objects by index instead of pointer
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: move in_pack out of struct object_entry
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.
If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.
The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running
make test GIT_TEST_FULL_IN_PACK_ARRAY=1
to take advantage of other special case tests.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: move in_pack_pos out of struct object_entry
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: use bitfield for object_entry::depth
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: turn type and in_pack_type to bitfields
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects: a bit of document about struct object_entry
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert: add round trip check based on 'core.checkRoundtripEncoding'
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].
Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.
Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.
convert: add tracing for 'working-tree-encoding' attribute
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.
Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git reencodes
the content to a canonical UTF-8 representation. On checkout Git will
reverse this operation.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
utf8: add function to detect a missing UTF-16/32 BOM
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.
The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.
utf8: add function to detect prohibited UTF-16/32 BOM
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.
This function is used in a subsequent commit.
[1] http://unicode.org/faq/utf_bom.html#bom10
Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
utf8: teach same_encoding() alternative UTF encoding names
The function same_encoding() could only recognize alternative names for
UTF-8 encodings. Teach it to recognize all kinds of alternative UTF
encoding names (e.g. utf16).
While we are at it, fix a crash that would occur if same_encoding() was
called with a NULL argument and a non-NULL argument.
This function is used in a subsequent commit.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: store the main ref store inside the repository struct
This moves the 'main_ref_store', which was a global variable in refs.c
into the repository struct.
This patch does not deal with the parts in the refs subsystem which deal
with the submodules there. A later patch needs to get rid of the submodule
exposure in the refs API, such as 'get_submodule_ref_store(path)'.
Acked-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object: add repository argument to lookup_replace_object
Add a repository argument to allow callers of lookup_replace_object
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object: add repository argument to do_lookup_replace_object
Add a repository argument to allow the do_lookup_replace_object caller
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object: add repository argument to prepare_replace_object
Add a repository argument to allow the prepare_replace_object caller
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: add repository argument to for_each_replace_ref
Add a repository argument to allow for_each_replace_ref callers to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: add repository argument to get_main_ref_store
Add a repository argument to allow the get_main_ref_store caller
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object: check_replace_refs is safe in multi repo environment
In e1111cef23 (inline lookup_replace_object() calls, 2011-05-15) a shortcut
for checking the object replacement was added by setting check_replace_refs
to 0 once the replacements were evaluated to not exist. This works fine in
with the assumption of only one repository in existence.
The assumption won't hold true any more when we work on multiple instances
of a repository structs (e.g. one struct per submodule), as the first
repository to be inspected may have no replacements and would set the
global variable. Other repositories would then completely omit their
evaluation of replacements.
This reverts back the meaning of the flag `check_replace_refs` of
"Do we need to check with the lookup table?" to "Do we need to read
the replacement definition?", adding the bypassing logic to
lookup_replace_object after the replacement definition was read.
As with the original patch, delay the renaming of the global variable
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replace-object: eliminate replace objects prepared flag
Make the oidmap a pointer.
That way we eliminate the need for the global boolean
variable 'replace_object_prepared' as we can put this information
into the pointer being NULL or not.
Another advantage of this is that we would more quickly catch
code that tries to access replace-map without initializing it.
This also allows the '#include "oidmap.h"' introduced in a previous
patch to be replaced by the forward declaration of 'struct oidmap;'.
Keeping the type opaque discourages circumventing accessor functions;
not dragging in other headers avoids some compile time overhead.
One disadvantage of this is change is performance as we need to
pay the overhead for a malloc. The alternative of moving the
global variable into the object store is less modular code.
Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-store: move lookup_replace_object to replace-object.h
lookup_replace_object is a low-level function that most users of the
object store do not need to use directly.
Move it to replace-object.h to avoid a dependency loop in an upcoming
change to its inline definition that will make use of repository.h.
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The relationship between an object X and another object Y that
replaces the object X is defined only within the scope of a
single repository.
The exception in reachability rule around these replacement objects
is also local to a repository (i.e. if traversal from refs reaches
X, then both X and Y are reachable and need to be kept from gc).
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SubmittingPatches: mention the git contacts command
Instead of just mentioning 'git blame' and 'git shortlog', which make it
quite hard for new contributors to pick out the appropriate list of
people to cc on their patch series, mention the 'git contacts' utility,
which makes it much easier to get a reasonable list of contacts for a
change.
This should help new contributors pick out a reasonable cc list by
simply using a single command.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the mem_pool type which encapsulates all the information necessary to
manage a pool of memory. This change moves the existing variables in
fast-import used to support the global memory pool to use this structure. It
also renames variables that are no longer used by memory pools to reflect their
more scoped usage.
These changes allow for the multiple instances of a memory pool to
exist and be reused outside of fast-import. In a future commit the
mem_pool type will be moved to its own file.
Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is part of a patch series to extract the memory pool logic in
fast-import into a more generalized version. The existing mem_pool type
maps more closely to a "block of memory" (mp_block) in the more
generalized memory pool. This commit renames the mem_pool to mp_block to
reduce churn in future patches.
Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge branch 'svn/authors-prog-2' of git://bogomips.org/git-svn
* 'svn/authors-prog-2' of git://bogomips.org/git-svn:
git-svn: allow empty email-address using authors-prog and authors-file
git-svn: search --authors-prog in PATH too
mingw/msvc: use the new-style RUNTIME_PREFIX helper
This change also allows us to stop overriding argv[0] with the absolute
path of the executable, allowing us to preserve e.g. the case of the
executable's file name.
This fixes https://github.com/git-for-windows/git/issues/1496 partially.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
The RUNTIME_PREFIX feature comes from Git for Windows, but it was
enhanced to allow support for other platforms. While changing the
original idea, the concept was also improved by not forcing argv[0] to
be adjusted.
Let's allow the same for Windows by implementing a helper just as for
the other platforms.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable Git to resolve its own binary location using a variety of
OS-specific and generic methods, including:
- procfs via "/proc/self/exe" (Linux)
- _NSGetExecutablePath (Darwin)
- KERN_PROC_PATHNAME sysctl on BSDs.
- argv0, if absolute (all, including Windows).
This is used to enable RUNTIME_PREFIX support for non-Windows systems,
notably Linux and Darwin. When configured with RUNTIME_PREFIX, Git will
do a best-effort resolution of its executable path and automatically use
this as its "exec_path" for relative helper and data lookups, unless
explicitly overridden.
Small incidental formatting cleanup of "exec_cmd.c".
Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Robbie Iannucci <iannucci@google.com>
Thanks-to: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Broaden the RUNTIME_PREFIX flag to configure Git's Perl scripts to
locate the Git installation's Perl support libraries by resolving
against the script's path, rather than hard-coding that path at
build-time. Hard-coding at build time worked on previous
RUNTIME_PREFIX configurations (i.e., Windows) because the Perl
scripts were run within a virtual filesystem whose paths were
consistent regardless of the location of the actual installation.
This will no longer be the case for non-Windows RUNTIME_PREFIX users.
When enabled, RUNTIME_PREFIX now requires Perl's system paths to be
expressed relative to a common installation directory in the Makefile,
and uses that relationship to locate support files based on the known
starting point of the script being executed, much like RUNTIME_PREFIX
does for the Git binary.
This change enables Git's Perl scripts to work when their Git installation
is relocated or moved to another system, even when they are not in a
virtual filesystem environment.
Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Thanks-to: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the generated Perl script headers are emitted by commands in
the Makefile. This mechanism restricts options to introduce alternative
header content, needed by Perl runtime prefix support, and obscures the
origin of the Perl script header.
Change the Makefile to generate a header by processing a template file and
move the header content into the "perl/" subdirectory. The generated
header content will now be stored in the "GIT-PERL-HEADER" file. This
allows the content of the Perl header to be controlled by changing the path
of the template in the Makefile.
Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Thanks-to: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsmonitor currently only flags the index as dirty if the extension is being
added or removed. This is a performance optimization that recognizes you can
stat() a lot of files in less time than it takes to write out an updated index.
This patch makes a small enhancement and flags the index dirty if we end up
having to stat() all files and scan the entire working directory. The assumption
being that must be expensive or you would not have turned on the feature.
Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The topic appears to inflict severe regression in renaming merges,
even though the promise of it was that it would improve them.
We do not yet know which exact change in the topic was wrong, but in
the meantime, let's play it safe and revert it out of 'master'
before real Git-using projects are harmed.
fsmonitor: fix incorrect buffer size when printing version number
This is a trivial bug fix for passing the incorrect size to snprintf() when
outputting the version. It should be passing the size of the destination buffer
rather than the size of the value being printed.
Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/perf: add scripts to bisect performance regressions
The new bisect_regression script can be used to automatically bisect
performance regressions. It will pass the new bisect_run_script to
`git bisect run`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ab/doc-hash-brokenness:
doc hash-function-transition: clarify what SHAttered means
doc hash-function-transition: clarify how older gits die on NewHash
The commit-graph file provides quick access to commit data, including
the OID of the root tree for each commit in the graph. When performing
a deep commit-graph walk, we may not need to load most of the trees
for these commits.
Delay loading the tree object for a commit loaded from the graph
until requested via get_commit_tree(). Do not lazy-load trees for
commits not in the graph, since that requires duplicate parsing
and the relative peformance improvement when trees are not needed
is small.
On the Linux repository, performance tests were run for the following
command:
git log --graph --oneline -1000
Before: 0.92s
After: 0.66s
Rel %: -28.3%
Adding '-- kernel/' to the command requires loading the root tree
for every commit that is walked. There was no measureable performance
change as a result of this patch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
treewide: replace maybe_tree with accessor methods
In anticipation of making trees load lazily, create a Coccinelle
script (contrib/coccinelle/commit.cocci) to ensure that all
references to the 'maybe_tree' member of struct commit are either
mutations or accesses through get_commit_tree() or
get_commit_tree_oid().
Apply the Coccinelle script to create the rest of the patch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While walking the commit graph, we load struct commit objects into
the object cache. During this process, we also load struct tree
objects for the root tree of each of these commits. We load these
objects even if we are only computing commit reachability information,
such as a merge base or ahead/behind information.
Create get_commit_tree() as a first step to removing direct
references to the 'maybe_tree' member of struct commit.
Create get_commit_tree_oid() as a shortcut for several references
to "&commit->maybe_tree->object.oid" in the codebase.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the commit-graph file to walk commit history removes the large
cost of parsing commits during the walk. This exposes a performance
issue: lookup_tree() takes a large portion of the computation time,
even when Git never uses those trees.
In anticipation of lazy-loading these trees, rename the 'tree' member
of struct commit to 'maybe_tree'. This serves two purposes: it hints
at the future role of possibly being NULL even if the commit has a
valid tree, and it allows for unambiguous transformation from simple
member access (i.e. commit->maybe_tree) to method access.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.
For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph: read only from specific pack-indexes
Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit: integrate commit graph with commit parsing
Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.
If core.commitGraph is false, then do not check graph files.
In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.
By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.
Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.
Also automatically add commits from the existing graph file, if it
exists.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 20d2a30f (Makefile: replace perl/Makefile.PL with simple make rules)
removed a target that allowed Makefiles from contrib/ to get the correct
install path. This introduces a new target for main Makefile and fixes
installation for Mediawiki module.
v2: Pass prefix as that can have influence as well, add single quotes
for _SQ variant.
The mechanism to use parse-options API to automate the command line
completion continues to get extended and polished.
* nd/parseopt-completion-more:
completion: use __gitcomp_builtin in _git_cherry
completion: use __gitcomp_builtin in _git_ls_tree
completion: delete option-only completion commands
completion: add --option completion for most builtin commands
completion: factor out _git_xxx calling code
completion: mention the oldest version we need to support
git.c: add hidden option --list-parseopt-builtins
git.c: move cmd_struct declaration up
Code to find the length to uniquely abbreviate object names based
on packfile content, which is a relatively recent addtion, has been
optimized to use the same fan-out table.
* ds/bsearch-hash:
sha1_name: use bsearch_pack() in unique_in_pack()
sha1_name: use bsearch_pack() for abbreviations
packfile: define and use bsearch_pack()
sha1_name: convert struct min_abbrev_data to object_id
* ws/rebase-p:
rebase: remove merges_option and a blank line
rebase: remove unused code paths from git_rebase__interactive__preserve_merges
rebase: remove unused code paths from git_rebase__interactive
rebase: add and use git_rebase__interactive__preserve_merges
rebase: extract functions out of git_rebase__interactive
rebase: reindent function git_rebase__interactive
rebase: update invocation of rebase dot-sourced scripts
rebase-interactive: simplify pick_on_preserving_merges
"diff-highlight" filter (in contrib/) learned to undertand "git log
--graph" output better.
* jk/diff-highlight-graph-fix:
diff-highlight: detect --graph by indent
diff-highlight: use flush() helper consistently
diff-highlight: test graphs with --color
diff-highlight: test interleaved parallel lines of history
diff-highlight: prefer "echo" to "cat" in tests
diff-highlight: use test_tick in graph test
diff-highlight: correct test graph diagram
* nd/remove-ignore-env-field:
repository.h: add comment and clarify repo_set_gitdir
repository: delete ignore_env member
sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
repository.c: delete dead functions
repository.c: move env-related setup code back to environment.c
repository: initialize the_repository in main()
From the output of ls-files, we remove all but the leftmost path
component and then we eliminate duplicates. We do this in a while loop,
which is a performance bottleneck when the number of iterations is large
(e.g. for 60000 files in linux.git).
$ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git
real 0m11.876s
user 0m4.685s
sys 0m6.808s
Replacing the loop with the cut command improves performance
significantly:
$ COMP_WORDS=(git status -- ar) COMP_CWORD=3; time _git
real 0m1.372s
user 0m0.263s
sys 0m0.167s
The measurements were done with Msys2 bash, which is used by Git for
Windows.
When filtering the ls-files output we take care not to touch absolute
paths. This is redundant, because ls-files will never output absolute
paths. Remove the unnecessary operations.
The issue was reported here:
https://github.com/git-for-windows/git/issues/1533
Signed-off-by: Clemens Buchacher <drizzd@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config.c: treat type specifiers singularly
Internally, we represent `git config`'s type specifiers as a bitset
using OPT_BIT. 'bool' is 1<<0, 'int' is 1<<1, and so on. This technique
allows for the representation of multiple type specifiers in the `int
types` field, but this multi-representation is left unused.
In fact, `git config` will not accept multiple type specifiers at a
time, as indicated by:
$ git config --int --bool some.section
error: only one type at a time.
This patch uses `OPT_SET_INT` to prefer the _last_ mentioned type
specifier, so that the above command would instead be valid, and a
synonym of:
$ git config --bool some.section
This change is motivated by two urges: (1) it does not make sense to
represent a singular type specifier internally as a bitset, only to
complain when there are multiple bits in the set. `OPT_SET_INT` is more
well-suited to this task than `OPT_BIT` is. (2) a future patch will
introduce `--type=<type>`, and we would like not to complain in the
following situation:
$ git config --int --type=int
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>