fsck: document and test sorted skipList input
authorÆvar Arnfjörð Bjarmason <avarab@gmail.com>
Mon, 3 Sep 2018 14:49:21 +0000 (14:49 +0000)
committerJunio C Hamano <gitster@pobox.com>
Wed, 12 Sep 2018 22:17:46 +0000 (15:17 -0700)
Ever since the skipList support was first added in cd94c6f91 ("fsck:
git receive-pack: support excluding objects from fsck'ing",
2015-06-22) the documentation for the format has that the file is a
sorted list of object names.

Thus, anyone using the feature would have thought the list needed to
be sorted. E.g. I recently in conjunction with my fetch.fsck.*
implementation in 1362df0d41 ("fetch: implement fetch.fsck.*",
2018-07-27) wrote some code to ship a skipList, and went out of my way
to sort it.

Doing so seems intuitive, since it contains fixed-width records, and
has no support for comments, so one might expect it to be binary
searched in-place on-disk.

However, as documented here this was never a requirement, so let's
change the documentation. Since this is a file format change let's
also document what was said about this in the past, so e.g. someone
like myself reading the new docs can see this never needed to be
sorted ("why do I have all this code to sort this thing...").

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/config.txt
t/t5504-fetch-receive-strict.sh
index eb66a119753726b0260acd456f5728351dc05ba3..fd1b5837d0f44eeea81f6bb691eff80fa7388848 100644 (file)
@@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 
 fsck.skipList::
-       The path to a sorted list of object names (i.e. one SHA-1 per
+       The path to a list of object names (i.e. one SHA-1 per
        line) that are known to be broken in a non-fatal way and should
        be ignored. This feature is useful when an established project
        should be accepted despite early commits containing errors that
@@ -1725,6 +1725,14 @@ Unlike variables like `color.ui` and `core.editor` the
 fall back on the `fsck.skipList` configuration if they aren't set. To
 uniformly configure the same fsck settings in different circumstances
 all three of them they must all set to the same values.
++
+Older versions of Git (before 2.20) documented that the object names
+list should be sorted. This was never a requirement, the object names
+can appear in any order, but when reading the list we track whether
+the list is sorted for the purposes of an internal binary search
+implementation, which can save itself some work with an already sorted
+list. Unless you have a humongous list there's no reason to go out of
+your way to pre-sort the list.
 
 gc.aggressiveDepth::
        The depth parameter used in the delta compression
index cbae31f3301b25e439b59b8925dcc21b9c5db1e4..fa56052f0f68bcd396a9688a5174a18a8d8cfd6a 100755 (executable)
@@ -142,6 +142,25 @@ test_expect_success 'fsck with no skipList input' '
        test_i18ngrep "missingEmail" err
 '
 
+test_expect_success 'setup sorted and unsorted skipLists' '
+       cat >SKIP.unsorted <<-EOF &&
+       0000000000000000000000000000000000000004
+       0000000000000000000000000000000000000002
+       $commit
+       0000000000000000000000000000000000000001
+       0000000000000000000000000000000000000003
+       EOF
+       sort SKIP.unsorted >SKIP.sorted
+'
+
+test_expect_success 'fsck with sorted skipList' '
+       git -c fsck.skipList=SKIP.sorted fsck
+'
+
+test_expect_success 'fsck with unsorted skipList' '
+       git -c fsck.skipList=SKIP.unsorted fsck
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
        git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
        test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&