Tolerate zlib deflation with window size < 32Kb
authorRoberto Tyley <roberto.tyley@guardian.co.uk>
Sun, 7 Aug 2011 18:46:13 +0000 (19:46 +0100)
committerJunio C Hamano <gitster@pobox.com>
Thu, 11 Aug 2011 20:02:47 +0000 (13:02 -0700)
Git currently reports loose objects as 'corrupt' if they've been
deflated using a window size less than 32Kb, because the
experimental_loose_object() function doesn't recognise the header
byte as a zlib header. This patch makes the function tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format.

On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated appear 'corrupt', which is why Agit is a read-only
Git client at this time - I don't want my client to generate Git
repos that other clients treat as broken :(

This patch makes Git tolerant of different deflate settings - it
might appear that it changes experimental_loose_object() to the point
where it could incorrectly identify the experimental format as the
standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:

1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
+ [1st byte of the following zlib header] % 31 = 0 (checksum)

As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.

Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.

The alternative to this patch is simply removing support for the
experimental format, which I am also totally cool with.

References:

Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53

Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177

Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
19 files changed:
sha1_file.c
t/t1013-loose-object-format.sh [new file with mode: 0755]
t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6 [new file with mode: 0644]
t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435 [new file with mode: 0644]
t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd [new file with mode: 0644]
t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 [new file with mode: 0644]
t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e [new file with mode: 0644]
t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696 [new file with mode: 0644]
t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd [new file with mode: 0644]
t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb [new file with mode: 0644]
t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 [new file with mode: 0644]
t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 [new file with mode: 0644]
t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730 [new file with mode: 0644]
t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa [new file with mode: 0644]
t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f [new file with mode: 0644]
t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832 [new file with mode: 0644]
t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8 [new file with mode: 0644]
t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 [new file with mode: 0644]
t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a [new file with mode: 0644]
index 697f4a43c5ef0bdc220a15545d94bc31ad1f248e..475d215c14d25212ef8d5b3849b650666cb53b17 100644 (file)
@@ -1217,14 +1217,34 @@ static int experimental_loose_object(unsigned char *map)
        unsigned int word;
 
        /*
-        * Is it a zlib-compressed buffer? If so, the first byte
-        * must be 0x78 (15-bit window size, deflated), and the
-        * first 16-bit word is evenly divisible by 31. If so,
-        * we are looking at the official format, not the experimental
-        * one.
+        * We must determine if the buffer contains the standard
+        * zlib-deflated stream or the experimental format based
+        * on the in-pack object format. Compare the header byte
+        * for each format:
+        *
+        * RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
+        * Experimental pack-based : Stttssss : ttt = 1,2,3,4
+        *
+        * If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
+        * in standard loose-object format, UNLESS it is a Git-pack
+        * format object *exactly* 8 bytes in size when inflated.
+        *
+        * However, RFC1950 also specifies that the 1st 16-bit word
+        * must be divisible by 31 - this checksum tells us our buffer
+        * is in the standard format, giving a false positive only if
+        * the 1st word of the Git-pack format object happens to be
+        * divisible by 31, ie:
+        *      ((byte0 * 256) + byte1) % 31 = 0
+        *   =>        0ttt10000www1000 % 31 = 0
+        *
+        * As it happens, this case can only arise for www=3 & ttt=1
+        * - ie, a Commit object, which would have to be 8 bytes in
+        * size. As no Commit can be that small, we find that the
+        * combination of these two criteria (bitmask & checksum)
+        * can always correctly determine the buffer format.
         */
        word = (map[0] << 8) + map[1];
-       if (map[0] == 0x78 && !(word % 31))
+       if ((map[0] & 0x8F) == 0x08 && !(word % 31))
                return 0;
        else
                return 1;
diff --git a/t/t1013-loose-object-format.sh b/t/t1013-loose-object-format.sh
new file mode 100755 (executable)
index 0000000..f45702c
--- /dev/null
@@ -0,0 +1,68 @@
+#!/bin/sh
+#
+# Copyright (c) 2011 Roberto Tyley
+#
+
+test_description='Correctly identify and parse loose object headers
+
+There are two file formats for loose objects - the original standard
+format, and the experimental format introduced with Git v1.4.3, later
+deprecated with v1.5.3. Although Git no longer writes the
+experimental format, objects in both formats must be read, with the
+format for a given file being determined by the header.
+
+Detecting file format based on header is not entirely trivial, not
+least because the first byte of a zlib-deflated stream will vary
+depending on how much memory was allocated for the deflation window
+buffer when the object was written out (for example 4KB on Android,
+rather that 32KB on a normal PC).
+
+The loose objects used as test vectors have been generated with the
+following Git versions:
+
+standard format: Git v1.7.4.1
+experimental format: Git v1.4.3 (legacyheaders=false)
+standard format, deflated with 4KB window size: Agit/JGit on Android
+'
+
+. ./test-lib.sh
+LF='
+'
+
+assert_blob_equals() {
+       printf "%s" "$2" >expected &&
+       git cat-file -p "$1" >actual &&
+       test_cmp expected actual
+}
+
+test_expect_success setup '
+       cp -R "$TEST_DIRECTORY/t1013/objects" .git/
+       git --version
+'
+
+test_expect_success 'read standard-format loose objects' '
+       git cat-file tag 8d4e360d6c70fbd72411991c02a09c442cf7a9fa &&
+       git cat-file commit 6baee0540ea990d9761a3eb9ab183003a71c3696 &&
+       git ls-tree 7a37b887a73791d12d26c0d3e39568a8fb0fa6e8 &&
+       assert_blob_equals "257cc5642cb1a054f08cc83f2d943e56fd3ebe99" "foo$LF"
+'
+
+test_expect_success 'read experimental-format loose objects' '
+       git cat-file tag 76e7fa9941f4d5f97f64fea65a2cba436bc79cbb &&
+       git cat-file commit 7875c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 &&
+       git ls-tree 95b1625de3ba8b2214d1e0d0591138aea733f64f &&
+       assert_blob_equals "2e65efe2a145dda7ee51d1741299f848e5bf752e" "a" &&
+       assert_blob_equals "9ae9e86b7bd6cb1472d9373702d8249973da0832" "ab" &&
+       assert_blob_equals "85df50785d62d3b05ab03d9cbf7e4a0b49449730" "abcd" &&
+       assert_blob_equals "1656f9233d999f61ef23ef390b9c71d75399f435" "abcdefgh" &&
+       assert_blob_equals "1e72a6b2c4a577ab0338860fa9fe87f761fc9bbd" "abcdefghi" &&
+       assert_blob_equals "70e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd" "abcdefghijklmnop" &&
+       assert_blob_equals "bd15045f6ce8ff75747562173640456a394412c8" "abcdefghijklmnopqrstuvwx"
+'
+
+test_expect_success 'read standard-format objects deflated with smaller window buffer' '
+       git cat-file tag f816d5255855ac160652ee5253b06cd8ee14165a &&
+       git cat-file tag 149cedb5c46929d18e0f118e9fa31927487af3b6
+'
+
+test_done
diff --git a/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6 b/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6
new file mode 100644 (file)
index 0000000..472fd14
Binary files /dev/null and b/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6 differ
diff --git a/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435 b/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435
new file mode 100644 (file)
index 0000000..c379d74
Binary files /dev/null and b/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435 differ
diff --git a/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd b/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd
new file mode 100644 (file)
index 0000000..9370630
Binary files /dev/null and b/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd differ
diff --git a/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 b/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
new file mode 100644 (file)
index 0000000..bdcf704
Binary files /dev/null and b/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 differ
diff --git a/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e b/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e
new file mode 100644 (file)
index 0000000..ad62c43
Binary files /dev/null and b/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e differ
diff --git a/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696 b/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696
new file mode 100644 (file)
index 0000000..3d2f033
Binary files /dev/null and b/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696 differ
diff --git a/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd b/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd
new file mode 100644 (file)
index 0000000..b3f71a6
Binary files /dev/null and b/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd differ
diff --git a/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb b/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
new file mode 100644 (file)
index 0000000..af4e9a7
--- /dev/null
@@ -0,0 +1,2 @@
\vx\9c%ÌA\ e\820\10@Ñ}O1{cSZ(\98\18ãνá\ 2Ãthª\94\92Z\8cÜÞ Ëÿ\16?\r\ f¦\ 2m×6dµi\9d\19É9\85¤Gå\98h\a´Ø¨ÁZR'Q¶\85\81R\8c¡\88\82\1eø³p\ e\91ç\82ÓqL9âÏ=g¸§\81sIÐo\13opÎÿ\94eÏ«_1»\80³¤$×ç\ 5*Si«ëNwpP\95RBôûÅÁú
\87[(ð®d-\8dø\ 2ÁL9á
\ No newline at end of file
diff --git a/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 b/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09
new file mode 100644 (file)
index 0000000..3dd28be
Binary files /dev/null and b/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 differ
diff --git a/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 b/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8
new file mode 100644 (file)
index 0000000..2b97b26
Binary files /dev/null and b/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 differ
diff --git a/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730 b/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730
new file mode 100644 (file)
index 0000000..6dff746
Binary files /dev/null and b/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730 differ
diff --git a/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa b/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa
new file mode 100644 (file)
index 0000000..cb41e92
Binary files /dev/null and b/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa differ
diff --git a/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f b/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f
new file mode 100644 (file)
index 0000000..7ac46b4
Binary files /dev/null and b/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f differ
diff --git a/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832 b/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832
new file mode 100644 (file)
index 0000000..9d8316d
Binary files /dev/null and b/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832 differ
diff --git a/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8 b/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8
new file mode 100644 (file)
index 0000000..eebf239
Binary files /dev/null and b/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8 differ
diff --git a/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 b/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
new file mode 100644 (file)
index 0000000..134cf19
Binary files /dev/null and b/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 differ
diff --git a/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a b/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
new file mode 100644 (file)
index 0000000..26b75ae
--- /dev/null
@@ -0,0 +1 @@
+H\89\15ÌÁ\ e\820\f\80aÏ{\8aÞ\rI»e\1d&Æø*¥\1d\88\ 1\17ß^¸ýù\ e¿Ë\ 4DåÒ\86wU\87Ò\97¬\1cS±4ª\19\8aÆ\11­ª\9e ,\19\afÅ[ðßVAÛºÎ\1eüxÈÇö6[wtG§Lu\a¸?\97¦²¼Ú×\1f@\89"gì{\86+\12b\b\7fy¾%M
\ No newline at end of file