1Date: Fri, 9 Nov 2007 08:28:38 -0800 (PST) 2From: Linus Torvalds <torvalds@linux-foundation.org> 3Subject: corrupt object on git-gc 4Abstract: Some tricks to reconstruct blob objects in order to fix 5 a corrupted repository. 6 7On Fri, 9 Nov 2007, Yossi Leybovich wrote: 8> 9> Did not help still the repository look for this object? 10> Any one know how can I track this object and understand which file is it 11 12So exactly *because* the SHA1 hash is cryptographically secure, the hash 13itself doesn't actually tell you anything, in order to fix a corrupt 14object you basically have to find the "original source" for it. 15 16The easiest way to do that is almost always to have backups, and find the 17same object somewhere else. Backups really are a good idea, and git makes 18it pretty easy (if nothing else, just clone the repository somewhere else, 19and make sure that you do *not* use a hard-linked clone, and preferably 20not the same disk/machine). 21 22But since you don't seem to have backups right now, the good news is that 23especially with a single blob being corrupt, these things *are* somewhat 24debuggable. 25 26First off, move the corrupt object away, and *save* it. The most common 27cause of corruption so far has been memory corruption, but even so, there 28are people who would be interested in seeing the corruption - but it's 29basically impossible to judge the corruption until we can also see the 30original object, so right now the corrupt object is useless, but it's very 31interesting for the future, in the hope that you can re-create a 32non-corrupt version. 33 34So: 35 36> ib]$ mv .git/objects/4b/9458b3786228369c63936db65827de3cc06200 ../ 37 38This is the right thing to do, although it's usually best to save it under 39it's full SHA1 name (you just dropped the "4b" from the result ;). 40 41Let's see what that tells us: 42 43> ib]$ git-fsck --full 44> broken link from tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 45> to blob 4b9458b3786228369c63936db65827de3cc06200 46> missing blob 4b9458b3786228369c63936db65827de3cc06200 47 48Ok, I removed the "dangling commit" messages, because they are just 49messages about the fact that you probably have rebased etc, so they're not 50at all interesting. But what remains is still very useful. In particular, 51we now know which tree points to it! 52 53Now you can do 54 55 git ls-tree 2d9263c6d23595e7cb2a21e5ebbb53655278dff8 56 57which will show something like 58 59 100644 blob 8d14531846b95bfa3564b58ccfb7913a034323b8 .gitignore 60 100644 blob ebf9bf84da0aab5ed944264a5db2a65fe3a3e883 .mailmap 61 100644 blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c COPYING 62 100644 blob ee909f2cc49e54f0799a4739d24c4cb9151ae453 CREDITS 63 040000 tree 0f5f709c17ad89e72bdbbef6ea221c69807009f6 Documentation 64 100644 blob 1570d248ad9237e4fa6e4d079336b9da62d9ba32 Kbuild 65 100644 blob 1c7c229a092665b11cd46a25dbd40feeb31661d9 MAINTAINERS 66 ... 67 68and you should now have a line that looks like 69 70 10064 blob 4b9458b3786228369c63936db65827de3cc06200 my-magic-file 71 72in the output. This already tells you a *lot* it tells you what file the 73corrupt blob came from! 74 75Now, it doesn't tell you quite enough, though: it doesn't tell what 76*version* of the file didn't get correctly written! You might be really 77lucky, and it may be the version that you already have checked out in your 78working tree, in which case fixing this problem is really simple, just do 79 80 git hash-object -w my-magic-file 81 82again, and if it outputs the missing SHA1 (4b945..) you're now all done! 83 84But that's the really lucky case, so let's assume that it was some older 85version that was broken. How do you tell which version it was? 86 87The easiest way to do it is to do 88 89 git log --raw --all --full-history -- subdirectory/my-magic-file 90 91and that will show you the whole log for that file (please realize that 92the tree you had may not be the top-level tree, so you need to figure out 93which subdirectory it was in on your own), and because you're asking for 94raw output, you'll now get something like 95 96 commit abc 97 Author: 98 Date: 99 .. 100 :100644 100644 4b9458b... newsha... M somedirectory/my-magic-file 101 102 103 commit xyz 104 Author: 105 Date: 106 107 .. 108 :100644 100644 oldsha... 4b9458b... M somedirectory/my-magic-file 109 110and this actually tells you what the *previous* and *subsequent* versions 111of that file were! So now you can look at those ("oldsha" and "newsha" 112respectively), and hopefully you have done commits often, and can 113re-create the missing my-magic-file version by looking at those older and 114newer versions! 115 116If you can do that, you can now recreate the missing object with 117 118 git hash-object -w <recreated-file> 119 120and your repository is good again! 121 122(Btw, you could have ignored the fsck, and started with doing a 123 124 git log --raw --all 125 126and just looked for the sha of the missing object (4b9458b..) in that 127whole thing. It's up to you - git does *have* a lot of information, it is 128just missing one particular blob version. 129 130Trying to recreate trees and especially commits is *much* harder. So you 131were lucky that it's a blob. It's quite possible that you can recreate the 132thing. 133 134 Linus