1 Concerning Git's Packing Heuristics 2 =================================== 3 4 Oh, here's a really stupid question: 5 6 Where do I go 7 to learn the details 8 of git's packing heuristics? 9 10Be careful what you ask! 11 12Followers of the git, please open the git IRC Log and turn to 13February 10, 2006. 14 15It's a rare occasion, and we are joined by the King Git Himself, 16Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor 17and seeks enlightenment. Others are present, but silent. 18 19Let's listen in! 20 21 <njs`> Oh, here's a really stupid question -- where do I go to 22 learn the details of git's packing heuristics? google avails 23 me not, reading the source didn't help a lot, and wading 24 through the whole mailing list seems less efficient than any 25 of that. 26 27It is a bold start! A plea for help combined with a simultaneous 28tri-part attack on some of the tried and true mainstays in the quest 29for enlightenment. Brash accusations of google being useless. Hubris! 30Maligning the source. Heresy! Disdain for the mailing list archives. 31Woe. 32 33 <pasky> yes, the packing-related delta stuff is somewhat 34 mysterious even for me ;) 35 36Ah! Modesty after all. 37 38 <linus> njs, I don't think the docs exist. That's something where 39 I don't think anybody else than me even really got involved. 40 Most of the rest of git others have been busy with (especially 41 Junio), but packing nobody touched after I did it. 42 43It's cryptic, yet vague. Linus in style for sure. Wise men 44interpret this as an apology. A few argue it is merely a 45statement of fact. 46 47 <njs`> I guess the next step is "read the source again", but I 48 have to build up a certain level of gumption first :-) 49 50Indeed! On both points. 51 52 <linus> The packing heuristic is actually really really simple. 53 54Bait... 55 56 <linus> But strange. 57 58And switch. That ought to do it! 59 60 <linus> Remember: git really doesn't follow files. So what it does is 61 - generate a list of all objects 62 - sort the list according to magic heuristics 63 - walk the list, using a sliding window, seeing if an object 64 can be diffed against another object in the window 65 - write out the list in recency order 66 67The traditional understatement: 68 69 <njs`> I suspect that what I'm missing is the precise definition of 70 the word "magic" 71 72The traditional insight: 73 74 <pasky> yes 75 76And Bable-like confusion flowed. 77 78 <njs`> oh, hmm, and I'm not sure what this sliding window means either 79 80 <pasky> iirc, it appeared to me to be just the sha1 of the object 81 when reading the code casually ... 82 83 ... which simply doesn't sound as a very good heuristics, though ;) 84 85 <njs`> .....and recency order. okay, I think it's clear I didn't 86 even realize how much I wasn't realizing :-) 87 88Ah, grasshopper! And thus the enlightenment begins anew. 89 90 <linus> The "magic" is actually in theory totally arbitrary. 91 ANY order will give you a working pack, but no, it's not 92 ordered by SHA1. 93 94 Before talking about the ordering for the sliding delta 95 window, let's talk about the recency order. That's more 96 important in one way. 97 98 <njs`> Right, but if all you want is a working way to pack things 99 together, you could just use cat and save yourself some 100 trouble... 101 102Waaait for it.... 103 104 <linus> The recency ordering (which is basically: put objects 105 _physically_ into the pack in the order that they are 106 "reachable" from the head) is important. 107 108 <njs`> okay 109 110 <linus> It's important because that's the thing that gives packs 111 good locality. It keeps the objects close to the head (whether 112 they are old or new, but they are _reachable_ from the head) 113 at the head of the pack. So packs actually have absolutely 114 _wonderful_ IO patterns. 115 116Read that again, because it is important. 117 118 <linus> But recency ordering is totally useless for deciding how 119 to actually generate the deltas, so the delta ordering is 120 something else. 121 122 The delta ordering is (wait for it): 123 - first sort by the "basename" of the object, as defined by 124 the name the object was _first_ reached through when 125 generating the object list 126 - within the same basename, sort by size of the object 127 - but always sort different types separately (commits first). 128 129 That's not exactly it, but it's very close. 130 131 <njs`> The "_first_ reached" thing is not too important, just you 132 need some way to break ties since the same objects may be 133 reachable many ways, yes? 134 135And as if to clarify: 136 137 <linus> The point is that it's all really just any random 138 heuristic, and the ordering is totally unimportant for 139 correctness, but it helps a lot if the heuristic gives 140 "clumping" for things that are likely to delta well against 141 each other. 142 143It is an important point, so secretly, I did my own research and have 144included my results below. To be fair, it has changed some over time. 145And through the magic of Revisionistic History, I draw upon this entry 146from The Git IRC Logs on my father's birthday, March 1: 147 148 <gitster> The quote from the above linus should be rewritten a 149 bit (wait for it): 150 - first sort by type. Different objects never delta with 151 each other. 152 - then sort by filename/dirname. hash of the basename 153 occupies the top BITS_PER_INT-DIR_BITS bits, and bottom 154 DIR_BITS are for the hash of leading path elements. 155 - then if we are doing "thin" pack, the objects we are _not_ 156 going to pack but we know about are sorted earlier than 157 other objects. 158 - and finally sort by size, larger to smaller. 159 160In one swell-foop, clarification and obscurification! Nonetheless, 161authoritative. Cryptic, yet concise. It even solicits notions of 162quotes from The Source Code. Clearly, more study is needed. 163 164 <gitster> That's the sort order. What this means is: 165 - we do not delta different object types. 166 - we prefer to delta the objects with the same full path, but 167 allow files with the same name from different directories. 168 - we always prefer to delta against objects we are not going 169 to send, if there are some. 170 - we prefer to delta against larger objects, so that we have 171 lots of removals. 172 173 The penultimate rule is for "thin" packs. It is used when 174 the other side is known to have such objects. 175 176There it is again. "Thin" packs. I'm thinking to myself, "What 177is a 'thin' pack?" So I ask: 178 179 <jdl> What is a "thin" pack? 180 181 <gitster> Use of --objects-edge to rev-list as the upstream of 182 pack-objects. The pack transfer protocol negotiates that. 183 184Woo hoo! Cleared that _right_ up! 185 186 <gitster> There are two directions - push and fetch. 187 188There! Did you see it? It is not '"push" and "pull"'! How often the 189confusion has started here. So casually mentioned, too! 190 191 <gitster> For push, git-send-pack invokes git-receive-pack on the 192 other end. The receive-pack says "I have up to these commits". 193 send-pack looks at them, and computes what are missing from 194 the other end. So "thin" could be the default there. 195 196 In the other direction, fetch, git-fetch-pack and 197 git-clone-pack invokes git-upload-pack on the other end 198 (via ssh or by talking to the daemon). 199 200 There are two cases: fetch-pack with -k and clone-pack is one, 201 fetch-pack without -k is the other. clone-pack and fetch-pack 202 with -k will keep the downloaded packfile without expanded, so 203 we do not use thin pack transfer. Otherwise, the generated 204 pack will have delta without base object in the same pack. 205 206 But fetch-pack without -k will explode the received pack into 207 individual objects, so we automatically ask upload-pack to 208 give us a thin pack if upload-pack supports it. 209 210OK then. 211 212Uh. 213 214Let's return to the previous conversation still in progress. 215 216 <njs`> and "basename" means something like "the tail of end of 217 path of file objects and dir objects, as per basename(3), and 218 we just declare all commit and tag objects to have the same 219 basename" or something? 220 221Luckily, that too is a point that gitster clarified for us! 222 223If I might add, the trick is to make files that _might_ be similar be 224located close to each other in the hash buckets based on their file 225names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and 226"Makefile" all landed in the same bucket due to their common basename, 227"Makefile". However, now they land in "close" buckets. 228 229The algorithm allows not just for the _same_ bucket, but for _close_ 230buckets to be considered delta candidates. The rationale is 231essentially that files, like Makefiles, often have very similar 232content no matter what directory they live in. 233 234 <linus> I played around with different delta algorithms, and with 235 making the "delta window" bigger, but having too big of a 236 sliding window makes it very expensive to generate the pack: 237 you need to compare every object with a _ton_ of other objects. 238 239 There are a number of other trivial heuristics too, which 240 basically boil down to "don't bother even trying to delta this 241 pair" if we can tell before-hand that the delta isn't worth it 242 (due to size differences, where we can take a previous delta 243 result into account to decide that "ok, no point in trying 244 that one, it will be worse"). 245 246 End result: packing is actually very size efficient. It's 247 somewhat CPU-wasteful, but on the other hand, since you're 248 really only supposed to do it maybe once a month (and you can 249 do it during the night), nobody really seems to care. 250 251Nice Engineering Touch, there. Find when it doesn't matter, and 252proclaim it a non-issue. Good style too! 253 254 <njs`> So, just to repeat to see if I'm following, we start by 255 getting a list of the objects we want to pack, we sort it by 256 this heuristic (basically lexicographically on the tuple 257 (type, basename, size)). 258 259 Then we walk through this list, and calculate a delta of 260 each object against the last n (tunable paramater) objects, 261 and pick the smallest of these deltas. 262 263Vastly simplified, but the essence is there! 264 265 <linus> Correct. 266 267 <njs`> And then once we have picked a delta or fulltext to 268 represent each object, we re-sort by recency, and write them 269 out in that order. 270 271 <linus> Yup. Some other small details: 272 273And of course there is the "Other Shoe" Factor too. 274 275 <linus> - We limit the delta depth to another magic value (right 276 now both the window and delta depth magic values are just "10") 277 278 <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO 279 patterns, because the things you want are near by, but to 280 actually reconstruct them you may have to jump all over in 281 random ways. 282 283 <linus> - When we write out a delta, and we haven't yet written 284 out the object it is a delta against, we write out the base 285 object first. And no, when we reconstruct them, we actually 286 get nice IO patterns, because: 287 - larger objects tend to be "more recent" (Linus' law: files grow) 288 - we actively try to generate deltas from a larger object to a 289 smaller one 290 - this means that the top-of-tree very seldom has deltas 291 (ie deltas in _practice_ are "backwards deltas") 292 293Again, we should reread that whole paragraph. Not just because 294Linus has slipped Linus's Law in there on us, but because it is 295important. Let's make sure we clarify some of the points here: 296 297 <njs`> So the point is just that in practice, delta order and 298 recency order match each other quite well. 299 300 <linus> Yes. There's another nice side to this (and yes, it was 301 designed that way ;): 302 - the reason we generate deltas against the larger object is 303 actually a big space saver too! 304 305 <njs`> Hmm, but your last comment (if "we haven't yet written out 306 the object it is a delta against, we write out the base object 307 first"), seems like it would make these facts mostly 308 irrelevant because even if in practice you would not have to 309 wander around much, in fact you just brute-force say that in 310 the cases where you might have to wander, don't do that :-) 311 312 <linus> Yes and no. Notice the rule: we only write out the base 313 object first if the delta against it was more recent. That 314 means that you can actually have deltas that refer to a base 315 object that is _not_ close to the delta object, but that only 316 happens when the delta is needed to generate an _old_ object. 317 318 <linus> See? 319 320Yeah, no. I missed that on the first two or three readings myself. 321 322 <linus> This keeps the front of the pack dense. The front of the 323 pack never contains data that isn't relevant to a "recent" 324 object. The size optimization comes from our use of xdelta 325 (but is true for many other delta algorithms): removing data 326 is cheaper (in size) than adding data. 327 328 When you remove data, you only need to say "copy bytes n--m". 329 In contrast, in a delta that _adds_ data, you have to say "add 330 these bytes: 'actual data goes here'" 331 332 *** njs` has quit: Read error: 104 (Connection reset by peer) 333 334 <linus> Uhhuh. I hope I didn't blow njs` mind. 335 336 *** njs` has joined channel #git 337 338 <pasky> :) 339 340The silent observers are amused. Of course. 341 342And as if njs` was expected to be omniscient: 343 344 <linus> njs - did you miss anything? 345 346OK, I'll spell it out. That's Geek Humor. If njs` was not actually 347connected for a little bit there, how would he know if missed anything 348while he was disconnected? He's a benevolent dictator with a sense of 349humor! Well noted! 350 351 <njs`> Stupid router. Or gremlins, or whatever. 352 353It's a cheap shot at Cisco. Take 'em when you can. 354 355 <njs`> Yes and no. Notice the rule: we only write out the base 356 object first if the delta against it was more recent. 357 358 I'm getting lost in all these orders, let me re-read :-) 359 So the write-out order is from most recent to least recent? 360 (Conceivably it could be the opposite way too, I'm not sure if 361 we've said) though my connection back at home is logging, so I 362 can just read what you said there :-) 363 364And for those of you paying attention, the Omniscient Trick has just 365been detailed! 366 367 <linus> Yes, we always write out most recent first 368 369For the other record: 370 371 <pasky> njs`: http://pastebin.com/547965 372 373The 'net never forgets, so that should be good until the end of time. 374 375 <njs`> And, yeah, I got the part about deeper-in-history stuff 376 having worse IO characteristics, one sort of doesn't care. 377 378 <linus> With the caveat that if the "most recent" needs an older 379 object to delta against (hey, shrinking sometimes does 380 happen), we write out the old object with the delta. 381 382 <njs`> (if only it happened more...) 383 384 <linus> Anyway, the pack-file could easily be denser still, but 385 because it's used both for streaming (the git protocol) and 386 for on-disk, it has a few pessimizations. 387 388Actually, it is a made-up word. But it is a made-up word being 389used as setup for a later optimization, which is a real word: 390 391 <linus> In particular, while the pack-file is then compressed, 392 it's compressed just one object at a time, so the actual 393 compression factor is less than it could be in theory. But it 394 means that it's all nice random-access with a simple index to 395 do "object name->location in packfile" translation. 396 397 <njs`> I'm assuming the real win for delta-ing large->small is 398 more homogenous statistics for gzip to run over? 399 400 (You have to put the bytes in one place or another, but 401 putting them in a larger blob wins on compression) 402 403 Actually, what is the compression strategy -- each delta 404 individually gzipped, the whole file gzipped, somewhere in 405 between, no compression at all, ....? 406 407 Right. 408 409Reality IRC sets in. For example: 410 411 <pasky> I'll read the rest in the morning, I really have to go 412 sleep or there's no hope whatsoever for me at the today's 413 exam... g'nite all. 414 415Heh. 416 417 <linus> pasky: g'nite 418 419 <njs`> pasky: 'luck 420 421 <linus> Right: large->small matters exactly because of compression 422 behaviour. If it was non-compressed, it probably wouldn't make 423 any difference. 424 425 <njs`> yeah 426 427 <linus> Anyway: I'm not even trying to claim that the pack-files 428 are perfect, but they do tend to have a nice balance of 429 density vs ease-of use. 430 431Gasp! OK, saved. That's a fair Engineering trade off. Close call! 432In fact, Linus reflects on some Basic Engineering Fundamentals, 433design options, etc. 434 435 <linus> More importantly, they allow git to still _conceptually_ 436 never deal with deltas at all, and be a "whole object" store. 437 438 Which has some problems (we discussed bad huge-file 439 behaviour on the git lists the other day), but it does mean 440 that the basic git concepts are really really simple and 441 straightforward. 442 443 It's all been quite stable. 444 445 Which I think is very much a result of having very simple 446 basic ideas, so that there's never any confusion about what's 447 going on. 448 449 Bugs happen, but they are "simple" bugs. And bugs that 450 actually get some object store detail wrong are almost always 451 so obious that they never go anywhere. 452 453 <njs`> Yeah. 454 455Nuff said. 456 457 <linus> Anyway. I'm off for bed. It's not 6AM here, but I've got 458 three kids, and have to get up early in the morning to send 459 them off. I need my beauty sleep. 460 461 <njs`> :-) 462 463 <njs`> appreciate the infodump, I really was failing to find the 464 details on git packs :-) 465 466And now you know the rest of the story.