Where do I go
to learn the details
- of git's packing heuristics?
+ of Git's packing heuristics?
Be careful what you ask!
-Followers of the git, please open the git IRC Log and turn to
+Followers of the Git, please open the Git IRC Log and turn to
February 10, 2006.
It's a rare occasion, and we are joined by the King Git Himself,
Let's listen in!
<njs`> Oh, here's a really stupid question -- where do I go to
- learn the details of git's packing heuristics? google avails
+ learn the details of Git's packing heuristics? google avails
me not, reading the source didn't help a lot, and wading
through the whole mailing list seems less efficient than any
of that.
<linus> njs, I don't think the docs exist. That's something where
I don't think anybody else than me even really got involved.
- Most of the rest of git others have been busy with (especially
+ Most of the rest of Git others have been busy with (especially
Junio), but packing nobody touched after I did it.
It's cryptic, yet vague. Linus in style for sure. Wise men
And switch. That ought to do it!
- <linus> Remember: git really doesn't follow files. So what it does is
+ <linus> Remember: Git really doesn't follow files. So what it does is
- generate a list of all objects
- sort the list according to magic heuristics
- walk the list, using a sliding window, seeing if an object
<pasky> yes
-And Bable-like confusion flowed.
+And Babel-like confusion flowed.
<njs`> oh, hmm, and I'm not sure what this sliding window means either
(type, basename, size)).
Then we walk through this list, and calculate a delta of
- each object against the last n (tunable paramater) objects,
+ each object against the last n (tunable parameter) objects,
and pick the smallest of these deltas.
Vastly simplified, but the essence is there!
- we actively try to generate deltas from a larger object to a
smaller one
- this means that the top-of-tree very seldom has deltas
- (ie deltas in _practice_ are "backwards deltas")
+ (i.e. deltas in _practice_ are "backwards deltas")
Again, we should reread that whole paragraph. Not just because
Linus has slipped Linus's Law in there on us, but because it is
<njs`> (if only it happened more...)
<linus> Anyway, the pack-file could easily be denser still, but
- because it's used both for streaming (the git protocol) and
+ because it's used both for streaming (the Git protocol) and
for on-disk, it has a few pessimizations.
Actually, it is a made-up word. But it is a made-up word being
do "object name->location in packfile" translation.
<njs`> I'm assuming the real win for delta-ing large->small is
- more homogenous statistics for gzip to run over?
+ more homogeneous statistics for gzip to run over?
(You have to put the bytes in one place or another, but
putting them in a larger blob wins on compression)
In fact, Linus reflects on some Basic Engineering Fundamentals,
design options, etc.
- <linus> More importantly, they allow git to still _conceptually_
+ <linus> More importantly, they allow Git to still _conceptually_
never deal with deltas at all, and be a "whole object" store.
Which has some problems (we discussed bad huge-file
- behaviour on the git lists the other day), but it does mean
- that the basic git concepts are really really simple and
+ behaviour on the Git lists the other day), but it does mean
+ that the basic Git concepts are really really simple and
straightforward.
It's all been quite stable.
Bugs happen, but they are "simple" bugs. And bugs that
actually get some object store detail wrong are almost always
- so obious that they never go anywhere.
+ so obvious that they never go anywhere.
<njs`> Yeah.
<njs`> :-)
<njs`> appreciate the infodump, I really was failing to find the
- details on git packs :-)
+ details on Git packs :-)
And now you know the rest of the story.