From: linux@horizon•com
To: git@vger•kernel.org
Cc: linux@horizon•com
Subject: Re: [PATCH] provide advance warning of some future pack default changes
Date: 14 Dec 2007 06:28:14 -0500 [thread overview]
Message-ID: <20071214112814.11083.qmail@science.horizon.com> (raw)
>+ * From v1.5.5, the pack.indexversion config option will default to 2,
>+ which is slightly more efficient, and makes repacking more immune to
>+ data corruptions. Git older than version 1.5.2 may revert to version 1
>+ of the pack index with a manual "git index-pack" to be able to directly
>+ access corresponding pack files.
You might want to mention that it's slightly more TIME efficient,
but takes 16% more space (28 bytes per object rather than 24).
If it helps, I documented the v2 index file format (a lot stolen
from commit c553ca25bd60dc9fd50b8bc7bd329601b81cee66 message).
(Public domain, copyright abandoned, if it breaks you get to keep both
pieces, yadda yadda.)
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index e5b31c8..a80baa4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,9 +1,9 @@
GIT pack format
===============
-= pack-*.pack file has the following format:
+= pack-*.pack files have the following format:
- - The header appears at the beginning and consists of the following:
+ - A header appears at the beginning and consists of the following:
4-byte signature:
The signature is: {'P', 'A', 'C', 'K'}
@@ -34,18 +34,14 @@ GIT pack format
- The trailer records 20-byte SHA1 checksum of all of the above.
-= pack-*.idx file has the following format:
+= Original (version 1) pack-*.idx files have the following format:
- The header consists of 256 4-byte network byte order
integers. N-th entry of this table records the number of
objects in the corresponding pack, the first byte of whose
- object name are smaller than N. This is called the
+ object name is less than or equal to N. This is called the
'first-level fan-out' table.
- Observation: we would need to extend this to an array of
- 8-byte integers to go beyond 4G objects per pack, but it is
- not strictly necessary.
-
- The header is followed by sorted 24-byte entries, one entry
per object in the pack. Each entry is:
@@ -55,10 +51,6 @@ GIT pack format
20-byte object name.
- Observation: we would definitely need to extend this to
- 8-byte integer plus 20-byte object name to handle a packfile
- that is larger than 4GB.
-
- The file is concluded with a trailer:
A copy of the 20-byte SHA1 checksum at the end of
@@ -68,31 +60,30 @@ GIT pack format
Pack Idx file:
- idx
- +--------------------------------+
- | fanout[0] = 2 |-.
- +--------------------------------+ |
+ -- +--------------------------------+
+fanout | fanout[0] = 2 (for example) |-.
+table +--------------------------------+ |
| fanout[1] | |
+--------------------------------+ |
| fanout[2] | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
- | fanout[255] | |
- +--------------------------------+ |
-main | offset | |
-index | object name 00XXXXXXXXXXXXXXXX | |
-table +--------------------------------+ |
- | offset | |
- | object name 00XXXXXXXXXXXXXXXX | |
- +--------------------------------+ |
- .-| offset |<+
- | | object name 01XXXXXXXXXXXXXXXX |
- | +--------------------------------+
- | | offset |
- | | object name 01XXXXXXXXXXXXXXXX |
- | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- | | offset |
- | | object name FFXXXXXXXXXXXXXXXX |
- | +--------------------------------+
+ | fanout[255] = total objects |---.
+ -- +--------------------------------+ | |
+main | offset | | |
+index | object name 00XXXXXXXXXXXXXXXX | | |
+table +--------------------------------+ | |
+ | offset | | |
+ | object name 00XXXXXXXXXXXXXXXX | | |
+ +--------------------------------+<+ |
+ .-| offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | +--------------------------------+ |
+ | | offset | |
+ | | object name 01XXXXXXXXXXXXXXXX | |
+ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
+ | | offset | |
+ | | object name FFXXXXXXXXXXXXXXXX | |
+ --| +--------------------------------+<--+
trailer | | packfile checksum |
| +--------------------------------+
| | idxfile checksum |
@@ -116,3 +107,40 @@ Pack file entry: <+
20-byte base object name SHA1 (the size above is the
size of the delta data that follows).
delta data, deflated.
+
+
+= Version 2 pack-*.idx files support packs larger than 4 GiB, and
+ have some other reorganizations. They have the format:
+
+ - A 4-byte magic number '\377tOc' which is an unreasonable
+ fanout[0] value.
+
+ - A 4-byte version number (= 2)
+
+ - A 256-entry fan-out table just like v1.
+
+ - A table of sorted 20-byte SHA1 object names. These are
+ packed together without offset values to reduce the cache
+ footprint of the binary search for a specific object name.
+
+ - A table of 4-byte CRC32 values of the packed object data.
+ This is new in v2 so compressed data can be copied directly
+ from pack to pack during repacking withough undetected
+ data corruption.
+
+ - A table of 4-byte offset values (in network byte order).
+ These are usually 31-bit pack file offsets, but large
+ offsets are encoded as an index into the next table with
+ the msbit set.
+
+ - A table of 8-byte offset entries (empty for pack files less
+ than 2 GiB). Pack files are organized with heavily used
+ objects toward the front, so most object references should
+ not need to refer to this table.
+
+ - The same trailer as a v1 pack file:
+
+ A copy of the 20-byte SHA1 checksum at the end of
+ corresponding packfile.
+
+ 20-byte SHA1-checksum of all of the above.
next reply other threads:[~2007-12-14 11:28 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-14 11:28 linux [this message]
2007-12-14 15:20 ` [PATCH] provide advance warning of some future pack default changes Nicolas Pitre
-- strict thread matches above, loose matches on Subject: below --
2007-12-02 22:04 v1.5.4 plans Junio C Hamano
2007-12-03 18:06 ` Nicolas Pitre
2007-12-03 21:23 ` Junio C Hamano
2007-12-14 3:32 ` [PATCH] provide advance warning of some future pack default changes Nicolas Pitre
2007-12-14 5:19 ` Junio C Hamano
2007-12-14 13:14 ` Nicolas Pitre
2007-12-14 12:45 ` Jakub Narebski
2007-12-14 13:38 ` Nicolas Pitre
2007-12-14 21:52 ` Joel Becker
2007-12-14 22:34 ` Nicolas Pitre
2007-12-14 22:39 ` Joel Becker
2007-12-14 22:46 ` Nicolas Pitre
2007-12-15 0:42 ` Joel Becker
2007-12-15 1:08 ` Nicolas Pitre
2007-12-15 1:21 ` Johannes Schindelin
2007-12-15 1:43 ` Junio C Hamano
2007-12-15 2:23 ` Nicolas Pitre
2007-12-17 20:09 ` Joel Becker
2007-12-17 20:41 ` Nicolas Pitre
2007-12-17 21:13 ` Joel Becker
2007-12-17 21:30 ` J. Bruce Fields
2007-12-17 21:52 ` Nicolas Pitre
2007-12-17 21:57 ` J. Bruce Fields
2007-12-17 22:15 ` Nicolas Pitre
2007-12-17 22:17 ` Junio C Hamano
2007-12-17 22:30 ` J. Bruce Fields
2007-12-17 22:55 ` Junio C Hamano
2007-12-18 0:04 ` J. Bruce Fields
2007-12-17 23:13 ` Nicolas Pitre
2007-12-17 21:16 ` Junio C Hamano
2007-12-17 21:45 ` Nicolas Pitre
2007-12-18 0:41 ` Junio C Hamano
2007-12-18 2:23 ` Mark Fasheh
2007-12-18 3:23 ` Nicolas Pitre
2007-12-18 3:52 ` Martin Langhoff
2007-12-18 4:09 ` Nicolas Pitre
2007-12-18 5:01 ` Junio C Hamano
2007-12-18 9:24 ` Jakub Narebski
2007-12-18 12:03 ` Johannes Schindelin
2007-12-18 14:16 ` Nicolas Pitre
2007-12-18 11:11 ` Jeff King
2007-12-18 12:06 ` Johannes Schindelin
2007-12-18 12:48 ` Jeff King
2007-12-18 13:30 ` Johannes Schindelin
2007-12-18 19:30 ` Jeff King
2007-12-18 20:12 ` Nicolas Pitre
2007-12-18 13:47 ` Jakub Narebski
2007-12-18 20:24 ` Junio C Hamano
2007-12-18 2:15 ` Mark Fasheh
2007-12-18 3:34 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071214112814.11083.qmail@science.horizon.com \
--to=linux@horizon$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox