Pile Format

The on-disk pile keeps every blob and branch in one append-only file. This layout provides a simple write-ahead log style database where new data is only appended. It allows both blob and branch storage in a single file while remaining resilient to crashes. The pile backs local repositories and acts as a durable content‑addressed store. The pile file can be memory mapped for fast reads and is safely shared between threads because existing bytes are never mutated.

While large databases often avoid mmap due to pitfalls with partial writes and page cache thrashing [1], this design works well for the pile's narrow usage pattern.

Design Rationale

This format emphasizes simplicity over sophisticated on-disk structures. Appending new blobs rather than rewriting existing data keeps corruption windows small and avoids complicated page management. Storing everything in a single file makes a pile easy to back up or replicate over simple transports while still allowing it to be memory mapped for fast reads. The 64 byte alignment ensures each entry begins on a cache line boundary, which improves concurrent access patterns and allows safe typed views with the zerocopy crate.

Hash verification only happens when blobs are read. Opening even a very large pile is therefore fast while still catching corruption before data is used.

Every record begins with a 16 byte magic marker that identifies whether it stores a blob or a branch. The sections below illustrate the layout of each type.

Usage

A pile typically lives as a .pile file on disk. Repositories open it through Pile::open which performs any necessary recovery and returns a handle for appending new blobs or branches. Multiple threads may share the same handle thanks to internal synchronisation, making a pile a convenient durable store for local development.

Blob Storage

                             8 byte  8 byte
            ┌────16 byte───┐┌──────┐┌──────┐┌────────────32 byte───────────┐
          ┌ ┌──────────────┐┌──────┐┌──────┐┌──────────────────────────────┐
 header   │ │magic number A││ time ││length││             hash             │
          └ └──────────────┘└──────┘└──────┘└──────────────────────────────┘
            ┌────────────────────────────64 byte───────────────────────────┐
          ┌ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
          │ │                                                              │
 payload  │ │              bytes (64byte aligned and padded)               │
          │ │                                                              │
          └ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┘

Each blob entry records its creation timestamp, the length of the payload and its hash. The payload is padded so the next record begins on a 64 byte boundary.

Branch Storage

            ┌────16 byte───┐┌────16 byte───┐┌────────────32 byte───────────┐
          ┌ ┌──────────────┐┌──────────────┐┌──────────────────────────────┐
 header   │ │magic number B││  branch id   ││             hash             │
          └ └──────────────┘└──────────────┘└──────────────────────────────┘

Branch entries map a branch identifier to the hash of a blob.

Recovery

When [Pile::try_open] scans an existing file it checks that every header uses a known marker and that the whole record fits. It does not verify any hashes. If a truncated or unknown block is found the function reports the number of bytes that were valid so far using [OpenError::CorruptPile].

The convenience wrapper [Pile::open] re-runs the same validation and truncates the file to the valid length if corruption is encountered. This recovers from interrupted writes by discarding incomplete data. Hash verification happens lazily only when individual blobs are loaded so that opening a large pile remains fast.

For more details on interacting with a pile see the Pile struct documentation.