Formal Verification Roadmap
This roadmap captures the initial strategy for driving the triblespace crates
toward comprehensive formal verification. It unifies model checking, symbolic
execution, fuzzing, and deterministic simulation so we can reason about both the
low-level data structures and high-level repository workflows with stronger
correctness guarantees.
Verification Stack Overview
- Model checking with Kani explores bounded but exhaustive state spaces for invariants that must never be violated.
- Symbolic execution with Miri exposes undefined behaviour (UB) and aliasing issues across regular unit tests without requiring new harnesses.
- Coverage-guided fuzzing stresses APIs with randomized input sequences to uncover emergent behaviours that formal proofs might miss.
- Deterministic simulations replay realistic repository workflows so we can audit higher-level semantics and regression-test subtle interplays between subsystems.
Each technique complements the others; together they provide layered assurance that keeps regressions from reaching downstream users.
Goals
- Protect the fundamental algebraic properties of
TribleSet,PATCH, and the repository commit graph. - Exercise serialization, deserialization, and zero-copy data views under adversarial inputs.
- Detect behavioural regressions in query heuristics, constraint solving, and workspace merging before they reach downstream users.
- Integrate the tooling into CI so proofs and regression checks run automatically for every change.
- Preserve a contributor-friendly workflow where verification steps are discoverable, well documented, and quick to reproduce locally.
Current Foundation
proofs/already contains Kani harnesses for query, value, and variable-set behaviour. They provide examples of bounded nondeterministic data generation (kani::any,Value::new) and assume/guarantee reasoning that new harnesses can reuse../scripts/preflight.shis the aggregation point for formatting and tests; adding verification steps here keeps contributor workflows consistent.
Invariant Catalogue
The roadmap anchors future work around the following invariants. Each row tracks the subsystem we care about, the guarantees we want to encode, and a rough sketch of how to exercise them in Kani, Miri, or fuzzing harnesses.
| Area | Key invariants | Candidate harness or check |
|---|---|---|
TribleSet (src/trible/tribleset.rs) | Union/intersection/difference maintain canonical ordering across all six PATCH indexes; iterators only yield deduplicated Tribles; insert never drops an ordering. | Extend the existing variableset harnesses with nondeterministic inserts, and add a dedicated tribleset_harness.rs validating round-trips across every ordering. |
PATCH & ByteTable (src/patch/*.rs) | Cuckoo displacement respects MAX_RETRIES without losing entries; Branch::modify_child grows tables when required and preserves leaf_count/segment_count; table_grow copies every occupant exactly once. | Introduce a patch_harness.rs that stress-tests plan_insert, table_insert, and Branch::grow, plus a micro-fuzzer that drives inserts/removals across random table sizes. |
Value schemas (src/value/*.rs) | Schema encoders respect declared byte widths; TryFromValue conversions and ValueSchema::validate reject truncated buffers; zero-copy views stay aligned. | Reuse value_harness.rs, adding per-schema helpers plus a Miri regression suite that loads slices at every alignment. |
Query engine (src/query/*.rs) | Constraint solver never aliases conflicting bindings; planner outputs cover all join permutations referenced by pattern!; influence tracking matches selected variables. | Expand proofs/query_harness.rs with minimal counterexamples, and fuzz constraint graphs via cargo fuzz. |
Repository & commits (src/repo/*.rs, proofs/commit_harness.rs) | Branch heads remain append-only; Workspace::pull never forgets reachable blobs; selector algebra matches Git semantics. | Add bounded commit DAG generators in commit_harness.rs plus deterministic simulation traces covering merges and garbage collection. |
Storage primitives (src/blob, src/repo, src/patch/leaf.rs) | Blob handles stay reference counted; pile headers remain within reserved capacity; byte slices from archives stay valid for the life of the store. | Combine Miri tests for aliasing with nightly fuzzers that replay repository sync transcripts. |
Expansion Plan
Phase 1 – Harden the Existing Kani Coverage
- Catalogue crate-level invariants and map them to concrete Kani harnesses.
Start with:
TribleSetoperations preserving canonical ordering and deduplication.- Join heuristics in
atreidesensuring variable bindings never alias conflicting values. - Repository merge logic maintaining append-only pile semantics.
- Extract shared helpers for generating bounded arbitrary data (e.g.
Vec::bounded_any) so harnesses remain expressive without exploding the search space. - Adopt a per-module harness layout (
proofs/<module>_harness.rs) registered fromproofs/mod.rsto make maintenance predictable. - Configure
scripts/verify.shto run targetedcargo kani --harness <name>invocations in parallel, then wire it into CI with caching to keep runtimes manageable.
Phase 2 – Symbolic Execution with Miri
- Enable
cargo miri testfor the default test suite to surface undefined behaviour (UB) and aliasing bugs that regular tests may miss. - Gate flaky or unsupported tests with
cfg(miri)guards so the suite stays deterministic under the interpreter. - Document the workflow in
scripts/preflight.shand optionally expose a dedicatedscripts/miri.shfor local runs when developers need deeper debugging.
Contributor Workflow
- Run
./scripts/preflight.shbefore every commit; it aggregates formatting, testing, and (eventually) targeted verification checks. - Use
cargo kani --harness <NAME>locally when iterating on a new proof. Start from the harness templates inproofs/so generators and assumptions stay consistent. - Execute
cargo miri testafter modifying unsafe code, pointer logic, or concurrency primitives; it catches UB bugs that normal tests cannot surface. - Kick off fuzz targets with
cargo fuzz run <TARGET>when touching boundary code (deserializers, planners, repository sync). Store new corpus inputs in version control if they expose bugs or tricky behaviours. - Record findings, gaps, and future work in
INVENTORY.mdso the roadmap evolves alongside the implementation effort.
Phase 3 – Fuzzing and Property Testing
- Introduce a
cargo fuzzworkspace targeting:- PATCH encoders/decoders with binary corpus seeds generated from integration tests.
- Join-order heuristics to explore combinations of constraint graphs and filter predicates.
- Repository sync workflows by fuzzing sequences of commits, pulls, and merges.
- Reuse structured generators from
proptestwhere deterministic shrinking is valuable, and bridge them into fuzz harnesses when possible to keep the state space constrained. - Automate nightly or on-demand fuzz campaigns via CI artifacts, storing any found counterexamples alongside minimised reproducers.
Phase 4 – Deterministic Simulation Testing
- Model repository replication scenarios with deterministic event queues to explore conflict resolution, garbage collection, and concurrent writers.
- Encode the simulations as regular unit tests backed by recorded execution traces so they can double as documentation for expected behaviour.
- Capture simulation scenarios discovered during fuzzing to prevent regressions.
Milestones & Reporting
- Track coverage for each invariant in a shared dashboard (CI summary or
INVENTORY.md) so contributors can quickly spot gaps. - Celebrate major wins—like a new harness landing or a bug found via verification—in the CHANGELOG to reinforce the value of the effort.
- Review and refresh this roadmap at least once per release cycle to keep the guidance aligned with the architecture.
Tooling Integration
- Track verification status in CI badges and documentation so contributors know which guarantees currently hold.
- Extend
INVENTORY.mdwith follow-up work items whenever new invariants or subsystems are identified. - Keep verification-specific configuration (Kani property files, fuzz corpora, deterministic seeds) under version control to make runs reproducible.
Next Steps
- Break the invariant catalogue into GitHub issues that reference the planned
harness files (
proofs/tribleset_harness.rs,proofs/patch_harness.rs, etc.). - Prototype the PATCH harness that drives
Branch::modify_childthrough insertion/growth cycles so we can assert the displacement planner andtable_grownever drop entries; wire the run intoscripts/verify.sh. - Evaluate CI capacity to determine how frequently Kani proofs,
cargo miri, and fuzzers can run without blocking contributors, documenting the cadence directly inINVENTORY.md.
This roadmap should evolve alongside the codebase—update it whenever new verification opportunities or obstacles appear.