Formal Verification Roadmap

This roadmap captures the initial strategy for driving the triblespace crates toward comprehensive formal verification. It unifies model checking, symbolic execution, fuzzing, and deterministic simulation so we can reason about both the low-level data structures and high-level repository workflows with stronger correctness guarantees.

Verification Stack Overview

  • Model checking with Kani explores bounded but exhaustive state spaces for invariants that must never be violated.
  • Symbolic execution with Miri exposes undefined behaviour (UB) and aliasing issues across regular unit tests without requiring new harnesses.
  • Coverage-guided fuzzing stresses APIs with randomized input sequences to uncover emergent behaviours that formal proofs might miss.
  • Deterministic simulations replay realistic repository workflows so we can audit higher-level semantics and regression-test subtle interplays between subsystems.

Each technique complements the others; together they provide layered assurance that keeps regressions from reaching downstream users.

Goals

  • Protect the fundamental algebraic properties of TribleSet, PATCH, and the repository commit graph.
  • Exercise serialization, deserialization, and zero-copy data views under adversarial inputs.
  • Detect behavioural regressions in query heuristics, constraint solving, and workspace merging before they reach downstream users.
  • Integrate the tooling into CI so proofs and regression checks run automatically for every change.
  • Preserve a contributor-friendly workflow where verification steps are discoverable, well documented, and quick to reproduce locally.

Current Foundation

  • proofs/ already contains Kani harnesses for query, value, and variable-set behaviour. They provide examples of bounded nondeterministic data generation (kani::any, Value::new) and assume/guarantee reasoning that new harnesses can reuse.
  • ./scripts/preflight.sh is the aggregation point for formatting and tests; adding verification steps here keeps contributor workflows consistent.

Invariant Catalogue

The roadmap anchors future work around the following invariants. Each row tracks the subsystem we care about, the guarantees we want to encode, and a rough sketch of how to exercise them in Kani, Miri, or fuzzing harnesses.

AreaKey invariantsCandidate harness or check
TribleSet (src/trible/tribleset.rs)Union/intersection/difference maintain canonical ordering across all six PATCH indexes; iterators only yield deduplicated Tribles; insert never drops an ordering.Extend the existing variableset harnesses with nondeterministic inserts, and add a dedicated tribleset_harness.rs validating round-trips across every ordering.
PATCH & ByteTable (src/patch/*.rs)Cuckoo displacement respects MAX_RETRIES without losing entries; Branch::modify_child grows tables when required and preserves leaf_count/segment_count; table_grow copies every occupant exactly once.Introduce a patch_harness.rs that stress-tests plan_insert, table_insert, and Branch::grow, plus a micro-fuzzer that drives inserts/removals across random table sizes.
Value schemas (src/value/*.rs)Schema encoders respect declared byte widths; TryFromValue conversions and ValueSchema::validate reject truncated buffers; zero-copy views stay aligned.Reuse value_harness.rs, adding per-schema helpers plus a Miri regression suite that loads slices at every alignment.
Query engine (src/query/*.rs)Constraint solver never aliases conflicting bindings; planner outputs cover all join permutations referenced by pattern!; influence tracking matches selected variables.Expand proofs/query_harness.rs with minimal counterexamples, and fuzz constraint graphs via cargo fuzz.
Repository & commits (src/repo/*.rs, proofs/commit_harness.rs)Branch heads remain append-only; Workspace::pull never forgets reachable blobs; selector algebra matches Git semantics.Add bounded commit DAG generators in commit_harness.rs plus deterministic simulation traces covering merges and garbage collection.
Storage primitives (src/blob, src/repo, src/patch/leaf.rs)Blob handles stay reference counted; pile headers remain within reserved capacity; byte slices from archives stay valid for the life of the store.Combine Miri tests for aliasing with nightly fuzzers that replay repository sync transcripts.

Expansion Plan

Phase 1 – Harden the Existing Kani Coverage

  1. Catalogue crate-level invariants and map them to concrete Kani harnesses. Start with:
    • TribleSet operations preserving canonical ordering and deduplication.
    • Join heuristics in atreides ensuring variable bindings never alias conflicting values.
    • Repository merge logic maintaining append-only pile semantics.
  2. Extract shared helpers for generating bounded arbitrary data (e.g. Vec::bounded_any) so harnesses remain expressive without exploding the search space.
  3. Adopt a per-module harness layout (proofs/<module>_harness.rs) registered from proofs/mod.rs to make maintenance predictable.
  4. Configure scripts/verify.sh to run targeted cargo kani --harness <name> invocations in parallel, then wire it into CI with caching to keep runtimes manageable.

Phase 2 – Symbolic Execution with Miri

  1. Enable cargo miri test for the default test suite to surface undefined behaviour (UB) and aliasing bugs that regular tests may miss.
  2. Gate flaky or unsupported tests with cfg(miri) guards so the suite stays deterministic under the interpreter.
  3. Document the workflow in scripts/preflight.sh and optionally expose a dedicated scripts/miri.sh for local runs when developers need deeper debugging.

Contributor Workflow

  1. Run ./scripts/preflight.sh before every commit; it aggregates formatting, testing, and (eventually) targeted verification checks.
  2. Use cargo kani --harness <NAME> locally when iterating on a new proof. Start from the harness templates in proofs/ so generators and assumptions stay consistent.
  3. Execute cargo miri test after modifying unsafe code, pointer logic, or concurrency primitives; it catches UB bugs that normal tests cannot surface.
  4. Kick off fuzz targets with cargo fuzz run <TARGET> when touching boundary code (deserializers, planners, repository sync). Store new corpus inputs in version control if they expose bugs or tricky behaviours.
  5. Record findings, gaps, and future work in INVENTORY.md so the roadmap evolves alongside the implementation effort.

Phase 3 – Fuzzing and Property Testing

  1. Introduce a cargo fuzz workspace targeting:
    • PATCH encoders/decoders with binary corpus seeds generated from integration tests.
    • Join-order heuristics to explore combinations of constraint graphs and filter predicates.
    • Repository sync workflows by fuzzing sequences of commits, pulls, and merges.
  2. Reuse structured generators from proptest where deterministic shrinking is valuable, and bridge them into fuzz harnesses when possible to keep the state space constrained.
  3. Automate nightly or on-demand fuzz campaigns via CI artifacts, storing any found counterexamples alongside minimised reproducers.

Phase 4 – Deterministic Simulation Testing

  1. Model repository replication scenarios with deterministic event queues to explore conflict resolution, garbage collection, and concurrent writers.
  2. Encode the simulations as regular unit tests backed by recorded execution traces so they can double as documentation for expected behaviour.
  3. Capture simulation scenarios discovered during fuzzing to prevent regressions.

Milestones & Reporting

  • Track coverage for each invariant in a shared dashboard (CI summary or INVENTORY.md) so contributors can quickly spot gaps.
  • Celebrate major wins—like a new harness landing or a bug found via verification—in the CHANGELOG to reinforce the value of the effort.
  • Review and refresh this roadmap at least once per release cycle to keep the guidance aligned with the architecture.

Tooling Integration

  • Track verification status in CI badges and documentation so contributors know which guarantees currently hold.
  • Extend INVENTORY.md with follow-up work items whenever new invariants or subsystems are identified.
  • Keep verification-specific configuration (Kani property files, fuzz corpora, deterministic seeds) under version control to make runs reproducible.

Next Steps

  1. Break the invariant catalogue into GitHub issues that reference the planned harness files (proofs/tribleset_harness.rs, proofs/patch_harness.rs, etc.).
  2. Prototype the PATCH harness that drives Branch::modify_child through insertion/growth cycles so we can assert the displacement planner and table_grow never drop entries; wire the run into scripts/verify.sh.
  3. Evaluate CI capacity to determine how frequently Kani proofs, cargo miri, and fuzzers can run without blocking contributors, documenting the cadence directly in INVENTORY.md.

This roadmap should evolve alongside the codebase—update it whenever new verification opportunities or obstacles appear.