Introduction

Welcome to the Tribles Book. This first chapter is your map to Trible Space—the problems it tackles, the core ideas behind the triblespace crate, and the kinds of projects it was built to support. By the end of these opening pages you should be able to recognize when Trible Space is the right tool, whether you are prototyping something new, extending an existing system, or adapting it for your own research purposes.

Why Trible Space exists

Trible Space exists because teams need to steward complex, diverse, and interconnected datasets without losing context. Research groups, startups, and digital libraries must pair large binary payloads with fine-grained facts, synchronize findings across laptops, servers, and mobile devices, and prove that derived results trace back to their inputs. Combining a conventional database, object store, and version-control workflow rarely delivers that outcome: blobs drift away from their provenance, merges become brittle, and auditing which observations justified a decision or model is tedious.

Trible Space closes those gaps with a single substrate that stores heavyweight assets and the relationships that explain them. Think of it as a library catalog for blobs and a lab notebook for the annotations, measurements, and discussions that give those blobs meaning. Because facts and payloads travel together, features like version control, verifiability, and provenance fall naturally out of the data model instead of bolting on as afterthoughts.

That same structure also lets you support offline edits, reconcile concurrent changes safely, and ship datasets to partners with the evidence needed to trust them.

To deliver those outcomes, Trible Space blends ideas from databases, version control, and content-addressed storage. Information is encoded as fixed-width tribles: 64-byte entity–attribute–value facts. Each trible stores two 16-byte extrinsic identifiers plus a 32-byte typed value. When a value exceeds the inline slot it becomes a schema-qualified hash pointing to an immutable content-addressed blob. The blob holds the heavyweight payload, while the trible remains a compact fact that fits neatly into indexes, caches, and query engines. Because both tribles and blobs are immutable, you can keep them in memory, on disk, or in remote object stores without transformation. Content hashes serve as identifiers, so every payload has a stable address and integrity is easy to verify when data is shared or synchronized.

This design unlocks capabilities that are difficult to achieve together in traditional stacks:

  • Trustworthy collaboration – hashes and immutable histories provide the audit trail needed to review changes, merge branches, and reproduce results across teams.
  • Content-addressed storage – values are stored and looked up by their contents, making caches and replicas safe even when they live on untrusted infrastructure.
  • Flexible querying – the query engine blends indexes on the fly, letting a single query range across trible sets, succinct indexes, and familiar Rust collections such as hash maps in one pass.

Taken together, these traits make it feasible to build systems with rich histories, reproducible computations, and verifiable data exchange while keeping the developer experience approachable.

Who this book is for

If you are new to Tribles, the opening chapters build vocabulary and provide a guided tour of the core data structures. Developers who already understand the problem space can skim ahead to detailed sections on schema design, query semantics, and the Atreides join algorithm. The book also points to the API documentation whenever you are ready to explore the crate directly.

How to read this book

The book is organized so you can either read it front-to-back or jump straight to the material that answers your questions. Each chapter layers new ideas onto the previous ones:

  1. Getting Started walks through installing the tooling, creating your first trible store, and issuing simple queries.
  2. Architecture and Query Engine explain how the runtime is structured so you can reason about performance and extensibility.
  3. Later sections explore schema design, incremental queries, repository workflows, and formal verification so you can grow from experiments to production systems.

Inline links lead to deeper resources and code samples. The Glossary offers quick refreshers on terminology, while Developing Locally covers how to set up a development environment and contribute back to the project. Whenever the book introduces a new concept, look for references to the crate documentation so you can inspect the corresponding APIs and examples.

By the end of this chapter you should have a mental model for why Trible Space is structured the way it is. From there, head to the chapters that match your goals—whether that means learning to query data effectively or integrating Tribles into a larger system.

Getting Started

This chapter walks you through creating a brand-new repository, committing your first entity, and understanding the pieces involved. It assumes you have Rust installed and are comfortable with running cargo commands from a terminal.

1. Add the dependencies

Create a new binary crate (for example with cargo new tribles-demo) and add the dependencies needed for the example. The triblespace crate provides the database, ed25519-dalek offers an implementation of the signing keys used for authentication, and rand supplies secure randomness.

cargo add triblespace ed25519-dalek rand

2. Build the example program

The walkthrough below mirrors the quick-start program featured in the README. It defines the attributes your application needs, stages and queries book data, publishes the first commit with automatic retries, and finally shows how to use try_push when you want to inspect and reconcile a conflict manually.

use triblespace::prelude::*;
use triblespace::prelude::blobschemas::LongString;
use triblespace::repo::{memoryrepo::MemoryRepo, Repository};
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;

mod literature {
    use triblespace::prelude::*;
    use triblespace::prelude::blobschemas::LongString;
    use triblespace::prelude::valueschemas::{Blake3, GenId, Handle, R256, ShortString};

    attributes! {
        /// The title of a work.
        ///
        /// Small doc paragraph used in the book examples.
        "A74AA63539354CDA47F387A4C3A8D54C" as pub title: ShortString;

        /// A quote from a work.
        // For quick prototypes you can omit the id and let the macro derive it
        // from the name and schema:
        pub quote: Handle<Blake3, LongString>;

        /// The author of a work.
        "8F180883F9FD5F787E9E0AF0DF5866B9" as pub author: GenId;

        /// The first name of an author.
        "0DBB530B37B966D137C50B943700EDB2" as pub firstname: ShortString;

        /// The last name of an author.
        "6BAA463FD4EAF45F6A103DB9433E4545" as pub lastname: ShortString;

        /// The number of pages in the work.
        "FCCE870BECA333D059D5CD68C43B98F0" as pub page_count: R256;

        /// A pen name or alternate spelling for an author.
        "D2D1B857AC92CEAA45C0737147CA417E" as pub alias: ShortString;
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Repositories manage shared history; MemoryRepo keeps everything in-memory
    // for quick experiments. Swap in a `Pile` when you need durable storage.
    let storage = MemoryRepo::default();
    let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng));
    let branch_id = repo
        .create_branch("main", None)
        .expect("create branch");
    let mut ws = repo.pull(*branch_id).expect("pull workspace");

    // Workspaces stage TribleSets before committing them. The entity! macro
    // returns sets that merge cheaply into our current working set.
    let author_id = ufoid();
    let mut library = TribleSet::new();

    library += entity! { &author_id @
        literature::firstname: "Frank",
        literature::lastname: "Herbert",
    };

    library += entity! { &author_id @
        literature::title: "Dune",
        literature::author: &author_id,
        literature::quote: ws.put::<LongString, _>(
            "Deep in the human unconscious is a pervasive need for a logical \
             universe that makes sense. But the real universe is always one \
             step beyond logic."
        ),
        literature::quote: ws.put::<LongString, _>(
            "I must not fear. Fear is the mind-killer. Fear is the little-death \
             that brings total obliteration. I will face my fear. I will permit \
             it to pass over me and through me. And when it has gone past I will \
             turn the inner eye to see its path. Where the fear has gone there \
             will be nothing. Only I will remain."
        ),
    };

    ws.commit(library, Some("import dune"));

    // `checkout(..)` returns the accumulated TribleSet for the branch.
    let catalog = ws.checkout(..)?;
    let title = "Dune";

    // Use `_?ident` when you need a fresh variable scoped to this macro call
    // without declaring it in the find! projection list.
    for (f, l, quote) in find!(
        (first: String, last: Value<_>, quote),
        pattern!(&catalog, [
            { _?author @
                literature::firstname: ?first,
                literature::lastname: ?last
            },
            {
                literature::title: title,
                literature::author: _?author,
                literature::quote: ?quote
            }
        ])
    ) {
        let quote: View<str> = ws.get(quote)?;
        let quote = quote.as_ref();
        println!("'{quote}'\n - from {title} by {f} {}.", l.from_value::<&str>());
    }

    // Use `push` when you want automatic retries that merge concurrent history
    // into the workspace before publishing.
    repo.push(&mut ws).expect("publish initial library");

    // Stage a non-monotonic update that we plan to reconcile manually.
    ws.commit(
        entity! { &author_id @ literature::firstname: "Francis" },
        Some("use pen name"),
    );

    // Simulate a collaborator racing us with a different update.
    let mut collaborator = repo
        .pull(*branch_id)
        .expect("pull collaborator workspace");
    collaborator.commit(
        entity! { &author_id @ literature::firstname: "Franklin" },
        Some("record legal first name"),
    );
    repo.push(&mut collaborator)
        .expect("publish collaborator history");

    // `try_push` returns a conflict workspace when the CAS fails, letting us
    // inspect divergent history and decide how to merge it.
    if let Some(mut conflict_ws) = repo
        .try_push(&mut ws)
        .expect("attempt manual conflict resolution")
    {
        let conflict_catalog = conflict_ws.checkout(..)?;

        for (first,) in find!(
            (first: Value<_>),
            pattern!(&conflict_catalog, [{
                literature::author: &author_id,
                literature::firstname: ?first
            }])
        ) {
            println!("Collaborator kept the name '{}'.", first.from_value::<&str>());
        }

        ws.merge(&mut conflict_ws)
            .expect("merge conflicting history");

        ws.commit(
            entity! { &author_id @ literature::alias: "Francis" },
            Some("keep pen-name as an alias"),
        );

        repo.push(&mut ws)
            .expect("publish merged aliases");
    }

    Ok(())
}

3. Run the program

Compile and execute the example with cargo run. On success it creates an example.pile file in the project directory and pushes a single entity to the main branch.

cargo run

Afterwards you can verify the file exists with ls example.pile. Delete it when you are done experimenting to avoid accidentally reading stale state.

Understanding the pieces

  • Branch setup. Repository::create_branch registers the branch and returns an ExclusiveId guard. Dereference the guard (or call ExclusiveId::release) to obtain the Id that Repository::pull expects when creating a Workspace.
  • Minting attributes. The attributes! macro names the fields that can be stored in the repository. Attribute identifiers are global—if two crates use the same identifier they will read each other's data—so give them meaningful project-specific names.
  • Committing data. The entity! macro builds a set of attribute/value assertions. When paired with the ws.commit call it records a transaction in the workspace that becomes visible to others once pushed.
  • Publishing changes. Repository::push merges any concurrent history into the workspace and retries automatically, making it ideal for monotonic updates where you are happy to accept the merged result.
  • Manual conflict resolution. Repository::try_push performs a single optimistic attempt and returns a conflict workspace when the compare-and-set fails. Inspect that workspace when you want to reason about the competing history—such as non-monotonic edits—before merging and retrying.
  • Closing repositories. When working with pile-backed repositories it is important to close them explicitly so buffered data is flushed and any errors are reported while you can still decide how to handle them. Calling repo.close()?; surfaces those errors; if the repository were only dropped, failures would have to be logged or panic instead. Alternatively, you can recover the underlying pile with Repository::into_storage and call Pile::close() yourself.

See the crate documentation for additional modules and examples.

Switching signing identities

The setup above generates a single signing key for brevity, but collaborating authors typically hold individual keys. Call Repository::set_signing_key before branching or pulling when you need a different default identity, or use Repository::create_branch_with_key and Repository::pull_with_key to choose a specific key per branch or workspace. The Managing signing identities section covers this workflow in more detail.

Developing Locally

Tribles is developed with stable Rust tooling and a small collection of helper scripts. This chapter walks through preparing a development environment, running tests, and rebuilding the documentation so you can iterate with confidence.

Prerequisites

  1. Install a recent Rust toolchain from rustup.rs. The default stable channel is what the project targets.

  2. Clone the repository and switch into it:

    git clone https://github.com/TribleSpace/triblespace-rs.git
    cd triblespace-rs
    
  3. (Optional) Install mdbook with cargo install mdbook if you would like to preview the rendered documentation locally.

Everyday workflows

The repository includes several scripts that keep formatting, tests, and book builds in sync:

  • ./scripts/devtest.sh executes the most relevant unit and integration tests for fast feedback while you are iterating.
  • ./scripts/preflight.sh runs formatting, the full test suite, and rebuilds this book. Run it before committing to ensure your branch is in good shape.
  • ./scripts/build_book.sh regenerates the documentation after you modify Markdown chapters or code snippets.

You can always fall back to the standard Cargo commands (cargo fmt, cargo test, etc.) if you prefer to run specific tools by hand.

Rebuilding the book

Once mdbook is installed you can rebuild the documentation with:

./scripts/build_book.sh

This script compiles the chapters into book/book, allowing you to open the HTML output in a browser. Rebuilding regularly helps catch stale examples and keeps the published copy aligned with the codebase.

Architecture Overview

Trible Space is designed to keep data management simple, safe and fast. The README introduces these goals in more detail, emphasizing a lean design with predictable performance and straightforward developer experience. This chapter explains how the pieces fit together and why they are organised this way.

Design Goals

A full discussion of the motivation behind Trible Space can be found in the Philosophy section. At a high level we want a self‑contained data store that offers:

  • Simplicity – minimal moving parts and predictable behaviour.
  • Developer Experience – a clear API that avoids complex servers or background processes.
  • Safety and Performance – sound data structures backed by efficient content addressed blobs.

These goals grew out of earlier "semantic" technologies that attempted to model knowledge as graphs. While systems like RDF promised great flexibility, in practice they often became difficult to host, query and synchronise. Trible Space keeps the idea of describing the world with simple statements but stores them in a form that is easy to exchange and reason about.

Architectural Layers

The system is organised into a small set of layers that compose cleanly:

  1. Data model – the immutable trible structures that encode facts.
  2. Stores – generic blob and branch storage traits that abstract over persistence backends.
  3. Repository – the coordination layer that combines stores into a versioned history.
  4. Workspaces – the in‑memory editing surface used by applications and tools.

Each layer has a tight, well defined boundary. Code that manipulates tribles never needs to know if bytes ultimately land on disk or in memory, and repository level operations never reach inside the data model. This separation keeps interfaces small, allows incremental optimisation and makes it easy to swap pieces during experimentation.

Data Model

The fundamental unit of information is a Trible. Its 64 byte layout is described in Trible Structure. A Trible links a subject entity to an attribute and value. Multiple tribles are stored in a TribleSet, which behaves like a hashmap with three columns — subject, attribute and value.

The 64 byte boundary allows tribles to live comfortably on cache lines and makes deduplication trivial. Because tribles are immutable, the runtime can copy, hash and serialise them without coordinating with other threads. Higher level features like schema checking and query planning are therefore free to assume that every fact they observe is stable for the lifetime of a query.

Trible Sets

TribleSets provide fast querying and cheap copy‑on‑write semantics. They can be merged, diffed and searched entirely in memory. When durability is needed the set is serialised into a blob and tracked by the repository layer.

To keep joins skew‑resistant, each set maintains all six orderings of entity, attribute and value. The trees reuse the same leaf nodes so a trible is stored only once, avoiding a naïve six‑fold memory cost while still letting the search loop pick the most selective permutation using the constraint heuristics.

Blob Storage

All persistent data lives in a BlobStore. Each blob is addressed by the hash of its contents, so identical data occupies space only once and readers can verify integrity by recomputing the hash. The trait exposes simple get and put operations, leaving caching and eviction strategies to the backend. Implementations decide where bytes reside: an in‑memory MemoryBlobStore, an on‑disk Pile described in Pile Format or a remote object store. Because handles are just 32‑byte hashes, repositories can copy or cache blobs without coordination. Trible sets, user blobs and commit records all share this mechanism.

Content addressing also means that blob stores can be layered. Applications commonly use a fast local cache backed by a slower durable store. Only the outermost layer needs to implement eviction; inner layers simply re-use the same hash keys, so cache misses fall through cleanly.

Branch Store

A BranchStore keeps track of the tips of each branch. Updates use a simple compare‑and‑set operation so concurrent writers detect conflicts. Both the in‑memory and pile repositories implement this trait.

Branch stores are intentionally dumb. They neither understand commits nor the shape of the working tree. Instead they focus on a single atomic pointer per branch. This reduces the surface area for race conditions and keeps multi‑writer deployments predictable even on eventually consistent filesystems.

Because only this single operation mutates repository state, nearly all other logic is value oriented and immutable. Conflicts surface only at the branch store update step, which simplifies concurrent use and reasoning about changes.

Repository

The Repository combines a blob store with a branch store. Commits store a trible set blob along with a parent link and signature. Because everything is content addressed, multiple repositories can share blobs or synchronize through a basic file copy.

Repository logic performs a few critical duties:

  • Validation – ensure referenced blobs exist and signatures line up with the claimed authorship.
  • Blob synchronization – upload staged data through the content-addressed blob store, which skips already-present bytes and reports integrity errors.
  • History traversal – provide iterators that let clients walk parent chains efficiently.

All of these operations rely only on hashes and immutable blobs, so repositories can be mirrored easily and verified offline.

Workspaces

A Workspace represents mutable state during editing. Checking out or branching yields a workspace backed by a fresh MemoryBlobStore. Commits are created locally and only become visible to others when pushed, as described in Repository Workflows.

Workspaces behave like sandboxes. They host application caches, pending trible sets and user blobs. Because they speak the same blob language as repositories, synchronisation is just a matter of copying hashes from the workspace store into the shared store once a commit is finalised.

Commits and History

TribleSets written to blobs form immutable commits. Each commit references its parent, creating an append‑only chain signed by the author. This is the durable history shared between repositories.

Because commits are immutable, rollback and branching are cheap. Diverging histories can coexist until a user merges them by applying query language operations over the underlying trible sets. The repository simply tracks which commit each branch tip points to.

Putting It Together

+-----------------------------------------------+
|                   Repository                  |
|   BlobStore (content addressed)               |
|   BranchStore (compare-and-set head)          |
+----------------------------+------------------+
           ^ push/try_push        pull
           |                         |
           |                         v
+----------------------------+------------------+
|                   Workspace                   |
|   base_blobs reader (view of repo blobs)      |
|   MemoryBlobStore (staged blobs)              |
|   current head (latest commit reference)      |
+----------------------------+------------------+
        ^             ^             |
        |             |         checkout
     commit       add_blob          |
        |             |             v
+----------------------------+------------------+
|                 Application                  |
+----------------------------------------------+

Repository::pull reads the branch metadata, loads the referenced commit, and couples that history with a fresh MemoryBlobStore staged area plus a reader for the repository's existing blobs.【F:src/repo.rs†L820-L848】 Workspace methods then stage edits locally: Workspace::put (the helper that adds blobs) writes application data into the in-memory store while Workspace::commit converts the new TribleSet into blobs and advances the current head pointer.【F:src/repo.rs†L1514-L1568】 Applications hydrate their views with Workspace::checkout, which gathers the selected commits and returns the assembled trible set to the caller.【F:src/repo.rs†L1681-L1697】 When the changes are ready to publish, Repository::try_push enumerates the staged blobs, uploads them into the repository blob store, creates updated branch metadata, and performs the compare-and-set branch update before clearing the staging area.【F:src/repo.rs†L881-L1014】 Because every blob is addressed by its hash, repositories can safely share data through any common storage without coordination.

The boundaries between layers encourage modular tooling. A CLI client can operate entirely within a workspace while a sync service automates pushes and pulls between repositories. As long as components honour the blob and branch store contracts they can evolve independently without risking the core guarantees of Trible Space.

Query Engine

Queries describe the patterns you want to retrieve. The engine favors extreme simplicity and aims for predictable latency and skew resistance without any tuning. Every constraint implements the Constraint trait so operators, sub-languages, and even alternative data sources can compose cleanly. Query evaluation is expressed as a negotiation between constraints, but the contract stays tiny: constraints report which variables they touch, estimate how many candidates remain for each variable, and can enumerate concrete values on demand. Those estimates feed directly into the depth-first search; there is no standalone planner to build or cache.

The constraint API mirrors the mental model you would use when reasoning about a query by hand. Constraints expose their VariableSet, provide estimate methods so the engine can choose the next variable to bind, and implement propose to extend partial assignments. Composite constraints, such as unions, may also override confirm to tighten previously gathered proposals. This cooperative protocol keeps the engine agnostic to where the data comes from—be it in-memory indices, remote stores, or bespoke application predicates.

Search Loop

The engine executes a query as a depth-first search over partial bindings. The loop integrates the cardinality heuristics directly instead of running a separate planning phase. Picking a different heuristic simply yields another member of the Atreides join family:

  1. Initialisation – When a Query is constructed it asks the root constraint for its variable set. The engine records each variable's influence so it knows which estimates to refresh when bindings change, computes an initial estimate for every variable, and sorts the yet-unbound list using those numbers.
  2. Propose (and confirm) – The engine pops the most selective variable from the sorted list and calls propose to collect candidate values that respect the current binding. Intersection constraints such as those built by and! reorder their children by increasing estimate and call confirm on the remaining branches so inconsistent candidates are filtered out before the engine commits to them. Constraints that observe the partial assignment simply avoid proposing or confirming values they already know will fail.
  3. Bind or Backtrack – If propose yielded values, the engine binds one, marks that variable as touched, and recurses. If no candidates remain it backtracks: the binding is undone, the variable returns to the unbound set, and the touched marker ensures dependent estimates refresh before the next attempt.
  4. Yield – Once every variable is bound the post-processing closure runs and the tuple is produced as a result row.

Between iterations the engine refreshes estimates for any variables influenced by the newly bound values. Because constraints only observe the bindings they explicitly depend on, unrelated sub-queries interleave naturally and new constraint types can participate without custom planner hooks.

Queries as Schemas

You might notice that trible.space does not define a global ontology or schema beyond associating attributes with a ValueSchema or BlobSchema. This is deliberate. The semantic web taught us that per-value typing, while desirable, was awkward in RDF: literal datatypes are optional, custom types need globally scoped IRIs and there is no enforcement, so most data degenerates into untyped strings. Trying to regain structure through global ontologies and class hierarchies made schemas rigid and reasoning computationally infeasible. Real-world data often arrives with missing, duplicate or additional fields, which clashes with these global, class-based constraints.

Our approach is to be sympathetic to edge cases and have the system deal only with the data it declares capable of handling. These application-specific schema declarations are exactly the shapes and constraints expressed by our queries1. Data not conforming to these queries is simply ignored by definition, as a query only returns data satisfying its constraints.2

Join Strategy

The query engine uses the Atreides family of worst-case optimal join algorithms. These algorithms leverage the same cardinality estimates surfaced through Constraint::estimate to guide the depth-first search over variable bindings, providing skew-resistant and predictable performance. Because the engine refreshes those estimates inside the search loop, the binding order adapts whenever a constraint updates its influence set—there is no separate planning artifact to maintain. For a detailed discussion, see the Atreides Join chapter.

Query Languages

Instead of a single query language, the engine exposes small composable constraints that combine with logical operators such as and and or. These constraints are simple yet flexible, enabling a wide variety of operators while still allowing the engine to explore the search space efficiently.

The query engine and data model are flexible enough to support many query styles, including graph, relational and document-oriented queries. Constraints may originate from the database itself (such as attribute lookups), from custom application logic, or from entirely external sources.

For example, the pattern! and entity! macros—available at the crate root and re-exported via triblespace::prelude (for instance with use triblespace::prelude::*;)—generate constraints for a given trible pattern in a query-by-example style reminiscent of SPARQL or GraphQL but tailored to a document-graph data model. It would also be possible to layer a property-graph language like Cypher or a relational language like Datalog on top of the engine.3

use std::collections::HashSet;

use tribles::examples::literature;
use tribles::prelude::*;
use tribles::prelude::valueschemas::ShortString;

fn main() {
    let mut kb = TribleSet::new();

    let author = ufoid();
    let book = ufoid();

    kb += entity! { &author @
        literature::firstname: "Frank",
        literature::lastname: "Herbert",
    };
    kb += entity! { &book @
        literature::author: &author,
        literature::title: "Dune",
    };

    let mut allowed = HashSet::<Value<ShortString>>::new();
    allowed.insert("Frank".to_value());

    let results: Vec<_> = find!((title: Value<_>, firstname: Value<_>),
        and!(
            allowed.has(firstname),
            pattern!(&kb, [{
                ?person @
                    literature::firstname: ?firstname,
                    literature::lastname: "Herbert",
            }, {
                literature::author: ?person,
                literature::title: ?title,
            }])
        )
    )
    .collect();

    assert_eq!(results.len(), 1);
    assert_eq!(results[0].0, "Dune".to_value());
}

The snippet above demonstrates how typed attribute constraints, user-defined predicates (the HashSet::has filter), and reusable namespaces can mix seamlessly within a single query.

Great care has been taken to ensure that query languages with different styles and semantics can coexist and even be mixed with other languages and data models within the same query. For practical examples of the current facilities, see the Query Language chapter.

1

Note that this query-schema isomorphism isn't necessarily true in all databases or query languages, e.g., it does not hold for SQL.

2

In RDF terminology: We challenge the classical A-Box & T-Box dichotomy by replacing the T-Box with a "Q-Box", which is descriptive and open rather than prescriptive and closed. This Q-Box naturally evolves with new and changing requirements, contexts and applications.

3

SQL would be a bit more challenging, as it is surprisingly imperative with its explicit JOINs and ORDER BYs, and its lack of a clear declarative semantics. This makes it harder to implement on top of a constraint-based query engine tailored towards a more declarative and functional style.

The Atreides Family of Worst-case Optimal Join Algorithms

The query engine reasons about data by solving a set of constraints over variables. Instead of constructing a traditional left-deep or bushy join plan, it performs a guided depth-first search that binds one variable at a time. The approach draws on the broader theory of worst-case optimal joins and lets us navigate the search space directly rather than materialising intermediate results.

Constraints as the search frontier

Every constraint implements the Constraint trait, which exposes four abilities that shape the search:

  1. estimate – predicts how many results remain for a variable under the current partial binding.
  2. propose – enumerates candidate values for a variable.
  3. confirm – filters a set of candidates without re-enumerating them.
  4. influence – reports which other variables need their estimates refreshed when this variable changes.

Traditional databases rely on a query planner to combine statistics into a join plan. Atreides instead consults the constraints directly while it searches. Each constraint can base its estimates on whatever structure it maintains—hash maps, precomputed counts, or even constant values for predicates that admit at most one match—so long as it can provide a quick upper bound. Whenever a binding changes, the engine asks the influenced constraints for fresh estimates. Those estimates are cached per variable and reused until another binding invalidates them, keeping the guidance loop responsive as the search progresses.

Because the heuristics are derived entirely from the constraints themselves, we do not need a separate query planner or multiple join implementations. Any custom constraint can participate in the same search by providing sensible estimates, proposal generation, confirmation, and influence tracking.

A spectrum of Atreides variants

The Atreides "family" refers to the spectrum of heuristics a constraint can use when implementing Constraint::estimate. Each variant exposes the same guided depth-first search, but with progressively tighter cardinality guidance. In practice they all revisit their estimates when bindings change; what differs is what quantity they approximate:

  • Row-count Join (Jessica) estimates the remaining search volume for the entire constraint. If one variable is bound but two others are not, Jessica multiplies the candidate counts for the unbound pair (|b| × |c|) and reports that larger product. The number can wildly overshoot the next variable's frontier, yet it often tracks the overall work the constraint will perform.
  • Distinct-value Join (Paul) narrows the focus to a single variable at a time. It returns the smallest proposal buffer the constraint could produce for any still-unbound variable, ignoring later confirmation filters. This is the behaviour exercised by Query::new today, which keeps the tightest candidate list on hand while the search walks forward.
  • Partial-binding Join (Ghanima) goes further by measuring the size of the actual proposal the composite constraint can deliver for the current binding and chosen variable. For an and constraint this corresponds to the intersection of its children after they have applied their own filtering, revealing how many candidates truly survive the local checks.
  • Exact-result Join (Leto) is an idealised limit where a constraint predicts how many of those proposed values extend all the way to full results once the remaining variables are also bound. Although no constraint currently achieves this omniscience, the interface supports it conceptually.

All four share the same implementation machinery; the difference lies in how aggressively estimate compresses the constraint's knowledge. Even when only partial information is available the search still functions, but better estimates steer the traversal directly toward the surviving tuples.

Every constraint can decide which rung of this ladder it occupies. Simple wrappers that only track total counts behave like Jessica, those that surface their tightest per-variable proposals behave like Paul, and structures capable of intersecting their children on the fly approach Ghanima's accuracy. The engine does not need to know which variant it is running—estimate supplies whatever fidelity the data structure can provide, and influence ensures that higher quality estimates refresh when relevant bindings change.

When a query starts, Query::new collects the initial estimates and influence sets, sorts the unbound variables so the tightest constraints are considered first, and caches per-variable proposal buffers that can be reused across backtracking steps. The engine then walks the search space as follows:

  1. Inspect the unbound variables.
  2. Refresh the cached estimates for any variables whose constraints were influenced by the latest binding.
  3. Pick the next variable to bind by sorting the unbound set on two criteria:
    • the base‑2 logarithm of the estimate (smaller estimates are tried first),
    • the number of other variables the constraints could influence (ties favour the most connected variable, which tends to prune the search faster).
  4. Ask the relevant constraints to propose candidates for that variable. Composite constraints enumerate the tightest member and call confirm on the rest so that each candidate is checked without materialising cross products.
  5. Push the candidates onto a stack and recurse until every variable is bound or the stack runs empty, in which case the engine backtracks.

Traditional databases rely on sorted indexes to make the above iteration tractable. Atreides still performs random lookups when confirming each candidate, but the cardinality hints let it enumerate the most selective constraint sequentially and probe only a handful of values in the wider ones. Because the search is depth-first, the memory footprint stays small and the engine can stream results as soon as they are found.

Consider a query that relates ?person to ?parent and ?city. The search begins with all three variables unbound. If ?city only has a handful of possibilities, its estimate will be the smallest, so the engine binds ?city first. Each city candidate is checked against the parent and person constraints before the search continues, quickly rejecting infeasible branches before the higher-cardinality relationships are explored.

Per-variable estimates in practice

Suppose we want to answer the following query:

(find [?person ?parent ?city]
  [?person :lives-in ?city]
  [?person :parent ?parent]
  [?parent :lives-in ?city])

There are three variables and three constraints. Every constraint can provide a cardinality hint for each variable it touches, and the combined query records the tightest estimate for each variable:

VariableContributing constraints (individual estimates)Stored estimate
?person?person :lives-in ?city (12), ?person :parent ?parent (40)12
?parent?person :parent ?parent (40), ?parent :lives-in ?city (6)6
?city?person :lives-in ?city (12), ?parent :lives-in ?city (6)6

The estimates are scoped to individual variables even when no single constraint covers the whole tuple. The engine chooses the variable with the tightest bound, ?parent, and asks the constraints that mention it for proposals. Each candidate parent immediately passes through the ?parent :lives-in ?city constraint, which usually narrows the possible cities to a handful. Those cities, in turn, constrain the possible ?person bindings. If a branch fails — for example because no child of the selected parent lives in the same city — the engine backtracks and tries the next parent. The smallest estimated constraints therefore guide the search towards promising combinations and keep the depth-first traversal from thrashing through unrelated values.

Implementation notes

  • During iteration the engine keeps a stack of bound variables, a Binding structure holding the active assignments, and a touched_variables set that marks which estimates need refreshing before the next decision point.
  • Highly skewed data still behaves predictably: even if one attribute dominates the dataset, the other constraints continue to bound the search space tightly and prevent runaway exploration.
  • The per-variable proposal buffers allow repeated proposals without reallocating, which is especially helpful when backtracking over large domains.

Why worst-case optimal?

Worst-case optimal join algorithms guarantee that their running time never exceeds the size of the output by more than a constant factor, even for pathological inputs. The Atreides family retains this property because the search always explores bindings in an order that honours the cardinality bounds supplied by the constraints. As a result the engine remains robust under heavy skew, sparse joins, and high-dimensional queries without requiring a bespoke join plan for each case.

This combination of simple heuristics, incremental estimates, and a disciplined search strategy keeps the implementation straightforward while delivering the performance characteristics we need for real-world workloads.

Query Language

This chapter introduces the core query facilities provided by triblespace. A query is described in a small declarative language that states which values should match instead of spelling out the iteration strategy. When you read a query, you are effectively looking at a logical statement about the data: if the constraints can all be satisfied, then the variable bindings are produced as results. The declarative style gives the engine freedom to reorder work and choose efficient execution strategies.

Every macro shown here is a convenience wrapper around a concrete Constraint implementation. When you need finer control—or want to assemble constraints manually outside the provided macros—reach for the corresponding builder types in tribles::query.

Declaring a query

The find! macro builds a Query by declaring variables and a constraint expression. The macro mirrors Datalog syntax: the head ((...)) lists the variables you want back, and the body describes the conditions they must meet. A minimal invocation looks like this:

#![allow(unused)]
fn main() {
let results = find!((a), a.is(1.into())).collect::<Vec<_>>();
}

find! returns an Iterator over tuples of the bound variables. Matches can be consumed lazily or collected into common collections:

#![allow(unused)]
fn main() {
for (a,) in find!((a), a.is(1.into())) {
    println!("match: {a}");
}
}

The head tuple supports pattern destructuring, so adding more variables is as simple as expanding the list: find!((a, b, c), ...) yields (a, b, c) tuples. Variables declared in the head can be reused multiple times inside the constraint to express joins. When a variable appears in several clauses the engine ensures every occurrence binds to the same value. Repeating a variable in two patterns, for example, restricts the result set to entities that satisfy both attribute assignments simultaneously. The order of declarations defines the shape of the tuple in the iterator, so reorganising the head changes how you destructure results.

Typed variables

Variables optionally include a concrete type to convert the underlying value. The constraint phase still works with untyped Value instances; conversion happens when the tuple is emitted. These conversions use FromValue and panic if decoding to the requested type fails. Bind the variable as a Value<_> and reach for TryFromValue yourself when you want to surface conversion errors instead of panicking.

#![allow(unused)]
fn main() {
find!((x: i32, y: Value<ShortString>),
      and!(x.is(1.into()), y.is("foo".to_value())))
}

The first variable is read as an i32 and the second as a short string if the conversion succeeds. The query engine walks all possible assignments that satisfy the constraint and yields tuples of the declared variables in the order they appear in the head.

Collecting results

Any type that implements FromIterator can collect the results of a query. Vec<_> is common for tests and examples, while HashSet<_> is useful when the match order is irrelevant. When you only need the first result, call iterator adapters such as next, find, or try_fold to avoid materializing the full result set.

Built-in constraints

find! queries combine a small set of constraint operators to form a declarative language for matching tribles. Each operator implements Constraint and can therefore be mixed and nested freely.

MacroPurposeNotes
and!Require every sub-constraint to holdBuilds an IntersectionConstraint.
or!Accept any satisfied alternativeProduces a UnionConstraint whose branches must reference the same variables.
ignore!Drop variables from a sub-queryRemoves the listed variables from planning so a constraint can contribute only along the remaining bindings.
temp!Mint hidden helper variablesAllocates fresh bindings for the nested expression so the helpers can join across patterns without being projected.
pattern!Match attribute assignments in a collectionExpands to a TriblePattern-backed constraint that relates attributes and values for the same entity.
pattern_changes!Track attribute updates incrementallyBuilds a TriblePattern constraint that yields newly added triples from a change set because incremental evaluation stays monotonic; see Incremental Queries for the broader evaluation workflow.
.is(...)Pin a variable to a constantWraps a ConstantConstraint that compares the binding against a literal value.
hasCheck membership in a collectionCollections such as HashSet expose .has(...) when they implement ContainsConstraint; triple stores like TribleSet instead participate through pattern!.

Any data structure that can iterate its contents, test membership, and report its size can implement ContainsConstraint. Membership constraints are particularly handy for single-column collections such as sets or map key views, while multi-position sources like TribleSet rely on pattern! to keep entity, attribute, and value bindings aligned.

Constant matches (is)

Call Variable::is when you need a binding to equal a specific value. The method returns a ConstantConstraint that checks whether the solver can assign the variable to the provided Value. Constant constraints behave like any other clause: combine them with and! to narrow a variable after other constraints have proposed candidates, or place them inside or! branches to accept multiple literals.

#![allow(unused)]
fn main() {
find!((title: Value<_>),
      and!(dataset.has(title), title.is("Dune".to_value())));
}

The snippet above keeps only the rows where title equals "Dune". Because is constrains the variable's value rather than projecting a new binding, it is also handy for helpers such as temp! when you want to filter hidden bindings without exposing them in the result tuple.

pattern! and pattern_changes! construct constant constraints for literal values automatically, so you often get the same behaviour simply by writing the desired value in the pattern:

#![allow(unused)]
fn main() {
find!((friend: Value<_>),
      pattern!(&dataset,
               [{ _?person @ social::friend: ?friend,
                  social::city: "Caladan" }]));
}

Repeating .is(...) on the same variable with different values causes the query to fail—just as conflicting pattern! clauses would—so prefer or! (or switch to a membership helper such as .has(...)) when you want to accept several constants.

Intersections (and!)

and! combines multiple constraints that must all hold simultaneously. Each sub-clause can introduce new bindings or further narrow existing ones, and the solver is free to reorder the work to reduce the search space. When a sub-constraint fails to produce a candidate that is compatible with the current bindings, the whole conjunction rejects that branch and moves on. The macro accepts any number of arguments, so and!(...) is often a convenient way to keep related clauses together without nesting additional find! calls:

#![allow(unused)]
fn main() {
let favourites = favourite_titles(); // e.g. a HashSet<Id> built elsewhere
find!((book: Value<_>, author: Value<_>),
      and!(favourites.has(book),
           pattern!(&dataset,
                    [{ ?book @ literature::title: "Dune",
                       literature::author: ?author }])));
}

Here the membership test over favourites and the attribute pattern from dataset run as part of the same conjunction. The solver joins them on their shared bindings (book and author) so only tuples that satisfy every clause make it into the result set. Because and! simply returns a constraint, you can nest it inside other combinators such as temp!, ignore!, or or! to structure queries however you like.

Alternatives (or!)

Use or! to express alternatives. Each branch behaves like an independent constraint and may introduce additional bindings that participate in the surrounding query, provided every branch mentions the same set of variables:

#![allow(unused)]
fn main() {
find!((alias: Value<_>),
      temp!((entity),
            or!(pattern!(&dataset,
                         [{ ?entity @ profile::nickname: ?alias }]),
                pattern!(&dataset,
                         [{ ?entity @ profile::display_name: ?alias }]))));
}

Each branch contributes every match it can produce given the current bindings. In the example above, people who have both a nickname and a display name yield two rows—one for each attribute—because the solver keeps the union of all solutions to preserve the query's monotonic semantics. Branches that cannot match simply contribute nothing. Because each branch is still a full constraint, combine or! with temp! when you need hidden helpers or wrap portions in ignore! to drop positions that do not matter for a particular alternative. If two branches reference different variables the macro panics at runtime, so keep the variable sets aligned even when some branches ignore portions of a constraint.

Ignoring bindings (ignore!)

Ignored variables are handy when a sub-expression references fields you want to drop. The IgnoreConstraint subtracts the listed bindings from the constraint's VariableSet, so the planner never attempts to join them with the outer query, project them into the results, or even solve for those positions. From the solver's perspective those slots vanish completely—it keeps evaluating the remaining bindings while treating the ignored ones as don't-care wildcards. Triple-based constraints, for example, always propose entity/attribute/value combinations; wrapping them in ignore!((value), ...) continues to constrain the entity and attribute while discarding the value column entirely. Clauses that reference at least one surviving variable still run and continue to narrow those bindings. If a branch mentions only ignored variables there is nothing left to relate to the outer query, so the planner has no variable along which to schedule it; the inner constraint is never consulted and the expression behaves as though it were omitted.

The identifiers you list inside ignore! expand to fresh bindings scoped to the nested expression, but subtracting them from the outer plan means the solver never unifies those bindings—or even asks the constraint to propose values for them. Even if you repeat the same name across multiple clauses, each occurrence behaves like an independent wildcard. Reach for temp! when you want a hidden variable to participate in the surrounding plan without being projected; reach for ignore! when you want to use a multi-column constraint while only keeping some of its positions.

This inertness is a strict scoping rule rather than existential quantification. If you need to assert that some related fact exists without reifying its value, structure the pattern so the surviving variables capture that dependency. The macro automatically uses the ambient context that find! or matches! provides, so typical invocations only specify the variable list and nested constraint:

#![allow(unused)]
fn main() {
find!((person: Value<_>),
      ignore!((street_value),
              pattern!(&dataset, [{ ?person @ contacts::street: ?street_value }])));
}

Here the pattern still constrains person because the triple ties the entity to an attribute, yet the actual street string is ignored. Had the block mentioned only street_value, the entire expression would have evaporated—there would be no remaining link to the rest of the query—so the outer query would not learn anything. Reusing street_value elsewhere in the ignored expression also does not force the sub-clauses to agree, because each occurrence is treated as its own wildcard; introduce a temp! binding when you need the same hidden value to appear in multiple places.

Temporary variables (temp!)

Real queries often need helper bindings that participate in the joins but do not show up in the result tuple. Wrap the relevant constraint with temp!((...vars...), expr) to mint hidden variables and evaluate expr with them in scope:

#![allow(unused)]
fn main() {
find!((person: Value<_>),
      temp!((friend),
            and!(pattern!(&dataset,
                          [{ _?p @ social::person: ?person, social::friend: ?friend }]),
                 pattern!(&dataset,
                          [{ ?friend @ social::city: "Caladan" }]))));
}

The helper binding friend links the two patterns, ensuring the same entity is used across both clauses without expanding the result tuple. temp! can create multiple variables at once (temp!((street, city), ...)). Like ignore!, you always wrap the hidden bindings in a tuple, so each invocation reads temp!((...vars...), ...). Here social would be a namespace module exporting the person, friend, and city attributes. The variables adopt the value schemas implied by the constraints they appear in, so no extra annotations are required. When working outside the query macros, call VariableContext::next_variable directly instead.

When the helper variable lives entirely within a single pattern, consider using _?alias instead of temp!. Both pattern! and pattern_changes! support _?ident placeholders that mint fresh bindings scoped to that one macro invocation. They behave like non-projected variables: you can reuse the _?ident multiple times inside the pattern to relate attributes, but the binding vanishes once control leaves the macro. Reach for temp! when the helper must span several constraints or when you need to reuse the same hidden binding across multiple patterns.

Example

#![allow(unused)]
fn main() {
use triblespace::prelude::*;
use triblespace::examples::{self, literature};

let dataset = examples::dataset();

for (title,) in find!((title: Value<_>),
                     and!(dataset.has(title), title.is("Dune".to_value()))) {
    println!("Found {}", title.from_value::<&str>());
}
}

This query searches the example dataset for the book titled "Dune". The variables and constraint can be adapted to express more complex joins and filters. For instance, you can introduce additional variables to retrieve both the title and the author while sharing the same dataset predicate:

#![allow(unused)]
fn main() {
for (title, author) in find!((title: Value<_>, author: Value<_>),
                             and!(title.is("Dune".to_value()),
                                  pattern!(&dataset,
                                           [{ _?book @ literature::title: ?title,
                                              literature::author: ?author }]))) {
    println!("{title} was written by {}", author.from_value::<&str>());
}
}

The extra variables participate in the join automatically; no explicit loop nesting or indexing is required.

Attribute patterns (pattern!)

The pattern! macro provides a concise way to match entities by attribute assignments. It expands to a constraint that can be used directly inside find!.

Important: in pattern! values prefixed with ? refer to variables declared in the surrounding find! head while string/number literals and more complex expressions are treated as literal values. Use _?name when you need a fresh variable that is scoped to a single macro invocation; you can reference it multiple times within the same pattern without adding it to the find! head. Parenthesised expressions remain supported for explicit literals.

#![allow(unused)]
fn main() {
let mut kb = TribleSet::new();
let e = ufoid();
kb += entity! { &e @ literature::firstname: "William", literature::lastname: "Shakespeare" };

let results: Vec<_> = find!((ee: Id), pattern!(&kb, [{ ?ee @ literature::firstname: "William" }])).collect();
assert_eq!(results.len(), 1);
}

Patterns may contain multiple clauses and reuse _? bindings to relate attributes without introducing extra columns in the result set. A single _?person variable can connect several attribute/value pairs while staying scoped to the pattern:

#![allow(unused)]
fn main() {
let mut kb = TribleSet::new();
let e = ufoid();
kb += entity! { &e @ literature::firstname: "Frank", literature::lastname: "Herbert" };

let author_last_names: Vec<_> = find!((last: Value<_>),
    pattern!(&kb, [{ _?person @ literature::firstname: "Frank", literature::lastname: ?last }])
).collect();
}

Here _?person remains scoped to the pattern while ensuring both attributes are drawn from the same entity. When a pattern references collections other than a TribleSet, ensure the collection implements TriblePattern so that the macro can materialize the requested triples.

To share a hidden binding across multiple patterns, declare it once with temp! and reference it with ?name from each clause:

#![allow(unused)]
fn main() {
let mut kb = TribleSet::new();
let alice = ufoid();
let bob = ufoid();
kb += entity! { &alice @ social::name: "Alice", social::friend: &bob };
kb += entity! { &bob @ social::name: "Bob" };

let results: Vec<_> = find!((friend_name: Value<_>),
    temp!((friend),
          and!(pattern!(&kb, [{ _?person @ social::friend: ?friend,
                                  social::name: ?friend_name }]),
               pattern!(&kb, [{ ?friend @ social::name: "Bob" }]))))
.collect();
}

The _?person variable is still local to the first pattern, while friend joins the two constraints without changing the projected results. As above, social denotes a namespace that defines the name and friend attributes.

matches!

Sometimes you only want to check whether a constraint has any solutions. The matches! macro mirrors the find! syntax but returns a boolean:

#![allow(unused)]
fn main() {
use triblespace::prelude::*;

assert!(matches!((x), x.is(1.into())));
assert!(!matches!((x), and!(x.is(1.into()), x.is(2.into()))));
}

Internally, matches! stops as soon as the first result is found. It is a lightweight alternative to find! when the mere existence of a match matters more than the actual bindings.

Custom constraints

Every building block implements the Constraint trait. You can implement this trait on your own types to integrate custom data sources or query operators with the solver. Collections that want to power pattern! implement TriblePattern so they can materialize the entity/attribute/value triples a pattern asks for. Membership-style helpers such as has(...) work with anything that implements ContainsConstraint, making it easy to join against pre-existing indexes, caches, or service clients without copying data into a TribleSet.

#![allow(unused)]
fn main() {
use std::collections::HashSet;

use triblespace::prelude::*;
use triblespace::prelude::valueschemas::ShortString;
use triblespace::query::hashsetconstraint::SetConstraint;

struct ExternalTags<'a> {
    tags: &'a HashSet<String>,
}

impl<'a> ContainsConstraint<'a, ShortString> for ExternalTags<'a> {
    type Constraint = SetConstraint<ShortString, &'a HashSet<String>, String>;

    fn has(self, variable: Variable<ShortString>) -> Self::Constraint {
        SetConstraint::new(variable, self.tags)
    }
}

let tags: HashSet<String> = ["rust", "datalog"].into_iter().map(String::from).collect();
let external = ExternalTags { tags: &tags };
let matches: Vec<_> =
    find!((tag: Value<ShortString>), external.has(tag)).collect();
}

The example wraps an external HashSet so it can be queried directly. A TriblePattern implementation follows the same shape: create a constraint type that reads from your backing store and return it from pattern. The query engine drives both traits through Constraint, so any data source that can estimate, propose, and confirm candidate values can participate in find!.

Regular path queries

The path! macro lets you search for graph paths matching a regular expression over edge attributes. It expands to a RegularPathConstraint and can be combined with other constraints. Invoke it through a namespace module (social::path!) to implicitly resolve attribute names:

#![allow(unused)]
fn main() {
use triblespace::prelude::*;

mod social {
  use triblespace::prelude::*;
  use triblespace::prelude::valueschemas::*;
  attributes! {
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" as follows: GenId;
    "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB" as likes: GenId;
  }
}
let mut kb = TribleSet::new();
let a = fucid(); let b = fucid(); let c = fucid();
kb += entity!{ &a @ social::follows: &b };
kb += entity!{ &b @ social::likes: &c };

let results: Vec<_> = find!((s: Value<_>, e: Value<_>),
    path!(&kb, s (social::follows | social::likes)+ e)).collect();
}

You can omit the hex literal in attributes! when you only need local or short‑lived attributes—the macro then derives a deterministic id from the name and schema just like Attribute::from_name. Stick with explicit ids when the attributes form part of a shared protocol.

The middle section uses a familiar regex syntax to describe allowed edge sequences. Editors with Rust macro expansion support provide highlighting and validation of the regular expression at compile time. Paths reference attributes from a single namespace; to traverse edges across multiple namespaces, create a new namespace that re-exports the desired attributes and invoke path! through it.

The endpoints of the path behave like ordinary variables. Bind them in the find! head to join the traversal with additional constraints—for example, restricting the starting entity or projecting the destination's attributes. If you want to follow the path but keep one endpoint unprojected, wrap the traversal in temp! so the hidden binding can participate in follow-up clauses:

#![allow(unused)]
fn main() {
let interesting_post = fucid();
let influencers = find!((start: Value<_>),
    temp!((end),
          and!(path!(&kb, start social::follows+ end),
               pattern!(&kb, [{ ?end @ social::likes: interesting_post.to_value() }]))))
    .collect::<Vec<_>>();
}

Combining path! with other constraints like this enables expressive graph queries while staying in the same declarative framework as the rest of the chapter.

Incremental Queries

The query engine normally evaluates a pattern against a complete TribleSet, recomputing every match from scratch. Applications that ingest data continuously often only need to know which results are introduced by new tribles. Tribles supports this with semi‑naive evaluation, a classic incremental query technique. Instead of running the whole query again, we focus solely on the parts of the query that can see the newly inserted facts and reuse the conclusions we already derived from the base dataset.

Delta evaluation

Given a base dataset and a set of newly inserted tribles, the engine runs the original query multiple times. Each run restricts a different triple constraint to the delta while the remaining constraints see the full set. The union of these runs yields exactly the new solutions. The process is:

  1. compute a delta TribleSet containing only the inserted tribles,
  2. for every triple in the query, evaluate a variant where that triple matches against delta,
  3. union all per‑triple results to obtain the incremental answers.

Because each variant touches only one triple from the delta, the work grows with the number of constraints and the size of the delta set rather than the size of the full dataset.

In practice the base dataset is often cached or already available to the application. Maintaining a delta set alongside it lets the engine quickly answer “what changed?” without re-deriving prior results. If the delta is empty the engine can skip evaluation entirely, making idle updates effectively free.

Monotonicity and CALM

Removed results are not tracked. Tribles follow the CALM principle: a program whose outputs are monotonic in its inputs needs no coordination. Updates simply add new facts and previously derived conclusions remain valid. When conflicting information arises, applications append fresh tribles describing their preferred view instead of retracting old ones. Stores may forget obsolete data, but semantically tribles are never deleted.

Exclusive IDs and absence checks

Exclusive identifiers tighten the blast radius of non-monotonic logic without abandoning CALM. Holding an ExclusiveId proves that no other writer can add tribles for that entity, so checking for the absence of a triple about that entity becomes stable: once you observe a missing attribute, no concurrent peer will later introduce it. This permits existence/absence queries in the narrow scope of entities you own while keeping global queries monotonic.

Even with that safety net, prefer monotonic reads and writes when possible because they compose cleanly across repositories. Absence checks should be reserved for workflows where the ExclusiveId guarantees a closed world for the entity—such as asserting a default value when none exists or verifying invariants before emitting additional facts. Outside that boundary, stick to append-only predicates so derived results remain valid as new data arrives from other collaborators.

Example

The pattern_changes! macro expresses these delta queries. It behaves like pattern! but takes the current TribleSet and a precomputed changeset. The macro unions variants of the query where each triple is constrained to that changeset, matching only the newly inserted tribles. It keeps the incremental flow compatible with the familiar find! interface, so callers can upgrade existing queries without changing how they collect results.

#![allow(unused)]
fn main() {
    let storage = MemoryRepo::default();
    let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng));
    let branch_id = repo.create_branch("main", None).expect("branch");
    let mut ws = repo.pull(*branch_id).expect("pull");

    // Commit initial data
    let shakespeare = ufoid();
    let hamlet = ufoid();
    let mut base = TribleSet::new();
    base += entity! { &shakespeare @ literature::firstname: "William", literature::lastname: "Shakespeare" };
    base += entity! { &hamlet @ literature::title: "Hamlet", literature::author: &shakespeare };
    ws.commit(base.clone(), None);
    let c1 = ws.head().unwrap();

    // Commit a new book
    let macbeth = ufoid();
    let mut change = TribleSet::new();
    change += entity! { &macbeth @ literature::title: "Macbeth", literature::author: &shakespeare };
    ws.commit(change.clone(), None);
    let c2 = ws.head().unwrap();

    // Compute updated state and delta between commits
    let base_state = ws.checkout(c1).expect("base");
    let updated = ws.checkout(c2).expect("updated");
    let delta = updated.difference(&base_state);

    // Find new titles by Shakespeare
    let results: Vec<_> = find!(
        (author: Value<_>, book: Value<_>, title: Value<_>),
        pattern_changes!(&updated, &delta, [
            { ?author @ literature::firstname: "William", literature::lastname: "Shakespeare" },
            { ?book @ literature::author: ?author, literature::title: ?title }
        ])
    )
    .map(|(_, b, t)| (b, t))
    .collect();

    println!("{results:?}");
}

The example stages an initial commit, records a follow-up commit, and then compares their checkouts. Feeding the latest checkout alongside the difference between both states allows pattern_changes! to report just the additional solutions contributed by the second commit. This mirrors how an application can react to a stream of incoming events: reuse the same query, but swap in a fresh delta set each time new data arrives.

Delta maintenance always comes down to set algebra. pattern_changes! cares only that you hand it a TribleSet containing the fresh facts, and every convenient workflow for producing that set leans on the same few operations:

  • take two snapshots and difference them to discover what was added;
  • union the new facts into whatever baseline you keep cached;
  • intersect a candidate subset when you need to focus a change further.

Workspaces showcase this directly. Each checkout materializes a TribleSet, so comparing two history points is just another snapshot diff: take the newer checkout, difference it against the older one to obtain the delta, and hand both sets to pattern_changes!. That matches the local buffering story as well. Keep a baseline TribleSet for the current state, accumulate incoming facts in a staging set, and union the staging set with the baseline to produce the updated snapshot you pass as the first argument. The delta comes from difference(&updated, &old) or from the staging set itself when you only stage fresh facts. Reusing the same set helpers keeps the bookkeeping short, avoids custom mirrors of the data, and stays efficient no matter where the updates originate.

Comparing history points

Workspace::checkout accepts commit selectors which can describe ranges in repository history. Checking out a range like a..b walks the history from b back toward a, unioning the contents of every commit that appears along the way but excluding commits already returned by the a selector. When commits contain only the tribles they introduce, that checkout matches exactly the fresh facts added after a. Feeding that delta into pattern_changes! lets us ask, “What new matches did commit b introduce over a?”

Trade‑offs

  • Applications must compute and supply the delta set; the engine does not track changes automatically.
  • Queries must remain monotonic since deletions are ignored.
  • Each triple incurs an extra variant, so highly selective constraints keep incremental evaluation efficient.
  • Delta sets that grow unboundedly lose their advantage. Regularly draining or compacting the changeset keeps semi-naive evaluation responsive.

Schemas

Trible Space stores data in strongly typed values and blobs. A schema describes the language‑agnostic byte layout for these types: [Value]s always occupy exactly 32 bytes while [Blob]s may be any length. Schemas translate those raw bytes to concrete application types and decouple persisted data from a particular implementation. This separation lets you refactor to new libraries or frameworks without rewriting what's already stored or coordinating live migrations. The crate ships with a collection of ready‑made schemas located in triblespace::core::value::schemas and triblespace::core::blob::schemas.

When data crosses the FFI boundary or is consumed by a different language, the schema is the contract both sides agree on. Consumers only need to understand the byte layout and identifier to read the data—they never have to link against your Rust types. Likewise, the Rust side can evolve its internal representations—add helper methods, change struct layouts, or introduce new types—without invalidating existing datasets.

Why 32 bytes?

Storing arbitrary Rust types requires a portable representation. Instead of human‑readable identifiers like RDF's URIs, Tribles uses a fixed 32‑byte array for all values. This size provides enough entropy to embed intrinsic identifiers—typically cryptographic hashes—when a value references data stored elsewhere in a blob. Keeping the width constant avoids platform‑specific encoding concerns and makes it easy to reason about memory usage.

Conversion traits

Schemas define how to convert between raw bytes and concrete Rust types. The conversion traits ToValue/FromValue and their fallible counterparts live on the schema types rather than on Value itself, avoiding orphan‑rule issues when supporting external data types. The Value wrapper treats its bytes as opaque; schemas may validate them or reject invalid patterns during conversion.

Fallible conversions (TryFromValue / TryToValue) are particularly useful for schemas that must validate invariants, such as checking that a timestamp falls within a permitted range or ensuring reserved bits are zeroed. Returning a domain‑specific error type keeps validation logic close to the serialization code.

#![allow(unused)]
fn main() {
use tribles::value::schemas::shortstring::ShortString;
use tribles::value::{TryFromValue, TryToValue, Value};

struct Username(String);

impl TryToValue<ShortString> for Username {
    type Error = &'static str;

    fn try_to_value(&self) -> Result<Value<ShortString>, Self::Error> {
        if self.0.is_empty() {
            Err("username must not be empty")
        } else {
            self.0
                .as_str()
                .try_to_value::<ShortString>()
                .map_err(|_| "username too long or contains NULs")
        }
    }
}

impl TryFromValue<'_, ShortString> for Username {
    type Error = &'static str;

    fn try_from_value(value: &Value<ShortString>) -> Result<Self, Self::Error> {
        String::try_from_value(value)
            .map(Username)
            .map_err(|_| "invalid utf-8 or too long")
    }
}
}

Schema identifiers

Every schema declares a unique 128‑bit identifier via the shared ConstMetadata::id() hook (for example, <ShortString as ConstMetadata>::id()). Persisting these IDs keeps serialized data self describing so other tooling can make sense of the payload without linking against your Rust types. Dynamic language bindings (like the Python crate) inspect the stored schema identifier to choose the correct decoder, while internal metadata stored inside Trible Space can use the same IDs to describe which schema governs a value, blob, or hash protocol.

Identifiers also make it possible to derive deterministic attribute IDs when you ingest external formats. Helpers such as Attribute::<S>::from_name("field") combine the schema ID with the source field name to create a stable attribute so re-importing the same data always targets the same column. The attributes! macro can use the same derivation when you omit the 128-bit id literal, which is useful for quick experiments or internal attributes; for schema that will be shared across binaries or languages prefer explicit ids so the column remains stable even if the attribute name later changes.

Built‑in value schemas

The crate provides the following value schemas out of the box:

  • GenId – an abstract 128 bit identifier.
  • ShortString – a UTF-8 string up to 32 bytes.
  • U256BE / U256LE – 256-bit unsigned integers.
  • I256BE / I256LE – 256-bit signed integers.
  • R256BE / R256LE – 256-bit rational numbers.
  • F256BE / F256LE – 256-bit floating point numbers.
  • Hash and Handle – cryptographic digests and blob handles (see hash.rs).
  • ED25519RComponent, ED25519SComponent and ED25519PublicKey – signature fields and keys.
  • NsTAIInterval to encode time intervals.
  • UnknownValue as a fallback when no specific schema is known.
#![allow(unused)]
fn main() {
use triblespace::core::metadata::ConstMetadata;
use triblespace::core::value::schemas::shortstring::ShortString;
use triblespace::core::value::{ToValue, ValueSchema};

let v = "hi".to_value::<ShortString>();
let raw_bytes = v.raw; // Persist alongside the schema's metadata id.
let schema_id = <ShortString as ConstMetadata>::id();
}

Built‑in blob schemas

The crate also ships with these blob schemas:

  • LongString for arbitrarily long UTF‑8 strings.
  • SimpleArchive which stores a raw sequence of tribles.
  • SuccinctArchiveBlob which stores the SuccinctArchive index type for offline queries. The SuccinctArchive helper exposes high-level iterators while the SuccinctArchiveBlob schema is responsible for the serialized byte layout.
  • UnknownBlob for data of unknown type.
#![allow(unused)]
fn main() {
use triblespace::metadata::ConstMetadata;
use triblespace::blob::schemas::longstring::LongString;
use triblespace::blob::{Blob, BlobSchema, ToBlob};

let b: Blob<LongString> = "example".to_blob();
let schema_id = <LongString as ConstMetadata>::id();
}

Defining new schemas

Custom formats implement [ValueSchema] or [BlobSchema]. A unique identifier serves as the schema ID. The example below defines a little-endian u64 value schema and a simple blob schema for arbitrary bytes.

#![allow(unused)]

fn main() {
pub struct U64LE;

impl ConstMetadata for U64LE {
    fn id() -> Id {
        id_hex!("0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A")
    }

    fn describe() -> (
        triblespace::core::trible::TribleSet,
        triblespace::core::blob::MemoryBlobStore<triblespace::core::value::schemas::hash::Blake3>,
    ) {
        (
            triblespace::core::trible::TribleSet::new(),
            triblespace::core::blob::MemoryBlobStore::new(),
        )
    }
}

impl ValueSchema for U64LE {
    type ValidationError = Infallible;
}

impl ToValue<U64LE> for u64 {
    fn to_value(self) -> Value<U64LE> {
        let mut raw = [0u8; VALUE_LEN];
        raw[..8].copy_from_slice(&self.to_le_bytes());
        Value::new(raw)
    }
}

impl FromValue<'_, U64LE> for u64 {
    fn from_value(v: &Value<U64LE>) -> Self {
        u64::from_le_bytes(v.raw[..8].try_into().unwrap())
    }
}

pub struct BytesBlob;

impl ConstMetadata for BytesBlob {
    fn id() -> Id {
        id_hex!("B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0")
    }
}

impl BlobSchema for BytesBlob {}

impl ToBlob<BytesBlob> for Bytes {
    fn to_blob(self) -> Blob<BytesBlob> {
        Blob::new(self)
    }
}

impl TryFromBlob<BytesBlob> for Bytes {
    type Error = Infallible;
    fn try_from_blob(b: Blob<BytesBlob>) -> Result<Self, Self::Error> {
        Ok(b.bytes)
    }
}

}

See examples/custom_schema.rs for the full source.

Versioning and evolution

Schemas form part of your persistence contract. When evolving them consider the following guidelines:

  1. Prefer additive changes. Introduce a new schema identifier when breaking compatibility. Consumers can continue to read the legacy data while new writers use the replacement ID.
  2. Annotate data with migration paths. Store both the schema ID and a logical version number if the consumer needs to know which rules to apply. UnknownValue/UnknownBlob allow you to safely defer decoding until a newer binary is available.
  3. Keep validation centralized. Place invariants in your schema conversions so migrations cannot accidentally create invalid values.

By keeping schema identifiers alongside stored values and blobs you can roll out new representations incrementally: ship readers that understand both IDs, update your import pipelines, and finally switch writers once everything recognizes the replacement schema.

Portability & Common Formats

Tribles aims to be a durable, language-agnostic database. That means the bytes we store must remain meaningful when they cross process, platform, or version boundaries. Rust's native representations are not stable enough for that job, so we model portable schemas that describe how a value should be encoded into a 32-byte buffer.

Why 32 Bytes?

The 32-byte window is a deliberate compromise. It is small enough to move quickly through memory and across networks, yet it offers enough entropy to hold intrinsic identifiers such as hashes. When a payload cannot fit within those 32 bytes we store the larger content in a blob and reference it from the value via its identifier. The uniform size also keeps the storage layer simple: a Value is always the same size regardless of its schema.

Schemas as Contracts

A schema describes which bit patterns are meaningful for a particular value. The Value type is parameterised by such a schema and remains agnostic about whether the underlying bytes currently satisfy the contract. Validation lives in the schema through the ValueSchema trait, while conversions to concrete Rust types use ToValue, FromValue, TryToValue, and TryFromValue. Because conversion traits are implemented for the schema instead of the Value type itself, we avoid Rust's orphan rule and allow downstream crates to add their own adapters.

Schemas carry two optional identifiers:

  • Value schema ID – Uniquely distinguishes the schema that governs the 32-byte value buffer.
  • Blob schema ID – Identifies the schema of any external blob a value may reference.

These identifiers let us document schemas inside the knowledge graph itself. They also provide stable lookup keys for registries or tooling that need to understand how bytes should be interpreted.

Working Safely with Values

Serialisation is effectively a controlled form of transmutation: we reinterpret bytes through the lens of a schema. Crossing from one schema to another will not cause undefined behaviour, but it may produce nonsensical data if the target schema expects a different layout. Validation routines and conversion helpers exist to guard against that outcome.

When you define a new schema:

  1. Create a zero-sized marker type and implement ValueSchema for it.
  2. Add conversions between the schema and your Rust types via the conversion traits mentioned above.
  3. Validate inputs when only some bit patterns are acceptable.

With those pieces in place, values can round-trip between storage and strongly typed Rust code while remaining portable and future-proof.

Importing Other Data Formats

Import pipelines let you bring external datasets into a tribles repository without hand-writing schemas or entity identifiers every time. This chapter introduces the import namespace, explains how the JSON importers map foreign fields onto attributes, and outlines how you can extend the same patterns to new formats.

Import Namespace Overview

The triblespace_core::import module collects conversion helpers that translate structured documents into raw tribles. Today the namespace ships with two JSON importers:

  • JsonImporter generates fresh entity identifiers for every object it visits.
  • DeterministicJsonImporter derives entity identifiers by hashing the encoded attribute/value pairs so the same input always reproduces the same entities.

Both variants accept encoder callbacks for JSON primitives. Those closures turn strings, numbers, and booleans into Value instances and can allocate blobs or perform validation before handing the data back to the importer. The valueschemas::Boolean helper stores false as all-zero bytes and true as all ones so JSON flags round-trip without ambiguity when you wire the boolean encoder to it.

Both importers accumulate statements internally. After feeding one or more JSON documents through import_value or import_str, call data() to inspect the tribles that were emitted and metadata() to retrieve attribute descriptors. When you want to start a fresh batch without recomputing attribute hashes, clear_data() drops the accumulated statements while leaving the attribute caches intact. Call clear() to reset both the staged data and the attribute caches when you need a completely fresh run.

Mapping JSON Fields to Attributes

Attributes are derived through Attribute::from_name, which hashes the JSON field name together with the ValueSchema selected for that primitive. The importers cache the resulting RawIds per field and schema so the hash only has to be computed once per run. Arrays are treated as multi-valued fields: every item is encoded and stored under the same attribute identifier, producing one trible per element.

After an import completes the importers regenerate metadata from their cached attribute maps. The metadata() accessor returns tribles that link each derived attribute id to its field name, value schema, and optional blob schema. Merge those descriptors into your repository alongside the imported data when you want queries to discover the original JSON field names or project datasets by schema without repeating the derivation logic.

Nested objects recurse automatically. The parent receives a GenId attribute that points at the child entity, allowing the importer to represent the entire object graph as a connected set of tribles. Because those GenId attributes are also derived from the parent field names they remain stable even when you import related documents in separate batches.

Managing Entity Identifiers

JsonImporter::new defaults to ufoid() for identifier generation, but the constructor is parameterized so you can inject your own policy—for example, a fucid() generator or any other closure that returns an ExclusiveId. The custom generator is applied consistently to every object the importer touches, including nested documents.

DeterministicJsonImporter takes a different approach. It buffers the encoded attribute/value pairs for each object, sorts them, and feeds the resulting byte stream into a user-supplied hash protocol. The first 16 bytes of that digest become the entity identifier, ensuring identical JSON inputs produce identical IDs even across separate runs. Once the identifier is established, the importer writes the cached pairs into its internal trible set via Trible::new, so subsequent calls to data() expose the deterministic statements alongside the metadata generated for every derived attribute.

This hashing step also changes how repeated structures behave. When a JSON document contains identical nested objects—common in fixtures such as citm_catalog or Twitter exports—the deterministic importer emits the same identifier for each recurrence. Only the first copy reaches the underlying TribleSet; later occurrences are recognised as duplicates and skipped during the merge. The nondeterministic importer must still mint a fresh identifier for every repetition, so it inserts and deduplicates a full set of triples each time. Even if the ID generator itself is fast, that extra merge work makes the deterministic importer benchmark faster on datasets with significant repetition.

Working with Encoder Callbacks

Encoder callbacks receive borrowed references to the raw JSON values. Because the closures are generic over a lifetime you can capture external resources—like a blob store connection or a schema registry—without allocating reference-counted wrappers. Callers can stage binary payloads in whichever blob backend they prefer and return handles that will be persisted alongside the tribles.

The callbacks report failures through EncodeError. You can construct an error with a simple message or wrap an existing error type. The importer surfaces the field name alongside the original error so schema mismatches remain easy to diagnose while keeping the hot path lightweight.

Extending the Importers

To support a new external format, implement a module in the import namespace that follows the same pattern: decode the source data, derive attributes with Attribute::from_name, and hand encoded values to Trible::new. Reuse the lifetime-parameterized encoder callbacks so callers can plug in existing blob stores or validation logic. If the format supplies stable identifiers, offer a constructor that accepts a custom generator or hash protocol so downstream systems can choose between ephemeral and deterministic imports.

Descriptive Structural Typing and the find! Idiom

This chapter documents the mental model and idioms we recommend when working with tribles. The model is intentionally descriptive: queries declare the shape of the data you want to see rather than prescribing a single concrete Rust type for every entity. This gives you the flexibility to keep the full graph around and to materialize only the view you need, when you need it.

Reading the chapter sequentially should equip you to:

  1. talk about entities in terms of the fields they expose rather than the structs you wish they were,
  2. design APIs that carry just enough information (a workspace, a checkout, an id) to ask for more data later, and
  3. know when it is worth materializing a typed projection and when the descriptive view is already the best representation.

Key ideas at a glance

  • Attributes are typed fields (unlike untyped RDF predicates).
  • An entity in tribles is a structural record: a bag of typed fields.
  • attributes! accepts explicit ids for shared columns and can derive deterministic ids from a name + schema when you omit the hex literal; use the derived form for quick, local attributes and reserve explicit ids for published protocols.
  • find! patterns are descriptive type checks / projections: they select entities that match a requested shape.
  • entity! constructs ad‑hoc entities (like struct literals).
  • Reified kinds/tags are attached via metadata::tag (GenId); projects often export canonical KIND_* constants you can pattern-match directly against.
  • Prefer passing the Workspace + the checkout result (TribleSet) and an entity id around — only materialize a concrete Rust view when required.
  • Strongly prefer operating on the tuples returned by find!; wrapper structs should exist only as grudging adapters for APIs that demand them.

Why "descriptive" not "prescriptive"?

In a prescriptive system you define a named struct (type), commit to it, and force conversions at boundaries. Tribles instead let you describe the fields you need at call sites. That keeps code resilient to schema evolution and avoids unnecessary unfolding of the graph.

In linguistic terms: instead of insisting every entity be declared as CategoryX, you ask "show me entities that have fields A and B" and work with those. If an entity also has field C that's fine — it simply matches the descriptive pattern.

Type theory mapping (short)

  • Structural typing: types are shapes of fields, not names.
  • Width subtyping: records with more fields subsume records with fewer.
  • Intersection types: requiring both patterns A and B is like A & B.
  • Row polymorphism: patterns naturally allow additional (unspecified) fields to exist.

1. Use Workspace as your core I/O handle

The Workspace is the primary object for interacting with a repository. It lets you open a branch, commit, push, checkout history, and — importantly — read blob handles (LongString) cheaply.

Pattern: open a workspace for the configured branch, checkout the HEAD ancestors to produce a TribleSet content snapshot for efficient read-only pattern matching, and use the same Workspace to lazily read blobs when you need them.

This avoids duplicating memory and allows cheap zero-copy access to LongString blobs.

Manager-owned repository and workspace DI

At runtime, prefer to give a long-lived manager (session, exporter, service) ownership of a Repository<Pile<Blake3>>. Downstream code can depend on that manager in one of two shapes:

  1. Accept a &mut Repository<_> and open/pull the workspace you need inside the function. This works well for tasks that need to coordinate multiple checkouts or want to control the retry loop themselves, and the mutable borrow is typically short-lived: you only need it while creating or pushing workspaces.
  2. Ask the manager to mint a &mut Workspace<_> for the duration of a task (e.g. an update, render, or event-handling callback) and pass that mutable reference down. The manager remains responsible for merging or dropping the workspace when the task completes.

Both approaches avoid constructing piles or repositories ad-hoc and keep lifecycle management centralized. Importantly, they let multiple tasks hold distinct mutable workspaces over the same repository while only requiring a single mutable borrow of the repository when you create or push those workspaces. Library functions should therefore accept a &mut Repository<_> or a &mut Workspace<_> (optionally paired with a TribleSet checkout for read-only helpers) rather than opening new repositories inside hot paths.

Example (pseudocode):

#![allow(unused)]
fn main() {
// manager owns a Repository for the process/session lifetime
let mut repo = manager.repo_mut();
let branch_id = manager.default_branch_id;

// option 1: task pulls its own workspace
let mut ws = repo.pull(branch_id)?;
let content = ws.checkout(ws.head())?;

// option 2: manager provides a workspace to a task callback
manager.with_workspace(branch_id, |ws| {
    let snapshot = ws.checkout(ws.head())?;
    render(snapshot);
    Ok(())
})?;
}

This pattern keeps startup/teardown centralized and eliminates the common hot-loop anti-pattern of repeatedly opening ephemeral repository instances.

2. Use find! as a descriptive type / projection

find! is not just a query language; it is the place where you declare the shape of the data you expect. Treat find! patterns as lightweight, inline type declarations. If an entity doesn't match, find! won't return it — no error, just absence.

When your project defines canonical tag ids (GenId constants) prefer to match the tag directly in the pattern rather than binding a short-string and filtering afterwards. Pattern clauses can be composed: you can match on tags and required attributes, and even join related entities in a single find! invocation.

Example: find plan snapshot ids (match tag directly)

#![allow(unused)]
fn main() {
// Match entities that have the canonical plan snapshot tag attached.
for (e,) in find!((e: Id), triblespace::pattern!(&content, [{ ?e @ metadata::tag: (KIND_PLAN_SNAPSHOT) }])) {
    // `e` is a plan snapshot entity id; follow-up finds can read other fields
}
}

Worked example: composing a structural pattern

#![allow(unused)]
fn main() {
// Grab all active plans with a title and owner.
for (plan_id, title, owner) in find!(
    (plan_id: Id, title: ShortString, owner: Id),
    tribles::pattern!(&content, [
        { ?plan_id @ metadata::tag: (KIND_PLAN) },
        { ?plan_id plan::status: (plan::STATUS_ACTIVE) },
        { ?plan_id plan::title: ?title },
        { ?plan_id plan::owner: ?owner },
    ])
) {
    // `title` is already typed as ShortString.
    // `owner` can drive a follow-up find! to pull account info as needed.
}
}

find! returns tuples of the values you requested in the head of the query. Nothing more, nothing less. The example above reads as "give me the Id bound to plan_id, the short string bound to title, and another Id bound to owner for every entity matching these clauses." Because the matching is descriptive, adding a new attribute such as plan::color does not invalidate the call site — you only see the data you asked for.

3. Lazy, ad‑hoc conversions only where needed

If a function needs a few fields for an operation, ask for them with find! inside the function. If later you perform an operation that needs different fields, you can perform another small find! there. Don't materialize large subgraphs unless a single operation needs them.

The recommended function signature is minimal and focused on the tribles primitives. Conversions into bespoke structs are almost always a smell; they obscure which fields are actually used and quickly devolve into an unofficial schema. Treat any adapter as an opt-in shim that exists purely at integration boundaries where consumers refuse to speak tribles primitives.

#![allow(unused)]
fn main() {
fn handle_plan_update(ws: &mut Workspace<Pile<Blake3>>, plan_id: Id) -> io::Result<()> {
    // ad-hoc find! calls to read the fields we need
    let checkout = ws.checkout()?;

    if let Some((title,)) =
        find!((title: ShortString), tribles::pattern!(&checkout, [{ ?plan_id plan::title: ?title }]))
            .next()
    {
        // The returned tuple is already typed. Convert to an owned String only if
        // external APIs demand ownership.
        process_title(title.from_value::<&str>());
    }

    Ok(())
}
}

If you cannot avoid exposing a typed facade (for example because an external API insists on receiving a struct), keep the struct tiny, document that it is a legacy shim, and derive it straight from a find! tuple:

#![allow(unused)]
fn main() {
struct PlanSummary<'a> {
    id: Id,
    title: &'a str,
}

fn load_plan_summary<'a>(ws: &'a mut Workspace<Pile<Blake3>>, plan_id: Id) -> io::Result<Option<PlanSummary<'a>>> {
    let content = ws.checkout()?;
    Ok(find!(
        (title: ShortString),
        tribles::pattern!(&content, [{ ?plan_id plan::title: ?title }])
    )
    .next()
    .map(|(title,)| PlanSummary {
        id: plan_id,
        title: title.from_value::<&str>(),
    }))
}
}

The struct above is merely a view with borrowed references; it is not a blessed schema. Resist the temptation to evolve it into a "real" model type — extend the find! pattern first and only mirror those changes here when an external interface forces your hand.

4. Read LongString as &str (zero-copy)

Blob schema types in tribles are intentionally zerocopy. Prefer the typed View API which returns a borrowed &str without copying when possible.

#![allow(unused)]
fn main() {
let view = ws
    .get::<View<str>, LongString>(handle)
    .map_err(|e| ...)?; // `handle` is a Value<Handle<Blake3, LongString>>
let s: &str = view.as_ref(); // zero-copy view tied to the workspace lifetime
// If you need an owned String: let owned = view.to_string();
}

Note: a View borrows data that is managed by the Workspace; avoid returning &str that outlives the workspace or the View.

When you do need ownership, convert at the edge of the system. Internal helpers should continue to pass views or typed handles around so that the call site that triggers a blob fetch is easy to spot. Cloning the in-memory blob after it has been pulled is cheap; the expensive part is fetching it from storage (potentially a remote) in the first place.

5. Structural sharing and normalization patterns

When persisting graphs that contain many repeated or immutable pieces (e.g. steps in a plan), prefer structural sharing:

  • Store canonical step entities (LongString blobs for their text).
  • Create a lightweight "link" entity per plan that references the step ids and metadata like order and status.

On update, create new step entities only for truly new step text and add a new snapshot entity that references the steps. This keeps history immutable and easy to reason about.

6. Push/merge retry loop for writers

When pushing writes, use the standard push/merge loop to handle concurrent writers. Two options are available:

  • Manual conflict handling with try_push (single attempt; returns a conflicting workspace on CAS failure):
#![allow(unused)]
fn main() {
ws.commit(content, Some("plan-update"));
let mut current_ws = ws;
while let Some(mut incoming) = repo.try_push(&mut current_ws)? {
    incoming.merge(&mut current_ws)?;
    current_ws = incoming;
}
}
  • Automatic retries with push (convenience wrapper that merges and retries until success or error):
#![allow(unused)]
fn main() {
ws.commit(content, Some("plan-update"));
// `push` will handle merge+retry internally; it returns Ok(()) on success
// or an error if the operation ultimately failed.
repo.push(&mut ws)?;
}

Practical anti‑patterns

  • Do not unfold the graph or convert it into nested Rust structs. It wastes CPU and memory and loses the benefits of tribles’ flexible reifications.
  • Avoid holding repo locks across async/await points. Acquire workspaces, do the minimal synchronous I/O you need, then release locks before awaiting.
  • Don’t assume presence of a field; be explicit about optional vs required semantics using Option / Result in typed adapters.
  • Don't create ephemeral Repository instances every time you need to read or write data. Instead, own a long-lived Repository in a manager and expose workspace-opening helpers.
  • Don't explicitly convert Values via from_value/to_value; use typed find! patterns find!((field: <field_type>), ...) and ws.get::<View<...>, _>(handle) for blob reads.
  • Don't create helper functions for queries, use find! patterns directly in the function that needs them. This keeps the shape of the data explicit and close to where it is used and avoids unnecessary unfolding of the graph.
  • Don't convert the structures returned by find! into other Rust structs. Work with the returned tuples directly.
  • find! should be thought of like a fundamental language primitive, like if/match/for. It is not a "database query" that returns rows to be converted into structs, it is a way to describe the shape of the data you want to see. ORM-like abstractions that try to map find! results into structs are an anti-pattern.
  • Avoid reading blobs eagerly; prefer lazy reads via ws.get::<View<...>, _>(handle). Allocate owned data only when necessary.

Glossary

  • Workspace: the repo handle that opens branches, reads blobs, commits and pushes.
  • TribleSet: the in-memory content snapshot returned by Workspace::checkout.
  • find!: the macro you use to discover entities matching a pattern (a descriptive type declaration).
  • entity!: construct an ad‑hoc entity into a TribleSet for commit.
  • LongString: zero-copy blob schema for potentially-large text.

Closing notes

This chapter captures the pragmatic type story we use in tribles: describe the fields you need at the place you need them, keep the full graph, and materialize small views lazily.

Reviewers' checklist (quick)

  • Prefer find!/pattern! projections for the fields needed by the function.
  • Avoid converting graph into rust structs.
  • Use ws.get::<View<...>, _>(handle) for zero-copy blob reads; allocate only when an owned value is required.
  • Match canonical tag ids via metadata::tag (KIND_* constants).
  • Manager-owned repo: long-lived Repository instances should be owned by a session/exporter/manager; library code should accept a Workspace or TribleSet rather than opening piles itself.
  • Use push/merge retry loops for writers; avoid holding repo locks across async/await points.

The sections below contain copy‑pasteable recipes for common operations.

Idioms & code recipes

This section contains pragmatic, copy‑pasteable snippets and patterns you can reuse. The examples intentionally use the tribles macros (attributes!, find!, pattern!, entity!) directly — that is the intended style.

Reviewer checklist

When reviewing code that touches tribles, look for these items:

  • Does the code use find! to select only the fields it needs, rather than unfolding the entire graph?
  • Are blob reads kept lazy (only read LongString when necessary)?
  • Are push flows using the push/merge retry loop to avoid losing concurrent updates?
  • Is the code avoiding holding the repo's Mutex across awaits and long blocking operations?

Further reading and references

  • See the tribles macros: attributes!, find!, pattern!, entity! in the tribles code for exact usage.
  • Type theory: "row polymorphism", "structural typing", "width subtyping" if you want the formal background.

The Type Algebra of TribleSpace

Queries as Types, Data as Proofs

TribleSpace grew out of a pragmatic goal: keep data declarative, composable, and statically checkable in Rust. Along the way we discovered that the core operations already form a type algebra. The macros used to define attributes, entities, and queries map directly onto familiar type-theoretic constructions, yielding an isomorphism between relations over triples and types over records.

Attributes Introduce Atomic Types

Each attribute introduced with attributes! defines an atomic type — a unary relation between an entity identifier and the attribute’s value:

"A74AA..." as pub title : ShortString

Formally this is a function title : Id → ValueTitle, or, in relational terms, the set { (id, value) }. In the codebase the macro emits either a pub const (when you supply the 128-bit id) or a static LazyLock (when you omit it) of type Attribute<S>, so the generated binding already carries the ValueSchema that governs the value column. The derived form hashes the attribute name together with the schema metadata via Attribute::from_name, which is convenient for quick experiments; shared protocols should still pin explicit ids so collaborators and other languages read the same column. These atomic pieces are the building blocks for everything else.

Entities as Intersection Types

An entity! expression forms a record type by intersecting atomic ones. Semantically it is the meet () of its constituent relations:

Entity{A, B}  ≡  { A : ValueA } ∧ { B : ValueB }

At runtime entity! expands to a small TribleSet containing those facts; the procedural macro literally emits a fresh set, populates it with Trible::new calls, and returns the set by value. At the type level it represents their conjunction. Records are therefore intersection types: every additional field refines the shape without invalidating existing data.

Merge Drives Dataset Composition

The += operator delegates to TribleSet::union, exposed through the AddAssign implementation. union performs set union on the six internal indexes that back a TribleSet. When the entity identifiers are disjoint the effect is classic dataset union; when they coincide we get field conjunction—record extension. This dual role is what keeps TribleSpace’s algebra compact.

One Operator, Two Readings

Everything that looks like “add another field” or “add another record” derives from the single law

union(facts₁, facts₂) = facts₁ ∪ facts₂.

The surrounding context determines how to read the result:

ContextEffectAlgebraic Face
same entity IDextend record∧ (field conjunction)
different IDsadd record∨ (set union)

The behaviour follows from the idempotent, commutative nature of TribleSet::union; no special cases are required in the implementation.

Patterns as Row-Type Predicates

A find!/pattern! pair behaves like a row-type predicate:

pattern!([A, B])  ≈  ∀r. {A, B} ⊆ r.

Read this as “find every entity whose row type r is a supertype of this shape.” The macro expands to an IntersectionConstraint built from TriblePattern constraints, so queries literally evaluate the conjunction of the row predicates.

Patterns compose intersectionally as well: combining two patterns is equivalent to requiring both row predicates simultaneously, mirroring intersection types at the query level.

The Lattice Perspective

TribleSets form a join-semilattice under union:

union(a, b) = a ∪ b
a ≤ b  ⇔  a ⊆ b.

Projection, join, and filtering are lattice homomorphisms: they preserve joins and the partial order. Because the same algebra handles both data and metadata, the implementation remains uniform and performant—merging facts is always the same low-level operation.

Practical Payoffs

Seeing the primitives through a type-theoretic lens clarifies several ergonomic choices:

  • Queries as proofs. Successful find! rows witness that an entity inhabits the requested type; absence is simply failure to prove the predicate.
  • Descriptive schemas. Structural typing drops the need for globally declared records—the type system is implicit in the patterns you write.
  • Composable extensions. Adding new attributes is monotonic: existing entities continue to satisfy prior predicates, and refined queries simply intersect more row constraints.

Summary Table

LevelConceptOperation
Attributeatomic typeunary relation
Entityconjunctionrecord formation
Datasetunionset composition
Querysub-row predicatetype constraint
Implementationlattice union

With a single associative, commutative, idempotent union, we obtain both row extension and dataset union, and a unified logical framework that bridges data engineering with type theory. That economy of primitives allows the system to feel simple on the surface yet provide rich type theory underneath.

Repository Workflows

Working with a Tribles repository feels familiar to Git users, but the types make data ownership and lifecycle explicit. Keep the following vocabulary in mind when exploring the API:

  • Repository – top-level object that tracks history through BlobStore and BranchStore implementations.
  • Workspace – mutable view of a branch, similar to Git's working directory and index combined. Workspaces buffer commits and custom blobs until you push them back to the repository.
  • BlobStore – storage backend for commits and payload blobs.
  • BranchStore – records branch metadata and head pointers.

Both stores can be in memory, on disk or backed by a remote service. The examples in examples/repo.rs and examples/workspace.rs showcase these APIs and are a great place to start if you are comfortable with Git but new to Tribles.

Opening a repository

Repositories are constructed from any storage that implements the appropriate traits. The choice largely depends on your deployment scenario:

  1. Pick or compose a storage backend (see Storage Backends and Composition).
  2. Create a signing key for the identity that will author commits.
  3. Call Repository::new(storage, signing_key) to obtain a handle.

Most applications perform the above steps once during start-up and then reuse the resulting Repository. If initialization may fail (for example when opening an on-disk pile), bubble the error to the caller so the process can retry or surface a helpful message to operators.

Storage Backends and Composition

Repository accepts any storage that implements both the BlobStore and BranchStore traits, so you can combine backends to fit your deployment. The crate ships with a few ready-made options:

  • MemoryRepo stores everything in memory and is ideal for tests or short-lived tooling where persistence is optional.
  • Pile persists blobs and branch metadata in a single append-only file. It is the default choice for durable local repositories and integrates with the pile tooling described in Pile Format.
  • ObjectStoreRemote connects to object_store endpoints (S3, local filesystems, etc.). It keeps all repository data in the remote service and is useful when you want a shared blob store without running a dedicated server.
  • HybridStore lets you split responsibilities, e.g. storing blobs on disk while keeping branch heads in memory or another backend. Any combination that satisfies the trait bounds works.

Backends that need explicit shutdown can implement StorageClose. When the repository type exposes that trait bound you can call repo.close()? to flush and release resources instead of relying on Drop to run at an unknown time. This is especially handy for automation where the process may terminate soon after completing a task.

use triblespace::core::repo::hybridstore::HybridStore;
use triblespace::core::repo::memoryrepo::MemoryRepo;
use triblespace::core::repo::objectstore::ObjectStoreRemote;
use triblespace::core::repo::Repository;
use triblespace::core::value::schemas::hash::Blake3;
use url::Url;

let blob_remote: ObjectStoreRemote<Blake3> =
    ObjectStoreRemote::with_url(&Url::parse("s3://bucket/prefix")?)?;
let branch_store = MemoryRepo::default();
let storage = HybridStore::new(blob_remote, branch_store);
let mut repo = Repository::new(storage, signing_key);

// Work with repo as usual …
// repo.close()?; // if the underlying storage supports StorageClose

Branching

A branch records a line of history and carries the metadata that identifies who controls updates to that history. Creating one writes initial metadata to the underlying store and returns an ExclusiveId guarding the branch head. Dereference that ID when you need a plain Id for queries or workspace operations.

Typical steps for working on a branch look like:

  1. Create a repository backed by blob and branch stores via Repository::new.
  2. Initialize or look up a branch ID with helpers like Repository::create_branch. When interacting with an existing branch call Repository::pull directly.
  3. Commit changes in the workspace using Workspace::commit.
  4. Push the workspace with Repository::push (or handle conflicts manually via Repository::try_push) to publish those commits.

The example below demonstrates bootstrapping a new branch and opening multiple workspaces on it. Each workspace holds its own staging area, so remember to push before sharing work or starting another task.

#![allow(unused)]
fn main() {
let mut repo = Repository::new(pile, SigningKey::generate(&mut OsRng));
let branch_id = repo.create_branch("main", None).expect("create branch");

let mut ws = repo.pull(*branch_id).expect("pull branch");
let mut ws2 = repo.pull(ws.branch_id()).expect("open branch");
}

After committing changes you can push the workspace back. push will retry on contention and attempt to merge, while try_push performs a single attempt and returns Ok(Some(conflict_ws)) when the branch head moved. Choose the latter when you need explicit conflict handling:

#![allow(unused)]
fn main() {
ws.commit(change, Some("initial commit"));
repo.push(&mut ws)?;
}

Managing signing identities

The key passed to Repository::new becomes the default signing identity for branch metadata and commits. Collaborative projects often need to switch between multiple authors or assign a dedicated key to automation. You can adjust the active identity in three ways:

  • Repository::set_signing_key replaces the repository's default key. Subsequent calls to helpers such as Repository::create_branch or Repository::pull use the new key for any commits created from those workspaces.
  • Repository::create_branch_with_key signs a branch's metadata with an explicit key, allowing each branch to advertise the author responsible for updating it.
  • Repository::pull_with_key opens a workspace that will sign its future commits with the provided key, regardless of the repository default.

The snippet below demonstrates giving an automation bot its own identity while letting a human collaborator keep theirs:

use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::repo::Repository;

let alice = SigningKey::generate(&mut OsRng);
let automation = SigningKey::generate(&mut OsRng);

// Assume `pile` was opened earlier, e.g. via `Pile::open` as shown in previous sections.
let mut repo = Repository::new(pile, alice.clone());

// Create a dedicated branch for the automation pipeline using its key.
let automation_branch = repo
    .create_branch_with_key("automation", None, automation.clone())?
    .release();

// Point automation jobs at their dedicated identity by default.
repo.set_signing_key(automation.clone());
let mut bot_ws = repo.pull(automation_branch)?;

// Humans can opt into their own signing identity even while automation remains
// the repository default.
let mut human_ws = repo.pull_with_key(automation_branch, alice.clone())?;

human_ws and bot_ws now operate on the same branch but will sign their commits with different keys. This pattern is useful when rotating credentials or running scheduled jobs under a service identity while preserving authorship in the history. You can swap identities at any time; existing workspaces keep the key they were created with until you explicitly call Workspace::set_signing_key.

Inspecting History

You can explore previous commits using Workspace::checkout which returns a TribleSet with the union of the specified commit contents. Passing a single commit returns just that commit. To include its history you can use the ancestors helper. Commit ranges are supported for convenience. The expression a..b yields every commit reachable from b that is not reachable from a, treating missing endpoints as empty (..b) or the current HEAD (a.. and ..). These selectors compose with filters, so you can slice history to only the entities you care about.

#![allow(unused)]
fn main() {
let history = ws.checkout(commit_a..commit_b)?;
let full = ws.checkout(ancestors(commit_b))?;
}

The history_of helper builds on the filter selector to retrieve only the commits affecting a specific entity. Commit selectors are covered in more detail in the next chapter:

#![allow(unused)]
fn main() {
let entity_changes = ws.checkout(history_of(my_entity))?;
}

Working with Custom Blobs

Workspaces keep a private blob store that mirrors the repository's backing store. This makes it easy to stage large payloads alongside the trible sets you plan to commit. The Workspace::put helper stores any type implementing ToBlob and returns a typed handle you can embed like any other value. Handles are Copy, so you can commit them and reuse them to fetch the blob later.

The example below stages a quote and an archived TribleSet, commits both, then retrieves them again with strongly typed and raw views. In practice you might use this pattern to attach schema migrations, binary artifacts, or other payloads that should travel with the commit:

#![allow(unused)]
fn main() {
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::blob::Blob;
use triblespace::examples::{self, literature};
use triblespace::prelude::*;
use triblespace::repo::{self, memoryrepo::MemoryRepo, Repository};
use blobschemas::{LongString, SimpleArchive};

let storage = MemoryRepo::default();
let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng));
let branch_id = repo.create_branch("main", None).expect("create branch");
let mut ws = repo.pull(*branch_id).expect("pull branch");

// Stage rich payloads before creating a commit.
let quote_handle = ws.put("Fear is the mind-killer".to_owned());
let archive_handle = ws.put(&examples::dataset());

// Embed the handles inside the change set that will be committed.
let mut change = triblespace::entity! {
    literature::title: "Dune (annotated)",
    literature::quote: quote_handle.clone(),
};
change += triblespace::entity! { repo::content: archive_handle.clone() };

ws.commit(change, Some("Attach annotated dataset"));
// Single-attempt push. Use `push` to let the repository merge and retry automatically.
repo.try_push(&mut ws).expect("try_push");

// Fetch the staged blobs back with the desired representation.
let restored_quote: String = ws
    .get(quote_handle)
    .expect("load quote");
let restored_set: TribleSet = ws
    .get(archive_handle)
    .expect("load dataset");
let archive_bytes: Blob<SimpleArchive> = ws
    .get(archive_handle)
    .expect("load raw blob");
std::fs::write("dataset.car", archive_bytes.bytes.as_ref()).expect("persist archive");
}

Rust infers the blob schema for both put and get from the handles and the assignment context, so the calls stay concise without explicit turbofish annotations.

Blobs staged this way stay local to the workspace until you push the commit. Workspace::get searches the workspace-local store first and falls back to the repository if necessary, so the handles remain valid after you publish the commit. This round trip lets you persist logs, archives, or other auxiliary files next to your structured data without inventing a separate storage channel.

Merging and Conflict Handling

When pushing a workspace another client might have already updated the branch. There are two ways to handle this:

  • Repository::try_push — a single-attempt push that uploads local blobs and attempts a CAS update once. If the branch advanced concurrently it returns Ok(Some(conflict_ws)) so callers can merge and retry explicitly:
#![allow(unused)]
fn main() {
ws.commit(content, Some("codex-turn"));
let mut current_ws = ws;
while let Some(mut incoming) = repo.try_push(&mut current_ws)? {
    // Merge the local staged changes into the incoming workspace and retry.
    incoming.merge(&mut current_ws)?;
    current_ws = incoming;
}
}
  • Repository::push — a convenience wrapper that performs the merge-and-retry loop for you. Call this when you prefer the repository to handle conflicts automatically; it either succeeds (returns Ok(())) or returns an error.
#![allow(unused)]
fn main() {
ws.commit(content, Some("codex-turn"));
repo.push(&mut ws)?; // will internally merge and retry until success
}

Troubleshooting: Workspace::merge succeeds only when both workspaces share a blob store. Merging a workspace pulled from a different pile or remote returns MergeError::DifferentRepos. Decide which repository will own the combined history, transfer the other branch's reachable blobs into it with repo::transfer(reachable(...)), create a branch for that imported head, and merge locally once both workspaces target the same store.

After a successful push the branch may have advanced further than the head supplied, because the repository refreshes its view after releasing the lock. An error indicating a corrupted pile does not necessarily mean the push failed; the update might have been written before the corruption occurred.

This snippet is taken from examples/workspace.rs. The examples/repo.rs example demonstrates the same pattern with two separate workspaces. The returned Workspace already contains the remote commits, so after merging your changes you push that new workspace to continue.

Typical CLI Usage

There is a small command line front-end in the trible repository. It exposes push and merge operations over simple commands and follows the same API presented in the examples. The tool is currently experimental and may lag behind the library, but it demonstrates how repository operations map onto a CLI.

Diagram

A simplified view of the push/merge cycle:


        ┌───────────┐         pull          ┌───────────┐
        | local ws  |◀───────────────────── |   repo    |
        └─────┬─────┘                       └───────────┘
              │
              │ commit
              │                                                                      
              ▼                                   
        ┌───────────┐         push          ┌───────────┐
        │  local ws │ ─────────────────────▶│   repo    │
        └─────┬─────┘                       └─────┬─────┘
              │                                   │
              │ merge                             │ conflict?
              └──────▶┌─────────────┐◀────────────┘
                      │ conflict ws │       
                      └───────┬─────┘
                              │             ┌───────────┐
                              └────────────▶|   repo    │
                                     push   └───────────┘
   

Each push either succeeds or returns a workspace containing the other changes. Merging incorporates your commits and the process repeats until no conflicts remain.

Troubleshooting push, branch, and pull failures

Repository::push, Repository::create_branch, and Repository::pull surface errors from the underlying blob and branch stores. These APIs intentionally do not hide storage issues, because diagnosing an I/O failure or a corrupt commit usually requires operator intervention. The table below lists the error variants along with common causes and remediation steps.

APIError variantLikely causes and guidance
Repository::pushPushError::StorageBranchesEnumerating branch metadata in the backing store failed. Check connectivity and credentials for the branch store (for example, the object-store bucket, filesystem directory, or HTTP endpoint).
Repository::pushPushError::StorageReaderCreating a blob reader failed before any transfer started. The blob store may be offline, misconfigured, or returning permission errors.
Repository::pushPushError::StorageGetFetching existing commit metadata failed. The underlying store returned an error or the metadata blob could not be decoded, which often signals corruption or truncated uploads. Inspect the referenced blob in the store to confirm it exists and is readable.
Repository::pushPushError::StoragePutUploading new content or metadata blobs failed. Look for transient network failures, insufficient space, or rejected writes in the blob store logs. Retrying after fixing the storage issue will re-send the missing blobs.
Repository::pushPushError::BranchUpdateUpdating the branch head failed. Many backends implement optimistic compare-and-swap semantics; stale heads or concurrent writers therefore surface here as update errors. Refresh the workspace and retry after resolving any store-side errors.
Repository::pushPushError::BadBranchMetadataThe branch metadata could not be parsed. Inspect the stored metadata blobs for corruption or manual edits and repair them before retrying the push.
Branch creation APIsBranchError::StorageReaderCreating a blob reader failed. Treat this like PushError::StorageReader: verify the blob store connectivity and credentials.
Branch creation APIsBranchError::StorageGetReading branch metadata during initialization failed. Check for corrupted metadata blobs or connectivity problems.
Branch creation APIsBranchError::StoragePutPersisting branch metadata failed. Inspect store logs for rejected writes or quota issues.
Branch creation APIsBranchError::BranchHeadRetrieving the current head of the branch failed. This usually points to an unavailable branch store or inconsistent metadata.
Branch creation APIsBranchError::BranchUpdateUpdating the branch entry failed. Resolve branch-store errors and ensure no other writers are racing the update before retrying.
Branch creation APIsBranchError::AlreadyExistsA branch with the requested name already exists. Choose a different name or delete the existing branch before recreating it.
Branch creation APIsBranchError::BranchNotFoundThe specified base branch does not exist. Verify the branch identifier and that the base branch has not been deleted.
Repository::pullPullError::BranchNotFoundThe branch is missing from the repository. Check the branch name/ID and confirm that it has not been removed.
Repository::pullPullError::BranchStorageAccessing the branch store failed. This mirrors BranchError::BranchHead and usually indicates an unavailable or misconfigured backend.
Repository::pullPullError::BlobReaderCreating a blob reader failed before commits could be fetched. Ensure the blob store is reachable and that the credentials grant read access.
Repository::pullPullError::BlobStorageReading commit or metadata blobs failed. Investigate missing objects, network failures, or permission problems in the blob store.
Repository::pullPullError::BadBranchMetadataThe branch metadata is malformed. Inspect and repair the stored metadata before retrying the pull.

Remote Stores

Remote deployments use the ObjectStoreRemote backend to speak to any service supported by the object_store crate (S3, Google Cloud Storage, Azure Blob Storage, HTTP-backed stores, the local filesystem, and the in-memory memory:/// adapter). ObjectStoreRemote implements both BlobStore and BranchStore, so the rest of the repository API continues to work unchanged – the only difference is the URL you pass to with_url.

use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::prelude::*;
use triblespace::core::repo::objectstore::ObjectStoreRemote;
use triblespace::core::repo::Repository;
use triblespace::core::value::schemas::hash::Blake3;
use url::Url;

fn open_remote_repo(raw_url: &str) -> anyhow::Result<()> {
    let url = Url::parse(raw_url)?;
    let storage = ObjectStoreRemote::<Blake3>::with_url(&url)?;
    let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng));

    let branch_id = repo.create_branch("main", None)?;
    let mut ws = repo.pull(*branch_id)?;
    ws.commit(TribleSet::new(), Some("initial commit"));

    while let Some(mut incoming) = repo.try_push(&mut ws)? {
        incoming.merge(&mut ws)?;
        ws = incoming;
    }

    Ok(())
}

ObjectStoreRemote writes directly through to the backing service. It implements StorageClose, but the implementation is a no-op, so dropping the repository handle is usually sufficient. Call repo.close() if you prefer an explicit shutdown step.

Credential configuration follows the object_store backend you select. For example, S3 endpoints consume AWS access keys or IAM roles, while memory:///foo provides a purely in-memory store for local testing. Once the URL resolves, repositories backed by piles and remote stores share the same workflow APIs.

Attaching a Foreign History (merge-import)

Sometimes you want to graft an existing branch from another pile into your current repository without rewriting its commits. Tribles supports a conservative, schema‑agnostic import followed by a single merge commit:

  1. Copy all reachable blobs from the source branch head into the target pile by streaming the reachable walker into repo::transfer. The traversal scans every 32‑byte aligned chunk and enqueues any candidate that dereferences in the source.
  2. Create a single merge commit that has two parents: your current branch head and the imported head. No content is attached to the merge; it simply ties the DAGs together.

This yields a faithful attachment of the foreign history — commits and their content are copied verbatim, and a one‑off merge connects both histories.

The trible CLI exposes this as:

trible branch merge-import \
  --from-pile /path/to/src.pile --from-name source-branch \
  --to-pile   /path/to/dst.pile --to-name   self

Internally this uses the reachable walker in combination with repo::transfer plus Workspace::merge_commit. Because the traversal scans aligned 32‑byte chunks, it is forward‑compatible with new formats as long as embedded handles remain 32‑aligned.

Sidebar — Choosing a copy routine

  • repo::transfer pairs the reachability walker (or any other iterator you provide) with targeted copies, returning (old_handle, new_handle) pairs for the supplied handles. Feed it the reachable iterator when you only want live blobs, the output of potential_handles when scanning metadata, or a collected list from BlobStoreList::blobs() when duplicating an entire store.
  • MemoryBlobStore::keep (and other BlobStoreKeep implementations) retain whichever handles you stream to them, making it easy to drop unreachable blobs once you've walked your roots.

Reachable copy keeps imports minimal; the transfer helper lets you rewrite specific handles while duplicating data into another store.

Programmatic example (Rust)

The same flow can be used directly from Rust when you have two piles on disk and want to attach the history of one branch to another:

use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::prelude::*;
use triblespace::core::repo::{self, pile::Pile, Repository};
use triblespace::core::value::schemas::hash::Blake3;
use triblespace::core::value::schemas::hash::Handle;

fn merge_import_example(
    src_path: &std::path::Path,
    src_branch_id: triblespace::id::Id,
    dst_path: &std::path::Path,
    dst_branch_id: triblespace::id::Id,
) -> anyhow::Result<()> {
    // 1) Open source (read) and destination (write) piles
    let mut src = Pile::open(src_path)?;
    src.restore()?;
    let mut dst = Pile::open(dst_path)?;
    dst.restore()?;

    // 2) Resolve source head commit handle
    let src_head: Value<Handle<Blake3, blobschemas::SimpleArchive>> =
        src.head(src_branch_id)?.ok_or_else(|| anyhow::anyhow!("source head not found"))?;

    // 3) Conservatively copy all reachable blobs from source → destination
    let reader = src.reader()?;
    let mapping: Vec<_> = repo::transfer(
        &reader,
        &mut dst,
        repo::reachable(&reader, [src_head.transmute()]),
    )
    .collect::<Result<_, _>>()?;
    eprintln!("copied {} reachable blobs", mapping.len());

    // 4) Attach via a single merge commit in the destination branch
    let mut repo = Repository::new(dst, SigningKey::generate(&mut OsRng));
    let mut ws = repo.pull(dst_branch_id)?;
    ws.merge_commit(src_head)?; // parents = { current HEAD, src_head }

    // 5) Push with standard conflict resolution
    while let Some(mut incoming) = repo.try_push(&mut ws)? {
        incoming.merge(&mut ws)?;
        ws = incoming;
    }
    Ok(())
}

Commit Selectors

Commit selectors describe which commits to load from a workspace. They give callers a reusable vocabulary for requests such as "let me work with the changes from last week" or "show the commits that touched this entity". The selector itself only decides which commits participate; the data behind those commits is materialized into a TribleSet by Workspace::checkout so the rest of the system can query it like any other dataset.

At checkout time the Workspace::checkout method accepts any type implementing the CommitSelector trait and returns a TribleSet built from the selected commits. Selectors can be as small as a single commit handle or as expressive as a filtered slice of history. This chapter walks through the available building blocks, how they compose, and how they relate to Git's revision grammar.

Range semantics

Range selectors mirror Git's two‑dot syntax. A selector of the form a..b starts from b and walks its reachable ancestors. The walk continues until it encounters a commit selected by a, at which point the descent along that branch stops. The start boundary is exclusive while the end boundary is inclusive: commits selected by a are omitted from the result, but the commit(s) provided by b are included alongside any additional ancestors reached through other branches. The shorthands behave as follows:

  • ..b is equivalent to empty()..b and gathers b plus all of its ancestors.
  • a.. defaults the end boundary to HEAD, collecting HEAD and its ancestors until the walk meets a.
  • .. expands to HEAD and every ancestor reachable from it.

Because the range semantics differ slightly from Git, you can wrap the start boundary in ancestors to reproduce Git's set-difference behaviour when parity is required: ancestors(a)..b matches git log a..b.

#![allow(unused)]
fn main() {
// Check out the entire history of the current branch
let history = ws.checkout(ancestors(ws.head()))?;

// Equivalent to `git log feature..main`
let delta = ws.checkout(ancestors(feature_tip)..main_tip)?;
}

Ranges are concise and map directly onto the ancestry walks exposed by the repository. Combinations such as "ancestors of B that exclude commits reachable from A" fall out naturally from existing selectors (ancestors(A)..B). When a query needs additional refinement, layer selectors like filter, reach for helpers such as symmetric_diff, or implement a small CommitSelector that post-processes the resulting CommitSet with union, intersection, or difference before handing it back to checkout.

Short-circuiting at the boundary avoids re-walking history that previous selectors already covered, but it still requires visiting every reachable commit when the start selector is empty. Long-lived queries that continuously ingest history can avoid that re-walk by carrying forward a specific commit as the new start boundary. If a prior run stopped at previous_head, the next iteration can use the range previous_head..new_head to gather only the commits introduced since the last checkout.

Implemented selectors

CommitSelector is implemented for:

  • CommitHandle – a single commit.
  • Vec<CommitHandle> and &[CommitHandle] – explicit lists of commits.
  • ancestors(commit) – a commit and all of its ancestors.
  • nth_ancestor(commit, n) – follows the first-parent chain n steps.
  • parents(commit) – direct parents of a commit.
  • symmetric_diff(a, b) – commits reachable from either a or b but not both.
  • Set combinators that operate on two selectors:
    • union(left, right) – commits returned by either selector.
    • intersect(left, right) – commits returned by both selectors.
    • difference(left, right) – commits from left that are not also returned by right.
  • Standard ranges: a..b, a.., ..b and .. that stop walking once the start boundary is encountered.
  • filter(selector, predicate) – retains commits for which predicate returns true.
  • history_of(entity) – commits touching a specific entity (built on filter).
  • time_range(start, end) – commits whose timestamps intersect the inclusive range.

The range primitives intentionally diverge from Git's subtraction semantics. a..b walks the history from b toward the start boundary and stops as soon as it rediscovers a commit yielded by a. Workspace checkouts frequently reuse an earlier selector—such as previous_head..new_head—so short-circuiting at the boundary saves re-walking the entire ancestor closure every time the selector runs. When you need Git's behaviour you can wrap the start in ancestors, trading the extra reachability work for parity with git log.

Because selectors already operate on CommitSet patches, composing new behaviour is largely a matter of combining those sets. The existing selectors in this chapter are implemented using the same building blocks that are available to library users, making it straightforward to prototype project-specific combinators without altering the Workspace::checkout API.

Set combinators

union, intersect, and difference wrap two other selectors and forward the results through the equivalent set operations exposed by PATCH. Reach for these helpers when you want to combine selectors without writing a custom CommitSelector implementation. Each helper accepts any selector combination and returns the corresponding CommitSet:

#![allow(unused)]
fn main() {
use tribles::repo::{ancestors, difference, intersect, union};

// Everything reachable from either branch tip.
let combined = ws.checkout(union(ancestors(main), ancestors(feature)))?;

// Only the commits both branches share.
let shared = ws.checkout(intersect(ancestors(main), ancestors(feature)))?;

// Feature-only commits without the mainline history.
let feature_delta = ws.checkout(difference(ancestors(feature), ancestors(main)))?;
}

Composing selectors

Selectors implement the CommitSelector trait, so they can wrap one another to express complex logic. The pattern is to start with a broad set—often ancestors(ws.head())—and then refine it. The first snippet below layers a time window with an entity filter before handing the selector to Workspace::checkout, and the follow-up demonstrates the built-in intersect selector to combine two existing selectors.

#![allow(unused)]
fn main() {
use hifitime::Epoch;
use tribles::repo::{filter, history_of, intersect, time_range};

let cutoff = Epoch::from_unix_seconds(1_701_696_000.0); // 2023-12-01
let recent = filter(time_range(cutoff, Epoch::now().unwrap()), |_, payload| {
    payload.iter().any(|trible| trible.e() == &my_entity)
});

let relevant = ws.checkout(recent)?;

// Start from the result and zero in on a single entity.
let entity_history = ws.checkout(history_of(my_entity))?;

let recent_entity_commits = ws.checkout(intersect(
    time_range(cutoff, Epoch::now().unwrap()),
    history_of(my_entity),
))?;
}

Filtering commits

The filter selector wraps another selector and keeps only the commits for which a user provided closure returns true. The closure receives the commit metadata and its payload, allowing inspection of authors, timestamps or the data itself. Selectors compose, so you can further narrow a range:

#![allow(unused)]
fn main() {
use hifitime::Epoch;
use triblespace::core::repo::{filter, time_range};

let since = Epoch::from_unix_seconds(1_609_459_200.0); // 2020-12-01
let now = Epoch::now().unwrap();
let recent = ws.checkout(filter(time_range(since, now), |_, payload| {
    payload.iter().any(|t| t.e() == &my_entity)
}))?;
}

Higher level helpers can build on this primitive. For example history_of(entity) filters ancestors(HEAD) to commits touching a specific entity:

#![allow(unused)]
fn main() {
let changes = ws.checkout(history_of(my_entity))?;
}

When debugging a complicated selector, start by checking out the wider range and logging the commit metadata. Verifying the intermediate results catches off-by-one errors early and helps spot situations where a filter excludes or includes more history than expected.

Git Comparison

The table below summarizes Git's revision grammar. Each row links back to the official documentation. Forms that rely on reflogs or reference objects other than commits are listed for completeness but are unlikely to be implemented.

Git SyntaxPlanned EquivalentReferenceStatus
Acommit(A)gitrevisionsImplemented
A^/A^Nnth_parent(A, N)gitrevisionsNot planned
A~Nnth_ancestor(A, N)gitrevisionsImplemented
A^@parents(A)gitrevisionsImplemented
A^!A minus parents(A)gitrevisionsUnimplemented
A^-NA minus nth_parent(A, N)gitrevisionsNot planned
A^0commit(A)gitrevisionsImplemented
A^{}deref_tag(A)gitrevisionsUnimplemented
A^{type}object_of_type(A, type)gitrevisionsNot planned: non-commit object
A^{/text}search_from(A, text)gitrevisionsNot planned: requires commit message search
:/textsearch_repo(text)gitrevisionsNot planned: requires repository search
A:pathblob_at(A, path)gitrevisionsNot planned: selects a blob not a commit
:[N:]pathindex_blob(path, N)gitrevisionsNot planned: selects from the index
A..Brange(A, B)gitrevisionsImplemented
A...Bsymmetric_diff(A, B)gitrevisionsImplemented
^Aexclude(reachable(A))gitrevisionsUnimplemented
A@{upstream}upstream_of(A)gitrevisionsNot planned: depends on remote config
A@{push}push_target_of(A)gitrevisionsNot planned: depends on remote config
A@{N}reflog(A, N)gitrevisionsNot planned: relies on reflog history
A@{<date>}reflog_at(A, date)gitrevisionsNot planned: relies on reflog history
@{N}reflog(HEAD, N)gitrevisionsNot planned: relies on reflog history
@{-N}previous_checkout(N)gitrevisionsNot planned: relies on reflog history

Only a subset of Git's revision grammar will likely be supported. Selectors relying on reflog history, remote configuration, or searching commits and blobs add complexity with little benefit for workspace checkout. They are listed above for completeness but remain unplanned for now.

Note: range(A, B) differs subtly from Git's two-dot syntax. It walks parents from B until a commit from A is encountered instead of subtracting the entire ancestor closure of A. Use ancestors(A)..B for Git's behaviour.

TimeRange

Commits record when they were made via a timestamp attribute of type NsTAIInterval. When creating a commit this interval defaults to (now, now) but other tools could provide a wider range if the clock precision is uncertain. The TimeRange selector uses this interval to gather commits whose timestamps fall between two Epoch values:

#![allow(unused)]
fn main() {
use hifitime::Epoch;
use triblespace::repo::time_range;

let since = Epoch::from_unix_seconds(1_609_459_200.0); // 2020-12-01
let now = Epoch::now().unwrap();
let tribles = ws.checkout(time_range(since, now))?;
}

This walks the history from HEAD and returns only those commits whose timestamp interval intersects the inclusive range.

Internally it uses filter(ancestors(HEAD), ..) to check each commit's timestamp range.

Garbage Collection and Forgetting

Repositories grow over time as commits, branch metadata, and user blobs accumulate. Because every blob is content addressed and immutable, nothing is ever overwritten and there is no automatic reclamation when branches move or objects become orphaned. To keep disk usage in check a repository can periodically forget blobs that are no longer referenced.

Forgetting is deliberately conservative. It only removes local copies, so re-synchronising from a peer or pushing a commit that references an "forgotten" blob will transparently restore it. Forgetting therefore complements the monotonic model: history never disappears globally, but any node can opt-out of retaining data it no longer needs.

The main challenge is deciding which blobs are still reachable without reconstructing every TribleSet. The sections below outline how the repository module solves that problem and how you can compose the building blocks in your own tools.

Understanding the Roots

The walk begins with a root set—the handles you know must stay alive. In a typical repository this includes the metadata blob for each branch (which in turn names the commit heads), tags, or any additional anchors your deployment requires. Roots are cheap to enumerate: walk the branch store via BranchStore::branches and load each branch head, or read the subset of metadata relevant to the retention policy you are enforcing. Everything reachable from those handles will be retained by the traversal; everything else is eligible for forgetting.

Conservative Reachability

Every commit and branch metadata record is stored as a SimpleArchive. The archive encodes a canonical TribleSet as 64-byte tribles, each containing a 32-byte value column. The blob store does not track which handles correspond to archives, so the collector treats every blob identically: it scans the raw bytes in 32-byte chunks and treats each chunk as a candidate handle. Chunks that are not value columns—for example the combined entity/attribute half of a trible or arbitrary attachment bytes—are discarded when the candidate lookup fails. If a chunk matches the hash of a blob in the store we assume it is a reference, regardless of the attribute type. With 32-byte hashes the odds of a random collision are negligible, so the scan may keep extra blobs but will not drop a referenced one.

Content blobs that are not SimpleArchive instances (for example large binary attachments) therefore behave as leaves: the traversal still scans them, but because no additional lookups succeed they contribute no further handles. They become reachable when some archive references their handle and are otherwise eligible for forgetting.

Traversal Algorithm

  1. Enumerate all branches and load their metadata blobs.
  2. Extract candidate handles from the metadata. This reveals the current commit head along with any other referenced blobs.
  3. Recursively walk the discovered commits and content blobs. Each blob is scanned in 32-byte steps; any chunk whose lookup succeeds is enqueued instead of deserialising the archive.
  4. Stream the discovered handles into whatever operation you need. The reachable helper returns an iterator of handles, so you can retain them, transfer them into another store, or collect them into whichever structure your workflow expects.

Because the traversal is purely additive you can compose additional filters or instrumentation as needed—for example to track how many objects are held alive by a particular branch or to export a log of missing blobs for diagnostics.

Automating the Walk

The repository module already provides most of the required plumbing. The reachable helper exposes the traversal as a reusable iterator so you can compose other operations along the way, while transfer duplicates whichever handles you feed it. The in-memory MemoryBlobStore can retain live blobs, duplicate them into a scratch store, and report how many handles were touched without writing bespoke walkers:

#![allow(unused)]
fn main() {
use triblespace::core::blob::memoryblobstore::MemoryBlobStore;
use triblespace::core::repo::{self, BlobStoreKeep, BlobStoreList, BranchStore};
use triblespace::core::value::schemas::hash::Blake3;

let mut store = MemoryBlobStore::<Blake3>::default();
// ... populate the store or import data ...

let mut branch_store = /* your BranchStore implementation */;
let reader = store.reader()?;

// Collect the branch metadata handles we want to keep alive.
let mut roots = Vec::new();
for branch_id in branch_store.branches()? {
    if let Some(meta) = branch_store.head(branch_id?)? {
        roots.push(meta.transmute());
    }
}

// Trim unreachable blobs in-place.
store.keep(repo::reachable(&reader, roots.clone()));

// Optionally copy the same reachable blobs into another store.
let mut scratch = MemoryBlobStore::<Blake3>::default();
let visited = repo::reachable(&reader, roots.clone()).count();
let mapping: Vec<_> = repo::transfer(
    &reader,
    &mut scratch,
    repo::reachable(&reader, roots),
)
.collect::<Result<_, _>>()?;

println!("visited {} blobs, copied {}", visited, mapping.len());
println!("rewrote {} handles", mapping.len());
}

In practice you will seed the walker with the handles extracted from branch metadata or other root sets instead of iterating the entire store. The helper takes any IntoIterator of handles, so once branch heads (and other roots) have been identified, they can be fed directly into the traversal without writing custom queues or visitor logic. Passing the resulting iterator to MemoryBlobStore::keep or repo::transfer makes it easy to implement mark-and-sweep collectors or selective replication pipelines without duplicating traversal code.

When you already have metadata represented as a TribleSet, the potential_handles helper converts its value column into the conservative stream of Handle<H, UnknownBlob> instances expected by these operations.

Operational Tips

  • Schedule forgetting deliberately. Trigger it after large merges or imports rather than on every commit so you amortise the walk over meaningful changes.
  • Watch available storage. Because forgetting only affects the local node, replicating from a peer may temporarily reintroduce forgotten blobs. Consider monitoring disk usage and budgeting headroom for such bursts.
  • Keep a safety margin. If you are unsure whether a handle should be retained, include it in the root set. Collisions between 32-byte handles are effectively impossible, so cautious root selection simply preserves anything that might be referenced.

Future Work

The public API for triggering garbage collection is still evolving. The composition-friendly walker introduced above is one building block; future work could layer additional convenience helpers or integrate with external retention policies. Conservative reachability by scanning SimpleArchive bytes remains the foundation for safe space reclamation.

Glossary

This chapter collects the core terms that appear throughout the book. Skim it when you encounter unfamiliar terminology or need a refresher on how concepts relate to one another in Trible Space.

Attribute

A property that describes some aspect of an entity. Attributes occupy the middle position in a trible and carry the ValueSchema (or blob-handle schema) that interprets and validates the value. Modules mint them with the attributes! macro, so they behave like detached struct fields: each attribute remains independently typed even when many are combined to describe the same entity, preserving its individual semantics. Provide an explicit 128-bit id in the macro when you need a canonical column shared across crates or languages; omit the literal to derive a deterministic id from the attribute name and value schema (the macro calls Attribute::from_name for you), which is handy for short-lived or internal attributes.

Blob

An immutable chunk of binary data addressed by the hash of its contents. Blobs store payloads that do not fit in the fixed 32-byte value slot—long strings, media assets, archived TribleSets, commit metadata, and other large artifacts. Each blob is tagged with a BlobSchema so applications can decode it back into native types.

Blob Store

An abstraction that persists blobs. Implementations back local piles, in-memory workspaces, or remote object stores while presenting a common BlobStore interface that handles hashing, deduplication, and retrieval.

Commit

A signed snapshot of repository state. Commits archive a TribleSet describing the workspace contents and store metadata such as parent handles, timestamps, authors, signatures, and optional messages. The metadata itself lives in a SimpleArchive blob whose hash becomes the commit handle.

Commit Selector

A query primitive that walks a repository’s commit graph to identify commits of interest. Selectors power history traversals such as parents, nth_ancestor, ranges like a..b, and helpers such as history_of(entity).

Entity

The first position in a trible. Entities identify the subject making a statement and group the attributes asserted about it. They are represented by stable identifiers minted from namespaces or ID owners—not by hashing their contents—so multiple facts about the same subject cohere without revealing how the identifier was derived. Ownership controls who may mint new facts for a given identifier.

PATCH

The Persistent Adaptive Trie with Cuckoo-compression and Hash-maintenance. Each PATCH stores all six permutations of a trible set in a 256-ary trie whose nodes use byte-oriented cuckoo hash tables and copy-on-write semantics. Shared leaves keep permutations deduplicated, rolling hashes let set operations skip unchanged branches, and queries only visit the segments relevant to their bindings, mirroring the behaviour described in the deep-dive chapter.

Pile

An append-only collection of blobs and branch records stored in a single file. Piles act as durable backing storage for repositories, providing a write-ahead-log style format that can be memory mapped, repaired after crashes, and safely shared between threads.

Repository

The durable record that ties blob storage, branch metadata, and namespaces together. A repository coordinates synchronization, replication, and history traversal across commits while enforcing signatures and branch ownership.

Schema

The set of attribute declarations and codecs that document and enforce the shape of data in Trible Space. Schemas assign language-agnostic meaning to the raw bytes—they are not the concrete Rust types—so any implementation that understands the schema can interpret the payloads consistently. Value schemas map the fixed 32-byte payload of a trible to native types, while blob schemas describe arbitrarily long payloads so tribles referencing those blobs stay portable.

Trible

A three-part tuple of entity, attribute, and value stored in a fixed 64-byte layout. Tribles capture atomic facts, and query engines compose them into joins and higher-order results.

Trible Space

The overall storage model that organises tribles across blobs, PATCHes, and repositories. It emphasises immutable, content-addressed data, monotonic set semantics, and familiar repository workflows.

Value

The third position in a trible. Values store a fixed 32-byte payload interpreted through the attribute’s schema. They often embed identifiers for related entities or handles referencing larger blobs.

Workspace

A mutable working area for preparing commits. Workspaces track staged trible sets and maintain a private blob store so large payloads can be uploaded before publishing. Once a commit is finalised it becomes immutable like the rest of Trible Space.

Documentation Improvement Ideas

This chapter is a roadmap for the next iteration of the book. Each subsection summarises a gap we discovered while reviewing the crate and outlines the minimal content that would help readers apply the feature in practice. When you pick one of these items up, try to produce a runnable example (or at least executable pseudocode) so the section teaches a concrete workflow rather than a theory sketch.

High-priority topics

The following themes unblock common deployment or operational scenarios and should be tackled first when planning documentation work:

Remote object stores

repo::objectstore::ObjectStoreRemote::with_url wires the repository into object_store services such as S3, local filesystems or Azure storage. The future chapter should walk through credential configuration, namespace selection, and pairing the remote backend with other stores (for example via HybridStore). It also needs to call out how branch updates rely on PutMode::Update/UpdateVersion retries, how conflicts bubble up to callers, and how listings stream through BlockingIter so readers know what consistency guarantees to expect. 【F:src/repo/objectstore.rs†L108-L316】

Hybrid storage recipes

repo::hybridstore::HybridStore mixes a blob store with a separate branch store. Documenting a few reference layouts—remote blobs with local branches, piles with in-memory branches, or even two-tier caches—will help teams evaluate trade-offs quickly. 【F:src/repo/hybridstore.rs†L1-L86】

Signature verification

Both repo::commit::verify and repo::branch::verify expose helpers for validating signed metadata before accepting remote history. A hands-on example should explain when to perform verification, how to surface failures to callers, and which key material needs to be distributed between collaborators. 【F:src/repo/commit.rs†L84-L122】 【F:src/repo/branch.rs†L95-L136】

Repository migration helpers

repo::transfer rewrites whichever handles you feed it and returns the old and new identifiers so callers can update references. A migration recipe could show how to collect handles from BlobStoreList::blobs() for full copies or from reachable when only live data should be duplicated. Highlight how the helper fits into a scripted maintenance window. 【F:src/repo.rs†L394-L516】

Conservative GC tooling

The garbage-collection chapter covers the high-level approach, but it should also reference concrete APIs such as repo::reachable, repo::transfer, and MemoryBlobStore::keep. Describing how to compute and retain the reachable set in code makes it easier to embed the GC workflow into automated jobs. 【F:src/repo.rs†L394-L516】 【F:src/blob/memoryblobstore.rs†L169-L210】

Emerging capabilities

These topics are less urgent but still deserve coverage so that readers can reuse advanced building blocks without digging through source code.

Succinct archive indexes

blob::schemas::succinctarchive::SuccinctArchive converts a TribleSet into compressed wavelet matrices, exposes helpers such as distinct_in and enumerate_in, implements TriblePattern, and serialises via ordered, compressed or cached Universe implementations. A dedicated section should walk through building an archive from a set, choosing a universe, storing it as a blob, and querying it directly through SuccinctArchiveConstraint so readers can reuse the on-disk index without round-tripping through TribleSet conversions. 【F:src/blob/schemas/succinctarchive.rs†L100-L529】 【F:src/blob/schemas/succinctarchive/universe.rs†L16-L265】 【F:src/blob/schemas/succinctarchive/succinctarchiveconstraint.rs†L9-L200】

Extensible path engines

Regular path queries run through RegularPathConstraint, which delegates edge checks to the PathEngine trait. The book should document how the built-in ThompsonEngine constructs NFAs from a TribleSet and demonstrate how to plug in alternative engines backed by other graph stores so readers can extend the regex-based traversal beyond in-memory datasets. 【F:src/query/regularpathconstraint.rs†L1-L200】

How to keep this list fresh

Treat these notes as a living backlog. Whenever a new subsystem lands, ask yourself whether it needs a discoverability guide, a tutorial or a troubleshooting section. Update this chapter with the gaps you observe, and link to the relevant modules so future contributors can jump straight into the implementation.

Formal Verification Roadmap

This roadmap captures the initial strategy for driving the triblespace crates toward comprehensive formal verification. It unifies model checking, symbolic execution, fuzzing, and deterministic simulation so we can reason about both the low-level data structures and high-level repository workflows with stronger correctness guarantees.

Verification Stack Overview

  • Model checking with Kani explores bounded but exhaustive state spaces for invariants that must never be violated.
  • Symbolic execution with Miri exposes undefined behaviour (UB) and aliasing issues across regular unit tests without requiring new harnesses.
  • Coverage-guided fuzzing stresses APIs with randomized input sequences to uncover emergent behaviours that formal proofs might miss.
  • Deterministic simulations replay realistic repository workflows so we can audit higher-level semantics and regression-test subtle interplays between subsystems.

Each technique complements the others; together they provide layered assurance that keeps regressions from reaching downstream users.

Goals

  • Protect the fundamental algebraic properties of TribleSet, PATCH, and the repository commit graph.
  • Exercise serialization, deserialization, and zero-copy data views under adversarial inputs.
  • Detect behavioural regressions in query heuristics, constraint solving, and workspace merging before they reach downstream users.
  • Integrate the tooling into CI so proofs and regression checks run automatically for every change.
  • Preserve a contributor-friendly workflow where verification steps are discoverable, well documented, and quick to reproduce locally.

Current Foundation

  • proofs/ already contains Kani harnesses for query, value, and variable-set behaviour. They provide examples of bounded nondeterministic data generation (kani::any, Value::new) and assume/guarantee reasoning that new harnesses can reuse.
  • ./scripts/preflight.sh is the aggregation point for formatting and tests; adding verification steps here keeps contributor workflows consistent.

Invariant Catalogue

The roadmap anchors future work around the following invariants. Each row tracks the subsystem we care about, the guarantees we want to encode, and a rough sketch of how to exercise them in Kani, Miri, or fuzzing harnesses.

AreaKey invariantsCandidate harness or check
TribleSet (src/trible/tribleset.rs)Union/intersection/difference maintain canonical ordering across all six PATCH indexes; iterators only yield deduplicated Tribles; insert never drops an ordering.Extend the existing variableset harnesses with nondeterministic inserts, and add a dedicated tribleset_harness.rs validating round-trips across every ordering.
PATCH & ByteTable (src/patch/*.rs)Cuckoo displacement respects MAX_RETRIES without losing entries; Branch::modify_child grows tables when required and preserves leaf_count/segment_count; table_grow copies every occupant exactly once.Introduce a patch_harness.rs that stress-tests plan_insert, table_insert, and Branch::grow, plus a micro-fuzzer that drives inserts/removals across random table sizes.
Value schemas (src/value/*.rs)Schema encoders respect declared byte widths; TryFromValue conversions and ValueSchema::validate reject truncated buffers; zero-copy views stay aligned.Reuse value_harness.rs, adding per-schema helpers plus a Miri regression suite that loads slices at every alignment.
Query engine (src/query/*.rs)Constraint solver never aliases conflicting bindings; planner outputs cover all join permutations referenced by pattern!; influence tracking matches selected variables.Expand proofs/query_harness.rs with minimal counterexamples, and fuzz constraint graphs via cargo fuzz.
Repository & commits (src/repo/*.rs, proofs/commit_harness.rs)Branch heads remain append-only; Workspace::pull never forgets reachable blobs; selector algebra matches Git semantics.Add bounded commit DAG generators in commit_harness.rs plus deterministic simulation traces covering merges and garbage collection.
Storage primitives (src/blob, src/repo, src/patch/leaf.rs)Blob handles stay reference counted; pile headers remain within reserved capacity; byte slices from archives stay valid for the life of the store.Combine Miri tests for aliasing with nightly fuzzers that replay repository sync transcripts.

Expansion Plan

Phase 1 – Harden the Existing Kani Coverage

  1. Catalogue crate-level invariants and map them to concrete Kani harnesses. Start with:
    • TribleSet operations preserving canonical ordering and deduplication.
    • Join heuristics in atreides ensuring variable bindings never alias conflicting values.
    • Repository merge logic maintaining append-only pile semantics.
  2. Extract shared helpers for generating bounded arbitrary data (e.g. Vec::bounded_any) so harnesses remain expressive without exploding the search space.
  3. Adopt a per-module harness layout (proofs/<module>_harness.rs) registered from proofs/mod.rs to make maintenance predictable.
  4. Configure scripts/verify.sh to run targeted cargo kani --harness <name> invocations in parallel, then wire it into CI with caching to keep runtimes manageable.

Phase 2 – Symbolic Execution with Miri

  1. Enable cargo miri test for the default test suite to surface undefined behaviour (UB) and aliasing bugs that regular tests may miss.
  2. Gate flaky or unsupported tests with cfg(miri) guards so the suite stays deterministic under the interpreter.
  3. Document the workflow in scripts/preflight.sh and optionally expose a dedicated scripts/miri.sh for local runs when developers need deeper debugging.

Contributor Workflow

  1. Run ./scripts/preflight.sh before every commit; it aggregates formatting, testing, and (eventually) targeted verification checks.
  2. Use cargo kani --harness <NAME> locally when iterating on a new proof. Start from the harness templates in proofs/ so generators and assumptions stay consistent.
  3. Execute cargo miri test after modifying unsafe code, pointer logic, or concurrency primitives; it catches UB bugs that normal tests cannot surface.
  4. Kick off fuzz targets with cargo fuzz run <TARGET> when touching boundary code (deserializers, planners, repository sync). Store new corpus inputs in version control if they expose bugs or tricky behaviours.
  5. Record findings, gaps, and future work in INVENTORY.md so the roadmap evolves alongside the implementation effort.

Phase 3 – Fuzzing and Property Testing

  1. Introduce a cargo fuzz workspace targeting:
    • PATCH encoders/decoders with binary corpus seeds generated from integration tests.
    • Join-order heuristics to explore combinations of constraint graphs and filter predicates.
    • Repository sync workflows by fuzzing sequences of commits, pulls, and merges.
  2. Reuse structured generators from proptest where deterministic shrinking is valuable, and bridge them into fuzz harnesses when possible to keep the state space constrained.
  3. Automate nightly or on-demand fuzz campaigns via CI artifacts, storing any found counterexamples alongside minimised reproducers.

Phase 4 – Deterministic Simulation Testing

  1. Model repository replication scenarios with deterministic event queues to explore conflict resolution, garbage collection, and concurrent writers.
  2. Encode the simulations as regular unit tests backed by recorded execution traces so they can double as documentation for expected behaviour.
  3. Capture simulation scenarios discovered during fuzzing to prevent regressions.

Milestones & Reporting

  • Track coverage for each invariant in a shared dashboard (CI summary or INVENTORY.md) so contributors can quickly spot gaps.
  • Celebrate major wins—like a new harness landing or a bug found via verification—in the CHANGELOG to reinforce the value of the effort.
  • Review and refresh this roadmap at least once per release cycle to keep the guidance aligned with the architecture.

Tooling Integration

  • Track verification status in CI badges and documentation so contributors know which guarantees currently hold.
  • Extend INVENTORY.md with follow-up work items whenever new invariants or subsystems are identified.
  • Keep verification-specific configuration (Kani property files, fuzz corpora, deterministic seeds) under version control to make runs reproducible.

Next Steps

  1. Break the invariant catalogue into GitHub issues that reference the planned harness files (proofs/tribleset_harness.rs, proofs/patch_harness.rs, etc.).
  2. Prototype the PATCH harness that drives Branch::modify_child through insertion/growth cycles so we can assert the displacement planner and table_grow never drop entries; wire the run into scripts/verify.sh.
  3. Evaluate CI capacity to determine how frequently Kani proofs, cargo miri, and fuzzers can run without blocking contributors, documenting the cadence directly in INVENTORY.md.

This roadmap should evolve alongside the codebase—update it whenever new verification opportunities or obstacles appear.

Deep Dive

This section contains in-depth discussions about Trible Space internals.

Philosophy

Triblespace was designed to feel approachable without sacrificing rigor. This chapter collects the guiding values that shape everything from the storage format to the public APIs. Treat it as a companion to the rest of the deep-dive sections: when you wonder "why does it work this way?", the answer usually traces back to one of these principles.

Clarity before cleverness

We favour predictable, mechanically simple components over opaque heuristics. Each subsystem should be understandable on its own, with behaviours that are obvious when composed with the rest of the stack. When a trade-off appears between a clever optimisation and debuggability, we err on the side of the latter and document the costs so future work can revisit the decision with better evidence.

Productive developer experience

APIs should read like regular Rust. Where backends demand asynchronous capabilities—such as object-store repositories—we wrap them in blocking entry points so typical workflows stay synchronous while still supporting advanced integrations. Well-documented patterns and composable macros let readers experiment in a REPL or test harness without extra scaffolding, and examples in the book mirror the crates users import so copy-and-paste snippets behave as advertised.

Soundness and data integrity

The engine must reject malformed data early, surface explicit error paths, and make invariants easy to audit. Safety checks live close to the data structures that rely on them, and proofs or tests accompany subtle invariants when feasible. Correctness remains the baseline for every optimisation.

Performance with headroom

Efficient data structures keep the core fast, but we prioritise predictable latency over micro-benchmarks that complicate maintenance. Hot paths receive focused tuning backed by benchmarks so we can understand the impact of each change.

Practical implications

These principles surface in day-to-day workflows:

  • Documentation favours runnable snippets and end-to-end walkthroughs, lowering the barrier to experimentation.
  • Internal abstractions expose minimal, intentional APIs, reducing the amount of context a contributor needs before making a change.
  • Tooling—such as the preflight.sh convenience script, targeted Kani verification harnesses, and runnable doc tests—keeps quality checks accessible so they are run regularly rather than only in CI.

Taken together, the philosophy is simple: build a reliable system whose pieces are easy to reason about, teach, and extend.

Identifiers for Distributed Systems

Distributed systems are assembled from independently authored pieces of data. Keeping those pieces addressable requires names that survive replication, concurrent edits, and local conventions. We have found it useful to categorize identifier schemes along two axes:

AbstractSemantic
IntrinsicHash, Signature, PubKeyEmbeddings
ExtrinsicUUID, UFOID, FUCIDNames, DOI, URL

The rows describe how an identifier is minted while the columns describe what it tries to communicate. Classifying an identifier along both axes makes its trade-offs explicit and helps decide how to combine multiple schemes for a given workflow.

Abstract vs. Semantic Identifiers

Semantic identifiers

Semantic identifiers (names, URLs, descriptive labels, embeddings) carry meaning about the thing they reference. That context makes them convenient for humans and for search workflows where the identifier itself narrows the scope of possible matches. Their usefulness comes with caveats:

  • Semantics drift. A name that once felt accurate can become misleading as an entity evolves.
  • Semantic identifiers are rarely unique. They work best as entry points into a richer dataset rather than as the source of truth for identity.
  • Without scoping, semantics clash. Two communities can reasonably reuse the same word for different concepts, so the identifier must be tied to a namespace, deployment, or audience.

Distributed systems reconcile those tensions by mapping many local semantic names to a persistent abstract identifier. Users remain free to adopt whatever terminology makes sense to them while the system maintains a shared canonical identity.

Embeddings deserve a special mention. They encode meaning in a machine-friendly form that can be compared for similarity instead of exact equality. That makes them great for recommendations and clustering but still unsuitable as primary identifiers: two distinct entities can legitimately share similar embeddings, and embeddings can change whenever the underlying model is retrained.

Abstract identifiers

Abstract identifiers (UUIDs, UFOIDs, FUCIDs, hashes, signatures) strip all meaning away in favor of uniqueness. They can be minted without coordination, usually by drawing from a high-entropy space and trusting probability to keep collisions effectively impossible. Abstract identifiers shine when you need:

  • Stable handles that survive across replicas and through refactors.
  • Globally unique names without a centralized registrar.
  • Cheap, constant-time generation so every component can allocate identifiers on demand.

Because they carry no inherent semantics, abstract identifiers are almost always paired with richer metadata. They provide the skeleton that keeps references consistent while semantic identifiers supply the narrative that humans consume.

Intrinsic vs. Extrinsic Identifiers

The intrinsic/extrinsic axis captures whether an identifier can be recomputed from the entity itself or whether it is assigned externally.

Intrinsic identifiers

Intrinsic identifiers (cryptographic hashes, digital signatures, content-based addresses) are derived from the bytes they describe. They function as fingerprints: if two values share the same intrinsic identifier then they are bit-for-bit identical. This property gives us:

  • Immutability. Changing the content produces a different identifier, which immediately signals tampering or corruption.
  • Self-validation. Replicas can verify received data locally instead of trusting a third party.
  • Stronger adversarial guarantees. Because an attacker must find collisions deliberately, intrinsic identifiers rely on cryptographic strength rather than purely statistical rarity.

Extrinsic identifiers

Extrinsic identifiers (names, URLs, DOIs, UUIDs, UFOIDs, FUCIDs) are assigned by policy instead of by content. They track a conceptual entity as it evolves through versions, formats, or migrations. In other words, extrinsic identifiers carry the "story" of a thing while intrinsic identifiers nail down individual revisions.

Thinking about the classic ship of Theseus thought experiment makes the distinction concrete: the restored ship and the reconstructed ship share the same extrinsic identity (they are both "Theseus' ship") but have different intrinsic identities because their planks differ.

Embeddings as Semantic Intrinsic Identifiers

Embeddings blur our neat taxonomy. They are intrinsic because they are computed from the underlying data, yet they are overtly semantic because similar content produces nearby points in the embedding space. That duality makes them powerful for discovery:

  • Systems can exchange embeddings as a "lingua franca" without exposing raw documents.
  • Expensive feature extraction can happen once and power many downstream indexes, decentralizing search infrastructure.
  • Embeddings let us compare otherwise incomparable artifacts (for example, a caption and an illustration) by projecting them into a shared space.

Despite those advantages, embeddings should still point at a durable abstract identifier rather than act as the identifier. Collisions are expected, model updates can shift the space, and floating-point representations can lose determinism across hardware.

High-Entropy Identifiers

For a truly distributed system, the creation of identifiers must avoid the bottlenecks and overhead associated with a central coordinating authority. At the same time, we must ensure that these identifiers are unique.

To guarantee uniqueness, we use abstract identifiers containing a large amount of entropy, making collisions statistically irrelevant. However, the entropy requirements differ based on the type of identifier:

  • Extrinsic abstract identifiers need enough entropy to prevent accidental collisions in normal operation.
  • Intrinsic abstract identifiers must also resist adversarial forging attempts, requiring significantly higher entropy.

From an information-theoretic perspective, the length of an identifier determines the maximum amount of entropy it can encode. For example, a 128-bit identifier can represent ( 2^{128} ) unique values, which is sufficient to make collisions statistically negligible even for large-scale systems.

For intrinsic identifiers, 256 bits is widely considered sufficient when modern cryptographic hash functions (e.g., SHA-256) are used. These hash functions provide strong guarantees of collision resistance, preimage resistance, and second-preimage resistance. Even in the event of weaknesses being discovered in a specific algorithm, it is more practical to adopt a new hash function than to increase the bit size of identifiers.

Additionally, future advances such as quantum computing are unlikely to undermine this length. Grover's algorithm would halve the effective security of a 256-bit hash, reducing it to ( 2^{128} ) operations—still infeasible with current or theoretical technology. As a result, 256 bits remains a future-proof choice for intrinsic identifiers.

Such 256-bit intrinsic identifiers are represented by the types triblespace::core::value::schemas::hash::Hash and triblespace::core::value::schemas::hash::Handle.

Not every workflow needs cryptographic strength. We therefore ship three high-entropy abstract identifier families—RNGID, UFOID, and FUCID—that keep 128 bits of global uniqueness while trading off locality, compressibility, and predictability to suit different scenarios.

Comparison of Identifier Types

RNGIDUFOIDFUCID
Global entropy128 bits96 bits random + timestamp128 bits
LocalityNoneHigh (time-ordered)High (monotonic counter)
Compression friendlinessNoneLowHigh
PredictabilityNoneLow (reveals mint time)High (per-source sequence)

Example: Scientific Publishing

Consider the case of published scientific papers. Each artifact, such as a .html or .pdf file, should be identified by its abstract intrinsic identifier, typically a cryptographic hash of its content. This ensures that any two entities referencing the same hash are referring to the exact same version of the artifact, providing immutability and validation.

Across different versions of the same paper, an abstract extrinsic identifier can tie these artifacts together as part of one logical entity. The identifier provides continuity regardless of changes to the paper’s content over time.

Semantic (human-readable) identifiers, such as abbreviations in citations or bibliographies, are scoped to individual papers and provide context-specific usability for readers. These names do not convey identity but serve as a way for humans to reference the persistent abstract identifiers that underlie the system.

Sadly the identifiers used in practice, such as DOIs, fail to align with these principles and strengths. They attempt to provide global extrinsic semantic identifiers for scientific papers, an ultimately flawed approach. They lack the associated guarantees of intrinsic identifiers and bring all the challenges of semantic identifiers. With their scope defined too broadly and their authority centralized, they fail to live up to the potential of distributed systems.

ID Ownership

In distributed systems, consistency requires monotonicity due to the CALM principle. However, this is not necessary for single writer systems. By assigning each ID an owner, we ensure that only the current owner can write new information about an entity associated with that ID. This allows for fine-grained synchronization and concurrency control.

To create a transaction, you can uniquely own all entities involved and write new data for them simultaneously. Since there can only be one owner for each ID at any given time, you can be confident that no other information has been written about the entities in question.

By default, all minted ExclusiveIds are associated with the thread they are dropped from. These IDs can be found in queries via the local_ids function.

Once the IDs are back in scope you can either work with them directly as ExclusiveIds or move them into an explicit IdOwner for a longer lived transaction. The example below shows both approaches in action:

#![allow(unused)]
fn main() {
use triblespace::examples::literature;
use triblespace::prelude::*;

let mut kb = TribleSet::new();
{
    let isaac = ufoid();
    let jules = ufoid();
    kb += entity! { &isaac @
        literature::firstname: "Isaac",
        literature::lastname: "Asimov",
    };
    kb += entity! { &jules @
        literature::firstname: "Jules",
        literature::lastname: "Verne",
    };
} // `isaac` and `jules` fall back to this thread's implicit IdOwner here.

let mut txn_owner = IdOwner::new();
let mut updates = TribleSet::new();

for (author, name) in find!(
    (author: ExclusiveId, name: String),
    and!(
        local_ids(author),
        pattern!(&kb, [{
            ?author @ literature::firstname: ?name
        }])
    )
) {
    // `author` is an ExclusiveId borrowed from the implicit thread owner.
    let author_id = txn_owner.insert(author);

    {
        let borrowed = txn_owner
            .borrow(&author_id)
            .expect("the ID was inserted above");
        updates += entity! { &borrowed @ literature::lastname: name.clone() };
    } // `borrowed` drops here and returns the ID to `txn_owner`.
}
}

Sometimes you want to compare two attributes without exposing the comparison variable outside the pattern. Prefixing the binding with _?, such as _?name, allocates a scoped variable local to the macro invocation. Both pattern! and pattern_changes! will reuse the same generated query variable whenever the _? form appears again, letting you express equality constraints inline without touching the outer find! signature.

Binding the variable as an ExclusiveId means the closure that find! installs will run the FromValue implementation for ExclusiveId. FromValue simply unwraps TryFromValue, which invokes Id::aquire and would panic if the current thread did not own the identifier. The local_ids constraint keeps the query safe by only enumerating IDs already owned by this thread. In the example we immediately move the acquired guard into txn_owner, enabling subsequent calls to IdOwner::borrow that yield OwnedIds. Dropping an OwnedId automatically returns the identifier to its owner so you can borrow it again later. If you only need the ID for a quick update you can skip the explicit owner entirely, bind the variable as a plain Id, and call Id::aquire when exclusive access is required.

Ownership and Eventual Consistency

While a simple grow set like the history stored in a Head already constitutes a conflict-free replicated data type (CRDT), it is also limited in expressiveness. To provide richer semantics while guaranteeing conflict-free mergeability we allow only "owned" IDs to be used in the entity position of newly generated triples. As owned IDs are [Send] but not [Sync] owning a set of them essentially constitutes a single writer transaction domain, allowing for some non-monotonic operations like if-does-not-exist, over the set of contained entities. Note that this does not make operations that would break CALM (consistency as logical monotonicity) safe, e.g. delete.

The trible module defines the Trible struct, the smallest unit of knowledge the system stores. Instances of Tribles live inside TribleSets, which index each fact in several complementary ways so that queries can be answered with as little work as possible.

┌────────────────────────────64 byte───────────────────────────┐
┌──────────────┐┌──────────────┐┌──────────────────────────────┐
│  entity-id   ││ attribute-id ││        inlined value         │
└──────────────┘└──────────────┘└──────────────────────────────┘
└────16 byte───┘└────16 byte───┘└────────────32 byte───────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶

At a high level a trible is a three-tuple consisting of an entity, an attribute, and a value. The entity and attribute are both 128‑bit abstract extrinsic identifiers as described in [crate::id], while the value is an arbitrary 256‑bit [crate::value::Value]. The value width deliberately matches the size of an intrinsic identifier so larger payloads can be referenced via blobs without inflating the inlined representation.

Abstract identifiers

Entities and attributes are purely extrinsic; their identifiers do not encode any meaning beyond uniqueness. An entity may accrue additional tribles over time and attributes simply name relationships without prescribing a schema. This keeps the format agnostic to external ontologies and minimises accidental coupling between datasets.

The value slot can carry any 256‑bit payload. Its size is dictated by the need to embed an intrinsic identifier for out‑of‑line data. When a fact exceeds this space the value typically stores a blob handle pointing to the larger payload.

Tribles are stored as a contiguous 64‑byte array with the entity occupying the first 16 bytes, the attribute the next 16, and the value the final 32 bytes. The name "trible" is a portmanteau of triple and byte and is pronounced like "tribble" from Star Trek – hence the project's mascot, Robert the tribble. This rigid layout keeps the representation friendly to SIMD optimisations and allows the storage layer to compute sizes deterministically.

Index permutations

TribleSets index each fact under all six permutations of entity (E), attribute (A) and value (V) so any combination of bound variables can be resolved efficiently. Regardless of which columns a query fixes the planner can reach matching leaves with a handful of comparisons:

┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐
│ EAV │  │ EVA │  │ AEV │  │ AVE │  │ VEA │  │ VAE │
└──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘
   │        │        │        │        │        │
┌───────────────────────────────────────────────────────┐
│            order-specific inner nodes                 │
└───────────────────────────────────────────────────────┘ 
   │        │        │        │        │        │
   ▼        ▼        ▼        ▼        ▼        ▼

┌───────────────────────────────────────────────────────┐
│                   SHARED LEAVES                       │
│     single canonical E–A–V tribles used by all        │
└───────────────────────────────────────────────────────┘

Each permutation maintains its own inner nodes, but all six share leaf nodes containing the 64‑byte trible. This avoids a naïve six‑fold memory cost while still letting the query planner pick the most selective ordering, keeping joins resistant to skew even when cardinalities vary widely.

Advantages

  • A total order over tribles enables efficient storage and canonicalisation.
  • Simple byte‑wise segmentation supports indexing and querying without an interning mechanism, keeping memory usage low and parallelisation easy while avoiding the need for garbage collection.
  • Schemas describe the value portion directly, making serialisation and deserialisation straightforward.
  • The fixed 64‑byte layout makes it easy to estimate the physical size of a dataset as a function of the number of tribles stored.
  • The minimalistic design aims to minimise entropy while retaining collision resistance, making it likely that a similar format would emerge through convergent evolution and could serve as a universal data interchange format.

Set operations and monotonic semantics

TribleSets provide familiar set-theoretic helpers such as TribleSet::union, TribleSet::intersection and TribleSet::difference. union consumes the right-hand operand and merges its contents into the receiver in place, while intersection and difference each produce a fresh TribleSet without mutating their inputs. Together these helpers make it straightforward to merge datasets, locate their overlap or identify the facts that still need to propagate between replicas while keeping the original sources intact.

This design reflects the crate's commitment to CALM-friendly, monotonic semantics. New information can be added freely, but existing facts are never destroyed. Consequently, difference is intended for comparing snapshots (e.g. "which facts are present in the remote set that I have not yet indexed?") rather than for destructive deletion. This keeps workflows declarative and convergent: sets can be combined in any order without introducing conflicts, and subtraction simply reports the gaps that remain to be filled.

Direction and consistency

In many triple stores the direction of an edge is chosen incidentally—there is no intrinsic preference for hasColor over colorOf. This ambiguity often leads to confusion, duplication, or both as different writers pick different conventions. Common mitigations either mirror every edge automatically (as done by OWL and RDF through inverseOf, doubling storage or demanding runtime inference) or devolve into bikeshedding about the "correct" orientation.

tribles avoids that trap by giving edge direction explicit semantics: the arrow points from the entity making the claim to the entity being described. The observer owns the identifier and is responsible for the consistency of the facts it asserts—see ID Ownership. This rule naturally fits the distributed setting where each entity has a single authoritative writer.

Viewed another way, edges always flow from describing to described entities, while cycles represent consensus between the parties involved. For example, hasColor must point from the object that exhibits the colour to the entity representing that colour. The orientation is therefore a consequence of the statement's meaning, not an arbitrary modelling choice.

Blobs

Blobs are immutable sequences of bytes used whenever data no longer fits into the fixed 256‑bit value slot of a trible. Instead of treating these payloads as untyped binary blobs, Tribles keeps track of their structure via BlobSchema. Much like ValueSchema drives how values are serialized into a trible, a BlobSchema defines how to encode and decode rich data into a byte sequence.

When to reach for blobs

Values and tribles capture compact facts – identifiers, timestamps, counters – in a fixed width. Whenever information grows beyond that footprint, blobs carry the payload while tribles continue to reference it. Common use cases include documents, media assets, serialized entity archives, or even domain specific binary formats. Because blobs are content addressed, the same payload stored twice automatically deduplicates to the same handle. In the in-memory implementation this falls straight out of the code: MemoryBlobStore::insert keeps a BTreeMap keyed by the handle and simply reuses the existing entry when the same digest shows up again.

Handles, schemas, and stores

Blobs live in a BlobStore. The store provides persistent storage and a content hash, determined by the selected HashProtocol, that acts as a stable handle. Handles can be embedded into tribles just like any other value so they benefit from the existing querying machinery. A handle couples the blob's hash with its BlobSchema so consumers always know how to deserialize the referenced bytes.

Converting Rust types to blobs is infallible in practice, therefore the ToBlob and TryFromBlob traits are the most common helpers. The TryToBlob and FromBlob variants have been dropped to keep the API surface small without losing ergonomics.

End‑to‑end example

The following example demonstrates creating blobs, archiving a TribleSet and signing its contents:

#![allow(unused)]
fn main() {
use triblespace::prelude::*;
use triblespace::examples::literature;
use triblespace::repo;
use valueschemas::{Handle, Blake3};
use blobschemas::{SimpleArchive, LongString};
use rand::rngs::OsRng;
use ed25519_dalek::{Signature, Signer, SigningKey};

// Build a BlobStore and fill it with some data.
let mut memory_store: MemoryBlobStore<Blake3> = MemoryBlobStore::new();

let book_author_id = fucid();
let quote_a: Value<Handle<Blake3, LongString>> = memory_store
    .put("Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic.")
    .unwrap();
let quote_b = memory_store
    .put("I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.")
    .unwrap();

let set = entity!{
   literature::title: "Dune",
   literature::author: &book_author_id,
   literature::quote: quote_a,
   literature::quote: quote_b
};

// Serialize the TribleSet and store it as another blob. The resulting
// handle points to the archived bytes and keeps track of its schema.
let archived_set_handle: Value<Handle<Blake3, SimpleArchive>> = memory_store.put(&set).unwrap();

let mut csprng = OsRng;
let commit_author_key: SigningKey = SigningKey::generate(&mut csprng);
let signature: Signature = commit_author_key.sign(
    &memory_store
        .reader()
        .unwrap()
        .get::<Blob<SimpleArchive>, SimpleArchive>(archived_set_handle)
        .unwrap()
        .bytes,
);

// Store the handle in another TribleSet so the archived content can be
// referenced alongside metadata and cryptographic proofs.
let _meta_set = entity!{
   repo::content: archived_set_handle,
   repo::short_message: "Initial commit",
   repo::signed_by: commit_author_key.verifying_key(),
   repo::signature_r: signature,
   repo::signature_s: signature,
};
}

Blobs complement tribles and values by handling large payloads while keeping the core data structures compact. Embedding handles into entities ties together structured metadata and heavyweight data without breaking immutability or introducing duplication. This division of labor lets tribles focus on querying relationships while BlobStores take care of storage concerns such as hashing, deduplication, and retrieval.

PATCH

The Persistent Adaptive Trie with Cuckoo-compression and Hash-maintenance (PATCH) is Trible Space’s workhorse for set operations. It combines three core ideas:

  1. Persistence. Updates clone only the modified path, so existing readers keep a consistent view while writers continue mutating. The structure behaves like an immutable value with copy-on-write updates.
  2. Adaptive width. Every node is conceptually 256-ary, yet the physical footprint scales with the number of occupied children.
  3. Hash maintenance. Each subtree carries a 128-bit fingerprint that allows set operations to skip identical branches early.

Together these properties let PATCH evaluate unions, intersections, and differences quickly while staying cache friendly and safe to clone.

Node layout

Traditional Adaptive Radix Trees (ART) use specialised node types (Node4, Node16, Node48, …) to balance space usage against branching factor. PATCH instead stores every branch in the same representation:

  • The Branch header tracks the first depth where the node diverges (end_depth) and caches a pointer to a representative child leaf (childleaf). These fields give PATCH its path compression — a branch can cover several key bytes, and we only expand into child tables once the children disagree below end_depth.
  • Children live in a byte-oriented cuckoo hash table backed by a single slice of Option<Head>. Each bucket holds two slots and the table grows in powers of two up to 256 entries.

Insertions reuse the generic modify_child helper, which drives the cuckoo loop and performs copy-on-write if a branch is shared. When the existing allocation is too small we allocate a larger table with the same layout, migrate the children, and update the owning pointer in place. Because every branch uses the same structure we avoid the tag soup and pointer chasing that ARTs rely on while still adapting to sparse and dense fan-out.

Resizing strategy

PATCH relies on two hash functions: an identity map and a pseudo-random permutation sampled once at startup. Both hashes feed a simple compressor that masks off the unused high bits for the current table size. Doubling the table therefore only exposes one more significant bit, so each child either stays in its bucket or moves to the partner bucket index + old_bucket_count.

The byte_table_resize_benchmark demonstrates how densely the table can fill before resizing. The benchmark inserts all byte values repeatedly and records the occupancy that forced each power-of-two table size to grow:

ByteTable resize fill - random: 0.863, sequential: 0.972
Per-size fill (random)
  size   2: 1.000  # path compression keeps two-entry nodes fully occupied
  size   4: 0.973
  size   8: 0.899
  size  16: 0.830
  size  32: 0.749
  size  64: 0.735
  size 128: 0.719
  size 256: 1.000  # identity hash maps all 256 children without resizing
Per-size fill (sequential)
  size   2: 1.000  # path compression keeps two-entry nodes fully occupied
  size   4: 1.000
  size   8: 0.993
  size  16: 1.000
  size  32: 0.928
  size  64: 0.925
  size 128: 0.927
  size 256: 1.000  # identity hash maps all 256 children without resizing

Random inserts average roughly 86 % table fill while sequential inserts stay near 97 % before the next doubling. Small nodes stay compact because the path-compressed header only materialises a table when needed, while the largest table reaches full occupancy without growing past 256 entries. These predictable fill factors keep memory usage steady without ART’s specialised node types.

Hash maintenance

Every leaf stores a SipHash-2-4 fingerprint of its key, and each branch XORs its children’s 128-bit hashes together. On insert or delete the previous hash contribution is XORed out and the new value XORed in, so updates run in constant time. Set operations such as difference compare these fingerprints first: matching hashes short-circuit because the subtrees are assumed identical, while differing hashes force a walk to compute the result. SipHash collisions are astronomically unlikely for these 128-bit values, so the shortcut is safe in practice.

Consumers can reorder or segment keys through the KeySchema and KeySegmentation traits. Prefix queries reuse the schema’s tree ordering to walk just the matching segments. Because every update is implemented with copy-on-write semantics, cloning a tree is cheap and retains structural sharing: multiple workspaces can branch, mutate independently, and merge results without duplicating entire datasets.

Pile Format

The on-disk pile keeps every blob and branch in one append-only file. This layout provides a simple write-ahead log style database where new data is only appended. It allows both blob and branch storage in a single file while remaining resilient to crashes. The pile backs local repositories and acts as a durable content‑addressed store. The pile file can be memory mapped for fast reads and is safely shared between threads because existing bytes are never mutated.

While large databases often avoid mmap due to pitfalls with partial writes and page cache thrashing [1], this design works well for the pile's narrow usage pattern.

Design Rationale

This format emphasizes simplicity over sophisticated on-disk structures. Appending new blobs rather than rewriting existing data keeps corruption windows small and avoids complicated page management. Storing everything in a single file makes a pile easy to back up or replicate over simple transports while still allowing it to be memory mapped for fast reads. The 64 byte alignment ensures each entry begins on a cache line boundary, which improves concurrent access patterns and allows safe typed views with the zerocopy crate.

Immutability Assumptions

A pile is treated as an immutable append-only log. Once a record sits below a process's applied offset, its bytes are assumed permanent. The implementation does not guard against mutations; modifying existing bytes is undefined behavior. Only the tail beyond the applied offset might hide a partial append after a crash, so validation and repair only operate on that region. Each record's validation state is cached for the lifetime of the process under this assumption.

Hash verification only happens when blobs are read. Opening even a very large pile is therefore fast while still catching corruption before data is used.

Every record begins with a 16 byte magic marker that identifies whether it stores a blob or a branch. The sections below illustrate the layout of each type.

Usage

A pile typically lives as a .pile file on disk. Repositories open it through Pile::open and then call refresh to load existing records or restore to repair after a crash. Multiple threads may share the same handle thanks to internal synchronisation, making a pile a convenient durable store for local development. Blob appends use a single O_APPEND write. Each handle remembers the last offset it processed and, after appending, scans any gap left by concurrent writes before advancing this applied_length. Writers may race and duplicate blobs, but content addressing keeps the data consistent. Each handle tracks hashes of pending appends separately so repeated writes are deduplicated until a refresh. A restore clears this in-memory set to discard truncated appends. Branch updates only record the referenced hash and do not verify that the corresponding blob exists in the pile, so a pile may act as a head-only store when blob data resides elsewhere. Updating branch heads requires a brief critical section: flush → refresh → lock → refresh → append → unlock. The initial refresh acquires a shared lock so it cannot race with restore, which takes an exclusive lock before truncating a corrupted tail.

Filesystems lacking atomic write/vwrite appends—such as some network or FUSE-based implementations—cannot safely host multiple writers and are not supported. Using such filesystems risks pile corruption.

Blob Storage

                             8 byte  8 byte
            ┌────16 byte───┐┌──────┐┌──────┐┌────────────32 byte───────────┐
          ┌ ┌──────────────┐┌──────┐┌──────┐┌──────────────────────────────┐
 header   │ │magic number A││ time ││length││             hash             │
          └ └──────────────┘└──────┘└──────┘└──────────────────────────────┘
            ┌────────────────────────────64 byte───────────────────────────┐
          ┌ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
          │ │                                                              │
 payload  │ │              bytes (64byte aligned and padded)               │
          │ │                                                              │
          └ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┘

Each blob entry records its creation timestamp, the length of the payload (which may be zero) and its hash. The payload is padded so the next record begins on a 64 byte boundary. The Pile Blob Metadata chapter explains how to query these fields through the PileReader API.

Branch Storage

            ┌────16 byte───┐┌────16 byte───┐┌────────────32 byte───────────┐
          ┌ ┌──────────────┐┌──────────────┐┌──────────────────────────────┐
 header   │ │magic number B││  branch id   ││             hash             │
          └ └──────────────┘└──────────────┘└──────────────────────────────┘

Branch entries map a branch identifier to the hash of a blob.

Recovery

Calling refresh scans an existing file to ensure every header uses a known marker and that the whole record fits. It does not verify any hashes. If a truncated or unknown block is found the function reports the number of bytes that were valid so far using [ReadError::CorruptPile].

If the file shrinks between scans into data that has already been applied, the process aborts immediately. Previously returned Bytes handles would dangle and continuing could cause undefined behavior, so truncation into validated data is treated as unrecoverable.

refresh holds a shared file lock while scanning. This prevents a concurrent restore call from truncating the file out from under the reader.

The restore helper re-runs the same validation and truncates the file to the valid length if corruption is encountered. This recovers from interrupted writes by discarding incomplete data. Hash verification happens lazily only when individual blobs are loaded so that opening a large pile remains fast.

For more details on interacting with a pile see the Pile struct documentation.

Pile Blob Metadata

Every blob stored in a pile begins with a header that records the moment it was written and the length of the payload that follows. The Pile implementation surfaces that information so tools can answer questions such as "when did this blob arrive?" without re-parsing the file on disk.

BlobMetadata

Blob metadata is exposed through the BlobMetadata struct. It is a simple value type containing two public fields:

  • timestamp: the write time stored in the blob header. The pile records the number of milliseconds since the Unix epoch when the blob was appended.
  • length: the size of the blob payload in bytes. Padding that aligns entries to 64 byte boundaries is excluded from this value, so it matches the slice returned by PileReader::get.

Looking up blob metadata

PileReader::metadata accepts the same Value<Handle<_, _>> that other blob store APIs use. It returns Some(BlobMetadata) when the blob exists and the stored bytes hash to the expected value; otherwise it yields None.

Readers operate on the snapshot that was current when they were created. Call Pile::refresh and request a new reader to observe blobs appended afterwards.

use anybytes::Bytes;
use triblespace::blob::schemas::UnknownBlob;
use triblespace::blob::Blob;
use triblespace::repo::pile::Pile;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut pile = Pile::open("/tmp/example.pile")?;

    let blob = Blob::<UnknownBlob>::new(Bytes::from_static(b"hello world"));
    let handle = pile.put(blob)?;

    let reader = pile.reader()?;
    if let Some(meta) = reader.metadata(handle) {
        println!(
            "Blob length: {} bytes, appended at {} ms",
            meta.length, meta.timestamp
        );
    }

    Ok(())
}

Failure cases

metadata returns None in a few situations:

  • the handle does not correspond to any blob stored in the pile;
  • the reader snapshot predates the blob (refresh the pile and create a new reader to see later writes);
  • validation previously failed because the on-disk bytes did not match the recorded hash, for example after the pile file was corrupted before this process opened it.

When None is returned, callers can treat it the same way they would handle a missing blob from get: the data is considered absent from the snapshot they are reading.

For more detail on how the metadata is laid out on disk, see the Pile Format chapter.