Introduction
Welcome to the Tribles Book. This first chapter is your map to Trible
Space—the problems it tackles, the core ideas behind the triblespace crate, and
the kinds of projects it was built to support. By the end of these opening
pages you should be able to recognize when Trible Space is the right tool,
whether you are prototyping something new, extending an existing system, or
adapting it for your own research purposes.
Why Trible Space exists
Trible Space exists because teams need to steward complex, diverse, and interconnected datasets without losing context. Research groups, startups, and digital libraries must pair large binary payloads with fine-grained facts, synchronize findings across laptops, servers, and mobile devices, and prove that derived results trace back to their inputs. Combining a conventional database, object store, and version-control workflow rarely delivers that outcome: blobs drift away from their provenance, merges become brittle, and auditing which observations justified a decision or model is tedious.
Trible Space closes those gaps with a single substrate that stores heavyweight assets and the relationships that explain them. Think of it as a library catalog for blobs and a lab notebook for the annotations, measurements, and discussions that give those blobs meaning. Because facts and payloads travel together, features like version control, verifiability, and provenance fall naturally out of the data model instead of bolting on as afterthoughts.
That same structure also lets you support offline edits, reconcile concurrent changes safely, and ship datasets to partners with the evidence needed to trust them.
To deliver those outcomes, Trible Space blends ideas from databases, version control, and content-addressed storage. Information is encoded as fixed-width tribles: 64-byte entity–attribute–value facts. Each trible stores two 16-byte extrinsic identifiers plus a 32-byte typed value. When a value exceeds the inline slot it becomes a schema-qualified hash pointing to an immutable content-addressed blob. The blob holds the heavyweight payload, while the trible remains a compact fact that fits neatly into indexes, caches, and query engines. Because both tribles and blobs are immutable, you can keep them in memory, on disk, or in remote object stores without transformation. Content hashes serve as identifiers, so every payload has a stable address and integrity is easy to verify when data is shared or synchronized.
This design unlocks capabilities that are difficult to achieve together in traditional stacks:
- Trustworthy collaboration – hashes and immutable histories provide the audit trail needed to review changes, merge branches, and reproduce results across teams.
- Content-addressed storage – values are stored and looked up by their contents, making caches and replicas safe even when they live on untrusted infrastructure.
- Flexible querying – the query engine blends indexes on the fly, letting a single query range across trible sets, succinct indexes, and familiar Rust collections such as hash maps in one pass.
Taken together, these traits make it feasible to build systems with rich histories, reproducible computations, and verifiable data exchange while keeping the developer experience approachable.
Who this book is for
If you are new to Tribles, the opening chapters build vocabulary and provide a guided tour of the core data structures. Developers who already understand the problem space can skim ahead to detailed sections on schema design, query semantics, and the Atreides join algorithm. The book also points to the API documentation whenever you are ready to explore the crate directly.
How to read this book
The book is organized so you can either read it front-to-back or jump straight to the material that answers your questions. Each chapter layers new ideas onto the previous ones:
- Getting Started walks through installing the tooling, creating your first trible store, and issuing simple queries.
- Architecture and Query Engine explain how the runtime is structured so you can reason about performance and extensibility.
- Later sections explore schema design, incremental queries, repository workflows, and formal verification so you can grow from experiments to production systems.
Inline links lead to deeper resources and code samples. The Glossary offers quick refreshers on terminology, while Developing Locally covers how to set up a development environment and contribute back to the project. Whenever the book introduces a new concept, look for references to the crate documentation so you can inspect the corresponding APIs and examples.
By the end of this chapter you should have a mental model for why Trible Space is structured the way it is. From there, head to the chapters that match your goals—whether that means learning to query data effectively or integrating Tribles into a larger system.
Getting Started
This chapter walks you through creating a brand-new repository, committing
your first entity, and understanding the pieces involved. It assumes you have
Rust installed and are comfortable
with running cargo commands from a terminal.
1. Add the dependencies
Create a new binary crate (for example with cargo new tribles-demo) and add
the dependencies needed for the example. The triblespace crate provides the
database, ed25519-dalek offers an implementation of the signing keys used for
authentication, and rand supplies secure randomness.
cargo add triblespace ed25519-dalek rand
2. Build the example program
The walkthrough below mirrors the quick-start program featured in the
README. It defines the attributes your application needs, stages and queries
book data, publishes the first commit with automatic retries, and finally shows
how to use try_push when you want to inspect and reconcile a conflict
manually.
use triblespace::prelude::*; use triblespace::prelude::blobschemas::LongString; use triblespace::repo::{memoryrepo::MemoryRepo, Repository}; use ed25519_dalek::SigningKey; use rand::rngs::OsRng; mod literature { use triblespace::prelude::*; use triblespace::prelude::blobschemas::LongString; use triblespace::prelude::valueschemas::{Blake3, GenId, Handle, R256, ShortString}; attributes! { /// The title of a work. /// /// Small doc paragraph used in the book examples. "A74AA63539354CDA47F387A4C3A8D54C" as pub title: ShortString; /// A quote from a work. // For quick prototypes you can omit the id and let the macro derive it // from the name and schema: pub quote: Handle<Blake3, LongString>; /// The author of a work. "8F180883F9FD5F787E9E0AF0DF5866B9" as pub author: GenId; /// The first name of an author. "0DBB530B37B966D137C50B943700EDB2" as pub firstname: ShortString; /// The last name of an author. "6BAA463FD4EAF45F6A103DB9433E4545" as pub lastname: ShortString; /// The number of pages in the work. "FCCE870BECA333D059D5CD68C43B98F0" as pub page_count: R256; /// A pen name or alternate spelling for an author. "D2D1B857AC92CEAA45C0737147CA417E" as pub alias: ShortString; } } fn main() -> Result<(), Box<dyn std::error::Error>> { // Repositories manage shared history; MemoryRepo keeps everything in-memory // for quick experiments. Swap in a `Pile` when you need durable storage. let storage = MemoryRepo::default(); let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng)); let branch_id = repo .create_branch("main", None) .expect("create branch"); let mut ws = repo.pull(*branch_id).expect("pull workspace"); // Workspaces stage TribleSets before committing them. The entity! macro // returns sets that merge cheaply into our current working set. let author_id = ufoid(); let mut library = TribleSet::new(); library += entity! { &author_id @ literature::firstname: "Frank", literature::lastname: "Herbert", }; library += entity! { &author_id @ literature::title: "Dune", literature::author: &author_id, literature::quote: ws.put::<LongString, _>( "Deep in the human unconscious is a pervasive need for a logical \ universe that makes sense. But the real universe is always one \ step beyond logic." ), literature::quote: ws.put::<LongString, _>( "I must not fear. Fear is the mind-killer. Fear is the little-death \ that brings total obliteration. I will face my fear. I will permit \ it to pass over me and through me. And when it has gone past I will \ turn the inner eye to see its path. Where the fear has gone there \ will be nothing. Only I will remain." ), }; ws.commit(library, Some("import dune")); // `checkout(..)` returns the accumulated TribleSet for the branch. let catalog = ws.checkout(..)?; let title = "Dune"; // Use `_?ident` when you need a fresh variable scoped to this macro call // without declaring it in the find! projection list. for (f, l, quote) in find!( (first: String, last: Value<_>, quote), pattern!(&catalog, [ { _?author @ literature::firstname: ?first, literature::lastname: ?last }, { literature::title: title, literature::author: _?author, literature::quote: ?quote } ]) ) { let quote: View<str> = ws.get(quote)?; let quote = quote.as_ref(); println!("'{quote}'\n - from {title} by {f} {}.", l.from_value::<&str>()); } // Use `push` when you want automatic retries that merge concurrent history // into the workspace before publishing. repo.push(&mut ws).expect("publish initial library"); // Stage a non-monotonic update that we plan to reconcile manually. ws.commit( entity! { &author_id @ literature::firstname: "Francis" }, Some("use pen name"), ); // Simulate a collaborator racing us with a different update. let mut collaborator = repo .pull(*branch_id) .expect("pull collaborator workspace"); collaborator.commit( entity! { &author_id @ literature::firstname: "Franklin" }, Some("record legal first name"), ); repo.push(&mut collaborator) .expect("publish collaborator history"); // `try_push` returns a conflict workspace when the CAS fails, letting us // inspect divergent history and decide how to merge it. if let Some(mut conflict_ws) = repo .try_push(&mut ws) .expect("attempt manual conflict resolution") { let conflict_catalog = conflict_ws.checkout(..)?; for (first,) in find!( (first: Value<_>), pattern!(&conflict_catalog, [{ literature::author: &author_id, literature::firstname: ?first }]) ) { println!("Collaborator kept the name '{}'.", first.from_value::<&str>()); } ws.merge(&mut conflict_ws) .expect("merge conflicting history"); ws.commit( entity! { &author_id @ literature::alias: "Francis" }, Some("keep pen-name as an alias"), ); repo.push(&mut ws) .expect("publish merged aliases"); } Ok(()) }
3. Run the program
Compile and execute the example with cargo run. On success it creates an
example.pile file in the project directory and pushes a single entity to the
main branch.
cargo run
Afterwards you can verify the file exists with ls example.pile. Delete it when
you are done experimenting to avoid accidentally reading stale state.
Understanding the pieces
- Branch setup.
Repository::create_branchregisters the branch and returns anExclusiveIdguard. Dereference the guard (or callExclusiveId::release) to obtain theIdthatRepository::pullexpects when creating aWorkspace. - Minting attributes. The
attributes!macro names the fields that can be stored in the repository. Attribute identifiers are global—if two crates use the same identifier they will read each other's data—so give them meaningful project-specific names. - Committing data. The
entity!macro builds a set of attribute/value assertions. When paired with thews.commitcall it records a transaction in the workspace that becomes visible to others once pushed. - Publishing changes.
Repository::pushmerges any concurrent history into the workspace and retries automatically, making it ideal for monotonic updates where you are happy to accept the merged result. - Manual conflict resolution.
Repository::try_pushperforms a single optimistic attempt and returns a conflict workspace when the compare-and-set fails. Inspect that workspace when you want to reason about the competing history—such as non-monotonic edits—before merging and retrying. - Closing repositories. When working with pile-backed repositories it is
important to close them explicitly so buffered data is flushed and any errors
are reported while you can still decide how to handle them. Calling
repo.close()?;surfaces those errors; if the repository were only dropped, failures would have to be logged or panic instead. Alternatively, you can recover the underlying pile withRepository::into_storageand callPile::close()yourself.
See the crate documentation for additional modules and examples.
Switching signing identities
The setup above generates a single signing key for brevity, but collaborating
authors typically hold individual keys. Call Repository::set_signing_key
before branching or pulling when you need a different default identity, or use
Repository::create_branch_with_key and Repository::pull_with_key to choose a
specific key per branch or workspace. The Managing signing identities
section covers this workflow in more detail.
Developing Locally
Tribles is developed with stable Rust tooling and a small collection of helper scripts. This chapter walks through preparing a development environment, running tests, and rebuilding the documentation so you can iterate with confidence.
Prerequisites
-
Install a recent Rust toolchain from rustup.rs. The default
stablechannel is what the project targets. -
Clone the repository and switch into it:
git clone https://github.com/TribleSpace/triblespace-rs.git cd triblespace-rs -
(Optional) Install
mdbookwithcargo install mdbookif you would like to preview the rendered documentation locally.
Everyday workflows
The repository includes several scripts that keep formatting, tests, and book builds in sync:
./scripts/devtest.shexecutes the most relevant unit and integration tests for fast feedback while you are iterating../scripts/preflight.shruns formatting, the full test suite, and rebuilds this book. Run it before committing to ensure your branch is in good shape../scripts/build_book.shregenerates the documentation after you modify Markdown chapters or code snippets.
You can always fall back to the standard Cargo commands (cargo fmt,
cargo test, etc.) if you prefer to run specific tools by hand.
Rebuilding the book
Once mdbook is installed you can rebuild the documentation with:
./scripts/build_book.sh
This script compiles the chapters into book/book, allowing you to open the
HTML output in a browser. Rebuilding regularly helps catch stale examples and
keeps the published copy aligned with the codebase.
Architecture Overview
Trible Space is designed to keep data management simple, safe and fast. The README introduces these goals in more detail, emphasizing a lean design with predictable performance and straightforward developer experience. This chapter explains how the pieces fit together and why they are organised this way.
Design Goals
A full discussion of the motivation behind Trible Space can be found in the Philosophy section. At a high level we want a self‑contained data store that offers:
- Simplicity – minimal moving parts and predictable behaviour.
- Developer Experience – a clear API that avoids complex servers or background processes.
- Safety and Performance – sound data structures backed by efficient content addressed blobs.
These goals grew out of earlier "semantic" technologies that attempted to model knowledge as graphs. While systems like RDF promised great flexibility, in practice they often became difficult to host, query and synchronise. Trible Space keeps the idea of describing the world with simple statements but stores them in a form that is easy to exchange and reason about.
Architectural Layers
The system is organised into a small set of layers that compose cleanly:
- Data model – the immutable trible structures that encode facts.
- Stores – generic blob and branch storage traits that abstract over persistence backends.
- Repository – the coordination layer that combines stores into a versioned history.
- Workspaces – the in‑memory editing surface used by applications and tools.
Each layer has a tight, well defined boundary. Code that manipulates tribles never needs to know if bytes ultimately land on disk or in memory, and repository level operations never reach inside the data model. This separation keeps interfaces small, allows incremental optimisation and makes it easy to swap pieces during experimentation.
Data Model
The fundamental unit of information is a Trible. Its 64 byte layout is described in Trible Structure. A Trible links a subject entity to an attribute and value. Multiple tribles are stored in a TribleSet, which behaves like a hashmap with three columns — subject, attribute and value.
The 64 byte boundary allows tribles to live comfortably on cache lines and makes deduplication trivial. Because tribles are immutable, the runtime can copy, hash and serialise them without coordinating with other threads. Higher level features like schema checking and query planning are therefore free to assume that every fact they observe is stable for the lifetime of a query.
Trible Sets
TribleSets provide fast querying and cheap copy‑on‑write semantics. They can be merged, diffed and searched entirely in memory. When durability is needed the set is serialised into a blob and tracked by the repository layer.
To keep joins skew‑resistant, each set maintains all six orderings of entity, attribute and value. The trees reuse the same leaf nodes so a trible is stored only once, avoiding a naïve six‑fold memory cost while still letting the search loop pick the most selective permutation using the constraint heuristics.
Blob Storage
All persistent data lives in a BlobStore. Each blob is addressed by the hash of its contents, so identical data occupies space only once and readers can verify integrity by recomputing the hash. The trait exposes simple get and put operations, leaving caching and eviction strategies to the backend. Implementations decide where bytes reside: an in‑memory MemoryBlobStore, an on‑disk Pile described in Pile Format or a remote object store. Because handles are just 32‑byte hashes, repositories can copy or cache blobs without coordination. Trible sets, user blobs and commit records all share this mechanism.
Content addressing also means that blob stores can be layered. Applications commonly use a fast local cache backed by a slower durable store. Only the outermost layer needs to implement eviction; inner layers simply re-use the same hash keys, so cache misses fall through cleanly.
Branch Store
A BranchStore keeps track of the tips of each branch. Updates use a simple compare‑and‑set operation so concurrent writers detect conflicts. Both the in‑memory and pile repositories implement this trait.
Branch stores are intentionally dumb. They neither understand commits nor the shape of the working tree. Instead they focus on a single atomic pointer per branch. This reduces the surface area for race conditions and keeps multi‑writer deployments predictable even on eventually consistent filesystems.
Because only this single operation mutates repository state, nearly all other logic is value oriented and immutable. Conflicts surface only at the branch store update step, which simplifies concurrent use and reasoning about changes.
Repository
The Repository combines a blob store with a branch store. Commits store a trible set blob along with a parent link and signature. Because everything is content addressed, multiple repositories can share blobs or synchronize through a basic file copy.
Repository logic performs a few critical duties:
- Validation – ensure referenced blobs exist and signatures line up with the claimed authorship.
- Blob synchronization – upload staged data through the content-addressed blob store, which skips already-present bytes and reports integrity errors.
- History traversal – provide iterators that let clients walk parent chains efficiently.
All of these operations rely only on hashes and immutable blobs, so repositories can be mirrored easily and verified offline.
Workspaces
A Workspace represents mutable state during editing. Checking out or branching yields a workspace backed by a fresh MemoryBlobStore. Commits are created locally and only become visible to others when pushed, as described in Repository Workflows.
Workspaces behave like sandboxes. They host application caches, pending trible sets and user blobs. Because they speak the same blob language as repositories, synchronisation is just a matter of copying hashes from the workspace store into the shared store once a commit is finalised.
Commits and History
TribleSets written to blobs form immutable commits. Each commit references its parent, creating an append‑only chain signed by the author. This is the durable history shared between repositories.
Because commits are immutable, rollback and branching are cheap. Diverging histories can coexist until a user merges them by applying query language operations over the underlying trible sets. The repository simply tracks which commit each branch tip points to.
Putting It Together
+-----------------------------------------------+
| Repository |
| BlobStore (content addressed) |
| BranchStore (compare-and-set head) |
+----------------------------+------------------+
^ push/try_push pull
| |
| v
+----------------------------+------------------+
| Workspace |
| base_blobs reader (view of repo blobs) |
| MemoryBlobStore (staged blobs) |
| current head (latest commit reference) |
+----------------------------+------------------+
^ ^ |
| | checkout
commit add_blob |
| | v
+----------------------------+------------------+
| Application |
+----------------------------------------------+
Repository::pull reads the branch metadata, loads the referenced commit, and couples that history with a fresh MemoryBlobStore staged area plus a reader for the repository's existing blobs.【F:src/repo.rs†L820-L848】 Workspace methods then stage edits locally: Workspace::put (the helper that adds blobs) writes application data into the in-memory store while Workspace::commit converts the new TribleSet into blobs and advances the current head pointer.【F:src/repo.rs†L1514-L1568】 Applications hydrate their views with Workspace::checkout, which gathers the selected commits and returns the assembled trible set to the caller.【F:src/repo.rs†L1681-L1697】 When the changes are ready to publish, Repository::try_push enumerates the staged blobs, uploads them into the repository blob store, creates updated branch metadata, and performs the compare-and-set branch update before clearing the staging area.【F:src/repo.rs†L881-L1014】 Because every blob is addressed by its hash, repositories can safely share data through any common storage without coordination.
The boundaries between layers encourage modular tooling. A CLI client can operate entirely within a workspace while a sync service automates pushes and pulls between repositories. As long as components honour the blob and branch store contracts they can evolve independently without risking the core guarantees of Trible Space.
Query Engine
Queries describe the patterns you want to retrieve. The engine favors extreme
simplicity and aims for predictable latency and skew resistance without any
tuning. Every constraint implements the
Constraint trait so operators, sub-languages, and
even alternative data sources can compose cleanly. Query evaluation is
expressed as a negotiation between constraints, but the contract stays tiny:
constraints report which variables they touch, estimate how many candidates
remain for each variable, and can enumerate concrete values on demand. Those
estimates feed directly into the depth-first search; there is no standalone
planner to build or cache.
The constraint API mirrors the mental model you would use when reasoning about
a query by hand. Constraints expose their
VariableSet, provide estimate methods so the
engine can choose the next variable to bind, and implement propose to extend
partial assignments. Composite constraints, such as unions, may also override
confirm to tighten previously gathered proposals. This cooperative protocol
keeps the engine agnostic to where the data comes from—be it in-memory
indices, remote stores, or bespoke application predicates.
Search Loop
The engine executes a query as a depth-first search over partial bindings. The loop integrates the cardinality heuristics directly instead of running a separate planning phase. Picking a different heuristic simply yields another member of the Atreides join family:
- Initialisation – When a
Queryis constructed it asks the root constraint for its variable set. The engine records each variable'sinfluenceso it knows which estimates to refresh when bindings change, computes an initialestimatefor every variable, and sorts the yet-unbound list using those numbers. - Propose (and confirm) – The engine pops the most selective variable from
the sorted list and calls
proposeto collect candidate values that respect the current binding. Intersection constraints such as those built byand!reorder their children by increasing estimate and callconfirmon the remaining branches so inconsistent candidates are filtered out before the engine commits to them. Constraints that observe the partial assignment simply avoid proposing or confirming values they already know will fail. - Bind or Backtrack – If
proposeyielded values, the engine binds one, marks that variable as touched, and recurses. If no candidates remain it backtracks: the binding is undone, the variable returns to the unbound set, and the touched marker ensures dependent estimates refresh before the next attempt. - Yield – Once every variable is bound the post-processing closure runs and the tuple is produced as a result row.
Between iterations the engine refreshes estimates for any variables influenced by the newly bound values. Because constraints only observe the bindings they explicitly depend on, unrelated sub-queries interleave naturally and new constraint types can participate without custom planner hooks.
Queries as Schemas
You might notice that trible.space does not define a global ontology or schema
beyond associating attributes with a
ValueSchema or
BlobSchema. This is deliberate. The semantic web
taught us that per-value typing, while desirable, was awkward in RDF: literal
datatypes are optional, custom types need globally scoped IRIs and there is no
enforcement, so most data degenerates into untyped strings. Trying to regain
structure through global ontologies and class hierarchies made schemas rigid
and reasoning computationally infeasible. Real-world data often arrives with
missing, duplicate or additional fields, which clashes with these global,
class-based constraints.
Our approach is to be sympathetic to edge cases and have the system deal only with the data it declares capable of handling. These application-specific schema declarations are exactly the shapes and constraints expressed by our queries1. Data not conforming to these queries is simply ignored by definition, as a query only returns data satisfying its constraints.2
Join Strategy
The query engine uses the Atreides family of worst-case optimal join
algorithms. These algorithms leverage the same cardinality estimates surfaced
through Constraint::estimate to guide the depth-first search over variable
bindings, providing skew-resistant and predictable performance. Because the
engine refreshes those estimates inside the search loop, the binding order
adapts whenever a constraint updates its influence set—there is no separate
planning artifact to maintain. For a detailed discussion, see the Atreides
Join
chapter.
Query Languages
Instead of a single query language, the engine exposes small composable
constraints that combine with logical operators such as and and or. These
constraints are simple yet flexible, enabling a wide variety of operators while
still allowing the engine to explore the search space efficiently.
The query engine and data model are flexible enough to support many query styles, including graph, relational and document-oriented queries. Constraints may originate from the database itself (such as attribute lookups), from custom application logic, or from entirely external sources.
For example, the pattern! and
entity! macros—available at the crate root and re-exported
via triblespace::prelude (for instance with
use triblespace::prelude::*;)—generate constraints for a given trible pattern in
a query-by-example style reminiscent of SPARQL or GraphQL but tailored to a
document-graph data model. It would also be possible to layer a property-graph
language like Cypher or a relational language like Datalog on top of the
engine.3
use std::collections::HashSet; use tribles::examples::literature; use tribles::prelude::*; use tribles::prelude::valueschemas::ShortString; fn main() { let mut kb = TribleSet::new(); let author = ufoid(); let book = ufoid(); kb += entity! { &author @ literature::firstname: "Frank", literature::lastname: "Herbert", }; kb += entity! { &book @ literature::author: &author, literature::title: "Dune", }; let mut allowed = HashSet::<Value<ShortString>>::new(); allowed.insert("Frank".to_value()); let results: Vec<_> = find!((title: Value<_>, firstname: Value<_>), and!( allowed.has(firstname), pattern!(&kb, [{ ?person @ literature::firstname: ?firstname, literature::lastname: "Herbert", }, { literature::author: ?person, literature::title: ?title, }]) ) ) .collect(); assert_eq!(results.len(), 1); assert_eq!(results[0].0, "Dune".to_value()); }
The snippet above demonstrates how typed attribute constraints, user-defined
predicates (the HashSet::has filter), and reusable namespaces can mix
seamlessly within a single query.
Great care has been taken to ensure that query languages with different styles and semantics can coexist and even be mixed with other languages and data models within the same query. For practical examples of the current facilities, see the Query Language chapter.
Note that this query-schema isomorphism isn't necessarily true in all databases or query languages, e.g., it does not hold for SQL.
In RDF terminology: We challenge the classical A-Box & T-Box dichotomy by replacing the T-Box with a "Q-Box", which is descriptive and open rather than prescriptive and closed. This Q-Box naturally evolves with new and changing requirements, contexts and applications.
SQL would be a bit more challenging, as it is surprisingly imperative with its explicit JOINs and ORDER BYs, and its lack of a clear declarative semantics. This makes it harder to implement on top of a constraint-based query engine tailored towards a more declarative and functional style.
The Atreides Family of Worst-case Optimal Join Algorithms
The query engine reasons about data by solving a set of constraints over variables. Instead of constructing a traditional left-deep or bushy join plan, it performs a guided depth-first search that binds one variable at a time. The approach draws on the broader theory of worst-case optimal joins and lets us navigate the search space directly rather than materialising intermediate results.
Constraints as the search frontier
Every constraint implements the Constraint trait,
which exposes four abilities that shape the search:
estimate– predicts how many results remain for a variable under the current partial binding.propose– enumerates candidate values for a variable.confirm– filters a set of candidates without re-enumerating them.influence– reports which other variables need their estimates refreshed when this variable changes.
Traditional databases rely on a query planner to combine statistics into a join plan. Atreides instead consults the constraints directly while it searches. Each constraint can base its estimates on whatever structure it maintains—hash maps, precomputed counts, or even constant values for predicates that admit at most one match—so long as it can provide a quick upper bound. Whenever a binding changes, the engine asks the influenced constraints for fresh estimates. Those estimates are cached per variable and reused until another binding invalidates them, keeping the guidance loop responsive as the search progresses.
Because the heuristics are derived entirely from the constraints themselves, we do not need a separate query planner or multiple join implementations. Any custom constraint can participate in the same search by providing sensible estimates, proposal generation, confirmation, and influence tracking.
A spectrum of Atreides variants
The Atreides "family" refers to the spectrum of heuristics a constraint can use
when implementing Constraint::estimate. Each
variant exposes the same guided depth-first search, but with progressively
tighter cardinality guidance. In practice they all revisit their estimates when
bindings change; what differs is what quantity they approximate:
- Row-count Join (Jessica) estimates the remaining search volume for the entire constraint. If one variable is bound but two others are not, Jessica multiplies the candidate counts for the unbound pair (|b| × |c|) and reports that larger product. The number can wildly overshoot the next variable's frontier, yet it often tracks the overall work the constraint will perform.
- Distinct-value Join (Paul) narrows the focus to a single variable at a
time. It returns the smallest proposal buffer the constraint could produce for
any still-unbound variable, ignoring later confirmation filters. This is the
behaviour exercised by
Query::newtoday, which keeps the tightest candidate list on hand while the search walks forward. - Partial-binding Join (Ghanima) goes further by measuring the size of the
actual proposal the composite constraint can deliver for the current binding
and chosen variable. For an
andconstraint this corresponds to the intersection of its children after they have applied their own filtering, revealing how many candidates truly survive the local checks. - Exact-result Join (Leto) is an idealised limit where a constraint predicts how many of those proposed values extend all the way to full results once the remaining variables are also bound. Although no constraint currently achieves this omniscience, the interface supports it conceptually.
All four share the same implementation machinery; the difference lies in how
aggressively estimate compresses the constraint's knowledge. Even when only
partial information is available the search still functions, but better
estimates steer the traversal directly toward the surviving tuples.
Every constraint can decide which rung of this ladder it occupies. Simple
wrappers that only track total counts behave like Jessica, those that surface
their tightest per-variable proposals behave like Paul, and structures capable
of intersecting their children on the fly approach Ghanima's accuracy. The
engine does not need to know which variant it is running—estimate supplies
whatever fidelity the data structure can provide, and influence ensures that
higher quality estimates refresh when relevant bindings change.
Guided depth-first search
When a query starts, Query::new collects the
initial estimates and influence sets, sorts the unbound variables so the
tightest constraints are considered first, and caches per-variable proposal
buffers that can be reused across backtracking steps. The engine then walks the
search space as follows:
- Inspect the unbound variables.
- Refresh the cached estimates for any variables whose constraints were influenced by the latest binding.
- Pick the next variable to bind by sorting the unbound set on two criteria:
- the base‑2 logarithm of the estimate (smaller estimates are tried first),
- the number of other variables the constraints could influence (ties favour the most connected variable, which tends to prune the search faster).
- Ask the relevant constraints to
proposecandidates for that variable. Composite constraints enumerate the tightest member and callconfirmon the rest so that each candidate is checked without materialising cross products. - Push the candidates onto a stack and recurse until every variable is bound or the stack runs empty, in which case the engine backtracks.
Traditional databases rely on sorted indexes to make the above iteration tractable. Atreides still performs random lookups when confirming each candidate, but the cardinality hints let it enumerate the most selective constraint sequentially and probe only a handful of values in the wider ones. Because the search is depth-first, the memory footprint stays small and the engine can stream results as soon as they are found.
Consider a query that relates ?person to ?parent and ?city. The search
begins with all three variables unbound. If ?city only has a handful of
possibilities, its estimate will be the smallest, so the engine binds ?city
first. Each city candidate is checked against the parent and person constraints
before the search continues, quickly rejecting infeasible branches before the
higher-cardinality relationships are explored.
Per-variable estimates in practice
Suppose we want to answer the following query:
(find [?person ?parent ?city]
[?person :lives-in ?city]
[?person :parent ?parent]
[?parent :lives-in ?city])
There are three variables and three constraints. Every constraint can provide a cardinality hint for each variable it touches, and the combined query records the tightest estimate for each variable:
| Variable | Contributing constraints (individual estimates) | Stored estimate |
|---|---|---|
?person | ?person :lives-in ?city (12), ?person :parent ?parent (40) | 12 |
?parent | ?person :parent ?parent (40), ?parent :lives-in ?city (6) | 6 |
?city | ?person :lives-in ?city (12), ?parent :lives-in ?city (6) | 6 |
The estimates are scoped to individual variables even when no single constraint
covers the whole tuple. The engine chooses the variable with the tightest bound,
?parent, and asks the constraints that mention it for proposals. Each
candidate parent immediately passes through the ?parent :lives-in ?city
constraint, which usually narrows the possible cities to a handful. Those
cities, in turn, constrain the possible ?person bindings. If a branch fails —
for example because no child of the selected parent lives in the same city — the
engine backtracks and tries the next parent. The smallest estimated constraints
therefore guide the search towards promising combinations and keep the
depth-first traversal from thrashing through unrelated values.
Implementation notes
- During iteration the engine keeps a stack of bound variables, a
Bindingstructure holding the active assignments, and atouched_variablesset that marks which estimates need refreshing before the next decision point. - Highly skewed data still behaves predictably: even if one attribute dominates the dataset, the other constraints continue to bound the search space tightly and prevent runaway exploration.
- The per-variable proposal buffers allow repeated proposals without reallocating, which is especially helpful when backtracking over large domains.
Why worst-case optimal?
Worst-case optimal join algorithms guarantee that their running time never exceeds the size of the output by more than a constant factor, even for pathological inputs. The Atreides family retains this property because the search always explores bindings in an order that honours the cardinality bounds supplied by the constraints. As a result the engine remains robust under heavy skew, sparse joins, and high-dimensional queries without requiring a bespoke join plan for each case.
This combination of simple heuristics, incremental estimates, and a disciplined search strategy keeps the implementation straightforward while delivering the performance characteristics we need for real-world workloads.
Query Language
This chapter introduces the core query facilities provided by triblespace. A
query is described in a small declarative language that states which values
should match instead of spelling out the iteration strategy. When you read a
query, you are effectively looking at a logical statement about the data: if
the constraints can all be satisfied, then the variable bindings are produced
as results. The declarative style gives the engine freedom to reorder work and
choose efficient execution strategies.
Every macro shown here is a convenience wrapper around a concrete
Constraint implementation. When you need finer
control—or want to assemble constraints manually outside the provided
macros—reach for the corresponding builder types in
tribles::query.
Declaring a query
The find! macro builds a
Query by declaring variables and a constraint
expression. The macro mirrors Datalog syntax: the head ((...)) lists the
variables you want back, and the body describes the conditions they must meet.
A minimal invocation looks like this:
#![allow(unused)] fn main() { let results = find!((a), a.is(1.into())).collect::<Vec<_>>(); }
find! returns an Iterator over tuples of the bound
variables. Matches can be consumed lazily or collected into common
collections:
#![allow(unused)] fn main() { for (a,) in find!((a), a.is(1.into())) { println!("match: {a}"); } }
The head tuple supports pattern destructuring, so adding more variables is as
simple as expanding the list: find!((a, b, c), ...) yields (a, b, c) tuples.
Variables declared in the head can be reused multiple times inside the
constraint to express joins. When a variable appears in several clauses the
engine ensures every occurrence binds to the same value. Repeating a variable in
two patterns, for example, restricts the result set to entities that satisfy
both attribute assignments simultaneously. The order of declarations defines the
shape of the tuple in the iterator, so reorganising the head changes how you
destructure results.
Typed variables
Variables optionally include a concrete type to convert the underlying value.
The constraint phase still works with untyped Value
instances; conversion happens when the tuple is emitted. These conversions use
FromValue and panic if decoding to the requested
type fails. Bind the variable as a Value<_> and reach
for TryFromValue yourself when you want to
surface conversion errors instead of panicking.
#![allow(unused)] fn main() { find!((x: i32, y: Value<ShortString>), and!(x.is(1.into()), y.is("foo".to_value()))) }
The first variable is read as an i32 and the second as a short string if the
conversion succeeds. The query engine walks all possible assignments that
satisfy the constraint and yields tuples of the declared variables in the order
they appear in the head.
Collecting results
Any type that implements FromIterator can collect
the results of a query. Vec<_> is common for tests and examples, while
HashSet<_> is useful when the match order is irrelevant. When you only need
the first result, call iterator adapters such as next, find, or try_fold
to avoid materializing the full result set.
Built-in constraints
find! queries combine a small set of constraint operators to form a
declarative language for matching tribles. Each operator implements
Constraint and can therefore be mixed and nested
freely.
| Macro | Purpose | Notes |
|---|---|---|
and! | Require every sub-constraint to hold | Builds an IntersectionConstraint. |
or! | Accept any satisfied alternative | Produces a UnionConstraint whose branches must reference the same variables. |
ignore! | Drop variables from a sub-query | Removes the listed variables from planning so a constraint can contribute only along the remaining bindings. |
temp! | Mint hidden helper variables | Allocates fresh bindings for the nested expression so the helpers can join across patterns without being projected. |
pattern! | Match attribute assignments in a collection | Expands to a TriblePattern-backed constraint that relates attributes and values for the same entity. |
pattern_changes! | Track attribute updates incrementally | Builds a TriblePattern constraint that yields newly added triples from a change set because incremental evaluation stays monotonic; see Incremental Queries for the broader evaluation workflow. |
.is(...) | Pin a variable to a constant | Wraps a ConstantConstraint that compares the binding against a literal value. |
has | Check membership in a collection | Collections such as HashSet expose .has(...) when they implement ContainsConstraint; triple stores like TribleSet instead participate through pattern!. |
Any data structure that can iterate its contents, test membership, and report
its size can implement ContainsConstraint. Membership constraints are
particularly handy for single-column collections such as sets or map key views,
while multi-position sources like TribleSet rely on pattern! to keep entity,
attribute, and value bindings aligned.
Constant matches (is)
Call Variable::is when you need a binding to
equal a specific value. The method returns a
ConstantConstraint
that checks whether the solver can assign the variable to the provided
Value. Constant constraints behave like any other
clause: combine them with and! to narrow a variable after other constraints
have proposed candidates, or place them inside or! branches to accept
multiple literals.
#![allow(unused)] fn main() { find!((title: Value<_>), and!(dataset.has(title), title.is("Dune".to_value()))); }
The snippet above keeps only the rows where title equals "Dune". Because
is constrains the variable's value rather than projecting a new binding, it
is also handy for helpers such as temp! when you want to filter hidden
bindings without exposing them in the result tuple.
pattern! and pattern_changes! construct constant constraints for literal
values automatically, so you often get the same behaviour simply by writing the
desired value in the pattern:
#![allow(unused)] fn main() { find!((friend: Value<_>), pattern!(&dataset, [{ _?person @ social::friend: ?friend, social::city: "Caladan" }])); }
Repeating .is(...) on the same variable with different values causes the
query to fail—just as conflicting pattern! clauses would—so prefer or! (or
switch to a membership helper such as .has(...)) when you want to accept
several constants.
Intersections (and!)
and! combines multiple constraints that must all hold
simultaneously. Each sub-clause can introduce new bindings or further narrow
existing ones, and the solver is free to reorder the work to reduce the search
space. When a sub-constraint fails to produce a candidate that is compatible
with the current bindings, the whole conjunction rejects that branch and moves
on. The macro accepts any number of arguments, so and!(...) is often a
convenient way to keep related clauses together without nesting additional
find! calls:
#![allow(unused)] fn main() { let favourites = favourite_titles(); // e.g. a HashSet<Id> built elsewhere find!((book: Value<_>, author: Value<_>), and!(favourites.has(book), pattern!(&dataset, [{ ?book @ literature::title: "Dune", literature::author: ?author }]))); }
Here the membership test over favourites and the attribute pattern from
dataset run as part of the same conjunction. The solver joins them on their
shared bindings (book and author) so only tuples that satisfy every clause
make it into the result set. Because and! simply returns a constraint, you
can nest it inside other combinators such as temp!, ignore!, or or! to
structure queries however you like.
Alternatives (or!)
Use or! to express alternatives. Each branch behaves
like an independent constraint and may introduce additional bindings that
participate in the surrounding query, provided every branch mentions the same
set of variables:
#![allow(unused)] fn main() { find!((alias: Value<_>), temp!((entity), or!(pattern!(&dataset, [{ ?entity @ profile::nickname: ?alias }]), pattern!(&dataset, [{ ?entity @ profile::display_name: ?alias }])))); }
Each branch contributes every match it can produce given the current bindings.
In the example above, people who have both a nickname and a display name yield
two rows—one for each attribute—because the solver keeps the union of all
solutions to preserve the query's monotonic semantics. Branches that cannot
match simply contribute nothing. Because each branch is still a full constraint,
combine or! with temp! when you need hidden helpers or wrap portions in
ignore! to drop positions that do not matter for a particular alternative. If
two branches reference different variables the macro panics at runtime, so keep
the variable sets aligned even when some branches ignore portions of a
constraint.
Ignoring bindings (ignore!)
Ignored variables are handy when a sub-expression references fields you want to
drop. The IgnoreConstraint
subtracts the listed bindings from the constraint's
VariableSet, so the planner never attempts to
join them with the outer query, project them into the results, or even solve
for those positions. From the solver's perspective those slots vanish
completely—it keeps evaluating the remaining bindings while treating the
ignored ones as don't-care wildcards. Triple-based constraints, for example,
always propose entity/attribute/value combinations; wrapping them in
ignore!((value), ...) continues to constrain the entity and attribute while
discarding the value column entirely. Clauses that reference at least one
surviving variable still run and continue to narrow those bindings. If a branch
mentions only ignored variables there is nothing left to relate to the outer
query, so the planner has no variable along which to schedule it; the inner
constraint is never consulted and the expression behaves as though it were
omitted.
The identifiers you list inside ignore! expand to fresh bindings scoped to
the nested expression, but subtracting them from the outer plan means the solver
never unifies those bindings—or even asks the constraint to propose values for
them. Even if you repeat the same name across multiple clauses, each occurrence
behaves like an independent wildcard. Reach for temp! when you
want a hidden variable to participate in the surrounding plan without being
projected; reach for ignore! when you want to use a multi-column constraint
while only keeping some of its positions.
This inertness is a strict scoping rule rather than existential quantification.
If you need to assert that some related fact exists without reifying its value,
structure the pattern so the surviving variables capture that dependency. The
macro automatically uses the ambient context that find! or matches!
provides, so typical invocations only specify the variable list and nested
constraint:
#![allow(unused)] fn main() { find!((person: Value<_>), ignore!((street_value), pattern!(&dataset, [{ ?person @ contacts::street: ?street_value }]))); }
Here the pattern still constrains person because the triple ties the entity to
an attribute, yet the actual street string is ignored. Had the block mentioned
only street_value, the entire expression would have evaporated—there would be
no remaining link to the rest of the query—so the outer query would not learn
anything. Reusing street_value elsewhere in the ignored expression also does
not force the sub-clauses to agree, because each occurrence is treated as its
own wildcard; introduce a temp! binding when you need the same hidden value to
appear in multiple places.
Temporary variables (temp!)
Real queries often need helper bindings that participate in the joins but do
not show up in the result tuple. Wrap the relevant constraint with
temp!((...vars...), expr) to mint hidden variables and evaluate expr with
them in scope:
#![allow(unused)] fn main() { find!((person: Value<_>), temp!((friend), and!(pattern!(&dataset, [{ _?p @ social::person: ?person, social::friend: ?friend }]), pattern!(&dataset, [{ ?friend @ social::city: "Caladan" }])))); }
The helper binding friend links the two patterns, ensuring the same entity is
used across both clauses without expanding the result tuple. temp! can create
multiple variables at once (temp!((street, city), ...)). Like ignore!, you
always wrap the hidden bindings in a tuple, so each invocation reads
temp!((...vars...), ...). Here social would be a namespace module exporting
the person, friend, and city attributes. The variables adopt the value
schemas implied by the constraints they appear in, so no extra annotations are
required. When working outside the query macros, call
VariableContext::next_variable
directly instead.
When the helper variable lives entirely within a single pattern, consider using
_?alias instead of temp!. Both pattern! and
pattern_changes! support _?ident placeholders that
mint fresh bindings scoped to that one macro invocation. They behave like
non-projected variables: you can reuse the _?ident multiple times inside the
pattern to relate attributes, but the binding vanishes once control leaves the
macro. Reach for temp! when the helper must span several constraints or when
you need to reuse the same hidden binding across multiple patterns.
Example
#![allow(unused)] fn main() { use triblespace::prelude::*; use triblespace::examples::{self, literature}; let dataset = examples::dataset(); for (title,) in find!((title: Value<_>), and!(dataset.has(title), title.is("Dune".to_value()))) { println!("Found {}", title.from_value::<&str>()); } }
This query searches the example dataset for the book titled "Dune". The variables and constraint can be adapted to express more complex joins and filters. For instance, you can introduce additional variables to retrieve both the title and the author while sharing the same dataset predicate:
#![allow(unused)] fn main() { for (title, author) in find!((title: Value<_>, author: Value<_>), and!(title.is("Dune".to_value()), pattern!(&dataset, [{ _?book @ literature::title: ?title, literature::author: ?author }]))) { println!("{title} was written by {}", author.from_value::<&str>()); } }
The extra variables participate in the join automatically; no explicit loop nesting or indexing is required.
Attribute patterns (pattern!)
The pattern! macro provides a concise way to match entities by attribute
assignments. It expands to a constraint that can be used directly inside
find!.
Important: in pattern! values prefixed with ? refer to variables declared
in the surrounding find! head while string/number literals and more complex
expressions are treated as literal values. Use _?name when you need a fresh
variable that is scoped to a single macro invocation; you can reference it
multiple times within the same pattern without adding it to the find! head.
Parenthesised expressions remain supported for explicit literals.
#![allow(unused)] fn main() { let mut kb = TribleSet::new(); let e = ufoid(); kb += entity! { &e @ literature::firstname: "William", literature::lastname: "Shakespeare" }; let results: Vec<_> = find!((ee: Id), pattern!(&kb, [{ ?ee @ literature::firstname: "William" }])).collect(); assert_eq!(results.len(), 1); }
Patterns may contain multiple clauses and reuse _? bindings to relate
attributes without introducing extra columns in the result set. A single
_?person variable can connect several attribute/value pairs while staying
scoped to the pattern:
#![allow(unused)] fn main() { let mut kb = TribleSet::new(); let e = ufoid(); kb += entity! { &e @ literature::firstname: "Frank", literature::lastname: "Herbert" }; let author_last_names: Vec<_> = find!((last: Value<_>), pattern!(&kb, [{ _?person @ literature::firstname: "Frank", literature::lastname: ?last }]) ).collect(); }
Here _?person remains scoped to the pattern while ensuring both attributes are
drawn from the same entity. When a pattern references collections other than a
TribleSet, ensure the collection implements
TriblePattern so that the macro can materialize
the requested triples.
To share a hidden binding across multiple patterns, declare it once with
temp! and reference it with ?name from each clause:
#![allow(unused)] fn main() { let mut kb = TribleSet::new(); let alice = ufoid(); let bob = ufoid(); kb += entity! { &alice @ social::name: "Alice", social::friend: &bob }; kb += entity! { &bob @ social::name: "Bob" }; let results: Vec<_> = find!((friend_name: Value<_>), temp!((friend), and!(pattern!(&kb, [{ _?person @ social::friend: ?friend, social::name: ?friend_name }]), pattern!(&kb, [{ ?friend @ social::name: "Bob" }])))) .collect(); }
The _?person variable is still local to the first pattern, while friend
joins the two constraints without changing the projected results. As above,
social denotes a namespace that defines the name and friend attributes.
matches!
Sometimes you only want to check whether a constraint has any solutions. The
matches! macro mirrors the find! syntax but returns a boolean:
#![allow(unused)] fn main() { use triblespace::prelude::*; assert!(matches!((x), x.is(1.into()))); assert!(!matches!((x), and!(x.is(1.into()), x.is(2.into())))); }
Internally, matches! stops as soon as the first result is found. It is a
lightweight alternative to find! when the mere existence of a match matters
more than the actual bindings.
Custom constraints
Every building block implements the
Constraint trait. You can implement this trait on
your own types to integrate custom data sources or query operators with the
solver. Collections that want to power pattern! implement
TriblePattern so they can materialize the
entity/attribute/value triples a pattern asks for. Membership-style helpers
such as has(...) work with anything that implements
ContainsConstraint, making it easy to join
against pre-existing indexes, caches, or service clients without copying data
into a TribleSet.
#![allow(unused)] fn main() { use std::collections::HashSet; use triblespace::prelude::*; use triblespace::prelude::valueschemas::ShortString; use triblespace::query::hashsetconstraint::SetConstraint; struct ExternalTags<'a> { tags: &'a HashSet<String>, } impl<'a> ContainsConstraint<'a, ShortString> for ExternalTags<'a> { type Constraint = SetConstraint<ShortString, &'a HashSet<String>, String>; fn has(self, variable: Variable<ShortString>) -> Self::Constraint { SetConstraint::new(variable, self.tags) } } let tags: HashSet<String> = ["rust", "datalog"].into_iter().map(String::from).collect(); let external = ExternalTags { tags: &tags }; let matches: Vec<_> = find!((tag: Value<ShortString>), external.has(tag)).collect(); }
The example wraps an external HashSet so it can be queried directly. A
TriblePattern implementation follows the same shape: create a constraint
type that reads from your backing store and return it from pattern. The query
engine drives both traits through Constraint, so any data source that can
estimate, propose, and confirm candidate values can participate in find!.
Regular path queries
The path! macro lets you search for graph paths matching a regular
expression over edge attributes. It expands to a
RegularPathConstraint and can be
combined with other constraints. Invoke it through a namespace module
(social::path!) to implicitly resolve attribute names:
#![allow(unused)] fn main() { use triblespace::prelude::*; mod social { use triblespace::prelude::*; use triblespace::prelude::valueschemas::*; attributes! { "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" as follows: GenId; "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB" as likes: GenId; } } let mut kb = TribleSet::new(); let a = fucid(); let b = fucid(); let c = fucid(); kb += entity!{ &a @ social::follows: &b }; kb += entity!{ &b @ social::likes: &c }; let results: Vec<_> = find!((s: Value<_>, e: Value<_>), path!(&kb, s (social::follows | social::likes)+ e)).collect(); }
You can omit the hex literal in attributes! when you only need local or
short‑lived attributes—the macro then derives a deterministic id from the name
and schema just like Attribute::from_name. Stick with explicit ids when the
attributes form part of a shared protocol.
The middle section uses a familiar regex syntax to describe allowed edge
sequences. Editors with Rust macro expansion support provide highlighting and
validation of the regular expression at compile time. Paths reference
attributes from a single namespace; to traverse edges across multiple
namespaces, create a new namespace that re-exports the desired attributes and
invoke path! through it.
The endpoints of the path behave like ordinary variables. Bind them in the
find! head to join the traversal with additional constraints—for example,
restricting the starting entity or projecting the destination's attributes. If
you want to follow the path but keep one endpoint unprojected, wrap the
traversal in temp! so the hidden binding can participate in follow-up
clauses:
#![allow(unused)] fn main() { let interesting_post = fucid(); let influencers = find!((start: Value<_>), temp!((end), and!(path!(&kb, start social::follows+ end), pattern!(&kb, [{ ?end @ social::likes: interesting_post.to_value() }])))) .collect::<Vec<_>>(); }
Combining path! with other constraints like this enables expressive graph
queries while staying in the same declarative framework as the rest of the
chapter.
Incremental Queries
The query engine normally evaluates a pattern against a complete
TribleSet, recomputing every match from scratch. Applications that
ingest data continuously often only need to know which results are
introduced by new tribles. Tribles supports this with semi‑naive
evaluation, a classic incremental query technique. Instead of running
the whole query again, we focus solely on the parts of the query that
can see the newly inserted facts and reuse the conclusions we already
derived from the base dataset.
Delta evaluation
Given a base dataset and a set of newly inserted tribles, the engine runs the original query multiple times. Each run restricts a different triple constraint to the delta while the remaining constraints see the full set. The union of these runs yields exactly the new solutions. The process is:
- compute a
deltaTribleSetcontaining only the inserted tribles, - for every triple in the query, evaluate a variant where that triple
matches against
delta, - union all per‑triple results to obtain the incremental answers.
Because each variant touches only one triple from the delta, the work grows with the number of constraints and the size of the delta set rather than the size of the full dataset.
In practice the base dataset is often cached or already available to the application. Maintaining a delta set alongside it lets the engine quickly answer “what changed?” without re-deriving prior results. If the delta is empty the engine can skip evaluation entirely, making idle updates effectively free.
Monotonicity and CALM
Removed results are not tracked. Tribles follow the CALM principle: a program whose outputs are monotonic in its inputs needs no coordination. Updates simply add new facts and previously derived conclusions remain valid. When conflicting information arises, applications append fresh tribles describing their preferred view instead of retracting old ones. Stores may forget obsolete data, but semantically tribles are never deleted.
Exclusive IDs and absence checks
Exclusive identifiers tighten the blast radius of non-monotonic logic
without abandoning CALM. Holding an ExclusiveId proves that no other
writer can add tribles for that entity, so checking for the absence of a
triple about that entity becomes stable: once you observe a missing
attribute, no concurrent peer will later introduce it. This permits
existence/absence queries in the narrow scope of entities you own while
keeping global queries monotonic.
Even with that safety net, prefer monotonic reads and writes when possible
because they compose cleanly across repositories. Absence checks should be
reserved for workflows where the ExclusiveId guarantees a closed world
for the entity—such as asserting a default value when none exists or
verifying invariants before emitting additional facts. Outside that
boundary, stick to append-only predicates so derived results remain valid
as new data arrives from other collaborators.
Example
The pattern_changes! macro expresses these delta queries. It behaves
like pattern! but takes the current TribleSet and a precomputed
changeset. The macro unions variants of the query where each triple is
constrained to that changeset, matching only the newly inserted tribles.
It keeps the incremental flow compatible with the familiar find!
interface, so callers can upgrade existing queries without changing how
they collect results.
#![allow(unused)] fn main() { let storage = MemoryRepo::default(); let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng)); let branch_id = repo.create_branch("main", None).expect("branch"); let mut ws = repo.pull(*branch_id).expect("pull"); // Commit initial data let shakespeare = ufoid(); let hamlet = ufoid(); let mut base = TribleSet::new(); base += entity! { &shakespeare @ literature::firstname: "William", literature::lastname: "Shakespeare" }; base += entity! { &hamlet @ literature::title: "Hamlet", literature::author: &shakespeare }; ws.commit(base.clone(), None); let c1 = ws.head().unwrap(); // Commit a new book let macbeth = ufoid(); let mut change = TribleSet::new(); change += entity! { &macbeth @ literature::title: "Macbeth", literature::author: &shakespeare }; ws.commit(change.clone(), None); let c2 = ws.head().unwrap(); // Compute updated state and delta between commits let base_state = ws.checkout(c1).expect("base"); let updated = ws.checkout(c2).expect("updated"); let delta = updated.difference(&base_state); // Find new titles by Shakespeare let results: Vec<_> = find!( (author: Value<_>, book: Value<_>, title: Value<_>), pattern_changes!(&updated, &delta, [ { ?author @ literature::firstname: "William", literature::lastname: "Shakespeare" }, { ?book @ literature::author: ?author, literature::title: ?title } ]) ) .map(|(_, b, t)| (b, t)) .collect(); println!("{results:?}"); }
The example stages an initial commit, records a follow-up commit, and
then compares their checkouts. Feeding the latest checkout alongside the
difference between both states allows pattern_changes! to report just
the additional solutions contributed by the second commit. This mirrors
how an application can react to a stream of incoming events: reuse the
same query, but swap in a fresh delta set each time new data arrives.
Delta maintenance always comes down to set algebra. pattern_changes!
cares only that you hand it a TribleSet containing the fresh facts,
and every convenient workflow for producing that set leans on the same
few operations:
- take two snapshots and
differencethem to discover what was added; unionthe new facts into whatever baseline you keep cached;intersecta candidate subset when you need to focus a change further.
Workspaces showcase this directly. Each checkout materializes a
TribleSet, so comparing two history points is just another snapshot
diff: take the newer checkout, difference it against the older one to
obtain the delta, and hand both sets to pattern_changes!. That matches
the local buffering story as well. Keep a baseline TribleSet for the
current state, accumulate incoming facts in a staging set, and union the
staging set with the baseline to produce the updated snapshot you pass as
the first argument. The delta comes from difference(&updated, &old) or
from the staging set itself when you only stage fresh facts. Reusing the
same set helpers keeps the bookkeeping short, avoids custom mirrors of
the data, and stays efficient no matter where the updates originate.
Comparing history points
Workspace::checkout accepts commit selectors
which can describe ranges in repository history. Checking out a range
like a..b walks the history from b back toward a, unioning the
contents of every commit that appears along the way but excluding commits
already returned by the a selector. When commits contain only the
tribles they introduce, that checkout matches exactly the fresh facts
added after a. Feeding that delta into pattern_changes! lets us ask,
“What new matches did commit b introduce over a?”
Trade‑offs
- Applications must compute and supply the delta set; the engine does not track changes automatically.
- Queries must remain monotonic since deletions are ignored.
- Each triple incurs an extra variant, so highly selective constraints keep incremental evaluation efficient.
- Delta sets that grow unboundedly lose their advantage. Regularly draining or compacting the changeset keeps semi-naive evaluation responsive.
Schemas
Trible Space stores data in strongly typed values and blobs. A schema
describes the language‑agnostic byte layout for these types: [Value]s always
occupy exactly 32 bytes while [Blob]s may be any length. Schemas translate
those raw bytes to concrete application types and decouple persisted data from a
particular implementation. This separation lets you refactor to new libraries or
frameworks without rewriting what's already stored or coordinating live
migrations. The crate ships with a collection of ready‑made schemas located in
triblespace::core::value::schemas and
triblespace::core::blob::schemas.
When data crosses the FFI boundary or is consumed by a different language, the schema is the contract both sides agree on. Consumers only need to understand the byte layout and identifier to read the data—they never have to link against your Rust types. Likewise, the Rust side can evolve its internal representations—add helper methods, change struct layouts, or introduce new types—without invalidating existing datasets.
Why 32 bytes?
Storing arbitrary Rust types requires a portable representation. Instead of human‑readable identifiers like RDF's URIs, Tribles uses a fixed 32‑byte array for all values. This size provides enough entropy to embed intrinsic identifiers—typically cryptographic hashes—when a value references data stored elsewhere in a blob. Keeping the width constant avoids platform‑specific encoding concerns and makes it easy to reason about memory usage.
Conversion traits
Schemas define how to convert between raw bytes and concrete Rust types. The
conversion traits ToValue/FromValue and their fallible counterparts live on
the schema types rather than on Value itself, avoiding orphan‑rule issues when
supporting external data types. The Value wrapper treats its bytes as opaque;
schemas may validate them or reject invalid patterns during conversion.
Fallible conversions (TryFromValue / TryToValue) are particularly useful for
schemas that must validate invariants, such as checking that a timestamp falls
within a permitted range or ensuring reserved bits are zeroed. Returning a
domain‑specific error type keeps validation logic close to the serialization
code.
#![allow(unused)] fn main() { use tribles::value::schemas::shortstring::ShortString; use tribles::value::{TryFromValue, TryToValue, Value}; struct Username(String); impl TryToValue<ShortString> for Username { type Error = &'static str; fn try_to_value(&self) -> Result<Value<ShortString>, Self::Error> { if self.0.is_empty() { Err("username must not be empty") } else { self.0 .as_str() .try_to_value::<ShortString>() .map_err(|_| "username too long or contains NULs") } } } impl TryFromValue<'_, ShortString> for Username { type Error = &'static str; fn try_from_value(value: &Value<ShortString>) -> Result<Self, Self::Error> { String::try_from_value(value) .map(Username) .map_err(|_| "invalid utf-8 or too long") } } }
Schema identifiers
Every schema declares a unique 128‑bit identifier via the shared
ConstMetadata::id() hook (for example, <ShortString as ConstMetadata>::id()).
Persisting these IDs keeps serialized data self describing so other tooling can
make sense of the payload without linking against your Rust types. Dynamic
language bindings (like the Python crate) inspect the stored schema identifier
to choose the correct decoder, while internal metadata stored inside Trible
Space can use the same IDs to describe which schema governs a value, blob, or
hash protocol.
Identifiers also make it possible to derive deterministic attribute IDs when you
ingest external formats. Helpers such as Attribute::<S>::from_name("field")
combine the schema ID with the source field name to create a stable attribute so
re-importing the same data always targets the same column.
The attributes! macro can use the same derivation when you omit the 128-bit id
literal, which is useful for quick experiments or internal attributes; for
schema that will be shared across binaries or languages prefer explicit ids so
the column remains stable even if the attribute name later changes.
Built‑in value schemas
The crate provides the following value schemas out of the box:
GenId– an abstract 128 bit identifier.ShortString– a UTF-8 string up to 32 bytes.U256BE/U256LE– 256-bit unsigned integers.I256BE/I256LE– 256-bit signed integers.R256BE/R256LE– 256-bit rational numbers.F256BE/F256LE– 256-bit floating point numbers.HashandHandle– cryptographic digests and blob handles (seehash.rs).ED25519RComponent,ED25519SComponentandED25519PublicKey– signature fields and keys.NsTAIIntervalto encode time intervals.UnknownValueas a fallback when no specific schema is known.
#![allow(unused)] fn main() { use triblespace::core::metadata::ConstMetadata; use triblespace::core::value::schemas::shortstring::ShortString; use triblespace::core::value::{ToValue, ValueSchema}; let v = "hi".to_value::<ShortString>(); let raw_bytes = v.raw; // Persist alongside the schema's metadata id. let schema_id = <ShortString as ConstMetadata>::id(); }
Built‑in blob schemas
The crate also ships with these blob schemas:
LongStringfor arbitrarily long UTF‑8 strings.SimpleArchivewhich stores a raw sequence of tribles.SuccinctArchiveBlobwhich stores theSuccinctArchiveindex type for offline queries. TheSuccinctArchivehelper exposes high-level iterators while theSuccinctArchiveBlobschema is responsible for the serialized byte layout.UnknownBlobfor data of unknown type.
#![allow(unused)] fn main() { use triblespace::metadata::ConstMetadata; use triblespace::blob::schemas::longstring::LongString; use triblespace::blob::{Blob, BlobSchema, ToBlob}; let b: Blob<LongString> = "example".to_blob(); let schema_id = <LongString as ConstMetadata>::id(); }
Defining new schemas
Custom formats implement [ValueSchema] or [BlobSchema]. A unique identifier
serves as the schema ID. The example below defines a little-endian u64 value
schema and a simple blob schema for arbitrary bytes.
#![allow(unused)] fn main() { pub struct U64LE; impl ConstMetadata for U64LE { fn id() -> Id { id_hex!("0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A") } fn describe() -> ( triblespace::core::trible::TribleSet, triblespace::core::blob::MemoryBlobStore<triblespace::core::value::schemas::hash::Blake3>, ) { ( triblespace::core::trible::TribleSet::new(), triblespace::core::blob::MemoryBlobStore::new(), ) } } impl ValueSchema for U64LE { type ValidationError = Infallible; } impl ToValue<U64LE> for u64 { fn to_value(self) -> Value<U64LE> { let mut raw = [0u8; VALUE_LEN]; raw[..8].copy_from_slice(&self.to_le_bytes()); Value::new(raw) } } impl FromValue<'_, U64LE> for u64 { fn from_value(v: &Value<U64LE>) -> Self { u64::from_le_bytes(v.raw[..8].try_into().unwrap()) } } pub struct BytesBlob; impl ConstMetadata for BytesBlob { fn id() -> Id { id_hex!("B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0") } } impl BlobSchema for BytesBlob {} impl ToBlob<BytesBlob> for Bytes { fn to_blob(self) -> Blob<BytesBlob> { Blob::new(self) } } impl TryFromBlob<BytesBlob> for Bytes { type Error = Infallible; fn try_from_blob(b: Blob<BytesBlob>) -> Result<Self, Self::Error> { Ok(b.bytes) } } }
See examples/custom_schema.rs for the full
source.
Versioning and evolution
Schemas form part of your persistence contract. When evolving them consider the following guidelines:
- Prefer additive changes. Introduce a new schema identifier when breaking compatibility. Consumers can continue to read the legacy data while new writers use the replacement ID.
- Annotate data with migration paths. Store both the schema ID and a
logical version number if the consumer needs to know which rules to apply.
UnknownValue/UnknownBloballow you to safely defer decoding until a newer binary is available. - Keep validation centralized. Place invariants in your schema conversions so migrations cannot accidentally create invalid values.
By keeping schema identifiers alongside stored values and blobs you can roll out new representations incrementally: ship readers that understand both IDs, update your import pipelines, and finally switch writers once everything recognizes the replacement schema.
Portability & Common Formats
Tribles aims to be a durable, language-agnostic database. That means the bytes we store must remain meaningful when they cross process, platform, or version boundaries. Rust's native representations are not stable enough for that job, so we model portable schemas that describe how a value should be encoded into a 32-byte buffer.
Why 32 Bytes?
The 32-byte window is a deliberate compromise. It is small enough to move
quickly through memory and across networks, yet it offers enough entropy to hold
intrinsic identifiers such as hashes. When a payload cannot fit within those 32
bytes we store the larger content in a blob and reference it from the value via
its identifier. The uniform size also keeps the storage layer simple: a Value
is always the same size regardless of its schema.
Schemas as Contracts
A schema describes which bit patterns are meaningful for a particular value. The
Value type is parameterised by such a schema and remains
agnostic about whether the underlying bytes currently satisfy the contract.
Validation lives in the schema through the ValueSchema
trait, while conversions to concrete Rust types use ToValue, FromValue,
TryToValue, and TryFromValue. Because conversion traits are implemented
for the schema instead of the Value type itself, we avoid Rust's orphan rule
and allow downstream crates to add their own adapters.
Schemas carry two optional identifiers:
- Value schema ID – Uniquely distinguishes the schema that governs the 32-byte value buffer.
- Blob schema ID – Identifies the schema of any external blob a value may reference.
These identifiers let us document schemas inside the knowledge graph itself. They also provide stable lookup keys for registries or tooling that need to understand how bytes should be interpreted.
Working Safely with Values
Serialisation is effectively a controlled form of transmutation: we reinterpret bytes through the lens of a schema. Crossing from one schema to another will not cause undefined behaviour, but it may produce nonsensical data if the target schema expects a different layout. Validation routines and conversion helpers exist to guard against that outcome.
When you define a new schema:
- Create a zero-sized marker type and implement
ValueSchemafor it. - Add conversions between the schema and your Rust types via the conversion traits mentioned above.
- Validate inputs when only some bit patterns are acceptable.
With those pieces in place, values can round-trip between storage and strongly typed Rust code while remaining portable and future-proof.
Importing Other Data Formats
Import pipelines let you bring external datasets into a tribles repository without
hand-writing schemas or entity identifiers every time. This chapter introduces the
import namespace, explains how the JSON importers map foreign fields onto
attributes, and outlines how you can extend the same patterns to new formats.
Import Namespace Overview
The triblespace_core::import module collects conversion helpers that translate
structured documents into raw tribles. Today the namespace ships with two JSON
importers:
JsonImportergenerates fresh entity identifiers for every object it visits.DeterministicJsonImporterderives entity identifiers by hashing the encoded attribute/value pairs so the same input always reproduces the same entities.
Both variants accept encoder callbacks for JSON primitives. Those closures turn
strings, numbers, and booleans into Value instances and can allocate blobs or
perform validation before handing the data back to the importer. The
valueschemas::Boolean helper stores false as all-zero bytes and true as
all ones so JSON flags round-trip without ambiguity when you wire the boolean
encoder to it.
Both importers accumulate statements internally. After feeding one or more
JSON documents through import_value or import_str, call data() to inspect
the tribles that were emitted and metadata() to retrieve attribute
descriptors. When you want to start a fresh batch without recomputing attribute
hashes, clear_data() drops the accumulated statements while leaving the
attribute caches intact. Call clear() to reset both the staged data and the
attribute caches when you need a completely fresh run.
Mapping JSON Fields to Attributes
Attributes are derived through Attribute::from_name, which hashes the JSON
field name together with the ValueSchema selected for that primitive. The
importers cache the resulting RawIds per field and schema so the hash only has
to be computed once per run. Arrays are treated as multi-valued fields: every
item is encoded and stored under the same attribute identifier, producing one
trible per element.
After an import completes the importers regenerate metadata from their cached
attribute maps. The metadata()
accessor returns tribles that link each derived attribute id to its field name,
value schema, and optional blob schema. Merge those descriptors into your
repository alongside the imported data when you want queries to discover the
original JSON field names or project datasets by schema without repeating the
derivation logic.
Nested objects recurse automatically. The parent receives a GenId attribute
that points at the child entity, allowing the importer to represent the entire
object graph as a connected set of tribles. Because those GenId attributes are
also derived from the parent field names they remain stable even when you import
related documents in separate batches.
Managing Entity Identifiers
JsonImporter::new defaults to ufoid() for identifier generation, but the
constructor is parameterized so you can inject your own policy—for example, a
fucid() generator or any other closure that returns an ExclusiveId. The
custom generator is applied consistently to every object the importer touches,
including nested documents.
DeterministicJsonImporter takes a different approach. It buffers the encoded
attribute/value pairs for each object, sorts them, and feeds the resulting byte
stream into a user-supplied hash protocol. The first 16 bytes of that digest
become the entity identifier, ensuring identical JSON inputs produce identical
IDs even across separate runs. Once the identifier is established, the importer
writes the cached pairs into its internal trible set via Trible::new, so
subsequent calls to data() expose the deterministic statements alongside the
metadata generated for every derived attribute.
This hashing step also changes how repeated structures behave. When a JSON
document contains identical nested objects—common in fixtures such as
citm_catalog or Twitter exports—the deterministic importer emits the same
identifier for each recurrence. Only the first copy reaches the underlying
TribleSet; later occurrences are recognised as duplicates and skipped during
the merge. The nondeterministic importer must still mint a fresh identifier for
every repetition, so it inserts and deduplicates a full set of triples each
time. Even if the ID generator itself is fast, that extra merge work makes the
deterministic importer benchmark faster on datasets with significant repetition.
Working with Encoder Callbacks
Encoder callbacks receive borrowed references to the raw JSON values. Because the closures are generic over a lifetime you can capture external resources—like a blob store connection or a schema registry—without allocating reference-counted wrappers. Callers can stage binary payloads in whichever blob backend they prefer and return handles that will be persisted alongside the tribles.
The callbacks report failures through EncodeError. You can construct an error
with a simple message or wrap an existing error type. The importer surfaces the
field name alongside the original error so schema mismatches remain easy to
diagnose while keeping the hot path lightweight.
Extending the Importers
To support a new external format, implement a module in the import namespace
that follows the same pattern: decode the source data, derive attributes with
Attribute::from_name, and hand encoded values to Trible::new. Reuse the
lifetime-parameterized encoder callbacks so callers can plug in existing blob
stores or validation logic. If the format supplies stable identifiers, offer a
constructor that accepts a custom generator or hash protocol so downstream
systems can choose between ephemeral and deterministic imports.
Descriptive Structural Typing and the find! Idiom
This chapter documents the mental model and idioms we recommend when working with tribles. The model is intentionally descriptive: queries declare the shape of the data you want to see rather than prescribing a single concrete Rust type for every entity. This gives you the flexibility to keep the full graph around and to materialize only the view you need, when you need it.
Reading the chapter sequentially should equip you to:
- talk about entities in terms of the fields they expose rather than the structs you wish they were,
- design APIs that carry just enough information (a workspace, a checkout, an id) to ask for more data later, and
- know when it is worth materializing a typed projection and when the descriptive view is already the best representation.
Key ideas at a glance
- Attributes are typed fields (unlike untyped RDF predicates).
- An entity in tribles is a structural record: a bag of typed fields.
attributes!accepts explicit ids for shared columns and can derive deterministic ids from a name + schema when you omit the hex literal; use the derived form for quick, local attributes and reserve explicit ids for published protocols.- find! patterns are descriptive type checks / projections: they select entities that match a requested shape.
- entity! constructs ad‑hoc entities (like struct literals).
- Reified kinds/tags are attached via metadata::tag (GenId); projects often export canonical KIND_* constants you can pattern-match directly against.
- Prefer passing the Workspace + the checkout result (TribleSet) and an entity id around — only materialize a concrete Rust view when required.
- Strongly prefer operating on the tuples returned by
find!; wrapper structs should exist only as grudging adapters for APIs that demand them.
Why "descriptive" not "prescriptive"?
In a prescriptive system you define a named struct (type), commit to it, and force conversions at boundaries. Tribles instead let you describe the fields you need at call sites. That keeps code resilient to schema evolution and avoids unnecessary unfolding of the graph.
In linguistic terms: instead of insisting every entity be declared as CategoryX, you ask "show me entities that have fields A and B" and work with those. If an entity also has field C that's fine — it simply matches the descriptive pattern.
Type theory mapping (short)
- Structural typing: types are shapes of fields, not names.
- Width subtyping: records with more fields subsume records with fewer.
- Intersection types: requiring both patterns A and B is like A & B.
- Row polymorphism: patterns naturally allow additional (unspecified) fields to exist.
Core idioms and recommended patterns
1. Use Workspace as your core I/O handle
The Workspace is the primary object for interacting with a repository. It lets you open a branch, commit, push, checkout history, and — importantly — read blob handles (LongString) cheaply.
Pattern: open a workspace for the configured branch, checkout the HEAD ancestors to produce a TribleSet content snapshot for efficient read-only pattern matching, and use the same Workspace to lazily read blobs when you need them.
This avoids duplicating memory and allows cheap zero-copy access to LongString blobs.
Manager-owned repository and workspace DI
At runtime, prefer to give a long-lived manager (session, exporter, service)
ownership of a Repository<Pile<Blake3>>. Downstream code can depend on that
manager in one of two shapes:
- Accept a
&mut Repository<_>and open/pull the workspace you need inside the function. This works well for tasks that need to coordinate multiple checkouts or want to control the retry loop themselves, and the mutable borrow is typically short-lived: you only need it while creating or pushing workspaces. - Ask the manager to mint a
&mut Workspace<_>for the duration of a task (e.g. an update, render, or event-handling callback) and pass that mutable reference down. The manager remains responsible for merging or dropping the workspace when the task completes.
Both approaches avoid constructing piles or repositories ad-hoc and keep
lifecycle management centralized. Importantly, they let multiple tasks hold
distinct mutable workspaces over the same repository while only requiring a
single mutable borrow of the repository when you create or push those
workspaces. Library functions should therefore accept a &mut Repository<_>
or a &mut Workspace<_> (optionally paired with a TribleSet checkout for
read-only helpers) rather than opening new repositories inside hot paths.
Example (pseudocode):
#![allow(unused)] fn main() { // manager owns a Repository for the process/session lifetime let mut repo = manager.repo_mut(); let branch_id = manager.default_branch_id; // option 1: task pulls its own workspace let mut ws = repo.pull(branch_id)?; let content = ws.checkout(ws.head())?; // option 2: manager provides a workspace to a task callback manager.with_workspace(branch_id, |ws| { let snapshot = ws.checkout(ws.head())?; render(snapshot); Ok(()) })?; }
This pattern keeps startup/teardown centralized and eliminates the common hot-loop anti-pattern of repeatedly opening ephemeral repository instances.
2. Use find! as a descriptive type / projection
find! is not just a query language; it is the place where you declare the shape of the data you expect. Treat find! patterns as lightweight, inline type declarations. If an entity doesn't match, find! won't return it — no error, just absence.
When your project defines canonical tag ids (GenId constants) prefer to match the tag directly in the pattern rather than binding a short-string and filtering afterwards. Pattern clauses can be composed: you can match on tags and required attributes, and even join related entities in a single find! invocation.
Example: find plan snapshot ids (match tag directly)
#![allow(unused)] fn main() { // Match entities that have the canonical plan snapshot tag attached. for (e,) in find!((e: Id), triblespace::pattern!(&content, [{ ?e @ metadata::tag: (KIND_PLAN_SNAPSHOT) }])) { // `e` is a plan snapshot entity id; follow-up finds can read other fields } }
Worked example: composing a structural pattern
#![allow(unused)] fn main() { // Grab all active plans with a title and owner. for (plan_id, title, owner) in find!( (plan_id: Id, title: ShortString, owner: Id), tribles::pattern!(&content, [ { ?plan_id @ metadata::tag: (KIND_PLAN) }, { ?plan_id plan::status: (plan::STATUS_ACTIVE) }, { ?plan_id plan::title: ?title }, { ?plan_id plan::owner: ?owner }, ]) ) { // `title` is already typed as ShortString. // `owner` can drive a follow-up find! to pull account info as needed. } }
find! returns tuples of the values you requested in the head of the query.
Nothing more, nothing less. The example above reads as "give me the Id
bound to plan_id, the short string bound to title, and another Id
bound to owner for every entity matching these clauses." Because the
matching is descriptive, adding a new attribute such as plan::color does
not invalidate the call site — you only see the data you asked for.
3. Lazy, ad‑hoc conversions only where needed
If a function needs a few fields for an operation, ask for them with find! inside the function. If later you perform an operation that needs different fields, you can perform another small find! there. Don't materialize large subgraphs unless a single operation needs them.
The recommended function signature is minimal and focused on the tribles primitives. Conversions into bespoke structs are almost always a smell; they obscure which fields are actually used and quickly devolve into an unofficial schema. Treat any adapter as an opt-in shim that exists purely at integration boundaries where consumers refuse to speak tribles primitives.
#![allow(unused)] fn main() { fn handle_plan_update(ws: &mut Workspace<Pile<Blake3>>, plan_id: Id) -> io::Result<()> { // ad-hoc find! calls to read the fields we need let checkout = ws.checkout()?; if let Some((title,)) = find!((title: ShortString), tribles::pattern!(&checkout, [{ ?plan_id plan::title: ?title }])) .next() { // The returned tuple is already typed. Convert to an owned String only if // external APIs demand ownership. process_title(title.from_value::<&str>()); } Ok(()) } }
If you cannot avoid exposing a typed facade (for example because an external API insists on receiving a struct), keep the struct tiny, document that it is a legacy shim, and derive it straight from a find! tuple:
#![allow(unused)] fn main() { struct PlanSummary<'a> { id: Id, title: &'a str, } fn load_plan_summary<'a>(ws: &'a mut Workspace<Pile<Blake3>>, plan_id: Id) -> io::Result<Option<PlanSummary<'a>>> { let content = ws.checkout()?; Ok(find!( (title: ShortString), tribles::pattern!(&content, [{ ?plan_id plan::title: ?title }]) ) .next() .map(|(title,)| PlanSummary { id: plan_id, title: title.from_value::<&str>(), })) } }
The struct above is merely a view with borrowed references; it is not a
blessed schema. Resist the temptation to evolve it into a "real" model type —
extend the find! pattern first and only mirror those changes here when an
external interface forces your hand.
4. Read LongString as &str (zero-copy)
Blob schema types in tribles are intentionally zerocopy. Prefer the typed View API which returns a borrowed &str without copying when possible.
#![allow(unused)] fn main() { let view = ws .get::<View<str>, LongString>(handle) .map_err(|e| ...)?; // `handle` is a Value<Handle<Blake3, LongString>> let s: &str = view.as_ref(); // zero-copy view tied to the workspace lifetime // If you need an owned String: let owned = view.to_string(); }
Note: a View borrows data that is managed by the Workspace; avoid returning
&str that outlives the workspace or the View.
When you do need ownership, convert at the edge of the system. Internal helpers should continue to pass views or typed handles around so that the call site that triggers a blob fetch is easy to spot. Cloning the in-memory blob after it has been pulled is cheap; the expensive part is fetching it from storage (potentially a remote) in the first place.
5. Structural sharing and normalization patterns
When persisting graphs that contain many repeated or immutable pieces (e.g. steps in a plan), prefer structural sharing:
- Store canonical step entities (LongString blobs for their text).
- Create a lightweight "link" entity per plan that references the step ids and metadata like order and status.
On update, create new step entities only for truly new step text and add a new snapshot entity that references the steps. This keeps history immutable and easy to reason about.
6. Push/merge retry loop for writers
When pushing writes, use the standard push/merge loop to handle concurrent writers. Two options are available:
- Manual conflict handling with
try_push(single attempt; returns a conflicting workspace on CAS failure):
#![allow(unused)] fn main() { ws.commit(content, Some("plan-update")); let mut current_ws = ws; while let Some(mut incoming) = repo.try_push(&mut current_ws)? { incoming.merge(&mut current_ws)?; current_ws = incoming; } }
- Automatic retries with
push(convenience wrapper that merges and retries until success or error):
#![allow(unused)] fn main() { ws.commit(content, Some("plan-update")); // `push` will handle merge+retry internally; it returns Ok(()) on success // or an error if the operation ultimately failed. repo.push(&mut ws)?; }
Practical anti‑patterns
- Do not unfold the graph or convert it into nested Rust structs. It wastes CPU and memory and loses the benefits of tribles’ flexible reifications.
- Avoid holding repo locks across async/await points. Acquire workspaces, do the minimal synchronous I/O you need, then release locks before awaiting.
- Don’t assume presence of a field; be explicit about optional vs required semantics using Option / Result in typed adapters.
- Don't create ephemeral Repository instances every time you need to read or write data. Instead, own a long-lived Repository in a manager and expose workspace-opening helpers.
- Don't explicitly convert Values via from_value/to_value; use typed
find!patternsfind!((field: <field_type>), ...)andws.get::<View<...>, _>(handle)for blob reads. - Don't create helper functions for queries, use
find!patterns directly in the function that needs them. This keeps the shape of the data explicit and close to where it is used and avoids unnecessary unfolding of the graph. - Don't convert the structures returned by
find!into other Rust structs. Work with the returned tuples directly. find!should be thought of like a fundamental language primitive, likeif/match/for. It is not a "database query" that returns rows to be converted into structs, it is a way to describe the shape of the data you want to see. ORM-like abstractions that try to mapfind!results into structs are an anti-pattern.- Avoid reading blobs eagerly; prefer lazy reads via
ws.get::<View<...>, _>(handle). Allocate owned data only when necessary.
Glossary
- Workspace: the repo handle that opens branches, reads blobs, commits and pushes.
- TribleSet: the in-memory content snapshot returned by Workspace::checkout.
- find!: the macro you use to discover entities matching a pattern (a descriptive type declaration).
- entity!: construct an ad‑hoc entity into a TribleSet for commit.
- LongString: zero-copy blob schema for potentially-large text.
Closing notes
This chapter captures the pragmatic type story we use in tribles: describe the fields you need at the place you need them, keep the full graph, and materialize small views lazily.
Reviewers' checklist (quick)
- Prefer find!/pattern! projections for the fields needed by the function.
- Avoid converting graph into rust structs.
- Use ws.get::<View<...>, _>(handle) for zero-copy blob reads; allocate only when an owned value is required.
- Match canonical tag ids via metadata::tag (KIND_* constants).
- Manager-owned repo: long-lived Repository instances should be owned by a session/exporter/manager; library code should accept a Workspace or TribleSet rather than opening piles itself.
- Use push/merge retry loops for writers; avoid holding repo locks across async/await points.
The sections below contain copy‑pasteable recipes for common operations.
Idioms & code recipes
This section contains pragmatic, copy‑pasteable snippets and patterns you can reuse. The examples intentionally use the tribles macros (attributes!, find!, pattern!, entity!) directly — that is the intended style.
Reviewer checklist
When reviewing code that touches tribles, look for these items:
- Does the code use find! to select only the fields it needs, rather than unfolding the entire graph?
- Are blob reads kept lazy (only read LongString when necessary)?
- Are push flows using the push/merge retry loop to avoid losing concurrent updates?
- Is the code avoiding holding the repo's Mutex across awaits and long blocking operations?
Further reading and references
- See the tribles macros: attributes!, find!, pattern!, entity! in the tribles code for exact usage.
- Type theory: "row polymorphism", "structural typing", "width subtyping" if you want the formal background.
The Type Algebra of TribleSpace
Queries as Types, Data as Proofs
TribleSpace grew out of a pragmatic goal: keep data declarative, composable, and statically checkable in Rust. Along the way we discovered that the core operations already form a type algebra. The macros used to define attributes, entities, and queries map directly onto familiar type-theoretic constructions, yielding an isomorphism between relations over triples and types over records.
Attributes Introduce Atomic Types
Each attribute introduced with attributes! defines an atomic type — a unary relation between an entity identifier and the attribute’s value:
"A74AA..." as pub title : ShortString
Formally this is a function title : Id → ValueTitle, or, in relational terms, the set { (id, value) }.
In the codebase the macro emits either a pub const (when you supply the 128-bit id) or a static LazyLock (when you omit it) of type Attribute<S>, so the generated binding already carries the ValueSchema that governs the value column.
The derived form hashes the attribute name together with the schema metadata via Attribute::from_name, which is convenient for quick experiments; shared protocols should still pin explicit ids so collaborators and other languages read the same column.
These atomic pieces are the building blocks for everything else.
Entities as Intersection Types
An entity! expression forms a record type by intersecting atomic ones.
Semantically it is the meet (∧) of its constituent relations:
Entity{A, B} ≡ { A : ValueA } ∧ { B : ValueB }
At runtime entity! expands to a small TribleSet containing those facts; the procedural macro literally emits a fresh set, populates it with Trible::new calls, and returns the set by value.
At the type level it represents their conjunction.
Records are therefore intersection types: every additional field refines the shape without invalidating existing data.
Merge Drives Dataset Composition
The += operator delegates to TribleSet::union, exposed through the AddAssign implementation.
union performs set union on the six internal indexes that back a TribleSet.
When the entity identifiers are disjoint the effect is classic dataset union; when they coincide we get field conjunction—record extension.
This dual role is what keeps TribleSpace’s algebra compact.
One Operator, Two Readings
Everything that looks like “add another field” or “add another record” derives from the single law
union(facts₁, facts₂) = facts₁ ∪ facts₂.
The surrounding context determines how to read the result:
| Context | Effect | Algebraic Face |
|---|---|---|
| same entity ID | extend record | ∧ (field conjunction) |
| different IDs | add record | ∨ (set union) |
The behaviour follows from the idempotent, commutative nature of TribleSet::union; no special cases are required in the implementation.
Patterns as Row-Type Predicates
A find!/pattern! pair behaves like a row-type predicate:
pattern!([A, B]) ≈ ∀r. {A, B} ⊆ r.
Read this as “find every entity whose row type r is a supertype of this shape.”
The macro expands to an IntersectionConstraint built from TriblePattern constraints, so queries literally evaluate the conjunction of the row predicates.
Patterns compose intersectionally as well: combining two patterns is equivalent to requiring both row predicates simultaneously, mirroring intersection types at the query level.
The Lattice Perspective
TribleSets form a join-semilattice under union:
union(a, b) = a ∪ b
a ≤ b ⇔ a ⊆ b.
Projection, join, and filtering are lattice homomorphisms: they preserve joins and the partial order. Because the same algebra handles both data and metadata, the implementation remains uniform and performant—merging facts is always the same low-level operation.
Practical Payoffs
Seeing the primitives through a type-theoretic lens clarifies several ergonomic choices:
- Queries as proofs. Successful
find!rows witness that an entity inhabits the requested type; absence is simply failure to prove the predicate. - Descriptive schemas. Structural typing drops the need for globally declared records—the type system is implicit in the patterns you write.
- Composable extensions. Adding new attributes is monotonic: existing entities continue to satisfy prior predicates, and refined queries simply intersect more row constraints.
Summary Table
| Level | Concept | Operation |
|---|---|---|
| Attribute | atomic type | unary relation |
| Entity | conjunction | record formation |
| Dataset | union | set composition |
| Query | sub-row predicate | type constraint |
| Implementation | lattice union | ∪ |
With a single associative, commutative, idempotent union, we obtain both row extension and dataset union, and a unified logical framework that bridges data engineering with type theory. That economy of primitives allows the system to feel simple on the surface yet provide rich type theory underneath.
Repository Workflows
Working with a Tribles repository feels familiar to Git users, but the types make data ownership and lifecycle explicit. Keep the following vocabulary in mind when exploring the API:
- Repository – top-level object that tracks history through
BlobStoreandBranchStoreimplementations. - Workspace – mutable view of a branch, similar to Git's working directory and index combined. Workspaces buffer commits and custom blobs until you push them back to the repository.
- BlobStore – storage backend for commits and payload blobs.
- BranchStore – records branch metadata and head pointers.
Both stores can be in memory, on disk or backed by a remote service. The
examples in examples/repo.rs and examples/workspace.rs showcase these APIs
and are a great place to start if you are comfortable with Git but new to
Tribles.
Opening a repository
Repositories are constructed from any storage that implements the appropriate traits. The choice largely depends on your deployment scenario:
- Pick or compose a storage backend (see Storage Backends and Composition).
- Create a signing key for the identity that will author commits.
- Call
Repository::new(storage, signing_key)to obtain a handle.
Most applications perform the above steps once during start-up and then reuse
the resulting Repository. If initialization may fail (for example when opening
an on-disk pile), bubble the error to the caller so the process can retry or
surface a helpful message to operators.
Storage Backends and Composition
Repository accepts any storage that implements both the BlobStore and
BranchStore traits, so you can combine backends to fit your deployment. The
crate ships with a few ready-made options:
MemoryRepostores everything in memory and is ideal for tests or short-lived tooling where persistence is optional.Pilepersists blobs and branch metadata in a single append-only file. It is the default choice for durable local repositories and integrates with the pile tooling described in Pile Format.ObjectStoreRemoteconnects toobject_storeendpoints (S3, local filesystems, etc.). It keeps all repository data in the remote service and is useful when you want a shared blob store without running a dedicated server.HybridStorelets you split responsibilities, e.g. storing blobs on disk while keeping branch heads in memory or another backend. Any combination that satisfies the trait bounds works.
Backends that need explicit shutdown can implement StorageClose. When the
repository type exposes that trait bound you can call repo.close()? to flush
and release resources instead of relying on Drop to run at an unknown time.
This is especially handy for automation where the process may terminate soon
after completing a task.
use triblespace::core::repo::hybridstore::HybridStore;
use triblespace::core::repo::memoryrepo::MemoryRepo;
use triblespace::core::repo::objectstore::ObjectStoreRemote;
use triblespace::core::repo::Repository;
use triblespace::core::value::schemas::hash::Blake3;
use url::Url;
let blob_remote: ObjectStoreRemote<Blake3> =
ObjectStoreRemote::with_url(&Url::parse("s3://bucket/prefix")?)?;
let branch_store = MemoryRepo::default();
let storage = HybridStore::new(blob_remote, branch_store);
let mut repo = Repository::new(storage, signing_key);
// Work with repo as usual …
// repo.close()?; // if the underlying storage supports StorageClose
Branching
A branch records a line of history and carries the metadata that identifies who
controls updates to that history. Creating one writes initial metadata to the
underlying store and returns an ExclusiveId guarding the
branch head. Dereference that ID when you need a plain Id for
queries or workspace operations.
Typical steps for working on a branch look like:
- Create a repository backed by blob and branch stores via
Repository::new. - Initialize or look up a branch ID with helpers like
Repository::create_branch. When interacting with an existing branch callRepository::pulldirectly. - Commit changes in the workspace using
Workspace::commit. - Push the workspace with
Repository::push(or handle conflicts manually viaRepository::try_push) to publish those commits.
The example below demonstrates bootstrapping a new branch and opening multiple workspaces on it. Each workspace holds its own staging area, so remember to push before sharing work or starting another task.
#![allow(unused)] fn main() { let mut repo = Repository::new(pile, SigningKey::generate(&mut OsRng)); let branch_id = repo.create_branch("main", None).expect("create branch"); let mut ws = repo.pull(*branch_id).expect("pull branch"); let mut ws2 = repo.pull(ws.branch_id()).expect("open branch"); }
After committing changes you can push the workspace back. push will retry on
contention and attempt to merge, while try_push performs a single attempt and
returns Ok(Some(conflict_ws)) when the branch head moved. Choose the latter
when you need explicit conflict handling:
#![allow(unused)] fn main() { ws.commit(change, Some("initial commit")); repo.push(&mut ws)?; }
Managing signing identities
The key passed to Repository::new becomes the default signing identity for
branch metadata and commits. Collaborative projects often need to switch
between multiple authors or assign a dedicated key to automation. You can
adjust the active identity in three ways:
Repository::set_signing_keyreplaces the repository's default key. Subsequent calls to helpers such asRepository::create_branchorRepository::pulluse the new key for any commits created from those workspaces.Repository::create_branch_with_keysigns a branch's metadata with an explicit key, allowing each branch to advertise the author responsible for updating it.Repository::pull_with_keyopens a workspace that will sign its future commits with the provided key, regardless of the repository default.
The snippet below demonstrates giving an automation bot its own identity while letting a human collaborator keep theirs:
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::repo::Repository;
let alice = SigningKey::generate(&mut OsRng);
let automation = SigningKey::generate(&mut OsRng);
// Assume `pile` was opened earlier, e.g. via `Pile::open` as shown in previous sections.
let mut repo = Repository::new(pile, alice.clone());
// Create a dedicated branch for the automation pipeline using its key.
let automation_branch = repo
.create_branch_with_key("automation", None, automation.clone())?
.release();
// Point automation jobs at their dedicated identity by default.
repo.set_signing_key(automation.clone());
let mut bot_ws = repo.pull(automation_branch)?;
// Humans can opt into their own signing identity even while automation remains
// the repository default.
let mut human_ws = repo.pull_with_key(automation_branch, alice.clone())?;
human_ws and bot_ws now operate on the same branch but will sign their
commits with different keys. This pattern is useful when rotating credentials or
running scheduled jobs under a service identity while preserving authorship in
the history. You can swap identities at any time; existing workspaces keep the
key they were created with until you explicitly call
Workspace::set_signing_key.
Inspecting History
You can explore previous commits using Workspace::checkout which returns a
TribleSet with the union of the specified commit contents. Passing a single
commit returns just that commit. To include its history you can use the
ancestors helper. Commit ranges are supported for convenience. The expression
a..b yields every commit reachable from b that is not reachable from a,
treating missing endpoints as empty (..b) or the current HEAD (a.. and
..). These selectors compose with filters, so you can slice history to only
the entities you care about.
#![allow(unused)] fn main() { let history = ws.checkout(commit_a..commit_b)?; let full = ws.checkout(ancestors(commit_b))?; }
The history_of helper builds on the filter selector to
retrieve only the commits affecting a specific entity. Commit selectors are
covered in more detail in the next chapter:
#![allow(unused)] fn main() { let entity_changes = ws.checkout(history_of(my_entity))?; }
Working with Custom Blobs
Workspaces keep a private blob store that mirrors the repository's backing
store. This makes it easy to stage large payloads alongside the trible sets you
plan to commit. The Workspace::put helper stores any type
implementing ToBlob and returns a typed handle you can
embed like any other value. Handles are Copy, so you can commit them and reuse
them to fetch the blob later.
The example below stages a quote and an archived TribleSet, commits both, then
retrieves them again with strongly typed and raw views. In practice you might
use this pattern to attach schema migrations, binary artifacts, or other payloads
that should travel with the commit:
#![allow(unused)] fn main() { use ed25519_dalek::SigningKey; use rand::rngs::OsRng; use triblespace::blob::Blob; use triblespace::examples::{self, literature}; use triblespace::prelude::*; use triblespace::repo::{self, memoryrepo::MemoryRepo, Repository}; use blobschemas::{LongString, SimpleArchive}; let storage = MemoryRepo::default(); let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng)); let branch_id = repo.create_branch("main", None).expect("create branch"); let mut ws = repo.pull(*branch_id).expect("pull branch"); // Stage rich payloads before creating a commit. let quote_handle = ws.put("Fear is the mind-killer".to_owned()); let archive_handle = ws.put(&examples::dataset()); // Embed the handles inside the change set that will be committed. let mut change = triblespace::entity! { literature::title: "Dune (annotated)", literature::quote: quote_handle.clone(), }; change += triblespace::entity! { repo::content: archive_handle.clone() }; ws.commit(change, Some("Attach annotated dataset")); // Single-attempt push. Use `push` to let the repository merge and retry automatically. repo.try_push(&mut ws).expect("try_push"); // Fetch the staged blobs back with the desired representation. let restored_quote: String = ws .get(quote_handle) .expect("load quote"); let restored_set: TribleSet = ws .get(archive_handle) .expect("load dataset"); let archive_bytes: Blob<SimpleArchive> = ws .get(archive_handle) .expect("load raw blob"); std::fs::write("dataset.car", archive_bytes.bytes.as_ref()).expect("persist archive"); }
Rust infers the blob schema for both put and get from the handles and the
assignment context, so the calls stay concise without explicit turbofish
annotations.
Blobs staged this way stay local to the workspace until you push the commit.
Workspace::get searches the workspace-local store first and falls back to the
repository if necessary, so the handles remain valid after you publish the
commit. This round trip lets you persist logs, archives, or other auxiliary
files next to your structured data without inventing a separate storage
channel.
Merging and Conflict Handling
When pushing a workspace another client might have already updated the branch. There are two ways to handle this:
Repository::try_push— a single-attempt push that uploads local blobs and attempts a CAS update once. If the branch advanced concurrently it returnsOk(Some(conflict_ws))so callers can merge and retry explicitly:
#![allow(unused)] fn main() { ws.commit(content, Some("codex-turn")); let mut current_ws = ws; while let Some(mut incoming) = repo.try_push(&mut current_ws)? { // Merge the local staged changes into the incoming workspace and retry. incoming.merge(&mut current_ws)?; current_ws = incoming; } }
Repository::push— a convenience wrapper that performs the merge-and-retry loop for you. Call this when you prefer the repository to handle conflicts automatically; it either succeeds (returnsOk(())) or returns an error.
#![allow(unused)] fn main() { ws.commit(content, Some("codex-turn")); repo.push(&mut ws)?; // will internally merge and retry until success }
Troubleshooting:
Workspace::mergesucceeds only when both workspaces share a blob store. Merging a workspace pulled from a different pile or remote returnsMergeError::DifferentRepos. Decide which repository will own the combined history, transfer the other branch's reachable blobs into it withrepo::transfer(reachable(...)), create a branch for that imported head, and merge locally once both workspaces target the same store.
After a successful push the branch may have advanced further than the head supplied, because the repository refreshes its view after releasing the lock. An error indicating a corrupted pile does not necessarily mean the push failed; the update might have been written before the corruption occurred.
This snippet is taken from examples/workspace.rs.
The examples/repo.rs example demonstrates the same
pattern with two separate workspaces. The returned Workspace already contains
the remote commits, so after merging your changes you push that new workspace to
continue.
Typical CLI Usage
There is a small command line front-end in the
trible repository. It exposes push
and merge operations over simple commands and follows the same API presented in
the examples. The tool is currently experimental and may lag behind the library,
but it demonstrates how repository operations map onto a CLI.
Diagram
A simplified view of the push/merge cycle:
┌───────────┐ pull ┌───────────┐
| local ws |◀───────────────────── | repo |
└─────┬─────┘ └───────────┘
│
│ commit
│
▼
┌───────────┐ push ┌───────────┐
│ local ws │ ─────────────────────▶│ repo │
└─────┬─────┘ └─────┬─────┘
│ │
│ merge │ conflict?
└──────▶┌─────────────┐◀────────────┘
│ conflict ws │
└───────┬─────┘
│ ┌───────────┐
└────────────▶| repo │
push └───────────┘
Each push either succeeds or returns a workspace containing the other changes. Merging incorporates your commits and the process repeats until no conflicts remain.
Troubleshooting push, branch, and pull failures
Repository::push, Repository::create_branch, and Repository::pull surface
errors from the underlying blob and branch stores. These APIs intentionally do
not hide storage issues, because diagnosing an I/O failure or a corrupt commit
usually requires operator intervention. The table below lists the error variants
along with common causes and remediation steps.
| API | Error variant | Likely causes and guidance |
|---|---|---|
Repository::push | PushError::StorageBranches | Enumerating branch metadata in the backing store failed. Check connectivity and credentials for the branch store (for example, the object-store bucket, filesystem directory, or HTTP endpoint). |
Repository::push | PushError::StorageReader | Creating a blob reader failed before any transfer started. The blob store may be offline, misconfigured, or returning permission errors. |
Repository::push | PushError::StorageGet | Fetching existing commit metadata failed. The underlying store returned an error or the metadata blob could not be decoded, which often signals corruption or truncated uploads. Inspect the referenced blob in the store to confirm it exists and is readable. |
Repository::push | PushError::StoragePut | Uploading new content or metadata blobs failed. Look for transient network failures, insufficient space, or rejected writes in the blob store logs. Retrying after fixing the storage issue will re-send the missing blobs. |
Repository::push | PushError::BranchUpdate | Updating the branch head failed. Many backends implement optimistic compare-and-swap semantics; stale heads or concurrent writers therefore surface here as update errors. Refresh the workspace and retry after resolving any store-side errors. |
Repository::push | PushError::BadBranchMetadata | The branch metadata could not be parsed. Inspect the stored metadata blobs for corruption or manual edits and repair them before retrying the push. |
| Branch creation APIs | BranchError::StorageReader | Creating a blob reader failed. Treat this like PushError::StorageReader: verify the blob store connectivity and credentials. |
| Branch creation APIs | BranchError::StorageGet | Reading branch metadata during initialization failed. Check for corrupted metadata blobs or connectivity problems. |
| Branch creation APIs | BranchError::StoragePut | Persisting branch metadata failed. Inspect store logs for rejected writes or quota issues. |
| Branch creation APIs | BranchError::BranchHead | Retrieving the current head of the branch failed. This usually points to an unavailable branch store or inconsistent metadata. |
| Branch creation APIs | BranchError::BranchUpdate | Updating the branch entry failed. Resolve branch-store errors and ensure no other writers are racing the update before retrying. |
| Branch creation APIs | BranchError::AlreadyExists | A branch with the requested name already exists. Choose a different name or delete the existing branch before recreating it. |
| Branch creation APIs | BranchError::BranchNotFound | The specified base branch does not exist. Verify the branch identifier and that the base branch has not been deleted. |
Repository::pull | PullError::BranchNotFound | The branch is missing from the repository. Check the branch name/ID and confirm that it has not been removed. |
Repository::pull | PullError::BranchStorage | Accessing the branch store failed. This mirrors BranchError::BranchHead and usually indicates an unavailable or misconfigured backend. |
Repository::pull | PullError::BlobReader | Creating a blob reader failed before commits could be fetched. Ensure the blob store is reachable and that the credentials grant read access. |
Repository::pull | PullError::BlobStorage | Reading commit or metadata blobs failed. Investigate missing objects, network failures, or permission problems in the blob store. |
Repository::pull | PullError::BadBranchMetadata | The branch metadata is malformed. Inspect and repair the stored metadata before retrying the pull. |
Remote Stores
Remote deployments use the ObjectStoreRemote
backend to speak to any service supported by the
object_store crate (S3,
Google Cloud Storage, Azure Blob Storage, HTTP-backed stores, the local
filesystem, and the in-memory memory:/// adapter). ObjectStoreRemote
implements both BlobStore and BranchStore, so the rest of the repository API
continues to work unchanged – the only difference is the URL you pass to
with_url.
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::prelude::*;
use triblespace::core::repo::objectstore::ObjectStoreRemote;
use triblespace::core::repo::Repository;
use triblespace::core::value::schemas::hash::Blake3;
use url::Url;
fn open_remote_repo(raw_url: &str) -> anyhow::Result<()> {
let url = Url::parse(raw_url)?;
let storage = ObjectStoreRemote::<Blake3>::with_url(&url)?;
let mut repo = Repository::new(storage, SigningKey::generate(&mut OsRng));
let branch_id = repo.create_branch("main", None)?;
let mut ws = repo.pull(*branch_id)?;
ws.commit(TribleSet::new(), Some("initial commit"));
while let Some(mut incoming) = repo.try_push(&mut ws)? {
incoming.merge(&mut ws)?;
ws = incoming;
}
Ok(())
}
ObjectStoreRemote writes directly through to the backing service. It
implements StorageClose, but the implementation is a no-op, so dropping the
repository handle is usually sufficient. Call repo.close() if you prefer an
explicit shutdown step.
Credential configuration follows the object_store backend you select. For
example, S3 endpoints consume AWS access keys or IAM roles, while
memory:///foo provides a purely in-memory store for local testing. Once the
URL resolves, repositories backed by piles and remote stores share the same
workflow APIs.
Attaching a Foreign History (merge-import)
Sometimes you want to graft an existing branch from another pile into your current repository without rewriting its commits. Tribles supports a conservative, schema‑agnostic import followed by a single merge commit:
- Copy all reachable blobs from the source branch head into the target pile
by streaming the
reachablewalker intorepo::transfer. The traversal scans every 32‑byte aligned chunk and enqueues any candidate that dereferences in the source. - Create a single merge commit that has two parents: your current branch head and the imported head. No content is attached to the merge; it simply ties the DAGs together.
This yields a faithful attachment of the foreign history — commits and their content are copied verbatim, and a one‑off merge connects both histories.
The trible CLI exposes this as:
trible branch merge-import \
--from-pile /path/to/src.pile --from-name source-branch \
--to-pile /path/to/dst.pile --to-name self
Internally this uses the reachable walker in combination with
repo::transfer plus Workspace::merge_commit. Because the traversal scans
aligned 32‑byte chunks, it is forward‑compatible with new formats as long as
embedded handles remain 32‑aligned.
Sidebar — Choosing a copy routine
repo::transferpairs the reachability walker (or any other iterator you provide) with targeted copies, returning(old_handle, new_handle)pairs for the supplied handles. Feed it thereachableiterator when you only want live blobs, the output ofpotential_handleswhen scanning metadata, or a collected list fromBlobStoreList::blobs()when duplicating an entire store.MemoryBlobStore::keep(and otherBlobStoreKeepimplementations) retain whichever handles you stream to them, making it easy to drop unreachable blobs once you've walked your roots.Reachable copy keeps imports minimal; the transfer helper lets you rewrite specific handles while duplicating data into another store.
Programmatic example (Rust)
The same flow can be used directly from Rust when you have two piles on disk and want to attach the history of one branch to another:
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
use triblespace::prelude::*;
use triblespace::core::repo::{self, pile::Pile, Repository};
use triblespace::core::value::schemas::hash::Blake3;
use triblespace::core::value::schemas::hash::Handle;
fn merge_import_example(
src_path: &std::path::Path,
src_branch_id: triblespace::id::Id,
dst_path: &std::path::Path,
dst_branch_id: triblespace::id::Id,
) -> anyhow::Result<()> {
// 1) Open source (read) and destination (write) piles
let mut src = Pile::open(src_path)?;
src.restore()?;
let mut dst = Pile::open(dst_path)?;
dst.restore()?;
// 2) Resolve source head commit handle
let src_head: Value<Handle<Blake3, blobschemas::SimpleArchive>> =
src.head(src_branch_id)?.ok_or_else(|| anyhow::anyhow!("source head not found"))?;
// 3) Conservatively copy all reachable blobs from source → destination
let reader = src.reader()?;
let mapping: Vec<_> = repo::transfer(
&reader,
&mut dst,
repo::reachable(&reader, [src_head.transmute()]),
)
.collect::<Result<_, _>>()?;
eprintln!("copied {} reachable blobs", mapping.len());
// 4) Attach via a single merge commit in the destination branch
let mut repo = Repository::new(dst, SigningKey::generate(&mut OsRng));
let mut ws = repo.pull(dst_branch_id)?;
ws.merge_commit(src_head)?; // parents = { current HEAD, src_head }
// 5) Push with standard conflict resolution
while let Some(mut incoming) = repo.try_push(&mut ws)? {
incoming.merge(&mut ws)?;
ws = incoming;
}
Ok(())
}
Commit Selectors
Commit selectors describe which commits to load from a workspace. They give
callers a reusable vocabulary for requests such as "let me work with the
changes from last week" or "show the commits that touched this entity". The
selector itself only decides which commits participate; the data behind
those commits is materialized into a TribleSet by Workspace::checkout so the
rest of the system can query it like any other dataset.
At checkout time the Workspace::checkout method accepts any type implementing
the CommitSelector trait and returns a TribleSet built from the selected
commits. Selectors can be as small as a single commit handle or as expressive as
a filtered slice of history. This chapter walks through the available building
blocks, how they compose, and how they relate to Git's revision grammar.
Range semantics
Range selectors mirror Git's two‑dot syntax. A selector of the form a..b
starts from b and walks its reachable ancestors. The walk continues until it
encounters a commit selected by a, at which point the descent along that
branch stops. The start boundary is exclusive while the end boundary is
inclusive: commits selected by a are omitted from the result, but the
commit(s) provided by b are included alongside any additional ancestors
reached through other branches. The shorthands behave as follows:
..bis equivalent toempty()..band gathersbplus all of its ancestors.a..defaults the end boundary toHEAD, collectingHEADand its ancestors until the walk meetsa...expands toHEADand every ancestor reachable from it.
Because the range semantics differ slightly from Git, you can wrap the start
boundary in ancestors to reproduce Git's set-difference behaviour when parity
is required: ancestors(a)..b matches git log a..b.
#![allow(unused)] fn main() { // Check out the entire history of the current branch let history = ws.checkout(ancestors(ws.head()))?; // Equivalent to `git log feature..main` let delta = ws.checkout(ancestors(feature_tip)..main_tip)?; }
Ranges are concise and map directly onto the ancestry walks exposed by the
repository. Combinations such as "ancestors of B that exclude commits reachable
from A" fall out naturally from existing selectors (ancestors(A)..B). When a
query needs additional refinement, layer selectors like filter, reach for
helpers such as symmetric_diff, or implement a small CommitSelector that
post-processes the resulting CommitSet with union, intersection, or
difference before handing it back to checkout.
Short-circuiting at the boundary avoids re-walking history that previous
selectors already covered, but it still requires visiting every reachable
commit when the start selector is empty. Long-lived queries that continuously
ingest history can avoid that re-walk by carrying forward a specific commit as
the new start boundary. If a prior run stopped at previous_head, the next
iteration can use the range previous_head..new_head to gather only the
commits introduced since the last checkout.
Implemented selectors
CommitSelector is implemented for:
CommitHandle– a single commit.Vec<CommitHandle>and&[CommitHandle]– explicit lists of commits.ancestors(commit)– a commit and all of its ancestors.nth_ancestor(commit, n)– follows the first-parent chainnsteps.parents(commit)– direct parents of a commit.symmetric_diff(a, b)– commits reachable from eitheraorbbut not both.- Set combinators that operate on two selectors:
union(left, right)– commits returned by either selector.intersect(left, right)– commits returned by both selectors.difference(left, right)– commits fromleftthat are not also returned byright.
- Standard ranges:
a..b,a..,..band..that stop walking once the start boundary is encountered. filter(selector, predicate)– retains commits for whichpredicatereturnstrue.history_of(entity)– commits touching a specific entity (built onfilter).time_range(start, end)– commits whose timestamps intersect the inclusive range.
The range primitives intentionally diverge from Git's subtraction semantics.
a..b walks the history from b toward the start boundary and stops as soon as
it rediscovers a commit yielded by a. Workspace checkouts frequently reuse an
earlier selector—such as previous_head..new_head—so short-circuiting at the
boundary saves re-walking the entire ancestor closure every time the selector
runs. When you need Git's behaviour you can wrap the start in
ancestors, trading the extra reachability work for parity with git log.
Because selectors already operate on CommitSet patches, composing new
behaviour is largely a matter of combining those sets. The existing selectors in
this chapter are implemented using the same building blocks that are available
to library users, making it straightforward to prototype project-specific
combinators without altering the Workspace::checkout API.
Set combinators
union, intersect, and difference wrap two other selectors and forward the
results through the equivalent set operations exposed by PATCH. Reach for these
helpers when you want to combine selectors without writing a custom
CommitSelector implementation. Each helper accepts any selector combination
and returns the corresponding CommitSet:
#![allow(unused)] fn main() { use tribles::repo::{ancestors, difference, intersect, union}; // Everything reachable from either branch tip. let combined = ws.checkout(union(ancestors(main), ancestors(feature)))?; // Only the commits both branches share. let shared = ws.checkout(intersect(ancestors(main), ancestors(feature)))?; // Feature-only commits without the mainline history. let feature_delta = ws.checkout(difference(ancestors(feature), ancestors(main)))?; }
Composing selectors
Selectors implement the CommitSelector trait, so they can wrap one another to
express complex logic. The pattern is to start with a broad
set—often ancestors(ws.head())—and then refine it. The first snippet below
layers a time window with an entity filter before handing the selector to
Workspace::checkout, and the follow-up demonstrates the built-in
intersect selector to combine two existing selectors.
#![allow(unused)] fn main() { use hifitime::Epoch; use tribles::repo::{filter, history_of, intersect, time_range}; let cutoff = Epoch::from_unix_seconds(1_701_696_000.0); // 2023-12-01 let recent = filter(time_range(cutoff, Epoch::now().unwrap()), |_, payload| { payload.iter().any(|trible| trible.e() == &my_entity) }); let relevant = ws.checkout(recent)?; // Start from the result and zero in on a single entity. let entity_history = ws.checkout(history_of(my_entity))?; let recent_entity_commits = ws.checkout(intersect( time_range(cutoff, Epoch::now().unwrap()), history_of(my_entity), ))?; }
Filtering commits
The filter selector wraps another selector and keeps only the commits for
which a user provided closure returns true. The closure receives the commit
metadata and its payload, allowing inspection of authors, timestamps or the
data itself. Selectors compose, so you can further narrow a range:
#![allow(unused)] fn main() { use hifitime::Epoch; use triblespace::core::repo::{filter, time_range}; let since = Epoch::from_unix_seconds(1_609_459_200.0); // 2020-12-01 let now = Epoch::now().unwrap(); let recent = ws.checkout(filter(time_range(since, now), |_, payload| { payload.iter().any(|t| t.e() == &my_entity) }))?; }
Higher level helpers can build on this primitive. For example history_of(entity) filters
ancestors(HEAD) to commits touching a specific entity:
#![allow(unused)] fn main() { let changes = ws.checkout(history_of(my_entity))?; }
When debugging a complicated selector, start by checking out the wider range and logging the commit metadata. Verifying the intermediate results catches off-by-one errors early and helps spot situations where a filter excludes or includes more history than expected.
Git Comparison
The table below summarizes Git's revision grammar. Each row links back to the official documentation. Forms that rely on reflogs or reference objects other than commits are listed for completeness but are unlikely to be implemented.
| Git Syntax | Planned Equivalent | Reference | Status |
|---|---|---|---|
A | commit(A) | gitrevisions | Implemented |
A^/A^N | nth_parent(A, N) | gitrevisions | Not planned |
A~N | nth_ancestor(A, N) | gitrevisions | Implemented |
A^@ | parents(A) | gitrevisions | Implemented |
A^! | A minus parents(A) | gitrevisions | Unimplemented |
A^-N | A minus nth_parent(A, N) | gitrevisions | Not planned |
A^0 | commit(A) | gitrevisions | Implemented |
A^{} | deref_tag(A) | gitrevisions | Unimplemented |
A^{type} | object_of_type(A, type) | gitrevisions | Not planned: non-commit object |
A^{/text} | search_from(A, text) | gitrevisions | Not planned: requires commit message search |
:/text | search_repo(text) | gitrevisions | Not planned: requires repository search |
A:path | blob_at(A, path) | gitrevisions | Not planned: selects a blob not a commit |
:[N:]path | index_blob(path, N) | gitrevisions | Not planned: selects from the index |
A..B | range(A, B) | gitrevisions | Implemented |
A...B | symmetric_diff(A, B) | gitrevisions | Implemented |
^A | exclude(reachable(A)) | gitrevisions | Unimplemented |
A@{upstream} | upstream_of(A) | gitrevisions | Not planned: depends on remote config |
A@{push} | push_target_of(A) | gitrevisions | Not planned: depends on remote config |
A@{N} | reflog(A, N) | gitrevisions | Not planned: relies on reflog history |
A@{<date>} | reflog_at(A, date) | gitrevisions | Not planned: relies on reflog history |
@{N} | reflog(HEAD, N) | gitrevisions | Not planned: relies on reflog history |
@{-N} | previous_checkout(N) | gitrevisions | Not planned: relies on reflog history |
Only a subset of Git's revision grammar will likely be supported. Selectors relying on reflog history, remote configuration, or searching commits and blobs add complexity with little benefit for workspace checkout. They are listed above for completeness but remain unplanned for now.
Note:
range(A, B)differs subtly from Git's two-dot syntax. It walks parents fromBuntil a commit fromAis encountered instead of subtracting the entire ancestor closure ofA. Useancestors(A)..Bfor Git's behaviour.
TimeRange
Commits record when they were made via a timestamp attribute of type
NsTAIInterval. When creating a commit this
interval defaults to (now, now) but other tools could provide a wider range
if the clock precision is uncertain. The TimeRange selector uses this interval
to gather commits whose timestamps fall between two Epoch values:
#![allow(unused)] fn main() { use hifitime::Epoch; use triblespace::repo::time_range; let since = Epoch::from_unix_seconds(1_609_459_200.0); // 2020-12-01 let now = Epoch::now().unwrap(); let tribles = ws.checkout(time_range(since, now))?; }
This walks the history from HEAD and returns only those commits whose
timestamp interval intersects the inclusive range.
Internally it uses filter(ancestors(HEAD), ..) to check each commit's
timestamp range.
Garbage Collection and Forgetting
Repositories grow over time as commits, branch metadata, and user blobs accumulate. Because every blob is content addressed and immutable, nothing is ever overwritten and there is no automatic reclamation when branches move or objects become orphaned. To keep disk usage in check a repository can periodically forget blobs that are no longer referenced.
Forgetting is deliberately conservative. It only removes local copies, so re-synchronising from a peer or pushing a commit that references an "forgotten" blob will transparently restore it. Forgetting therefore complements the monotonic model: history never disappears globally, but any node can opt-out of retaining data it no longer needs.
The main challenge is deciding which blobs are still reachable without
reconstructing every TribleSet. The sections below outline how the repository
module solves that problem and how you can compose the building blocks in your
own tools.
Understanding the Roots
The walk begins with a root set—the handles you know must stay alive. In a
typical repository this includes the metadata blob for each branch (which in
turn names the commit heads), tags, or any additional anchors your deployment
requires. Roots are cheap to enumerate: walk the branch store via
BranchStore::branches
and load each branch head, or read the subset of metadata relevant to the
retention policy you are enforcing. Everything reachable from those handles
will be retained by the traversal; everything else is eligible for forgetting.
Conservative Reachability
Every commit and branch metadata record is stored as a SimpleArchive. The
archive encodes a canonical TribleSet as 64-byte tribles, each containing a
32-byte value column. The blob store does not track which handles correspond to
archives, so the collector treats every blob identically: it scans the raw bytes
in 32-byte chunks and treats each chunk as a candidate handle. Chunks that are
not value columns—for example the combined entity/attribute half of a trible or
arbitrary attachment bytes—are discarded when the candidate lookup fails. If a
chunk matches the hash of a blob in the store we assume it is a reference,
regardless of the attribute type. With 32-byte hashes the odds of a random
collision are negligible, so the scan may keep extra blobs but will not drop a
referenced one.
Content blobs that are not SimpleArchive instances (for example large binary
attachments) therefore behave as leaves: the traversal still scans them, but
because no additional lookups succeed they contribute no further handles. They
become reachable when some archive references their handle and are otherwise
eligible for forgetting.
Traversal Algorithm
- Enumerate all branches and load their metadata blobs.
- Extract candidate handles from the metadata. This reveals the current commit head along with any other referenced blobs.
- Recursively walk the discovered commits and content blobs. Each blob is scanned in 32-byte steps; any chunk whose lookup succeeds is enqueued instead of deserialising the archive.
- Stream the discovered handles into whatever operation you need. The
reachablehelper returns an iterator of handles, so you can retain them, transfer them into another store, or collect them into whichever structure your workflow expects.
Because the traversal is purely additive you can compose additional filters or instrumentation as needed—for example to track how many objects are held alive by a particular branch or to export a log of missing blobs for diagnostics.
Automating the Walk
The repository module already provides most of the required plumbing. The
reachable
helper exposes the traversal as a reusable iterator so you can compose other
operations along the way, while
transfer
duplicates whichever handles you feed it. The in-memory MemoryBlobStore can
retain live blobs, duplicate them into a scratch store, and report how many
handles were touched without writing bespoke walkers:
#![allow(unused)] fn main() { use triblespace::core::blob::memoryblobstore::MemoryBlobStore; use triblespace::core::repo::{self, BlobStoreKeep, BlobStoreList, BranchStore}; use triblespace::core::value::schemas::hash::Blake3; let mut store = MemoryBlobStore::<Blake3>::default(); // ... populate the store or import data ... let mut branch_store = /* your BranchStore implementation */; let reader = store.reader()?; // Collect the branch metadata handles we want to keep alive. let mut roots = Vec::new(); for branch_id in branch_store.branches()? { if let Some(meta) = branch_store.head(branch_id?)? { roots.push(meta.transmute()); } } // Trim unreachable blobs in-place. store.keep(repo::reachable(&reader, roots.clone())); // Optionally copy the same reachable blobs into another store. let mut scratch = MemoryBlobStore::<Blake3>::default(); let visited = repo::reachable(&reader, roots.clone()).count(); let mapping: Vec<_> = repo::transfer( &reader, &mut scratch, repo::reachable(&reader, roots), ) .collect::<Result<_, _>>()?; println!("visited {} blobs, copied {}", visited, mapping.len()); println!("rewrote {} handles", mapping.len()); }
In practice you will seed the walker with the handles extracted from branch
metadata or other root sets instead of iterating the entire store. The helper
takes any IntoIterator of handles, so once branch heads (and other roots) have
been identified, they can be fed directly into the traversal without writing
custom queues or visitor logic. Passing the resulting iterator to
MemoryBlobStore::keep or repo::transfer makes it easy to implement
mark-and-sweep collectors or selective replication pipelines without duplicating
traversal code.
When you already have metadata represented as a TribleSet, the
potential_handles
helper converts its value column into the conservative stream of
Handle<H, UnknownBlob> instances expected by these operations.
Operational Tips
- Schedule forgetting deliberately. Trigger it after large merges or imports rather than on every commit so you amortise the walk over meaningful changes.
- Watch available storage. Because forgetting only affects the local node, replicating from a peer may temporarily reintroduce forgotten blobs. Consider monitoring disk usage and budgeting headroom for such bursts.
- Keep a safety margin. If you are unsure whether a handle should be retained, include it in the root set. Collisions between 32-byte handles are effectively impossible, so cautious root selection simply preserves anything that might be referenced.
Future Work
The public API for triggering garbage collection is still evolving. The
composition-friendly walker introduced above is one building block; future work
could layer additional convenience helpers or integrate with external retention
policies. Conservative reachability by scanning SimpleArchive bytes remains
the foundation for safe space reclamation.
Glossary
This chapter collects the core terms that appear throughout the book. Skim it when you encounter unfamiliar terminology or need a refresher on how concepts relate to one another in Trible Space.
Attribute
A property that describes some aspect of an entity. Attributes occupy the
middle position in a trible and carry the ValueSchema (or blob-handle schema)
that interprets and validates the value. Modules mint them with the
attributes! macro, so they behave like detached struct fields: each attribute
remains independently typed even when many are combined to describe the same
entity, preserving its individual semantics. Provide an explicit 128-bit id in
the macro when you need a canonical column shared across crates or languages;
omit the literal to derive a deterministic id from the attribute name and value
schema (the macro calls Attribute::from_name for you), which is handy for
short-lived or internal attributes.
Blob
An immutable chunk of binary data addressed by the hash of its contents. Blobs
store payloads that do not fit in the fixed 32-byte value slot—long strings,
media assets, archived TribleSets, commit metadata, and other large
artifacts. Each blob is tagged with a BlobSchema so applications can decode it
back into native types.
Blob Store
An abstraction that persists blobs. Implementations back local piles, in-memory
workspaces, or remote object stores while presenting a common BlobStore
interface that handles hashing, deduplication, and retrieval.
Commit
A signed snapshot of repository state. Commits archive a TribleSet describing
the workspace contents and store metadata such as parent handles, timestamps,
authors, signatures, and optional messages. The metadata itself lives in a
SimpleArchive blob whose hash becomes the commit handle.
Commit Selector
A query primitive that walks a repository’s commit graph to identify commits of
interest. Selectors power history traversals such as parents,
nth_ancestor, ranges like a..b, and helpers such as history_of(entity).
Entity
The first position in a trible. Entities identify the subject making a statement and group the attributes asserted about it. They are represented by stable identifiers minted from namespaces or ID owners—not by hashing their contents—so multiple facts about the same subject cohere without revealing how the identifier was derived. Ownership controls who may mint new facts for a given identifier.
PATCH
The Persistent Adaptive Trie with Cuckoo-compression and Hash-maintenance. Each PATCH stores all six permutations of a trible set in a 256-ary trie whose nodes use byte-oriented cuckoo hash tables and copy-on-write semantics. Shared leaves keep permutations deduplicated, rolling hashes let set operations skip unchanged branches, and queries only visit the segments relevant to their bindings, mirroring the behaviour described in the deep-dive chapter.
Pile
An append-only collection of blobs and branch records stored in a single file. Piles act as durable backing storage for repositories, providing a write-ahead-log style format that can be memory mapped, repaired after crashes, and safely shared between threads.
Repository
The durable record that ties blob storage, branch metadata, and namespaces together. A repository coordinates synchronization, replication, and history traversal across commits while enforcing signatures and branch ownership.
Schema
The set of attribute declarations and codecs that document and enforce the shape of data in Trible Space. Schemas assign language-agnostic meaning to the raw bytes—they are not the concrete Rust types—so any implementation that understands the schema can interpret the payloads consistently. Value schemas map the fixed 32-byte payload of a trible to native types, while blob schemas describe arbitrarily long payloads so tribles referencing those blobs stay portable.
Trible
A three-part tuple of entity, attribute, and value stored in a fixed 64-byte layout. Tribles capture atomic facts, and query engines compose them into joins and higher-order results.
Trible Space
The overall storage model that organises tribles across blobs, PATCHes, and repositories. It emphasises immutable, content-addressed data, monotonic set semantics, and familiar repository workflows.
Value
The third position in a trible. Values store a fixed 32-byte payload interpreted through the attribute’s schema. They often embed identifiers for related entities or handles referencing larger blobs.
Workspace
A mutable working area for preparing commits. Workspaces track staged trible sets and maintain a private blob store so large payloads can be uploaded before publishing. Once a commit is finalised it becomes immutable like the rest of Trible Space.
Documentation Improvement Ideas
This chapter is a roadmap for the next iteration of the book. Each subsection summarises a gap we discovered while reviewing the crate and outlines the minimal content that would help readers apply the feature in practice. When you pick one of these items up, try to produce a runnable example (or at least executable pseudocode) so the section teaches a concrete workflow rather than a theory sketch.
High-priority topics
The following themes unblock common deployment or operational scenarios and should be tackled first when planning documentation work:
Remote object stores
repo::objectstore::ObjectStoreRemote::with_url wires the repository into
object_store services such
as S3, local filesystems or Azure storage. The future chapter should walk
through credential configuration, namespace selection, and pairing the remote
backend with other stores (for example via HybridStore). It also needs to call
out how branch updates rely on PutMode::Update/UpdateVersion retries, how
conflicts bubble up to callers, and how listings stream through
BlockingIter so readers know what consistency guarantees to expect. 【F:src/repo/objectstore.rs†L108-L316】
Hybrid storage recipes
repo::hybridstore::HybridStore mixes a blob store with a separate branch
store. Documenting a few reference layouts—remote blobs with local branches,
piles with in-memory branches, or even two-tier caches—will help teams evaluate
trade-offs quickly. 【F:src/repo/hybridstore.rs†L1-L86】
Signature verification
Both repo::commit::verify and repo::branch::verify expose helpers for
validating signed metadata before accepting remote history. A hands-on example
should explain when to perform verification, how to surface failures to callers,
and which key material needs to be distributed between collaborators. 【F:src/repo/commit.rs†L84-L122】 【F:src/repo/branch.rs†L95-L136】
Repository migration helpers
repo::transfer rewrites whichever handles you feed it and returns the old and
new identifiers so callers can update references. A migration recipe could show
how to collect handles from BlobStoreList::blobs() for full copies or from
reachable when only live data should be duplicated. Highlight how the helper
fits into a scripted maintenance window. 【F:src/repo.rs†L394-L516】
Conservative GC tooling
The garbage-collection chapter covers the high-level approach, but it should
also reference concrete APIs such as repo::reachable, repo::transfer, and
MemoryBlobStore::keep. Describing how to compute and retain the reachable set
in code makes it easier to embed the GC workflow into automated jobs. 【F:src/repo.rs†L394-L516】 【F:src/blob/memoryblobstore.rs†L169-L210】
Emerging capabilities
These topics are less urgent but still deserve coverage so that readers can reuse advanced building blocks without digging through source code.
Succinct archive indexes
blob::schemas::succinctarchive::SuccinctArchive converts a TribleSet into
compressed wavelet matrices, exposes helpers such as distinct_in and
enumerate_in, implements TriblePattern, and serialises via ordered,
compressed or cached Universe implementations. A dedicated section should walk
through building an archive from a set, choosing a universe, storing it as a
blob, and querying it directly through SuccinctArchiveConstraint so readers
can reuse the on-disk index without round-tripping through TribleSet
conversions. 【F:src/blob/schemas/succinctarchive.rs†L100-L529】 【F:src/blob/schemas/succinctarchive/universe.rs†L16-L265】 【F:src/blob/schemas/succinctarchive/succinctarchiveconstraint.rs†L9-L200】
Extensible path engines
Regular path queries run through RegularPathConstraint, which delegates edge
checks to the PathEngine trait. The book should document how the built-in
ThompsonEngine constructs NFAs from a TribleSet and demonstrate how to plug
in alternative engines backed by other graph stores so readers can extend the
regex-based traversal beyond in-memory datasets. 【F:src/query/regularpathconstraint.rs†L1-L200】
How to keep this list fresh
Treat these notes as a living backlog. Whenever a new subsystem lands, ask yourself whether it needs a discoverability guide, a tutorial or a troubleshooting section. Update this chapter with the gaps you observe, and link to the relevant modules so future contributors can jump straight into the implementation.
Formal Verification Roadmap
This roadmap captures the initial strategy for driving the triblespace crates
toward comprehensive formal verification. It unifies model checking, symbolic
execution, fuzzing, and deterministic simulation so we can reason about both the
low-level data structures and high-level repository workflows with stronger
correctness guarantees.
Verification Stack Overview
- Model checking with Kani explores bounded but exhaustive state spaces for invariants that must never be violated.
- Symbolic execution with Miri exposes undefined behaviour (UB) and aliasing issues across regular unit tests without requiring new harnesses.
- Coverage-guided fuzzing stresses APIs with randomized input sequences to uncover emergent behaviours that formal proofs might miss.
- Deterministic simulations replay realistic repository workflows so we can audit higher-level semantics and regression-test subtle interplays between subsystems.
Each technique complements the others; together they provide layered assurance that keeps regressions from reaching downstream users.
Goals
- Protect the fundamental algebraic properties of
TribleSet,PATCH, and the repository commit graph. - Exercise serialization, deserialization, and zero-copy data views under adversarial inputs.
- Detect behavioural regressions in query heuristics, constraint solving, and workspace merging before they reach downstream users.
- Integrate the tooling into CI so proofs and regression checks run automatically for every change.
- Preserve a contributor-friendly workflow where verification steps are discoverable, well documented, and quick to reproduce locally.
Current Foundation
proofs/already contains Kani harnesses for query, value, and variable-set behaviour. They provide examples of bounded nondeterministic data generation (kani::any,Value::new) and assume/guarantee reasoning that new harnesses can reuse../scripts/preflight.shis the aggregation point for formatting and tests; adding verification steps here keeps contributor workflows consistent.
Invariant Catalogue
The roadmap anchors future work around the following invariants. Each row tracks the subsystem we care about, the guarantees we want to encode, and a rough sketch of how to exercise them in Kani, Miri, or fuzzing harnesses.
| Area | Key invariants | Candidate harness or check |
|---|---|---|
TribleSet (src/trible/tribleset.rs) | Union/intersection/difference maintain canonical ordering across all six PATCH indexes; iterators only yield deduplicated Tribles; insert never drops an ordering. | Extend the existing variableset harnesses with nondeterministic inserts, and add a dedicated tribleset_harness.rs validating round-trips across every ordering. |
PATCH & ByteTable (src/patch/*.rs) | Cuckoo displacement respects MAX_RETRIES without losing entries; Branch::modify_child grows tables when required and preserves leaf_count/segment_count; table_grow copies every occupant exactly once. | Introduce a patch_harness.rs that stress-tests plan_insert, table_insert, and Branch::grow, plus a micro-fuzzer that drives inserts/removals across random table sizes. |
Value schemas (src/value/*.rs) | Schema encoders respect declared byte widths; TryFromValue conversions and ValueSchema::validate reject truncated buffers; zero-copy views stay aligned. | Reuse value_harness.rs, adding per-schema helpers plus a Miri regression suite that loads slices at every alignment. |
Query engine (src/query/*.rs) | Constraint solver never aliases conflicting bindings; planner outputs cover all join permutations referenced by pattern!; influence tracking matches selected variables. | Expand proofs/query_harness.rs with minimal counterexamples, and fuzz constraint graphs via cargo fuzz. |
Repository & commits (src/repo/*.rs, proofs/commit_harness.rs) | Branch heads remain append-only; Workspace::pull never forgets reachable blobs; selector algebra matches Git semantics. | Add bounded commit DAG generators in commit_harness.rs plus deterministic simulation traces covering merges and garbage collection. |
Storage primitives (src/blob, src/repo, src/patch/leaf.rs) | Blob handles stay reference counted; pile headers remain within reserved capacity; byte slices from archives stay valid for the life of the store. | Combine Miri tests for aliasing with nightly fuzzers that replay repository sync transcripts. |
Expansion Plan
Phase 1 – Harden the Existing Kani Coverage
- Catalogue crate-level invariants and map them to concrete Kani harnesses.
Start with:
TribleSetoperations preserving canonical ordering and deduplication.- Join heuristics in
atreidesensuring variable bindings never alias conflicting values. - Repository merge logic maintaining append-only pile semantics.
- Extract shared helpers for generating bounded arbitrary data (e.g.
Vec::bounded_any) so harnesses remain expressive without exploding the search space. - Adopt a per-module harness layout (
proofs/<module>_harness.rs) registered fromproofs/mod.rsto make maintenance predictable. - Configure
scripts/verify.shto run targetedcargo kani --harness <name>invocations in parallel, then wire it into CI with caching to keep runtimes manageable.
Phase 2 – Symbolic Execution with Miri
- Enable
cargo miri testfor the default test suite to surface undefined behaviour (UB) and aliasing bugs that regular tests may miss. - Gate flaky or unsupported tests with
cfg(miri)guards so the suite stays deterministic under the interpreter. - Document the workflow in
scripts/preflight.shand optionally expose a dedicatedscripts/miri.shfor local runs when developers need deeper debugging.
Contributor Workflow
- Run
./scripts/preflight.shbefore every commit; it aggregates formatting, testing, and (eventually) targeted verification checks. - Use
cargo kani --harness <NAME>locally when iterating on a new proof. Start from the harness templates inproofs/so generators and assumptions stay consistent. - Execute
cargo miri testafter modifying unsafe code, pointer logic, or concurrency primitives; it catches UB bugs that normal tests cannot surface. - Kick off fuzz targets with
cargo fuzz run <TARGET>when touching boundary code (deserializers, planners, repository sync). Store new corpus inputs in version control if they expose bugs or tricky behaviours. - Record findings, gaps, and future work in
INVENTORY.mdso the roadmap evolves alongside the implementation effort.
Phase 3 – Fuzzing and Property Testing
- Introduce a
cargo fuzzworkspace targeting:- PATCH encoders/decoders with binary corpus seeds generated from integration tests.
- Join-order heuristics to explore combinations of constraint graphs and filter predicates.
- Repository sync workflows by fuzzing sequences of commits, pulls, and merges.
- Reuse structured generators from
proptestwhere deterministic shrinking is valuable, and bridge them into fuzz harnesses when possible to keep the state space constrained. - Automate nightly or on-demand fuzz campaigns via CI artifacts, storing any found counterexamples alongside minimised reproducers.
Phase 4 – Deterministic Simulation Testing
- Model repository replication scenarios with deterministic event queues to explore conflict resolution, garbage collection, and concurrent writers.
- Encode the simulations as regular unit tests backed by recorded execution traces so they can double as documentation for expected behaviour.
- Capture simulation scenarios discovered during fuzzing to prevent regressions.
Milestones & Reporting
- Track coverage for each invariant in a shared dashboard (CI summary or
INVENTORY.md) so contributors can quickly spot gaps. - Celebrate major wins—like a new harness landing or a bug found via verification—in the CHANGELOG to reinforce the value of the effort.
- Review and refresh this roadmap at least once per release cycle to keep the guidance aligned with the architecture.
Tooling Integration
- Track verification status in CI badges and documentation so contributors know which guarantees currently hold.
- Extend
INVENTORY.mdwith follow-up work items whenever new invariants or subsystems are identified. - Keep verification-specific configuration (Kani property files, fuzz corpora, deterministic seeds) under version control to make runs reproducible.
Next Steps
- Break the invariant catalogue into GitHub issues that reference the planned
harness files (
proofs/tribleset_harness.rs,proofs/patch_harness.rs, etc.). - Prototype the PATCH harness that drives
Branch::modify_childthrough insertion/growth cycles so we can assert the displacement planner andtable_grownever drop entries; wire the run intoscripts/verify.sh. - Evaluate CI capacity to determine how frequently Kani proofs,
cargo miri, and fuzzers can run without blocking contributors, documenting the cadence directly inINVENTORY.md.
This roadmap should evolve alongside the codebase—update it whenever new verification opportunities or obstacles appear.
Deep Dive
This section contains in-depth discussions about Trible Space internals.
Philosophy
Triblespace was designed to feel approachable without sacrificing rigor. This chapter collects the guiding values that shape everything from the storage format to the public APIs. Treat it as a companion to the rest of the deep-dive sections: when you wonder "why does it work this way?", the answer usually traces back to one of these principles.
Clarity before cleverness
We favour predictable, mechanically simple components over opaque heuristics. Each subsystem should be understandable on its own, with behaviours that are obvious when composed with the rest of the stack. When a trade-off appears between a clever optimisation and debuggability, we err on the side of the latter and document the costs so future work can revisit the decision with better evidence.
Productive developer experience
APIs should read like regular Rust. Where backends demand asynchronous capabilities—such as object-store repositories—we wrap them in blocking entry points so typical workflows stay synchronous while still supporting advanced integrations. Well-documented patterns and composable macros let readers experiment in a REPL or test harness without extra scaffolding, and examples in the book mirror the crates users import so copy-and-paste snippets behave as advertised.
Soundness and data integrity
The engine must reject malformed data early, surface explicit error paths, and make invariants easy to audit. Safety checks live close to the data structures that rely on them, and proofs or tests accompany subtle invariants when feasible. Correctness remains the baseline for every optimisation.
Performance with headroom
Efficient data structures keep the core fast, but we prioritise predictable latency over micro-benchmarks that complicate maintenance. Hot paths receive focused tuning backed by benchmarks so we can understand the impact of each change.
Practical implications
These principles surface in day-to-day workflows:
- Documentation favours runnable snippets and end-to-end walkthroughs, lowering the barrier to experimentation.
- Internal abstractions expose minimal, intentional APIs, reducing the amount of context a contributor needs before making a change.
- Tooling—such as the
preflight.shconvenience script, targeted Kani verification harnesses, and runnable doc tests—keeps quality checks accessible so they are run regularly rather than only in CI.
Taken together, the philosophy is simple: build a reliable system whose pieces are easy to reason about, teach, and extend.
Identifiers for Distributed Systems
Distributed systems are assembled from independently authored pieces of data. Keeping those pieces addressable requires names that survive replication, concurrent edits, and local conventions. We have found it useful to categorize identifier schemes along two axes:
| Abstract | Semantic | |
|---|---|---|
| Intrinsic | Hash, Signature, PubKey | Embeddings |
| Extrinsic | UUID, UFOID, FUCID | Names, DOI, URL |
The rows describe how an identifier is minted while the columns describe what it tries to communicate. Classifying an identifier along both axes makes its trade-offs explicit and helps decide how to combine multiple schemes for a given workflow.
Abstract vs. Semantic Identifiers
Semantic identifiers
Semantic identifiers (names, URLs, descriptive labels, embeddings) carry meaning about the thing they reference. That context makes them convenient for humans and for search workflows where the identifier itself narrows the scope of possible matches. Their usefulness comes with caveats:
- Semantics drift. A name that once felt accurate can become misleading as an entity evolves.
- Semantic identifiers are rarely unique. They work best as entry points into a richer dataset rather than as the source of truth for identity.
- Without scoping, semantics clash. Two communities can reasonably reuse the same word for different concepts, so the identifier must be tied to a namespace, deployment, or audience.
Distributed systems reconcile those tensions by mapping many local semantic names to a persistent abstract identifier. Users remain free to adopt whatever terminology makes sense to them while the system maintains a shared canonical identity.
Embeddings deserve a special mention. They encode meaning in a machine-friendly form that can be compared for similarity instead of exact equality. That makes them great for recommendations and clustering but still unsuitable as primary identifiers: two distinct entities can legitimately share similar embeddings, and embeddings can change whenever the underlying model is retrained.
Abstract identifiers
Abstract identifiers (UUIDs, UFOIDs, FUCIDs, hashes, signatures) strip all meaning away in favor of uniqueness. They can be minted without coordination, usually by drawing from a high-entropy space and trusting probability to keep collisions effectively impossible. Abstract identifiers shine when you need:
- Stable handles that survive across replicas and through refactors.
- Globally unique names without a centralized registrar.
- Cheap, constant-time generation so every component can allocate identifiers on demand.
Because they carry no inherent semantics, abstract identifiers are almost always paired with richer metadata. They provide the skeleton that keeps references consistent while semantic identifiers supply the narrative that humans consume.
Intrinsic vs. Extrinsic Identifiers
The intrinsic/extrinsic axis captures whether an identifier can be recomputed from the entity itself or whether it is assigned externally.
Intrinsic identifiers
Intrinsic identifiers (cryptographic hashes, digital signatures, content-based addresses) are derived from the bytes they describe. They function as fingerprints: if two values share the same intrinsic identifier then they are bit-for-bit identical. This property gives us:
- Immutability. Changing the content produces a different identifier, which immediately signals tampering or corruption.
- Self-validation. Replicas can verify received data locally instead of trusting a third party.
- Stronger adversarial guarantees. Because an attacker must find collisions deliberately, intrinsic identifiers rely on cryptographic strength rather than purely statistical rarity.
Extrinsic identifiers
Extrinsic identifiers (names, URLs, DOIs, UUIDs, UFOIDs, FUCIDs) are assigned by policy instead of by content. They track a conceptual entity as it evolves through versions, formats, or migrations. In other words, extrinsic identifiers carry the "story" of a thing while intrinsic identifiers nail down individual revisions.
Thinking about the classic ship of Theseus thought experiment makes the distinction concrete: the restored ship and the reconstructed ship share the same extrinsic identity (they are both "Theseus' ship") but have different intrinsic identities because their planks differ.
Embeddings as Semantic Intrinsic Identifiers
Embeddings blur our neat taxonomy. They are intrinsic because they are computed from the underlying data, yet they are overtly semantic because similar content produces nearby points in the embedding space. That duality makes them powerful for discovery:
- Systems can exchange embeddings as a "lingua franca" without exposing raw documents.
- Expensive feature extraction can happen once and power many downstream indexes, decentralizing search infrastructure.
- Embeddings let us compare otherwise incomparable artifacts (for example, a caption and an illustration) by projecting them into a shared space.
Despite those advantages, embeddings should still point at a durable abstract identifier rather than act as the identifier. Collisions are expected, model updates can shift the space, and floating-point representations can lose determinism across hardware.
High-Entropy Identifiers
For a truly distributed system, the creation of identifiers must avoid the bottlenecks and overhead associated with a central coordinating authority. At the same time, we must ensure that these identifiers are unique.
To guarantee uniqueness, we use abstract identifiers containing a large amount of entropy, making collisions statistically irrelevant. However, the entropy requirements differ based on the type of identifier:
- Extrinsic abstract identifiers need enough entropy to prevent accidental collisions in normal operation.
- Intrinsic abstract identifiers must also resist adversarial forging attempts, requiring significantly higher entropy.
From an information-theoretic perspective, the length of an identifier determines the maximum amount of entropy it can encode. For example, a 128-bit identifier can represent ( 2^{128} ) unique values, which is sufficient to make collisions statistically negligible even for large-scale systems.
For intrinsic identifiers, 256 bits is widely considered sufficient when modern cryptographic hash functions (e.g., SHA-256) are used. These hash functions provide strong guarantees of collision resistance, preimage resistance, and second-preimage resistance. Even in the event of weaknesses being discovered in a specific algorithm, it is more practical to adopt a new hash function than to increase the bit size of identifiers.
Additionally, future advances such as quantum computing are unlikely to undermine this length. Grover's algorithm would halve the effective security of a 256-bit hash, reducing it to ( 2^{128} ) operations—still infeasible with current or theoretical technology. As a result, 256 bits remains a future-proof choice for intrinsic identifiers.
Such 256-bit intrinsic identifiers are represented by the types
triblespace::core::value::schemas::hash::Hash and
triblespace::core::value::schemas::hash::Handle.
Not every workflow needs cryptographic strength. We therefore ship three high-entropy abstract identifier families—RNGID, UFOID, and FUCID—that keep 128 bits of global uniqueness while trading off locality, compressibility, and predictability to suit different scenarios.
Comparison of Identifier Types
| RNGID | UFOID | FUCID | |
|---|---|---|---|
| Global entropy | 128 bits | 96 bits random + timestamp | 128 bits |
| Locality | None | High (time-ordered) | High (monotonic counter) |
| Compression friendliness | None | Low | High |
| Predictability | None | Low (reveals mint time) | High (per-source sequence) |
Example: Scientific Publishing
Consider the case of published scientific papers. Each artifact, such as a .html
or .pdf file, should be identified by its abstract intrinsic identifier,
typically a cryptographic hash of its content. This ensures that any two
entities referencing the same hash are referring to the exact same version of
the artifact, providing immutability and validation.
Across different versions of the same paper, an abstract extrinsic identifier can tie these artifacts together as part of one logical entity. The identifier provides continuity regardless of changes to the paper’s content over time.
Semantic (human-readable) identifiers, such as abbreviations in citations or bibliographies, are scoped to individual papers and provide context-specific usability for readers. These names do not convey identity but serve as a way for humans to reference the persistent abstract identifiers that underlie the system.
Sadly the identifiers used in practice, such as DOIs, fail to align with these principles and strengths. They attempt to provide global extrinsic semantic identifiers for scientific papers, an ultimately flawed approach. They lack the associated guarantees of intrinsic identifiers and bring all the challenges of semantic identifiers. With their scope defined too broadly and their authority centralized, they fail to live up to the potential of distributed systems.
ID Ownership
In distributed systems, consistency requires monotonicity due to the CALM principle. However, this is not necessary for single writer systems. By assigning each ID an owner, we ensure that only the current owner can write new information about an entity associated with that ID. This allows for fine-grained synchronization and concurrency control.
To create a transaction, you can uniquely own all entities involved and write new data for them simultaneously. Since there can only be one owner for each ID at any given time, you can be confident that no other information has been written about the entities in question.
By default, all minted ExclusiveIds are associated with the thread they are dropped from.
These IDs can be found in queries via the local_ids function.
Once the IDs are back in scope you can either work with them directly as
ExclusiveIds or move them into an explicit
IdOwner for a longer lived transaction. The example
below shows both approaches in action:
#![allow(unused)] fn main() { use triblespace::examples::literature; use triblespace::prelude::*; let mut kb = TribleSet::new(); { let isaac = ufoid(); let jules = ufoid(); kb += entity! { &isaac @ literature::firstname: "Isaac", literature::lastname: "Asimov", }; kb += entity! { &jules @ literature::firstname: "Jules", literature::lastname: "Verne", }; } // `isaac` and `jules` fall back to this thread's implicit IdOwner here. let mut txn_owner = IdOwner::new(); let mut updates = TribleSet::new(); for (author, name) in find!( (author: ExclusiveId, name: String), and!( local_ids(author), pattern!(&kb, [{ ?author @ literature::firstname: ?name }]) ) ) { // `author` is an ExclusiveId borrowed from the implicit thread owner. let author_id = txn_owner.insert(author); { let borrowed = txn_owner .borrow(&author_id) .expect("the ID was inserted above"); updates += entity! { &borrowed @ literature::lastname: name.clone() }; } // `borrowed` drops here and returns the ID to `txn_owner`. } }
Sometimes you want to compare two attributes without exposing the comparison
variable outside the pattern. Prefixing the binding with _?, such as
_?name, allocates a scoped variable local to the macro invocation. Both
pattern! and pattern_changes! will reuse the same generated query variable
whenever the _? form appears again, letting you express equality constraints
inline without touching the outer find! signature.
Binding the variable as an ExclusiveId means the
closure that find! installs will run the
FromValue implementation for ExclusiveId.
FromValue simply unwraps TryFromValue, which
invokes Id::aquire and would panic if the current
thread did not own the identifier. The
local_ids constraint keeps the query safe by only
enumerating IDs already owned by this thread. In the example we immediately
move the acquired guard into txn_owner, enabling subsequent calls to
IdOwner::borrow that yield
OwnedIds. Dropping an OwnedId automatically returns
the identifier to its owner so you can borrow it again later. If you only need
the ID for a quick update you can skip the explicit owner entirely, bind the
variable as a plain Id, and call
Id::aquire when exclusive access is required.
Ownership and Eventual Consistency
While a simple grow set like the history stored in a Head
already constitutes a conflict-free replicated data type (CRDT), it is also limited in expressiveness.
To provide richer semantics while guaranteeing conflict-free mergeability we allow only
"owned" IDs to be used in the entity position of newly generated triples.
As owned IDs are [Send] but not [Sync] owning a
set of them essentially constitutes a single writer transaction domain,
allowing for some non-monotonic operations like if-does-not-exist, over
the set of contained entities. Note that this does not make operations that
would break CALM (consistency as logical monotonicity) safe, e.g. delete.
The trible module defines the Trible struct, the smallest unit of
knowledge the system stores. Instances of Tribles live inside
TribleSets, which index each fact in several complementary ways so that
queries can be answered with as little work as possible.
┌────────────────────────────64 byte───────────────────────────┐
┌──────────────┐┌──────────────┐┌──────────────────────────────┐
│ entity-id ││ attribute-id ││ inlined value │
└──────────────┘└──────────────┘└──────────────────────────────┘
└────16 byte───┘└────16 byte───┘└────────────32 byte───────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶
At a high level a trible is a three-tuple consisting of an entity, an attribute, and a value. The entity and attribute are both 128‑bit abstract extrinsic identifiers as described in [crate::id], while the value is an arbitrary 256‑bit [crate::value::Value]. The value width deliberately matches the size of an intrinsic identifier so larger payloads can be referenced via blobs without inflating the inlined representation.
Abstract identifiers
Entities and attributes are purely extrinsic; their identifiers do not encode any meaning beyond uniqueness. An entity may accrue additional tribles over time and attributes simply name relationships without prescribing a schema. This keeps the format agnostic to external ontologies and minimises accidental coupling between datasets.
The value slot can carry any 256‑bit payload. Its size is dictated by the need to embed an intrinsic identifier for out‑of‑line data. When a fact exceeds this space the value typically stores a blob handle pointing to the larger payload.
Tribles are stored as a contiguous 64‑byte array with the entity occupying the first 16 bytes, the attribute the next 16, and the value the final 32 bytes. The name "trible" is a portmanteau of triple and byte and is pronounced like "tribble" from Star Trek – hence the project's mascot, Robert the tribble. This rigid layout keeps the representation friendly to SIMD optimisations and allows the storage layer to compute sizes deterministically.
Index permutations
TribleSets index each fact under all six permutations of entity (E),
attribute (A) and value (V) so any combination of bound variables can be
resolved efficiently. Regardless of which columns a query fixes the
planner can reach matching leaves with a handful of comparisons:
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│ EAV │ │ EVA │ │ AEV │ │ AVE │ │ VEA │ │ VAE │
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │ │ │
┌───────────────────────────────────────────────────────┐
│ order-specific inner nodes │
└───────────────────────────────────────────────────────┘
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
┌───────────────────────────────────────────────────────┐
│ SHARED LEAVES │
│ single canonical E–A–V tribles used by all │
└───────────────────────────────────────────────────────┘
Each permutation maintains its own inner nodes, but all six share leaf nodes containing the 64‑byte trible. This avoids a naïve six‑fold memory cost while still letting the query planner pick the most selective ordering, keeping joins resistant to skew even when cardinalities vary widely.
Advantages
- A total order over tribles enables efficient storage and canonicalisation.
- Simple byte‑wise segmentation supports indexing and querying without an interning mechanism, keeping memory usage low and parallelisation easy while avoiding the need for garbage collection.
- Schemas describe the value portion directly, making serialisation and deserialisation straightforward.
- The fixed 64‑byte layout makes it easy to estimate the physical size of a dataset as a function of the number of tribles stored.
- The minimalistic design aims to minimise entropy while retaining collision resistance, making it likely that a similar format would emerge through convergent evolution and could serve as a universal data interchange format.
Set operations and monotonic semantics
TribleSets provide familiar set-theoretic helpers such as
TribleSet::union,
TribleSet::intersection
and
TribleSet::difference.
union consumes the right-hand operand and merges its contents into the
receiver in place, while intersection and difference each produce a fresh
TribleSet without mutating their inputs. Together these helpers make it
straightforward to merge datasets, locate their overlap or identify the facts
that still need to propagate between replicas while keeping the original
sources intact.
This design reflects the crate's commitment to CALM-friendly, monotonic
semantics. New information can be added freely, but existing facts are never
destroyed. Consequently, difference is intended for comparing snapshots
(e.g. "which facts are present in the remote set that I have not yet
indexed?") rather than for destructive deletion. This keeps workflows
declarative and convergent: sets can be combined in any order without
introducing conflicts, and subtraction simply reports the gaps that remain to
be filled.
Direction and consistency
In many triple stores the direction of an edge is chosen incidentally—there
is no intrinsic preference for hasColor over colorOf. This ambiguity often
leads to confusion, duplication, or both as different writers pick different
conventions. Common mitigations either mirror every edge automatically (as
done by OWL and RDF through inverseOf, doubling storage or demanding runtime
inference) or devolve into bikeshedding about the "correct" orientation.
tribles avoids that trap by giving edge direction explicit semantics: the
arrow points from the entity making the claim to the entity being described.
The observer owns the identifier and is responsible for the consistency of the
facts it asserts—see ID Ownership. This rule naturally fits the
distributed setting where each entity has a single authoritative writer.
Viewed another way, edges always flow from describing to described entities,
while cycles represent consensus between the parties involved. For example,
hasColor must point from the object that exhibits the colour to the entity
representing that colour. The orientation is therefore a consequence of the
statement's meaning, not an arbitrary modelling choice.
Blobs
Blobs are immutable sequences of bytes used whenever data no longer fits into
the fixed 256‑bit value slot of a trible. Instead of treating these payloads as
untyped binary blobs, Tribles keeps track of their structure via BlobSchema.
Much like ValueSchema drives how values are serialized into a trible, a
BlobSchema defines how to encode and decode rich data into a byte sequence.
When to reach for blobs
Values and tribles capture compact facts – identifiers, timestamps, counters –
in a fixed width. Whenever information grows beyond that footprint, blobs carry
the payload while tribles continue to reference it. Common use cases include
documents, media assets, serialized entity archives, or even domain specific
binary formats. Because blobs are content addressed, the same payload stored
twice automatically deduplicates to the same handle. In the in-memory
implementation this falls straight out of the code: MemoryBlobStore::insert
keeps a BTreeMap keyed by the handle and simply reuses the existing entry
when the same digest shows up again.
Handles, schemas, and stores
Blobs live in a BlobStore. The store provides persistent storage and a
content hash, determined by the selected HashProtocol, that acts as a stable
handle. Handles can be embedded into tribles just like any other value so they
benefit from the existing querying machinery. A handle couples the blob's hash
with its BlobSchema so consumers always know how to deserialize the
referenced bytes.
Converting Rust types to blobs is infallible in practice, therefore the ToBlob
and TryFromBlob traits are the most common helpers. The TryToBlob and
FromBlob variants have been dropped to keep the API surface small without
losing ergonomics.
End‑to‑end example
The following example demonstrates creating blobs, archiving a TribleSet and
signing its contents:
#![allow(unused)] fn main() { use triblespace::prelude::*; use triblespace::examples::literature; use triblespace::repo; use valueschemas::{Handle, Blake3}; use blobschemas::{SimpleArchive, LongString}; use rand::rngs::OsRng; use ed25519_dalek::{Signature, Signer, SigningKey}; // Build a BlobStore and fill it with some data. let mut memory_store: MemoryBlobStore<Blake3> = MemoryBlobStore::new(); let book_author_id = fucid(); let quote_a: Value<Handle<Blake3, LongString>> = memory_store .put("Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic.") .unwrap(); let quote_b = memory_store .put("I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.") .unwrap(); let set = entity!{ literature::title: "Dune", literature::author: &book_author_id, literature::quote: quote_a, literature::quote: quote_b }; // Serialize the TribleSet and store it as another blob. The resulting // handle points to the archived bytes and keeps track of its schema. let archived_set_handle: Value<Handle<Blake3, SimpleArchive>> = memory_store.put(&set).unwrap(); let mut csprng = OsRng; let commit_author_key: SigningKey = SigningKey::generate(&mut csprng); let signature: Signature = commit_author_key.sign( &memory_store .reader() .unwrap() .get::<Blob<SimpleArchive>, SimpleArchive>(archived_set_handle) .unwrap() .bytes, ); // Store the handle in another TribleSet so the archived content can be // referenced alongside metadata and cryptographic proofs. let _meta_set = entity!{ repo::content: archived_set_handle, repo::short_message: "Initial commit", repo::signed_by: commit_author_key.verifying_key(), repo::signature_r: signature, repo::signature_s: signature, }; }
Blobs complement tribles and values by handling large payloads while keeping the core data structures compact. Embedding handles into entities ties together structured metadata and heavyweight data without breaking immutability or introducing duplication. This division of labor lets tribles focus on querying relationships while BlobStores take care of storage concerns such as hashing, deduplication, and retrieval.
PATCH
The Persistent Adaptive Trie with Cuckoo-compression and Hash-maintenance (PATCH) is Trible Space’s workhorse for set operations. It combines three core ideas:
- Persistence. Updates clone only the modified path, so existing readers keep a consistent view while writers continue mutating. The structure behaves like an immutable value with copy-on-write updates.
- Adaptive width. Every node is conceptually 256-ary, yet the physical footprint scales with the number of occupied children.
- Hash maintenance. Each subtree carries a 128-bit fingerprint that allows set operations to skip identical branches early.
Together these properties let PATCH evaluate unions, intersections, and differences quickly while staying cache friendly and safe to clone.
Node layout
Traditional Adaptive Radix Trees (ART) use specialised node types (Node4,
Node16, Node48, …) to balance space usage against branching factor. PATCH
instead stores every branch in the same representation:
- The
Branchheader tracks the first depth where the node diverges (end_depth) and caches a pointer to a representative child leaf (childleaf). These fields give PATCH its path compression — a branch can cover several key bytes, and we only expand into child tables once the children disagree belowend_depth. - Children live in a byte-oriented cuckoo hash table backed by a single
slice of
Option<Head>. Each bucket holds two slots and the table grows in powers of two up to 256 entries.
Insertions reuse the generic modify_child helper, which drives the cuckoo loop
and performs copy-on-write if a branch is shared. When the existing allocation
is too small we allocate a larger table with the same layout, migrate the
children, and update the owning pointer in place. Because every branch uses the
same structure we avoid the tag soup and pointer chasing that ARTs rely on while
still adapting to sparse and dense fan-out.
Resizing strategy
PATCH relies on two hash functions: an identity map and a pseudo-random
permutation sampled once at startup. Both hashes feed a simple compressor that
masks off the unused high bits for the current table size. Doubling the table
therefore only exposes one more significant bit, so each child either stays in
its bucket or moves to the partner bucket index + old_bucket_count.
The byte_table_resize_benchmark demonstrates how densely the table can fill
before resizing. The benchmark inserts all byte values repeatedly and records the
occupancy that forced each power-of-two table size to grow:
ByteTable resize fill - random: 0.863, sequential: 0.972
Per-size fill (random)
size 2: 1.000 # path compression keeps two-entry nodes fully occupied
size 4: 0.973
size 8: 0.899
size 16: 0.830
size 32: 0.749
size 64: 0.735
size 128: 0.719
size 256: 1.000 # identity hash maps all 256 children without resizing
Per-size fill (sequential)
size 2: 1.000 # path compression keeps two-entry nodes fully occupied
size 4: 1.000
size 8: 0.993
size 16: 1.000
size 32: 0.928
size 64: 0.925
size 128: 0.927
size 256: 1.000 # identity hash maps all 256 children without resizing
Random inserts average roughly 86 % table fill while sequential inserts stay near 97 % before the next doubling. Small nodes stay compact because the path-compressed header only materialises a table when needed, while the largest table reaches full occupancy without growing past 256 entries. These predictable fill factors keep memory usage steady without ART’s specialised node types.
Hash maintenance
Every leaf stores a SipHash-2-4 fingerprint of its key, and each branch XORs
its children’s 128-bit hashes together. On insert or delete the previous hash
contribution is XORed out and the new value XORed in, so updates run in constant
time. Set operations such as difference compare these fingerprints first:
matching hashes short-circuit because the subtrees are assumed identical, while
differing hashes force a walk to compute the result. SipHash collisions are
astronomically unlikely for these 128-bit values, so the shortcut is safe in
practice.
Consumers can reorder or segment keys through the KeySchema
and KeySegmentation traits. Prefix queries reuse the
schema’s tree ordering to walk just the matching segments. Because every update
is implemented with copy-on-write semantics, cloning a tree is cheap and retains
structural sharing: multiple workspaces can branch, mutate independently, and
merge results without duplicating entire datasets.
Pile Format
The on-disk pile keeps every blob and branch in one append-only file. This layout provides a simple write-ahead log style database where new data is only appended. It allows both blob and branch storage in a single file while remaining resilient to crashes. The pile backs local repositories and acts as a durable content‑addressed store. The pile file can be memory mapped for fast reads and is safely shared between threads because existing bytes are never mutated.
While large databases often avoid mmap due to pitfalls with partial writes
and page cache thrashing [1], this
design works well for the pile's narrow usage pattern.
Design Rationale
This format emphasizes simplicity over sophisticated on-disk structures.
Appending new blobs rather than rewriting existing data keeps corruption
windows small and avoids complicated page management. Storing everything in a
single file makes a pile easy to back up or replicate over simple transports
while still allowing it to be memory mapped for fast reads. The 64 byte
alignment ensures each entry begins on a cache line boundary, which improves
concurrent access patterns and allows safe typed views with the zerocopy
crate.
Immutability Assumptions
A pile is treated as an immutable append-only log. Once a record sits below a process's applied offset, its bytes are assumed permanent. The implementation does not guard against mutations; modifying existing bytes is undefined behavior. Only the tail beyond the applied offset might hide a partial append after a crash, so validation and repair only operate on that region. Each record's validation state is cached for the lifetime of the process under this assumption.
Hash verification only happens when blobs are read. Opening even a very large pile is therefore fast while still catching corruption before data is used.
Every record begins with a 16 byte magic marker that identifies whether it stores a blob or a branch. The sections below illustrate the layout of each type.
Usage
A pile typically lives as a .pile file on disk. Repositories open it through
Pile::open and then call refresh to load existing
records or restore to repair after a crash. Multiple
threads may share the same handle thanks to internal synchronisation, making a
pile a convenient durable store for local development. Blob appends use a single
O_APPEND write. Each handle remembers the last offset it processed and, after
appending, scans any gap left by concurrent writes before advancing this
applied_length. Writers may race and duplicate blobs, but content addressing
keeps the data consistent. Each handle tracks hashes of pending appends
separately so repeated writes are deduplicated until a refresh. A restore
clears this in-memory set to discard truncated appends. Branch updates only record
the referenced hash and do not verify that the corresponding blob exists in the
pile, so a pile may act as a head-only store when blob data resides elsewhere.
Updating branch heads requires a brief critical
section: flush → refresh → lock → refresh → append → unlock. The initial
refresh acquires a shared lock so it cannot race with restore, which takes an
exclusive lock before truncating a corrupted tail.
Filesystems lacking atomic write/vwrite appends—such as some network or
FUSE-based implementations—cannot safely host multiple writers and are not
supported. Using such filesystems risks pile corruption.
Blob Storage
8 byte 8 byte
┌────16 byte───┐┌──────┐┌──────┐┌────────────32 byte───────────┐
┌ ┌──────────────┐┌──────┐┌──────┐┌──────────────────────────────┐
header │ │magic number A││ time ││length││ hash │
└ └──────────────┘└──────┘└──────┘└──────────────────────────────┘
┌────────────────────────────64 byte───────────────────────────┐
┌ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
│ │ │
payload │ │ bytes (64byte aligned and padded) │
│ │ │
└ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┘
Each blob entry records its creation timestamp, the length of the payload (which may be zero) and
its hash. The payload is padded so the next record begins on a
64 byte boundary. The Pile Blob Metadata chapter explains how to
query these fields through the PileReader API.
Branch Storage
┌────16 byte───┐┌────16 byte───┐┌────────────32 byte───────────┐
┌ ┌──────────────┐┌──────────────┐┌──────────────────────────────┐
header │ │magic number B││ branch id ││ hash │
└ └──────────────┘└──────────────┘└──────────────────────────────┘
Branch entries map a branch identifier to the hash of a blob.
Recovery
Calling refresh scans an existing file to ensure
every header uses a known marker and that the whole record fits. It does not
verify any hashes. If a truncated or unknown block is found the function reports
the number of bytes that were valid so far using
[ReadError::CorruptPile].
If the file shrinks between scans into data that has already been applied, the
process aborts immediately. Previously returned Bytes handles would dangle and
continuing could cause undefined behavior, so truncation into validated data is
treated as unrecoverable.
refresh holds a shared file lock while scanning. This prevents a concurrent
restore call from truncating the file out from under
the reader.
The restore helper re-runs the same validation and
truncates the file to the valid length if corruption is encountered. This
recovers from interrupted writes by discarding incomplete data. Hash
verification happens lazily only when individual blobs are loaded so that
opening a large pile remains fast.
For more details on interacting with a pile see the Pile struct
documentation.
Pile Blob Metadata
Every blob stored in a pile begins with a header that records the moment it was
written and the length of the payload that follows. The Pile implementation
surfaces that information so tools can answer questions such as "when did this
blob arrive?" without re-parsing the file on disk.
BlobMetadata
Blob metadata is exposed through the BlobMetadata struct. It is
a simple value type containing two public fields:
timestamp: the write time stored in the blob header. The pile records the number of milliseconds since the Unix epoch when the blob was appended.length: the size of the blob payload in bytes. Padding that aligns entries to 64 byte boundaries is excluded from this value, so it matches the slice returned byPileReader::get.
Looking up blob metadata
PileReader::metadata accepts the same Value<Handle<_, _>> that other blob
store APIs use. It returns Some(BlobMetadata) when the blob exists and the
stored bytes hash to the expected value; otherwise it yields None.
Readers operate on the snapshot that was current when they were created. Call
Pile::refresh and request a new reader to observe blobs appended
afterwards.
use anybytes::Bytes; use triblespace::blob::schemas::UnknownBlob; use triblespace::blob::Blob; use triblespace::repo::pile::Pile; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut pile = Pile::open("/tmp/example.pile")?; let blob = Blob::<UnknownBlob>::new(Bytes::from_static(b"hello world")); let handle = pile.put(blob)?; let reader = pile.reader()?; if let Some(meta) = reader.metadata(handle) { println!( "Blob length: {} bytes, appended at {} ms", meta.length, meta.timestamp ); } Ok(()) }
Failure cases
metadata returns None in a few situations:
- the handle does not correspond to any blob stored in the pile;
- the reader snapshot predates the blob (refresh the pile and create a new reader to see later writes);
- validation previously failed because the on-disk bytes did not match the recorded hash, for example after the pile file was corrupted before this process opened it.
When None is returned, callers can treat it the same way they would handle a
missing blob from get: the data is considered absent from the snapshot they are
reading.
For more detail on how the metadata is laid out on disk, see the Pile Format chapter.