Trible Structure - Tribles Book

The trible module defines the Trible struct, the smallest unit of knowledge the system stores. Instances of Tribles live inside TribleSets, which index each fact in several complementary ways so that queries can be answered with as little work as possible.

┌────────────────────────────64 byte───────────────────────────┐
┌──────────────┐┌──────────────┐┌──────────────────────────────┐
│  entity-id   ││ attribute-id ││        inlined value         │
└──────────────┘└──────────────┘└──────────────────────────────┘
└────16 byte───┘└────16 byte───┘└────────────32 byte───────────┘
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─▶

At a high level a trible is a three-tuple consisting of an entity, an attribute, and a value. The entity and attribute are both 128‑bit abstract extrinsic identifiers as described in [crate::id], while the value is an arbitrary 256‑bit [crate::value::Value]. The value width deliberately matches the size of an intrinsic identifier so larger payloads can be referenced via blobs without inflating the inlined representation.

Abstract identifiers

Entities and attributes are purely extrinsic; their identifiers do not encode any meaning beyond uniqueness. An entity may accrue additional tribles over time and attributes simply name relationships without prescribing a schema. This keeps the format agnostic to external ontologies and minimises accidental coupling between datasets.

The value slot can carry any 256‑bit payload. Its size is dictated by the need to embed an intrinsic identifier for out‑of‑line data. When a fact exceeds this space the value typically stores a blob handle pointing to the larger payload.

Tribles are stored as a contiguous 64‑byte array with the entity occupying the first 16 bytes, the attribute the next 16, and the value the final 32 bytes. The name "trible" is a portmanteau of triple and byte and is pronounced like "tribble" from Star Trek – hence the project's mascot, Robert the tribble. This rigid layout keeps the representation friendly to SIMD optimisations and allows the storage layer to compute sizes deterministically.

Index permutations

TribleSets index each fact under all six permutations of entity (E), attribute (A) and value (V) so any combination of bound variables can be resolved efficiently. Regardless of which columns a query fixes the planner can reach matching leaves with a handful of comparisons:

┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐
│ EAV │  │ EVA │  │ AEV │  │ AVE │  │ VEA │  │ VAE │
└──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘
   │        │        │        │        │        │
┌───────────────────────────────────────────────────────┐
│            order-specific inner nodes                 │
└───────────────────────────────────────────────────────┘ 
   │        │        │        │        │        │
   ▼        ▼        ▼        ▼        ▼        ▼

┌───────────────────────────────────────────────────────┐
│                   SHARED LEAVES                       │
│     single canonical E–A–V tribles used by all        │
└───────────────────────────────────────────────────────┘

Each permutation maintains its own inner nodes, but all six share leaf nodes containing the 64‑byte trible. This avoids a naïve six‑fold memory cost while still letting the query planner pick the most selective ordering, keeping joins resistant to skew even when cardinalities vary widely.

Advantages

A total order over tribles enables efficient storage and canonicalisation.
Simple byte‑wise segmentation supports indexing and querying without an interning mechanism, keeping memory usage low and parallelisation easy while avoiding the need for garbage collection.
Schemas describe the value portion directly, making serialisation and deserialisation straightforward.
The fixed 64‑byte layout makes it easy to estimate the physical size of a dataset as a function of the number of tribles stored.
The minimalistic design aims to minimise entropy while retaining collision resistance, making it likely that a similar format would emerge through convergent evolution and could serve as a universal data interchange format.

Set operations and monotonic semantics

TribleSets provide familiar set-theoretic helpers such as TribleSet::union, TribleSet::intersection and TribleSet::difference. union consumes the right-hand operand and merges its contents into the receiver in place, while intersection and difference each produce a fresh TribleSet without mutating their inputs. Together these helpers make it straightforward to merge datasets, locate their overlap or identify the facts that still need to propagate between replicas while keeping the original sources intact.

This design reflects the crate's commitment to CALM-friendly, monotonic semantics. New information can be added freely, but existing facts are never destroyed. Consequently, difference is intended for comparing snapshots (e.g. "which facts are present in the remote set that I have not yet indexed?") rather than for destructive deletion. This keeps workflows declarative and convergent: sets can be combined in any order without introducing conflicts, and subtraction simply reports the gaps that remain to be filled.

Direction and consistency

In many triple stores the direction of an edge is chosen incidentally—there is no intrinsic preference for hasColor over colorOf. This ambiguity often leads to confusion, duplication, or both as different writers pick different conventions. Common mitigations either mirror every edge automatically (as done by OWL and RDF through inverseOf, doubling storage or demanding runtime inference) or devolve into bikeshedding about the "correct" orientation.

tribles avoids that trap by giving edge direction explicit semantics: the arrow points from the entity making the claim to the entity being described. The observer owns the identifier and is responsible for the consistency of the facts it asserts—see ID Ownership. This rule naturally fits the distributed setting where each entity has a single authoritative writer.

Viewed another way, edges always flow from describing to described entities, while cycles represent consensus between the parties involved. For example, hasColor must point from the object that exhibits the colour to the entity representing that colour. The orientation is therefore a consequence of the statement's meaning, not an arbitrary modelling choice.