Schemas
TribleSpace stores data in strongly typed values and blobs. A schema
describes the language‑agnostic byte layout for these types: [Value]s always
occupy exactly 32 bytes while [Blob]s may be any length. Schemas translate
those raw bytes to concrete application types and decouple persisted data from a
particular implementation. This separation lets you refactor to new libraries or
frameworks without rewriting what's already stored or coordinating live
migrations. The crate ships with a collection of ready‑made schemas located in
triblespace::core::value::schemas and
triblespace::core::blob::schemas.
When data crosses the FFI boundary or is consumed by a different language, the schema is the contract both sides agree on. Consumers only need to understand the byte layout and identifier to read the data—they never have to link against your Rust types. Likewise, the Rust side can evolve its internal representations—add helper methods, change struct layouts, or introduce new types—without invalidating existing datasets.
Why 32 bytes?
Storing arbitrary Rust types requires a portable representation. Instead of human‑readable identifiers like RDF's URIs, Tribles uses a fixed 32‑byte array for all values. This size provides enough entropy to embed intrinsic identifiers—typically cryptographic hashes—when a value references data stored elsewhere in a blob. Keeping the width constant avoids platform‑specific encoding concerns and makes it easy to reason about memory usage.
Conversion traits
Schemas define how to convert between raw bytes and concrete Rust types. The
conversion traits ToValue/TryFromValue/TryToValue live on
the schema types rather than on Value itself, avoiding orphan‑rule issues when
supporting external data types. The Value wrapper treats its bytes as opaque;
schemas may validate them or reject invalid patterns during conversion.
Fallible conversions (TryFromValue / TryToValue) are particularly useful for
schemas that must validate invariants, such as checking that a timestamp falls
within a permitted range or ensuring reserved bits are zeroed. Returning a
domain‑specific error type keeps validation logic close to the serialization
code.
#![allow(unused)] fn main() { use triblespace::core::value::schemas::shortstring::ShortString; use triblespace::core::value::{TryFromValue, TryToValue, Value}; struct Username(String); impl TryToValue<ShortString> for Username { type Error = &'static str; fn try_to_value(self) -> Result<Value<ShortString>, Self::Error> { if self.0.is_empty() { Err("username must not be empty") } else { self.0 .as_str() .try_to_value() .map_err(|_| "username too long or contains NULs") } } } impl TryFromValue<'_, ShortString> for Username { type Error = &'static str; fn try_from_value(value: &Value<ShortString>) -> Result<Self, Self::Error> { String::try_from_value(value) .map(Username) .map_err(|_| "invalid utf-8 or too long") } } }
Schema identifiers
Every schema declares a unique 128‑bit identifier via the shared
ConstId::ID constant (for example, <ShortString as ConstId>::ID).
Persisting these IDs keeps serialized data self describing so other tooling can
make sense of the payload without linking against your Rust types. Dynamic
language bindings (like the Python crate) inspect the stored schema identifier
to choose the correct decoder, while internal metadata stored inside Trible
Space can use the same IDs to describe which schema governs a value, blob, or
hash protocol.
Identifiers also make it possible to derive deterministic attribute IDs when you
ingest external formats. Helpers such as Attribute::<S>::from_name("field")
combine the schema ID with the source field name to create a stable attribute so
re-importing the same data always targets the same column.
The attributes! macro can use the same derivation when you omit the 128-bit id
literal, which is useful for quick experiments or internal attributes; for
schema that will be shared across binaries or languages prefer explicit ids so
the column remains stable even if the attribute name later changes.
Built‑in value schemas
The crate provides the following value schemas out of the box:
GenId– an abstract 128 bit identifier.ShortString– a UTF-8 string up to 32 bytes.U256BE/U256LE– 256-bit unsigned integers.I256BE/I256LE– 256-bit signed integers.R256BE/R256LE– 256-bit rational numbers.F64– IEEE-754 double-precision floating point number (little-endian).F256BE/F256LE– 256-bit floating point numbers.HashandHandle– cryptographic digests and blob handles (seehash.rs).ED25519RComponent,ED25519SComponentandED25519PublicKey– signature fields and keys.NsTAIIntervalto encode time intervals.UnknownValueas a fallback when no specific schema is known.
#![allow(unused)] fn main() { use triblespace::prelude::*; use triblespace::core::metadata::ConstId; use triblespace::core::value::schemas::shortstring::ShortString; use triblespace::core::value::{ToValue, ValueSchema}; let v: Value<ShortString> = "hi".to_value(); let raw_bytes = v.raw; // Persist alongside the schema's metadata id. let schema_id = <ShortString as ConstId>::ID; }
Built‑in blob schemas
The crate also ships with these blob schemas:
LongStringfor arbitrarily long UTF‑8 strings.FileBytesfor opaque file-backed byte payloads.SimpleArchivewhich stores a raw sequence of tribles.SuccinctArchiveBlobwhich stores theSuccinctArchiveindex type for offline queries. TheSuccinctArchivehelper exposes high-level iterators while theSuccinctArchiveBlobschema is responsible for the serialized byte layout.WasmCodefor WebAssembly bytecode stored as a blob.UnknownBlobfor data of unknown type.
#![allow(unused)] fn main() { use triblespace::core::metadata::ConstId; use triblespace::core::blob::schemas::longstring::LongString; use triblespace::core::blob::{Blob, BlobSchema, ToBlob}; let b: Blob<LongString> = "example".to_blob(); let schema_id = <LongString as ConstId>::ID; }
Both value and blob schemas can emit optional discovery metadata. Calling
ConstDescribe::describe returns a rooted Fragment (exporting the schema id)
whose facts tag the schema entity with metadata::KIND_VALUE_SCHEMA or
metadata::KIND_BLOB_SCHEMA and may attach a metadata::name and
metadata::description (LongString handles). Persist the description blobs
alongside the metadata tribles if you want the text to remain readable.
Choosing the right schema
When defining an attribute, the schema determines how the 32-byte value slot is interpreted. Use this decision tree to pick the right one:
What are you storing?
│
├─ A reference to another entity?
│ └─ GenId
│
├─ A tag, category, or enum-like classifier?
│ └─ metadata::tag (GenId) — tags are entities with their own ID.
│ Use metadata::name to give them a human-readable label.
│ ⚠ Do NOT define a separate ShortString tag attribute —
│ use the canonical metadata::tag and mint tag IDs.
│
├─ A short label or display name?
│ ├─ Fits in 32 bytes (≤32 UTF-8 bytes)?
│ │ └─ ShortString
│ └─ Longer text?
│ └─ Handle<Blake3, LongString> (blob)
│
├─ A number?
│ ├─ Integer
│ │ ├─ Fits in 64 bits? → U256BE (zero-extended) or custom u64 schema
│ │ └─ Needs full 256 bits? → U256BE / I256BE
│ ├─ Floating point
│ │ ├─ Standard double? → F64
│ │ └─ Extended precision? → F256BE
│ └─ Rational? → R256
│
├─ A timestamp or time range?
│ └─ NsTAIInterval
│
├─ A cryptographic value?
│ ├─ Content hash? → Hash<Blake3>
│ ├─ Reference to a blob? → Handle<Blake3, BlobSchema>
│ └─ Signature? → ED25519RComponent / ED25519SComponent / ED25519PublicKey
│
├─ A file or binary payload?
│ └─ Handle<Blake3, FileBytes> (blob)
│
├─ A large structured dataset?
│ └─ Handle<Blake3, SimpleArchive> (blob, stores a TribleSet)
│
└─ Something else?
├─ Fits in 32 bytes? → define a custom ValueSchema
└─ Larger? → define a custom BlobSchema + use Handle
Rules of thumb:
- If two values should be joinable (appear in the same query variable), they must share a schema. Choose the most specific schema that covers both uses.
- Prefer
ShortStringoverLongStringwhen the text fits — inline values avoid a blob lookup. - Use
GenIdfor relationships between entities. Never store entity references as strings. - When in doubt between a value schema and a blob, ask: "will I ever want to query or join on this directly?" If yes, it should be a value. If it's opaque content you just retrieve, use a blob handle.
Defining new schemas
Custom formats implement [ValueSchema] or [BlobSchema]. A unique identifier
serves as the schema ID. The example below defines a little-endian u64 value
schema and a simple blob schema for arbitrary bytes.
pub struct U64LE;
impl ConstId for U64LE {
const ID: Id = id_hex!("0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A");
}
impl ConstDescribe for U64LE {}
impl ValueSchema for U64LE {
type ValidationError = Infallible;
}
impl ToValue<U64LE> for u64 {
fn to_value(self) -> Value<U64LE> {
let mut raw = [0u8; VALUE_LEN];
raw[..8].copy_from_slice(&self.to_le_bytes());
Value::new(raw)
}
}
impl TryFromValue<'_, U64LE> for u64 {
type Error = std::convert::Infallible;
fn try_from_value(v: &Value<U64LE>) -> Result<Self, std::convert::Infallible> {
Ok(u64::from_le_bytes(v.raw[..8].try_into().unwrap()))
}
}
pub struct BytesBlob;
impl ConstId for BytesBlob {
const ID: Id = id_hex!("B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0");
}
impl ConstDescribe for BytesBlob {}
impl BlobSchema for BytesBlob {}
impl ToBlob<BytesBlob> for Bytes {
fn to_blob(self) -> Blob<BytesBlob> {
Blob::new(self)
}
}
impl TryFromBlob<BytesBlob> for Bytes {
type Error = Infallible;
fn try_from_blob(b: Blob<BytesBlob>) -> Result<Self, Self::Error> {
Ok(b.bytes)
}
}
See examples/custom_schema.rs for the full
source.
Versioning and evolution
Schemas form part of your persistence contract. When evolving them consider the following guidelines:
- Prefer additive changes. Introduce a new schema identifier when breaking compatibility. Consumers can continue to read the legacy data while new writers use the replacement ID.
- Annotate data with migration paths. Store both the schema ID and a
logical version number if the consumer needs to know which rules to apply.
UnknownValue/UnknownBloballow you to safely defer decoding until a newer binary is available. - Keep validation centralized. Place invariants in your schema conversions so migrations cannot accidentally create invalid values.
By keeping schema identifiers alongside stored values and blobs you can roll out new representations incrementally: ship readers that understand both IDs, update your import pipelines, and finally switch writers once everything recognizes the replacement schema.
Value formatters (WASM)
Binary formats are great for portability and performance, but they can be painful to inspect if you don’t know the schema ahead of time. TribleSpace supports an optional schema-level formatter mechanism: a value schema can point to a small sandboxed WebAssembly module that turns its raw 32 bytes into a human-readable string.
The formatter is stored as a blob (blobschemas::WasmCode) and referenced from
the schema identifier entity via the metadata attribute metadata::value_formatter.
The built-in runner lives behind the wasm feature flag (enabled by default in
the triblespace facade crate) and uses wasmi with tight limits (fuel, memory
pages, output size). Modules must not import anything and use the following
minimal ABI:
memory(linear memory)format(w0: i64, w1: i64, w2: i64, w3: i64) -> i64
The format arguments are the raw 32 bytes split into 4×8-byte chunks
(little-endian). The return value packs the output pointer and output length:
- Success returns
(output_len << 32) | output_ptrwithoutput_ptr != 0. - Failure returns
(error_code << 32) | 0(i.e.output_ptr == 0).
The core crate can optionally ship built-in formatters for its built-in value
schemas. Enable the wasm feature to have
ConstDescribe::describe (which is fallible) attach metadata::value_formatter entries for the
standard schemas. This feature requires the wasm32-unknown-unknown Rust
target at build time because the bundled formatters are compiled to WebAssembly
via the #[value_formatter] proc macro.