Encodings
TribleSpace stores data in strongly typed values and blobs. An encoding
describes the language‑agnostic byte layout for these types: [Inline]s always
occupy exactly 32 bytes while [Blob]s may be any length. Encodings translate
those raw bytes to concrete application types and decouple persisted data from a
particular implementation. This separation lets you refactor to new libraries or
frameworks without rewriting what's already stored or coordinating live
migrations. The crate ships with a collection of ready‑made encodings located in
triblespace::core::inline::encodings and
triblespace::core::blob::encodings.
When data crosses the FFI boundary or is consumed by a different language, the encoding is the contract both sides agree on. Consumers only need to understand the byte layout and identifier to read the data—they never have to link against your Rust types. Likewise, the Rust side can evolve its internal representations—add helper methods, change struct layouts, or introduce new types—without invalidating existing datasets.
Why 32 bytes?
Storing arbitrary Rust types requires a portable representation. Instead of human‑readable identifiers like RDF's URIs, Tribles uses a fixed 32‑byte array for all values. This size provides enough entropy to embed intrinsic identifiers—typically cryptographic hashes—when a value references data stored elsewhere in a blob. Keeping the width constant avoids platform‑specific encoding concerns and makes it easy to reason about memory usage.
Conversion traits
Conversion goes through the Encodes<Source> trait, which lives on the
encoding (the encoding is the impl target; the source is the trait parameter).
This is the same direction as std's From<T> — and for the same reason: it
trivially satisfies Rust's orphan rule, so you can write
impl Encodes<SomeForeignType> for MyLocalEncoding without any "trait
position 0" gymnastics.
The ergonomic source-side methods .to_inline() / .to_blob() /
.into_encoded() are auto-derived blanket implementations — users never
implement them directly, the same way you never implement Into<T> in Rust:
User implements: Auto-derived via blanket:
Encodes<T> for S IntoEncoded<S> for T (+ IntoInline / IntoBlob aliases)
For fallible conversions where the error type is part of the contract (parsing
a hex string into a hash, validating a timestamp range, rejecting reserved
bits), use TryToInline / TryFromInline — kept as separate traits because the
error type is per‑source.
#![allow(unused)] fn main() { use triblespace::core::inline::encodings::shortstring::ShortString; use triblespace::core::inline::{TryFromInline, TryToInline, Inline}; struct Username(String); impl TryToInline<ShortString> for Username { type Error = &'static str; fn try_to_inline(self) -> Result<Inline<ShortString>, Self::Error> { if self.0.is_empty() { Err("username must not be empty") } else { self.0 .as_str() .try_to_inline() .map_err(|_| "username too long or contains NULs") } } } impl TryFromInline<'_, ShortString> for Username { type Error = &'static str; fn try_from_inline(value: &Inline<ShortString>) -> Result<Self, Self::Error> { String::try_from_inline(value) .map(Username) .map_err(|_| "invalid utf-8 or too long") } } }
Encoding identifiers
Every encoding declares a unique 128‑bit identifier, accessible via the
MetaDescribe::id method (for example, ShortString::id()).
Persisting these IDs keeps serialized data self describing so other tooling can
make sense of the payload without linking against your Rust types. Dynamic
language bindings (like the Python crate) inspect the stored encoding identifier
to choose the correct decoder, while internal metadata stored inside Trible
Space can use the same IDs to describe which encoding governs a value, blob, or
hash protocol.
Identifiers also make it possible to derive deterministic attribute IDs when you
ingest external formats. Wrap the source field name in an entity-core fragment —
Attribute::<S>::from(entity!{ metadata::name: <name handle>, metadata::value_encoding: <S as MetaDescribe>::id() }) —
to combine the encoding ID with the source field name and produce a stable
attribute so re-importing the same data always targets the same column.
The attributes! macro applies the same derivation when you omit the 128-bit id
literal, which is useful for quick experiments or internal attributes; for
encodings that will be shared across binaries or languages prefer explicit ids so
the column remains stable even if the attribute name later changes.
Built‑in inline encodings
The crate provides the following inline encodings out of the box:
GenId– an abstract 128 bit identifier.ShortString– a UTF-8 string up to 32 bytes.U256BE/U256LE– 256-bit unsigned integers.I256BE/I256LE– 256-bit signed integers.R256BE/R256LE– 256-bit rational numbers.F64– IEEE-754 double-precision floating point number (little-endian).F256BE/F256LE– 256-bit floating point numbers.HashandHandle– cryptographic digests and blob handles (seehash.rs).ED25519RComponent,ED25519SComponentandED25519PublicKey– signature fields and keys.NsTAIIntervalto encode time intervals.Boolean– all-zero for false, all-0xFF for true.LineLocation– a(start_line, start_col, end_line, end_col)span encoded as four big-endian u64 values.RangeU128– a half-open(start, end)range of two big-endian u128 values.RangeInclusiveU128– an inclusive(start, end)range of two big-endian u128 values.UnknownInlineas a fallback when no specific encoding is known.
#![allow(unused)] fn main() { use triblespace::prelude::*; use triblespace::core::metadata::MetaDescribe; use triblespace::core::inline::encodings::shortstring::ShortString; use triblespace::core::inline::{IntoInline, InlineEncoding}; let v: Inline<ShortString> = "hi".to_inline(); let raw_bytes = v.raw; // Persist alongside the encoding's metadata id. let encoding_id = ShortString::id(); // derived via describe(&mut scratch).root() }
Built‑in blob encodings
The crate also ships with these blob encodings:
LongStringfor arbitrarily long UTF‑8 strings.RawBytesfor opaque file-backed byte payloads.SimpleArchivewhich stores a raw sequence of tribles.SuccinctArchiveBlobwhich stores theSuccinctArchiveindex type for offline queries. TheSuccinctArchivehelper exposes high-level iterators while theSuccinctArchiveBlobencoding is responsible for the serialized byte layout.WasmCodefor WebAssembly bytecode stored as a blob.UnknownBlobfor data of unknown type.
#![allow(unused)] fn main() { use triblespace::core::metadata::MetaDescribe; use triblespace::core::blob::encodings::longstring::LongString; use triblespace::core::blob::{Blob, BlobEncoding, IntoBlob}; let b: Blob<LongString> = "example".to_blob(); let encoding_id = LongString::id(); // derived via describe(&mut scratch).root() }
Both value and blob encodings can emit optional discovery metadata. Calling
MetaDescribe::describe returns a rooted Fragment (exporting the encoding id)
whose facts tag the encoding entity with metadata::KIND_INLINE_ENCODING or
metadata::KIND_BLOB_ENCODING and may attach a metadata::name and
metadata::description (LongString handles). Persist the description blobs
alongside the metadata tribles if you want the text to remain readable.
Choosing the right encoding
When defining an attribute, the encoding determines how the 32-byte value slot is interpreted. Use this decision tree to pick the right one:
What are you storing?
│
├─ A reference to another entity?
│ └─ GenId
│
├─ A tag, category, or enum-like classifier?
│ └─ metadata::tag (GenId) — tags are entities with their own ID.
│ Use metadata::name to give them a human-readable label.
│ ⚠ Do NOT define a separate ShortString tag attribute —
│ use the canonical metadata::tag and mint tag IDs.
│
├─ A short label or display name?
│ ├─ Fits in 32 bytes (≤32 UTF-8 bytes)?
│ │ └─ ShortString
│ └─ Longer text?
│ └─ Handle<LongString> (blob)
│
├─ A number?
│ ├─ Integer
│ │ ├─ Fits in 64 bits? → U256BE (zero-extended) or custom u64 encoding
│ │ └─ Needs full 256 bits? → U256BE / I256BE
│ ├─ Floating point
│ │ ├─ Standard double? → F64
│ │ └─ Extended precision? → F256BE
│ └─ Rational? → R256
│
├─ A timestamp or time range?
│ └─ NsTAIInterval
│
├─ A cryptographic value?
│ ├─ Content hash? → Hash<Blake3>
│ ├─ Reference to a blob? → Handle<BlobEncoding>
│ └─ Signature? → ED25519RComponent / ED25519SComponent / ED25519PublicKey
│
├─ A file or binary payload?
│ └─ Handle<RawBytes> (blob)
│
├─ A large structured dataset?
│ └─ Handle<SimpleArchive> (blob, stores a TribleSet)
│
└─ Something else?
├─ Fits in 32 bytes? → define a custom InlineEncoding
└─ Larger? → define a custom BlobEncoding + use Handle
Rules of thumb:
- If two values should be joinable (appear in the same query variable), they must share an encoding. Choose the most specific encoding that covers both uses.
- Prefer
ShortStringoverLongStringwhen the text fits — inline values avoid a blob lookup. - Use
GenIdfor relationships between entities. Never store entity references as strings. - When in doubt between an inline encoding and a blob, ask: "will I ever want to query or join on this directly?" If yes, it should be inline. If it's opaque content you just retrieve, use a blob handle.
Defining new encodings
Custom formats implement [InlineEncoding] or [BlobEncoding]. A unique identifier
serves as the encoding ID. The example below defines a little-endian u64
inline encoding and a simple blob encoding for arbitrary bytes.
pub struct U64LE;
impl MetaDescribe for U64LE {
fn describe() -> triblespace::core::trible::Fragment {
let id: Id = id_hex!("0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A");
entity! { ExclusiveId::force_ref(&id) @
metadata::name: "u64le",
metadata::tag: metadata::KIND_INLINE_ENCODING,
}
}
}
impl InlineEncoding for U64LE {
type ValidationError = Infallible;
type Encoding = Self;
}
impl Encodes<u64> for U64LE
{
type Output = Inline<U64LE>;
fn encode(source: u64) -> Inline<U64LE> {
let mut raw = [0u8; INLINE_LEN];
raw[..8].copy_from_slice(&source.to_le_bytes());
Inline::new(raw)
}
}
impl TryFromInline<'_, U64LE> for u64 {
type Error = std::convert::Infallible;
fn try_from_inline(v: &Inline<U64LE>) -> Result<Self, std::convert::Infallible> {
Ok(u64::from_le_bytes(v.raw[..8].try_into().unwrap()))
}
}
pub struct BytesBlob;
impl MetaDescribe for BytesBlob {
fn describe() -> triblespace::core::trible::Fragment {
let id: Id = id_hex!("B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0");
entity! { ExclusiveId::force_ref(&id) @
metadata::name: "bytesblob",
metadata::tag: metadata::KIND_BLOB_ENCODING,
}
}
}
impl BlobEncoding for BytesBlob {}
impl Encodes<Bytes> for BytesBlob
{
type Output = Blob<BytesBlob>;
fn encode(source: Bytes) -> Blob<BytesBlob> {
Blob::new(source)
}
}
impl TryFromBlob<BytesBlob> for Bytes {
type Error = Infallible;
fn try_from_blob(b: Blob<BytesBlob>) -> Result<Self, Self::Error> {
Ok(b.bytes)
}
}
See examples/custom_schema.rs for the full
source.
Versioning and evolution
Schemas form part of your persistence contract. When evolving them consider the following guidelines:
- Prefer additive changes. Introduce a new encoding identifier when breaking compatibility. Consumers can continue to read the legacy data while new writers use the replacement ID.
- Annotate data with migration paths. Store both the encoding ID and a
logical version number if the consumer needs to know which rules to apply.
UnknownInline/UnknownBloballow you to safely defer decoding until a newer binary is available. - Keep validation centralized. Place invariants in your encoding conversions so migrations cannot accidentally create invalid values.
By keeping encoding identifiers alongside stored values and blobs you can roll out new representations incrementally: ship readers that understand both IDs, update your import pipelines, and finally switch writers once everything recognizes the replacement encoding.
Inline formatters (WASM)
Binary formats are great for portability and performance, but they can be painful to inspect if you don’t know the encoding ahead of time. TribleSpace supports an optional encoding-level formatter mechanism: an inline encoding can point to a small sandboxed WebAssembly module that turns its raw 32 bytes into a human-readable string.
The formatter is stored as a blob (blobencodings::WasmCode) and referenced from
the encoding identifier entity via the metadata attribute metadata::value_formatter.
The built-in runner lives behind the wasm feature flag (enabled by default in
the triblespace facade crate) and uses wasmi with tight limits (fuel, memory
pages, output size). Modules must not import anything and use the following
minimal ABI:
memory(linear memory)format(w0: i64, w1: i64, w2: i64, w3: i64) -> i64
The format arguments are the raw 32 bytes split into 4×8-byte chunks
(little-endian). The return value packs the output pointer and output length:
- Success returns
(output_len << 32) | output_ptrwithoutput_ptr != 0. - Failure returns
(error_code << 32) | 0(i.e.output_ptr == 0).
The core crate can optionally ship built-in formatters for its built-in value
encodings. Enable the wasm feature to have
MetaDescribe::describe (which is fallible) attach metadata::value_formatter entries for the
standard encodings. This feature requires the wasm32-unknown-unknown Rust
target at build time because the bundled formatters are compiled to WebAssembly
via the #[value_formatter] proc macro.