Encodings

TribleSpace stores data in strongly typed values and blobs. An encoding describes the language‑agnostic byte layout for these types: [Inline]s always occupy exactly 32 bytes while [Blob]s may be any length. Encodings translate those raw bytes to concrete application types and decouple persisted data from a particular implementation. This separation lets you refactor to new libraries or frameworks without rewriting what's already stored or coordinating live migrations. The crate ships with a collection of ready‑made encodings located in triblespace::core::inline::encodings and triblespace::core::blob::encodings.

When data crosses the FFI boundary or is consumed by a different language, the encoding is the contract both sides agree on. Consumers only need to understand the byte layout and identifier to read the data—they never have to link against your Rust types. Likewise, the Rust side can evolve its internal representations—add helper methods, change struct layouts, or introduce new types—without invalidating existing datasets.

Why 32 bytes?

Storing arbitrary Rust types requires a portable representation. Instead of human‑readable identifiers like RDF's URIs, Tribles uses a fixed 32‑byte array for all values. This size provides enough entropy to embed intrinsic identifiers—typically cryptographic hashes—when a value references data stored elsewhere in a blob. Keeping the width constant avoids platform‑specific encoding concerns and makes it easy to reason about memory usage.

Conversion traits

Conversion goes through the Encodes<Source> trait, which lives on the encoding (the encoding is the impl target; the source is the trait parameter). This is the same direction as std's From<T> — and for the same reason: it trivially satisfies Rust's orphan rule, so you can write impl Encodes<SomeForeignType> for MyLocalEncoding without any "trait position 0" gymnastics.

The ergonomic source-side methods .to_inline() / .to_blob() / .into_encoded() are auto-derived blanket implementations — users never implement them directly, the same way you never implement Into<T> in Rust:

User implements:     Auto-derived via blanket:
  Encodes<T> for S     IntoEncoded<S> for T  (+ IntoInline / IntoBlob aliases)

For fallible conversions where the error type is part of the contract (parsing a hex string into a hash, validating a timestamp range, rejecting reserved bits), use TryToInline / TryFromInline — kept as separate traits because the error type is per‑source.

#![allow(unused)]
fn main() {
use triblespace::core::inline::encodings::shortstring::ShortString;
use triblespace::core::inline::{TryFromInline, TryToInline, Inline};

struct Username(String);

impl TryToInline<ShortString> for Username {
    type Error = &'static str;

    fn try_to_inline(self) -> Result<Inline<ShortString>, Self::Error> {
        if self.0.is_empty() {
            Err("username must not be empty")
        } else {
            self.0
                .as_str()
                .try_to_inline()
                .map_err(|_| "username too long or contains NULs")
        }
    }
}

impl TryFromInline<'_, ShortString> for Username {
    type Error = &'static str;

    fn try_from_inline(value: &Inline<ShortString>) -> Result<Self, Self::Error> {
        String::try_from_inline(value)
            .map(Username)
            .map_err(|_| "invalid utf-8 or too long")
    }
}
}

Encoding identifiers

Every encoding declares a unique 128‑bit identifier, accessible via the MetaDescribe::id method (for example, ShortString::id()). Persisting these IDs keeps serialized data self describing so other tooling can make sense of the payload without linking against your Rust types. Dynamic language bindings (like the Python crate) inspect the stored encoding identifier to choose the correct decoder, while internal metadata stored inside Trible Space can use the same IDs to describe which encoding governs a value, blob, or hash protocol.

Identifiers also make it possible to derive deterministic attribute IDs when you ingest external formats. Wrap the source field name in an entity-core fragment — Attribute::<S>::from(entity!{ metadata::name: <name handle>, metadata::value_encoding: <S as MetaDescribe>::id() }) — to combine the encoding ID with the source field name and produce a stable attribute so re-importing the same data always targets the same column. The attributes! macro applies the same derivation when you omit the 128-bit id literal, which is useful for quick experiments or internal attributes; for encodings that will be shared across binaries or languages prefer explicit ids so the column remains stable even if the attribute name later changes.

Built‑in inline encodings

The crate provides the following inline encodings out of the box:

  • GenId – an abstract 128 bit identifier.
  • ShortString – a UTF-8 string up to 32 bytes.
  • U256BE / U256LE – 256-bit unsigned integers.
  • I256BE / I256LE – 256-bit signed integers.
  • R256BE / R256LE – 256-bit rational numbers.
  • F64 – IEEE-754 double-precision floating point number (little-endian).
  • F256BE / F256LE – 256-bit floating point numbers.
  • Hash and Handle – cryptographic digests and blob handles (see hash.rs).
  • ED25519RComponent, ED25519SComponent and ED25519PublicKey – signature fields and keys.
  • NsTAIInterval to encode time intervals.
  • Boolean – all-zero for false, all-0xFF for true.
  • LineLocation – a (start_line, start_col, end_line, end_col) span encoded as four big-endian u64 values.
  • RangeU128 – a half-open (start, end) range of two big-endian u128 values.
  • RangeInclusiveU128 – an inclusive (start, end) range of two big-endian u128 values.
  • UnknownInline as a fallback when no specific encoding is known.
#![allow(unused)]
fn main() {
use triblespace::prelude::*;
use triblespace::core::metadata::MetaDescribe;
use triblespace::core::inline::encodings::shortstring::ShortString;
use triblespace::core::inline::{IntoInline, InlineEncoding};

let v: Inline<ShortString> = "hi".to_inline();
let raw_bytes = v.raw; // Persist alongside the encoding's metadata id.
let encoding_id = ShortString::id(); // derived via describe(&mut scratch).root()
}

Built‑in blob encodings

The crate also ships with these blob encodings:

  • LongString for arbitrarily long UTF‑8 strings.
  • RawBytes for opaque file-backed byte payloads.
  • SimpleArchive which stores a raw sequence of tribles.
  • SuccinctArchiveBlob which stores the SuccinctArchive index type for offline queries. The SuccinctArchive helper exposes high-level iterators while the SuccinctArchiveBlob encoding is responsible for the serialized byte layout.
  • WasmCode for WebAssembly bytecode stored as a blob.
  • UnknownBlob for data of unknown type.
#![allow(unused)]
fn main() {
use triblespace::core::metadata::MetaDescribe;
use triblespace::core::blob::encodings::longstring::LongString;
use triblespace::core::blob::{Blob, BlobEncoding, IntoBlob};

let b: Blob<LongString> = "example".to_blob();
let encoding_id = LongString::id(); // derived via describe(&mut scratch).root()
}

Both value and blob encodings can emit optional discovery metadata. Calling MetaDescribe::describe returns a rooted Fragment (exporting the encoding id) whose facts tag the encoding entity with metadata::KIND_INLINE_ENCODING or metadata::KIND_BLOB_ENCODING and may attach a metadata::name and metadata::description (LongString handles). Persist the description blobs alongside the metadata tribles if you want the text to remain readable.

Choosing the right encoding

When defining an attribute, the encoding determines how the 32-byte value slot is interpreted. Use this decision tree to pick the right one:

What are you storing?
│
├─ A reference to another entity?
│  └─ GenId
│
├─ A tag, category, or enum-like classifier?
│  └─ metadata::tag (GenId) — tags are entities with their own ID.
│     Use metadata::name to give them a human-readable label.
│     ⚠ Do NOT define a separate ShortString tag attribute —
│     use the canonical metadata::tag and mint tag IDs.
│
├─ A short label or display name?
│  ├─ Fits in 32 bytes (≤32 UTF-8 bytes)?
│  │  └─ ShortString
│  └─ Longer text?
│     └─ Handle<LongString>  (blob)
│
├─ A number?
│  ├─ Integer
│  │  ├─ Fits in 64 bits? → U256BE (zero-extended) or custom u64 encoding
│  │  └─ Needs full 256 bits? → U256BE / I256BE
│  ├─ Floating point
│  │  ├─ Standard double? → F64
│  │  └─ Extended precision? → F256BE
│  └─ Rational? → R256
│
├─ A timestamp or time range?
│  └─ NsTAIInterval
│
├─ A cryptographic value?
│  ├─ Content hash? → Hash<Blake3>
│  ├─ Reference to a blob? → Handle<BlobEncoding>
│  └─ Signature? → ED25519RComponent / ED25519SComponent / ED25519PublicKey
│
├─ A file or binary payload?
│  └─ Handle<RawBytes>  (blob)
│
├─ A large structured dataset?
│  └─ Handle<SimpleArchive>  (blob, stores a TribleSet)
│
└─ Something else?
   ├─ Fits in 32 bytes? → define a custom InlineEncoding
   └─ Larger? → define a custom BlobEncoding + use Handle

Rules of thumb:

  • If two values should be joinable (appear in the same query variable), they must share an encoding. Choose the most specific encoding that covers both uses.
  • Prefer ShortString over LongString when the text fits — inline values avoid a blob lookup.
  • Use GenId for relationships between entities. Never store entity references as strings.
  • When in doubt between an inline encoding and a blob, ask: "will I ever want to query or join on this directly?" If yes, it should be inline. If it's opaque content you just retrieve, use a blob handle.

Defining new encodings

Custom formats implement [InlineEncoding] or [BlobEncoding]. A unique identifier serves as the encoding ID. The example below defines a little-endian u64 inline encoding and a simple blob encoding for arbitrary bytes.


pub struct U64LE;

impl MetaDescribe for U64LE {
    fn describe() -> triblespace::core::trible::Fragment {
        let id: Id = id_hex!("0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A0A");
        entity! { ExclusiveId::force_ref(&id) @
            metadata::name: "u64le",
            metadata::tag:  metadata::KIND_INLINE_ENCODING,
        }
    }
}

impl InlineEncoding for U64LE {
    type ValidationError = Infallible;
    type Encoding = Self;
}

impl Encodes<u64> for U64LE
{
    type Output = Inline<U64LE>;
    fn encode(source: u64) -> Inline<U64LE> {
        let mut raw = [0u8; INLINE_LEN];
        raw[..8].copy_from_slice(&source.to_le_bytes());
        Inline::new(raw)
    }
}

impl TryFromInline<'_, U64LE> for u64 {
    type Error = std::convert::Infallible;
    fn try_from_inline(v: &Inline<U64LE>) -> Result<Self, std::convert::Infallible> {
        Ok(u64::from_le_bytes(v.raw[..8].try_into().unwrap()))
    }
}

pub struct BytesBlob;

impl MetaDescribe for BytesBlob {
    fn describe() -> triblespace::core::trible::Fragment {
        let id: Id = id_hex!("B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0B0");
        entity! { ExclusiveId::force_ref(&id) @
            metadata::name: "bytesblob",
            metadata::tag:  metadata::KIND_BLOB_ENCODING,
        }
    }
}

impl BlobEncoding for BytesBlob {}

impl Encodes<Bytes> for BytesBlob
{
    type Output = Blob<BytesBlob>;
    fn encode(source: Bytes) -> Blob<BytesBlob> {
        Blob::new(source)
    }
}

impl TryFromBlob<BytesBlob> for Bytes {
    type Error = Infallible;
    fn try_from_blob(b: Blob<BytesBlob>) -> Result<Self, Self::Error> {
        Ok(b.bytes)
    }
}

See examples/custom_schema.rs for the full source.

Versioning and evolution

Schemas form part of your persistence contract. When evolving them consider the following guidelines:

  1. Prefer additive changes. Introduce a new encoding identifier when breaking compatibility. Consumers can continue to read the legacy data while new writers use the replacement ID.
  2. Annotate data with migration paths. Store both the encoding ID and a logical version number if the consumer needs to know which rules to apply. UnknownInline/UnknownBlob allow you to safely defer decoding until a newer binary is available.
  3. Keep validation centralized. Place invariants in your encoding conversions so migrations cannot accidentally create invalid values.

By keeping encoding identifiers alongside stored values and blobs you can roll out new representations incrementally: ship readers that understand both IDs, update your import pipelines, and finally switch writers once everything recognizes the replacement encoding.

Inline formatters (WASM)

Binary formats are great for portability and performance, but they can be painful to inspect if you don’t know the encoding ahead of time. TribleSpace supports an optional encoding-level formatter mechanism: an inline encoding can point to a small sandboxed WebAssembly module that turns its raw 32 bytes into a human-readable string.

The formatter is stored as a blob (blobencodings::WasmCode) and referenced from the encoding identifier entity via the metadata attribute metadata::value_formatter.

The built-in runner lives behind the wasm feature flag (enabled by default in the triblespace facade crate) and uses wasmi with tight limits (fuel, memory pages, output size). Modules must not import anything and use the following minimal ABI:

  • memory (linear memory)
  • format(w0: i64, w1: i64, w2: i64, w3: i64) -> i64

The format arguments are the raw 32 bytes split into 4×8-byte chunks (little-endian). The return value packs the output pointer and output length:

  • Success returns (output_len << 32) | output_ptr with output_ptr != 0.
  • Failure returns (error_code << 32) | 0 (i.e. output_ptr == 0).

The core crate can optionally ship built-in formatters for its built-in value encodings. Enable the wasm feature to have MetaDescribe::describe (which is fallible) attach metadata::value_formatter entries for the standard encodings. This feature requires the wasm32-unknown-unknown Rust target at build time because the bundled formatters are compiled to WebAssembly via the #[value_formatter] proc macro.