Query Engine
Queries describe the patterns you want to retrieve. The engine favors extreme
simplicity and aims for predictable latency and skew resistance without any
tuning. Every constraint implements the
Constraint
trait so operators, sub-languages and
even alternative data sources can compose cleanly.
Queries as Schemas
You might notice that trible.space does not define a global ontology or schema
beyond associating attributes with a
ValueSchema
or
BlobSchema
. This is deliberate. The semantic web
taught us that per-value typing, while desirable, was awkward in RDF: literal
datatypes are optional, custom types need globally scoped IRIs and there is no
enforcement, so most data degenerates into untyped strings. Trying to regain
structure through global ontologies and class hierarchies made schemas rigid
and reasoning computationally infeasible. Real-world data often arrives with
missing, duplicate or additional fields, which clashes with these global,
class-based constraints.
Our approach is to be sympathetic to edge cases and have the system deal only with the data it declares capable of handling. These application-specific schema declarations are exactly the shapes and constraints expressed by our queries1. Data not conforming to these queries is simply ignored by definition, as a query only returns data satisfying its constraints.2
Join Strategy
The query engine uses the Atreides family of worst-case optimal join algorithms. These algorithms leverage cardinality estimates to guide a depth-first search over variable bindings, providing skew-resistant and predictable performance. For a detailed discussion, see the Atreides Join chapter.
Query Languages
Instead of a single query language, the engine exposes small composable
constraints that combine with logical operators such as and
and or
. These
constraints are simple yet flexible, enabling a wide variety of operators while
still allowing the engine to explore the search space efficiently.
The query engine and data model are flexible enough to support many query styles, including graph, relational and document-oriented queries.
For example, the namespace
module offers macros that
generate constraints for a given trible pattern in a query-by-example style
reminiscent of SPARQL or GraphQL but tailored to a document-graph data model.
It would also be possible to layer a property-graph language like Cypher or a
relational language like Datalog on top of the engine.3
Great care has been taken to ensure that query languages with different styles and semantics can coexist and even be mixed with other languages and data models within the same query. For practical examples of the current facilities, see the Query Language chapter.
Note that this query-schema isomorphism isn't necessarily true in all databases or query languages, e.g., it does not hold for SQL.
In RDF terminology: We challenge the classical A-Box & T-Box dichotomy by replacing the T-Box with a "Q-Box", which is descriptive and open rather than prescriptive and closed. This Q-Box naturally evolves with new and changing requirements, contexts and applications.
SQL would be a bit more challenging, as it is surprisingly imperative with its explicit JOINs and ORDER BYs, and its lack of a clear declarative semantics. This makes it harder to implement on top of a constraint-based query engine tailored towards a more declarative and functional style.