Architecture

Internally Crystalline uses an actor-like model where each component has a pool of async tasks and/or worker threads that it uses to handles messages received via channels. These tasks typically have a structure with a main task that manages the lifetime of the task and channel cleanup, along with a pool of worker tasks that process the actual messages received from the channel. The worker tasks may then send additional messages to other components as required.

This design means that there will be many more async tasks running than logical threads, this is where Tokio proves it's worth by scheduling all the tasks appropriately.

Storage

Crystalline stores all data using Tantivy, a full text search engine library.

An index in Crystalline largely consists of just a name and a retention policy, the actual data is stored in one or more buckets. These buckets will be placed in hot, warm, or cold storage depending on their age and the retention policy for the index as pictured below: Inputs, Indices, Buckets

The buckets themselves each contain events for a single 24-hour span in UTC and will be given a directory name based on the range it covers. For example the bucket for 1/1/2024 would have the directory name 1704067200-1704153599 which would in turn contain a Tantivy index with all events that occurred between those two timestamps.

The Tantivy indices themselves have a fixed schema with a minimal set of required fields and an optimised tokeniser to reduce storage requirements and improve performance.

When searching, all buckets for an index within the search period will be opened and searched in parallel and the results merged into a single set. (with an additional range query applied for partial overlaps )

State

Crystalline primarily reacts to events received as HTTP requests before passing them on to the appropriate component for processing, with a few additional operations that run at regular intervals.

Scheduled operations

  • Config state synchronisation
    Crystalline stores all information about inputs, indices, users, etc in a database mounted under /config (or /var/lib/crystalline/db when not in a container), when configuration changes are made via the API they will be persisted to the database and are not applied immediately.

    Running state is maintained by regularly polling the database and updating the running state of the application to match the state in the database.

  • Retention policy evaluation
    Crystalline periodically evaluates the current state of indices against their retention policies and will automatically apply any required changes, such as migration or deletion of buckets to meet the requirements of the retention policy.

  • Garbage collection
    Crystalline periodically runs garbage collection on temporary data such as cached search results