Docker

Crystalline is primarily made to be deployed to a container runtime such as Docker or Podman.

The container is built with 3 mountpoints to be configured, These are:

/data - This is where Crystalline will store all log events and where the index files are stored.
/config - This is where Crystalline will look for and save its configuration database.
/cache - This is where Crystalline will store temporary data such as search results, this is usually fine to be a tmpfs mountpoint; but if individual searches produce a very large number of results then you may want to consider using a persistent storage device for this directory.

Example compose file

version: "3"
services:
  crystalline:
    image: codeberg.org/kryesh/crystalline:latest
    hostname: crystalline
    restart: unless-stopped
    volumes:
      - crystalline_data:/data
      - crystalline_config:/config
      - type: tmpfs
        target: /cache
    ports:
      - 8080:8080
    environment:
      - CRYSTALLINE_BATCH_SIZE=500
      - CRYSTALLINE_PARALLEL_BUCKET_READERS=4
      - CRYSTALLINE_PARALLEL_BUCKET_WRITERS=4
      - CRYSTALLINE_SCALE_FACTOR=4
      - CRYSTALLINE_WRITER_WORKER_MEM=16
      - CRYSTALLINE_WRITER_WORKER_THREADS=4
      - CRYSTALLINE_IDLE_WRITER_TIMEOUT=5
      - CRYSTALLINE_COMMIT_INTERVAL=5
      - CRYSTALLINE_SEARCH_JOB_TTL=7200
volumes:
  crystalline_data:
  crystalline_config:

Bootstrap

Crystalline can optionally be provided with a bootstrap config file. This is a JSON file that contains the configuration for your application. It allows you to define things like:

Indices
Inputs

If Crystalline starts and find a database that has not been initialised, it will check the CRYSTALLINE_BOOTSTRAP_CONFIG environment variable for a path to a bootstrap config file. If this is set, Crystalline will use this to initialise the database before starting up.

Example Bootstrap Config File

This example shows how you can define an index and input in your bootstrap config file (note that the uuid for the input index field must match the id of an index):

{
    "indices": [
        {
            "id": "65e636bb-dfa5-48b1-a827-162ab1b2f816",
            "name": "syslog",
            "storage_type": "Directory",
            "retention": {
                "Simple": {
                    "buckets": 30
                }
            }
        }
    ],
    "inputs": [
        {
            "id": "7475e32f-ca86-4324-8945-4fd23401d9ec",
            "label": "syslog",
            "index": "65e636bb-dfa5-48b1-a827-162ab1b2f816",
            "token_required": false
        }
    ]
}

Tiered Storage

Crystalline divides indexes into 3 tiers:

Hot tier: Index data that is frequently accessed
Warm tier: Index data that is less frequently accessed but still needs to be available for search
Cold tier: Index data that is rarely accessed

Crystalline will use directories under the /data mountpint to store index data. The default configuration uses the following directory structure:

/data
├── hot
│   ├── index_a
│   └── index_b
├── warm
│   ├── index_a
│   └── index_b
└── cold
    ├── index_a
    └── index_b

If you want to use different storage for each of these tiers, then instead of attaching storage to /data, you can instead attach it to the appropriate hot, warm, or cold directory.

Ingesting Events

Events are ingested into Crystalline via input configurations. Once configured, events can be ingested via the following API endpoints:

/ingest/<input_uuid>/json - Accepts newline delimited JSON objects
/ingest/<input_uuid>/raw - Accepts newline delimited raw string events
/ingest/<input_uuid>/multiline - Accepts all lines of received text as a single event

Log sources

Fluent Bit

Fluent Bit

Logs can be sent to Crystalline via the Fluent Bit http output plugin.

Unauthenticated

Example fluentbit output stanza for an input that doesn't require a token using the Crystalline default timestamp configuration:

[OUTPUT]
  Name http
  Match *
  URI /api/ingest/<input_uuid>/json
  Host crystalline
  Port 8080
  tls Off
  compress gzip
  Format json_lines
  Json_date_key timestamp
  Json_date_format epoch
  log_response_payload false

Using an input token

Example fluentbit output stanza for an input that requires a token using the Crystalline default timestamp configuration:

[OUTPUT]
  Name http
  Match *
  URI /api/ingest/<input_uuid>/json
  Header X-Crystalline-Token <input_token>
  Host crystalline
  Port 8080
  tls Off
  compress gzip
  Format json_lines
  Json_date_key timestamp
  Json_date_format epoch
  log_response_payload false

Parameters

Runtime parameters for crystalline are defined via environmeent variables. The following table lists the available parameters and their default values:

Parameter	Default	Description
CRYSTALLINE_DB_DIR	`/config` (container) `/etc/crystalline` (host)	Directory containing the configuration database
CRYSTALLINE_DATA_DIR	`/data` (container) `/var/lib/crystalline/data` (host)	Directory containing stored data and index contents
CRYSTALLINE_CACHE_DIR	`/cache` (container) `/var/lib/crystalline/cache` (host)	Directory containing the cache files.
CRYSTALLINE_HTTP_IP	`0.0.0.0`	IP address to bind the HTTP server to
CRYSTALLINE_HTTP_PORT	`8080`	Port number to bind the HTTP server to
CRYSTALLINE_BATCH_SIZE	`500`	Number of documents to process in a single batch
CRYSTALLINE_SCALE_FACTOR	Number of available CPUs	Scaling factor for parallel processing
CRYSTALLINE_PARALLEL_BUCKET_READERS	`2`	Number of buckets that can be online for reading at the same time
CRYSTALLINE_PARALLEL_BUCKET_WRITERS	`2`	Number of buckets that can be online for writing at the same time
CRYSTALLINE_WRITER_WORKER_MEM	`16`	The amount of memory (in MB) to use per bucket writer worker thread.
CRYSTALLINE_WRITER_WORKER_THREADS	`2`	Number of threads a currently writing bucket can use for parallel processing.
CRYSTALLINE_IDLE_READER_TIMEOUT	`10`	(Not currently used) Timeout in seconds after which an idle bucket readeris closed.
CRYSTALLINE_IDLE_WRITER_TIMEOUT	`5`	Timeout in seconds after which an idle bucket writer is closed.
CRYSTALLINE_COMMIT_INTERVAL	`5`	Time in seconds before forcing bucket writers to commit their changes to disk.
CRYSTALLINE_SEARCH_JOB_TTL	`7200` (2 hours)	Time in seconds after which a search job is considered expired and will be have its results deleted from the cache.
CRYSTALLINE_BOOTSTRAP_CONFIG	Not Set	Optionally set this to the path of a bootstrap JSON file to initialise the application configuration

Indices

Index configuration parameters:

Parameter	Description
id	A `uuid` that uniquely identifies the index.
name	A human readable name for the index.
storage_type	The type of storage to use for this index. Currently `Directory` is the ony supported value.
retention	The retention policy for this index. See Retention Policies below for more details.

Retention Policies

Retention policies define how long data is persisted in an index, and which storage tiers are used to store that data.

A bucket refers to a 24-hour period of time in UTC, so the numbers configured in a retention policy map to days of retention.

The bucket for the current 24-hour is always hot and cannot be deleted or moved to the warm or cold tiers. This means that retention policies refer to the number of previous days to keep, not the total number of days to keep.

Simple

The Simple retention policy contains a single parameter buckets which defines the number of buckets that will be kept. All buckets will be kep on the hot storage tier.

A value of 0 means that all buckets are kept in hot storage indefinitely.

HotWarm

The HotWarm retention policy contains two parameters:

primary - The number of buckets to keep on the hot storage tier.
secondary - The number of buckets to keep on the warm storage tier.

A value of 0 for the primary parameter means that buckets are immediately moved to the warm storage tier when they are no longer the current bucket.

A value of 0 for the secondary parameter means that all buckets are kept indefinitely on the warm storage tier.

HotWarmCold

The HotWarmCold retention policy contains three parameters:

primary - The number of buckets to keep on the hot storage tier.
secondary - The number of buckets to keep on the warm storage tier.
tertiary - The number of buckets to keep on the cold storage tier.

A value of 0 for the primary parameter means that buckets are immediately moved to the warm storage tier when they are no longer the current bucket.

A value of 0 for the secondary parameter means that buckets are immediately moved to the cold storage tier.

A value of 0 for the tertiary parameter means that all buckets are kept indefinitely on the cold storage tier.

Inputs

Input configuration parameters:

Parameter	Description	Default value
id	A `uuid` that uniquely identifies the input.	Not set
label	A human-readable name for the input.	Not set
index	The `uuid` of the index that this input will send events to	Not set
token_required	A boolean value indicating whether a token is required to access the input. If true, the input will only accept events with a valid token.	Not set
token	A string containing the token that should be used for authentication when sending events to this input. This field is optional and can be left blank if `token_required` is false.	Not set
time_extractor	The time extractor to use for this input. See Time Extractors for more information.	`{ "field": "timestamp", "format": "%s" }`

Time Extractors

Time extractors are used to extract a timestamp from a given log message. The extracted timestamp is then used for sorting and retention purposes.

There are 3 types of time extractors:

FieldTimeExtractor - extracts the timestamp from a JSON field in the log message using the provided format string.
RegexTimeExtractor - extracts the timestamp from the log message using a regular expression, and then parses it using the provided format string.
RegexFieldTimeExtractor - extracts the timestamp from a JSON field in the log message using a regular expression, and then parses it using the provided format string.

If an extractor fails to extract a timestamp from a log message, then it will be assigned the current time as its timestamp.

Field Time Extractor

The FieldTimeExtractor is used to extract a timestamp from a JSON field in the log message using the provided format string. If no Time Extractor is specified for an input, the default is to use this type with the following configuration:

field: timestamp
format: %s

This means that by default, the extractor will look for a JSON field called timestamp, and attempt to parse it as a Unix timestamp with seconds precision. If no such field is found, or if parsing fails, then the current time will be used as the timestamp.

Regex Time Extractor

The RegexTimeExtractor is used to extract a timestamp from the log message using a regular expression, and then parse it using the provided format string. This type of extractor can be useful for parsing timestamps that are not in JSON format, or for which the field name is not known in advance.

It has two parameeters to configure:

regex: A regular expression pattern used to match and extract the timestamp from the log message. The timestamp must be captured by a named group called timestamp.
format: A format string used to parse the extracted timestamp.

Regex Field Time Extractor

The RegexFieldTimeExtractor is used to extract a timestamp from a JSON field in the log message using a regular expression, and then parse it using the provided format string. This type of extractor can be useful for parsing timestamps that have additional characters or formatting around them, such as quotes or brackets.

It has three parameters to configure:

field: The name of the JSON field containing the timestamp.
regex: A regular expression pattern used to match and extract the timestamp from the field value. The timestamp must be captured by a named group called timestamp.
format: A format string used to parse the extracted timestamp.

Command Types

There are 2 primary types of commands, and a few sub-types.

Source commands produce a stream of events on their own. They must be used as the first command in a search.
Stage commands take a stream of events from another command and transform it into a new stream of events. They cannot be used as the first command in a search.
- Aggregation some stage commands aggregate multiple events into a single event. these commands may block the stream until they have enough data to produce an output event.

Source Commands

Source commands produce a stream of events on their own.

They must be used as the first command in a search.

select

The select command is used to scan raw events from indices based on keywords or terms. It will likely be the most common command you use and will be the first command in most searches.

It will automatically check if events are valid JSON and parse the 1st level of keys as fields. It will also run any per-index extraction logic to extract additional fields from the event.

Syntax

The select command may be followed by multiple index arguments structured like this:

systemd(foo bar)

The search terms are a space separated list of keywords or terms to match against the raw event data. An event must contain all of the search terms in order for it to be returned. For the example above, the systemd index will be searched and only events containing both foo and bar tokens will be returned.

The select command will automatically break the contents between () into appropriate tokens, so you can use spaces or not as you see fit. It will also automatically strip symbols from the search terms with the exception that any nested open parentheses must also be followed by correpsonding closing parentheses.

An index argument that contains a quoted string will only match events containing that exact phrase, symbols included. For example:

systemd("foo:bar")

will only return events containing the exact text foo:bar, while

systemd(foo:bar)

will return any event containing both foo and bar as it will be broken into two separate tokens and remove the symbol.

Duplicate index arguments

If a select command contains multiple instances of the same index argument, each argument will proccessed independently resulting in duplicate events being returned. For example:

| select systemd(foo bar) systemd(foo bar)

will return all events containing foo and bar twice.

Because of this behaviour, be careful when using mulitiple index arguments for the same index as it may result in duplicated events being returned if events are matched by multiple key words or terms.

Example

| select systemd(foo bar) syslog("complex_literal:with-other.stuff")

rawselect

The rawselect command behaves almost identically like the select command, but without performing any additional processing such as JSON detection or additional field extraction.

Source Commands

Stage commands take a stream of events from another command and transform it into a new stream of events.

They cannot be used as the first command in a search.

match

The match command is used to limit results to those that meet certain criteria. There can be multiple criterion, and all must be met for a result to be included in the output.

For fields that have multiple values, if any of the values match the criteria then the result will be included in the output.

Syntax

The match command accepts one or more expressions in the following format:

field=<expression>

The format accepts either = or != to indicate whether a field should match (or not) an expression.

The expression can be any of the following:

A string, denomiated by double quotes ("). This will match results where the field value is exactly equal to the provided string:
```
field="value"
```
A regular expression, denoted by forward slashes (/):
```
field=/regex/
```
The regular expression must be in the format used by the rust regex crate here.
An identifier, which can be used to compare the values of two fields. This is useful for comparing a field with a value from another field in the same result:
```
field1=field2
```
A glob expression, denoted as either an identifier or a string either appended or prepended with an asterisk (*). This will match results where the field value either begins or ends with the provided string. There must be exaclty one * in the expression and it can only appear at the beginning or end of the expression:
```
field=*value
field="*value"
field=value*
field="value*"
```
A wildcard (*), which will match any value for that field. This is useful when you want to check if the field exists, but don't care about its value:
```
field=*
```

Combining expressions

Multiple expressions can be combined using common boolean operations via and, or, xor and not. If no operator is specified for multiple expressions then they will be combined with an implicit and operation.

Expressions also support grouping using parentheses () to specifiy the order or grouping of operations, there is no guarantee of order of evaluation otherwise.

Example

field foo contains either the string bar, or has the same value as field baz:

| match foo="bar" OR foo=baz

field foo starts with bar and ends with baz (note that and is implicit here):

| match foo="bar*" foo="*baz"

field foo contains the string bar or baz, or the field x exists:

| match (foo=/bar/ OR foo=/baz/) OR x=*

field foo does not contain the string bar:

| match NOT foo=/bar/

| match foo!=/bar/

filter

The filter command is very similar to the match command, however rather than only returning events that match an expression, it instead returns all events; but only retains values of fields that match the given expression.

For example, if you have an event with a field called foo, with the value["bar", "baz"], and run filter foo = "bar", then the resulting event will still contain the field foo, but it's value will be reduced to just ["bar"].

Syntax

The syntax for the filter command is the same as that of the match command, see here for more details.

Combining expressions

Like the match command, the boolean operations and, or, xor and not are still valid; however unlike the match command they are a no-op and will simply execute the nested expressions as filters.

Example

Only retain values of the field foo that match the regular expression /bar/:

| filter foo=/bar/

Only retain values of the field foo that do not match the regular expression /bar/:

| filter foo!=/bar/

rename

The rename command is used to rename fields in events; it can be particularly useful for handling json data where the field names are not valid identiﬁer for search queries.

Syntax

The rename command takes a list of source and destination field names, separated by the to keyword:

| rename <source> to <destination> [<source> to <destination> ...]

<source> can be either an identifier or a quoted string to allow for otherwise invalid names. <destination> must be an identifier and cannot be quoted.

Examples

Rename the invalid field name foo:bar to foobar:

| rename "foo:bar" to foobar

fields

The fields command is used to specifiy the fields that should be present in each event

Syntax

The fields command accepts a list of field names:

| fields foo bar baz

This command will remove all fields from an event except for foo, bar, and baz; if any of these fields are not present in the event, they will be added with a null value.

Example

Only retain the _raw and _time fields for all events:

| fields _raw _time

extract

The extract command is used to extract new fields from existing ones using named capture groups in a regular expression.

Syntax

The extract command accepts multiple argumets in the following format:

field=/(?<new_field>.+)/

With the example above, the extract command will run the regular expression /(?<new_field>.+)/ on each value of the field field. If a match is found, it will either create or append the match to the new_field field.

Example

For an example ssh login event where the message field contains the following:

Accepted password for user from 192.168.0.10 port 60782 ssh2

The following command will extract authentication method, the username and the source IP address into new fields:

| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/

extractjson

The extractjson command is similar to the extract command, but it instead parses the contents of all specifed files as JSON and adds each identiified key-value pair as a new field on the event.

Syntax

The extractjson command accepts a list of field names to attempt to extract JSON from. If any of these fields are present, they will be parsed as JSON and each key-value pair in the resulting object will be added as a new field on the event. The new fields will be prefixed with the name of the original field that was extracted, followed by a _ character.

| extractjson <field> [<field> ...]

Example

For an example event where the field foo is a JSON string with the following value:

{"bar": "baz"}

This command would add a new field called foo_bar to the event, with the value of "baz":

| extractjson foo

eval

The eval command is very flexible, and allows for a wide range of operations to be performed on field values. It can be used to perform mathematical calculations, string manipulation, and more.

Syntax

The eval command accepts arguments with the following structure:

new_field=<expression>

The can be multiple arguments for a given eval command, separated by spaces. The expressions will be evaluated in left-to-right order and subseuqent expressions may refer to fields created by previous expressions.

There are a wide range of functions that can be used within the expression. These include:

conditionals - Control flow based on conditions
encoding - Functions for encoding and decoding data
maths - Mathematical operations
text - String manipulation functions
multivalue - Functions for working with multivalued fields
cryptography - Cryptographic functions such as hashing operations
time - Functions for working with time and dates

As well as the subcommands above, there are also primitive expressions field and literal that can be used to refer to existing fields or literal values respectively.

Eval subcommands that accept arguments can be arbitarily nested, allowing for complex expressions to be built up.

Example

Create a new field foo with the literal value of bar on all events:

| eval foo="bar"

Create a new field baz with the value of whatever the foo field contains on all events:

| eval baz=foo

For an example of using conditionals - take the following search which returns ssh logins from multiple indices:

| select systemd(accepted ssh) syslog(accepted sshd)

The systemd index use capitals for field names while the syslog index uses lowercase. We can use eval to create a consistent set of fields across both indices that we can then extract fields from:

| select systemd(accepted ssh) syslog(accepted sshd)
| eval message=if(MESSAGE=*, MESSAGE, message)
| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/

An example of using nested subcommands to create a new field host from a field fqdn containing the value: hostname.subdomain.domain.tld:

| eval host=mvindex(split(fqdn, "."), 0)

This command splits the fqdn field on each period character and then extracts the first element of that array (the hostname). The result is a new field called host with the value of hostname.

encoding

HexEncode

The hexencode subcommand is used to encode data into hexadecimal format:

| eval encoded=hexencode(<expression>)

HexDecode

The hexdecode subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':

| eval decoded=hexdecode(<expression>)

Base64Encode

The b64encode subcommand is used to encode data into the base 64 format:

| eval encoded=b64encode(<expression>)

Base64Decode

The b64decode subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':

| eval decoded=b64decode(<expression>)

UrlEncode

The urlencode subcommand is used to URL-encode a string. This function converts all values that are not allowed in a URL into their %-delimited hexadecimal representation:

| eval encoded=urlencode(<expression>)

UrlDecode

The urldecode subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':

| eval decoded=urldecode(<expression>)

conditionals

if

The if subcommand has the follow syntax:

if(<match expression>, <eval expression if true>, <eval expression if false>)

<match expression> is an expression in the same format as that of the match command, and <eval expression if true> and <eval expression if false> are expressions to be evaluated if the match expression evaluates to true or false respectively.

Example

Take the following search which returns ssh logins from multiple indices:

| select systemd(accepted ssh) syslog(accepted sshd)

| select systemd(accepted ssh) syslog(accepted sshd)
| eval message=if(MESSAGE=*, MESSAGE, message)
| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/

maths

E

The e subcommand returns 2.718281828459045, Euler's number.

| eval e=e()

Pi

The pi subcommand returns the value of pi (3.141592653589793).

| eval pi=pi()

Sin

The sin subcommand calculates the sine of a number in radians.

| eval sin=sin(<expression>)

Cos

The cos subcommand calculates the cosine of a number in radians.

| eval cos=cos(<expression>)

Tan

The tan subcommand calculates the tangent of a number in radians.

| eval tan=tan(<expression>)

SinH

The sinh subcommand calculates the hypebolic sine of a number in radians.

| eval sinh=sinh(<expression>)

CosH

The cosh subcommand calculates the hypebolic cosine of a number in radians.

| eval cosh=cosh(<expression>)

TanH

The tanh subcommand calculates the hypebolic tangent of a number in radians.

| eval tanh=tanh(<expression>)

ASin

The asin subcommand calculates the arcsine of a number in radians.

| eval asin=asin(<expression>)

ACos

The acos subcommand calculates the arccosine of a number in radians.

| eval acos=acos(<expression>)

ATan

The atan subcommand calculates the arctangent of a number in radians.

| eval atan=atan(<expression>)

ASinH

The asinh subcommand calculates the inverse hypebolic sine of a number in radians.

| eval asinh=asinh(<expression>)

ACosH

The acosh subcommand calculates the inverse hypebolic cosine of a number in radians.

| eval acosh=acosh(<expression>)

ATanH

The atanh subcommand calculates the inverse hypebolic tangent of a number in radians.

| eval atanh=atanh(<expression>)

Exp

The exp subcommand calculates the base e exponential of a number. This is equivalent to e^x.

| eval exp=exp(<expression>)

Ln

The ln subcommand calculates the nautral logarithm (base e) of a number.

| eval ln=ln(<expression>)

Sqrt

The sqrt subcommand calculates the sqaure root of a number.

| eval sqrt=sqrt(<expression>)

Abs

The abs subcommand calculates the apsolute value of a number.

| eval abs=abs(<expression>)

Ceil

The ceil subcommand calculates the ceiling (next highest integer) of a number.

| eval ceil=ceil(<expression>)

Floor

The floor subcommand calculates the floor (next lowest integer) of a number.

| eval floor=floor(<expression>)

Log

The log subcommand calculates the logarithm of a number in the provided base

| eval result=log(<expression>, <base>)

Pow

The pow subcommand raises a number to the power of another number.

| eval result=pow(<expression>, <exponent>)

Nrt

The nrt subcommand calculates the nth root of a number.

| eval result=nrt(<expression>, <degree>)

Add

The add subcommand adds 2 numbers together.

| eval result=add(<expression>, <expression>)

Sub

The sub subcommand sutracts one number from another.

| eval result=sub(<expression>, <expression>)

Mul

The mul subcommand multiplies two numbers.

| eval result=mul(<expression>, <expression>)

Div

The div subcommand divides one number by another.

| eval result=div(<expression>, <expression>)

text

Len

The len subcommand is used to get the length of a string in bytes. This means that for utf-8 encoded strings containin non-ascii characters, the result may not be what you expect.

| eval len=len(<expression>)

Lower

The lower subcommand converts all uppercase letters in a string to lowercase.

| eval lower=lower(<expression>)

Upper

The upper subcommand converts all lowercase letters in a string to uppercase.

| eval upper=upper(<expression>)

Trim

The trim subcommand removes leading and trailing whitespace from a string.

| eval trimmed=trim(<expression>)

Concatenate

The concatenate subcommand concatenates two or more strings into one, you can optionally specify a delimiter to be inserted between the strings with the sep="val" argument.

| eval abc=concat("a", "b" , "c")
| eval a_b=concat("a", "b" , sep="_")

LStrip

The lstrip subcommand removes all characters matching a pattern from the left side of a string until it encounters a character not in the pattern.

NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.

| eval stripped=lstrip(<value expression>, <pattern expression>)

Example removing with a the field foo containing the following value abcFoocba

| eval stripped=lstrip(foo, "abc")

The result will be Foocba.

RStrip

The rstrip subcommand removes all characters matching a pattern from the right side of a string until it encounters a character not in the pattern.

NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.

| eval stripped=rstrip(<value expression>, <pattern expression>)

Example removing with a the field foo containing the following value abcFoocba

| eval stripped=rstrip(foo, "abc")

The result will be abcFoo.

Strip

The strip subcommand removes all characters matching a pattern from the either side of a string until it encounters a character not in the pattern.

NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.

| eval stripped=strip(<value expression>, <pattern expression>)

Example removing with a the field foo containing the following value abcFoocba

| eval stripped=strip(foo, "abc")

The result will be Foo.

Split

The split subcommand will split a string into an array of substrings based on a delimiter.

| eval split=split(<value expression>, <delimiter expression>)

For example splitting up components of an FQDN:

| eval split=split("www.google.com", ".")

This will return a multivalue field with the following values ["www","google","com"].

SubStr

The substr subcommand returns a substring of a string based on a start and end index.

| eval sub=substr(<value expression>, <start index>, <end index>)

For example extracting foo from foobar:

| eval sub=substr("foobar", 0, 3)

Replace

The replace subcommand performs a find and replace on a string.

| eval edited=replace(<value expression>, <find expression>, <replace expression>)

For example replacing foo with bar in the value foobar resulting in barbar:

| eval edited=replace("foobar", "foo", "bar")

multivalue

Min

The mvmin subcommand returns the smallest value of a multivalued field. For a field foo with values [1, 2, 3] this example will set min to 1.

| eval min=mvmin(foo)

Max

The mvmax subcommand returns the largest value of a multivalued field. For a field foo with values [1, 2, 3] this example will set max to 3.

| eval max=mvmax(foo)

Dedup

The mvdedup subcommand returns the contents of a multivalued field with duplicates removed. For a field foo with values [1, 1, 3] this example will set unique to [1, 3].

| eval unique=mvdedup(foo)

Sort

The mvsort subcommand returns the contents of a multivalued field sorted in ascending order. For a field foo with values [3, 1, 2] this example will set sorted to [1, 2, 3].

| eval sorted=mvsort(foo)

Reverse

The mvrev subcommand returns the contents of a multivalued field in reverse order. For a field foo with values [1, 2, 3] this example will set reversed to [3, 2, 1].

| eval reversed=mvrev(foo)

Count

The mvcount subcommand returns the number of values in a multivalued field.

Join

The mvjoin subcommand returns a multivalue field with all the values of the second expresion appended to the first expression.

With field1 containing ["a","b"] and field2 containing ["c","d"], this example command with create a field merged that contains ["a","b","c","d"].

| eval merged = mvjoin(field1, field2)

Index

The mvindex subcommand returns the value at the specified index of a multivalued field.

With field1 containing ["a","b"], this example command with create a field first_value that contains "a".

| eval first_value = mvindex(field1, 0)

Range

The mvrange subcommand returns the values of a multivalued field within a start and end index range.

With field1 containing ["a","b","c"], this example command with create a field subset that contains ["b","c"].

| eval subset = mvrange(field1, 1, 2)

cryptography

Md5

The md5 subcommand is used to calculate the md5 hash of the nested expression:

| eval hash=md5(<expression>)

Sha1

The sha1 subcommand is used to calculate the sha1 hash of the nested expression:

| eval hash=sha1(<expression>)

Sha224

The sha224 subcommand is used to calculate the sha224 hash of the nested expression:

| eval hash=sha224(<expression>)

Sha256

The sha256 subcommand is used to calculate the sha256 hash of the nested expression:

| eval hash=sha256(<expression>)

Sha384

The sha384 subcommand is used to calculate the sha384 hash of the nested expression:

| eval hash=sha384(<expression>)

Sha512

The sha512 subcommand is used to calculate the sha512 hash of the nested expression:

| eval hash=sha512(<expression>)

Sha3_224

The sha3_224 subcommand is used to calculate the sha3_224 hash of the nested expression:

| eval hash=sha3_224(<expression>)

Sha3_256

The sha3_256 subcommand is used to calculate the sha3_256 hash of the nested expression:

| eval hash=sha3_256(<expression>)

Sha3_384

The sha3_384 subcommand is used to calculate the sha3_384 hash of the nested expression:

| eval hash=sha3_384(<expression>)

Sha3_512

The sha3_512 subcommand is used to calculate the sha3_512 hash of the nested expression:

| eval hash=sha3_512(<expression>)

time

Now

The now subcommand returns the current date and time:

| eval now=now()

StrfTime

The strftime subcommand formats a timestamp into a string. The format variables that can be used are listed here.

Example to get the current year:

| eval year=strftime(now(), "%Y")

StrpTime

The strptime subcommand attempts to parse a string into a timestamp. The format variables that can be used are listed here.

The parsing format must include timezone information.

| eval 
    time=strptime("1970-01-01T00:00:00Z", "%+")
    time2=strptime("1970-01-01T00:00:00+0000", "%Y-%m-%dT%H:%M:%S%z")

Aggregation Commands

Some stage commands aggregate multiple events into a single event.

These commands may block the stream until they have enough data to produce an output event.

stats

The stats command is used to calculate statistics over all events in a given search.

Syntax

The stats command is structured as follows:

A list of aggregation functions, that may take a field name as an argument; and may have an alias specified with the as keyword to set the name of the resulting field.
- There must always be at least one aggregation function.
An optional by identifier, followed by at least one field name. All unique permutations of the values of these fields will result in a new aggregation group containing a copy of all specified aggregation functions.

| stats <aggregation>(<field>) as <output-name> by <field1>, <fieldN>

Aggregation Functions

List of available aggregation functions:

Takes a field name argument

sum - Sums the numeric values in the given field.
avg - Calculates the average numeric value of the given field.
min - Finds the smallest value in the given field.
max - Finds the largest value in the given field.
unique - Counts the number of unique values in the given field.
values - Returns a list of all unique values in the given field.

No field name argument

count - Increments a counter for each aggregation group.

Example

Count how many events are associated with each systemd unit:

| select systemd
| stats count() by SYSTEMD_UNIT

will output rows with 2 columns, SYSTEMD_UNIT and count, containing the name of a systemd unit and the number of events associated with it respectively.

Get the number of events + the number of unique systemd units for each host:

| select systemd
| stats count() as event_count, unique(SYSTEMD_UNIT) as unique_units by HOSTNAME

will output rows with 3 columns, HOSTNAME, event_count, and unique_units.