Docker
Crystalline is primarily made to be deployed to a container runtime such as Docker or Podman.
The container is built with 3 mountpoints to be configured, These are:
/data
- This is where Crystalline will store all log events and where the index files are stored./config
- This is where Crystalline will look for and save its configuration database./cache
- This is where Crystalline will store temporary data such as search results, this is usually fine to be a tmpfs mountpoint; but if individual searches produce a very large number of results then you may want to consider using a persistent storage device for this directory.
Example compose file
version: "3"
services:
crystalline:
image: codeberg.org/kryesh/crystalline:latest
hostname: crystalline
restart: unless-stopped
volumes:
- crystalline_data:/data
- crystalline_config:/config
- type: tmpfs
target: /cache
ports:
- 8080:8080
environment:
- CRYSTALLINE_BATCH_SIZE=500
- CRYSTALLINE_PARALLEL_BUCKET_READERS=4
- CRYSTALLINE_PARALLEL_BUCKET_WRITERS=4
- CRYSTALLINE_SCALE_FACTOR=4
- CRYSTALLINE_WRITER_WORKER_MEM=16
- CRYSTALLINE_WRITER_WORKER_THREADS=4
- CRYSTALLINE_IDLE_WRITER_TIMEOUT=5
- CRYSTALLINE_COMMIT_INTERVAL=5
- CRYSTALLINE_SEARCH_JOB_TTL=7200
volumes:
crystalline_data:
crystalline_config:
Bootstrap
Crystalline can optionally be provided with a bootstrap config file. This is a JSON file that contains the configuration for your application. It allows you to define things like:
- Indices
- Inputs
If Crystalline starts and find a database that has not been initialised, it will check the CRYSTALLINE_BOOTSTRAP_CONFIG
environment variable for a path to a bootstrap config file. If this is set, Crystalline will use this to initialise the database before starting up.
Example Bootstrap Config File
This example shows how you can define an index and input in your bootstrap config file (note that the uuid for the input index field must match the id of an index):
{
"indices": [
{
"id": "65e636bb-dfa5-48b1-a827-162ab1b2f816",
"name": "syslog",
"storage_type": "Directory",
"retention": {
"Simple": {
"buckets": 30
}
}
}
],
"inputs": [
{
"id": "7475e32f-ca86-4324-8945-4fd23401d9ec",
"label": "syslog",
"index": "65e636bb-dfa5-48b1-a827-162ab1b2f816",
"token_required": false
}
]
}
Tiered Storage
Crystalline divides indexes into 3 tiers:
- Hot tier: Index data that is frequently accessed
- Warm tier: Index data that is less frequently accessed but still needs to be available for search
- Cold tier: Index data that is rarely accessed
Crystalline will use directories under the /data
mountpint to store index data. The default configuration uses the following directory structure:
/data
├── hot
│ ├── index_a
│ └── index_b
├── warm
│ ├── index_a
│ └── index_b
└── cold
├── index_a
└── index_b
If you want to use different storage for each of these tiers, then instead of attaching storage to /data
, you can instead attach it to the appropriate hot
, warm
, or cold
directory.
Ingesting Events
Events are ingested into Crystalline via input
configurations. Once configured, events can be ingested via the following API endpoints:
/ingest/<input_uuid>/json
- Accepts newline delimited JSON objects/ingest/<input_uuid>/raw
- Accepts newline delimited raw string events/ingest/<input_uuid>/multiline
- Accepts all lines of received text as a single event
Log sources
Fluent Bit
Logs can be sent to Crystalline via the Fluent Bit http
output plugin.
Unauthenticated
Example fluentbit output stanza for an input that doesn't require a token using the Crystalline default timestamp configuration:
[OUTPUT]
Name http
Match *
URI /api/ingest/<input_uuid>/json
Host crystalline
Port 8080
tls Off
compress gzip
Format json_lines
Json_date_key timestamp
Json_date_format epoch
log_response_payload false
Using an input token
Example fluentbit output stanza for an input that requires a token using the Crystalline default timestamp configuration:
[OUTPUT]
Name http
Match *
URI /api/ingest/<input_uuid>/json
Header X-Crystalline-Token <input_token>
Host crystalline
Port 8080
tls Off
compress gzip
Format json_lines
Json_date_key timestamp
Json_date_format epoch
log_response_payload false
Parameters
Runtime parameters for crystalline are defined via environmeent variables. The following table lists the available parameters and their default values:
Parameter | Default | Description |
---|---|---|
CRYSTALLINE_DB_DIR | /config (container)/etc/crystalline (host) | Directory containing the configuration database |
CRYSTALLINE_DATA_DIR | /data (container)/var/lib/crystalline/data (host) | Directory containing stored data and index contents |
CRYSTALLINE_CACHE_DIR | /cache (container)/var/lib/crystalline/cache (host) | Directory containing the cache files. |
CRYSTALLINE_HTTP_IP | 0.0.0.0 | IP address to bind the HTTP server to |
CRYSTALLINE_HTTP_PORT | 8080 | Port number to bind the HTTP server to |
CRYSTALLINE_BATCH_SIZE | 500 | Number of documents to process in a single batch |
CRYSTALLINE_SCALE_FACTOR | Number of available CPUs | Scaling factor for parallel processing |
CRYSTALLINE_PARALLEL_BUCKET_READERS | 2 | Number of buckets that can be online for reading at the same time |
CRYSTALLINE_PARALLEL_BUCKET_WRITERS | 2 | Number of buckets that can be online for writing at the same time |
CRYSTALLINE_WRITER_WORKER_MEM | 16 | The amount of memory (in MB) to use per bucket writer worker thread. |
CRYSTALLINE_WRITER_WORKER_THREADS | 2 | Number of threads a currently writing bucket can use for parallel processing. |
CRYSTALLINE_IDLE_READER_TIMEOUT | 10 | (Not currently used) Timeout in seconds after which an idle bucket readeris closed. |
CRYSTALLINE_IDLE_WRITER_TIMEOUT | 5 | Timeout in seconds after which an idle bucket writer is closed. |
CRYSTALLINE_COMMIT_INTERVAL | 5 | Time in seconds before forcing bucket writers to commit their changes to disk. |
CRYSTALLINE_SEARCH_JOB_TTL | 7200 (2 hours) | Time in seconds after which a search job is considered expired and will be have its results deleted from the cache. |
CRYSTALLINE_BOOTSTRAP_CONFIG | Not Set | Optionally set this to the path of a bootstrap JSON file to initialise the application configuration |
Indices
Index configuration parameters:
Parameter | Description |
---|---|
id | A uuid that uniquely identifies the index. |
name | A human readable name for the index. |
storage_type | The type of storage to use for this index. Currently Directory is the ony supported value. |
retention | The retention policy for this index. See Retention Policies below for more details. |
Retention Policies
Retention policies define how long data is persisted in an index, and which storage tiers are used to store that data.
A bucket
refers to a 24-hour period of time in UTC, so the numbers configured in a retention policy map to days of retention.
The bucket for the current 24-hour is always hot and cannot be deleted or moved to the warm or cold tiers. This means that retention policies refer to the number of previous days to keep, not the total number of days to keep.
Simple
The Simple
retention policy contains a single parameter buckets
which defines the number of buckets that will be kept. All buckets will be kep on the hot storage tier.
A value of 0
means that all buckets are kept in hot storage indefinitely.
HotWarm
The HotWarm
retention policy contains two parameters:
primary
- The number of buckets to keep on the hot storage tier.secondary
- The number of buckets to keep on the warm storage tier.
A value of 0
for the primary
parameter means that buckets are immediately moved to the warm storage tier when they are no longer the current bucket.
A value of 0
for the secondary
parameter means that all buckets are kept indefinitely on the warm storage tier.
HotWarmCold
The HotWarmCold
retention policy contains three parameters:
primary
- The number of buckets to keep on the hot storage tier.secondary
- The number of buckets to keep on the warm storage tier.tertiary
- The number of buckets to keep on the cold storage tier.
A value of 0
for the primary
parameter means that buckets are immediately moved to the warm storage tier when they are no longer the current bucket.
A value of 0
for the secondary
parameter means that buckets are immediately moved to the cold storage tier.
A value of 0
for the tertiary
parameter means that all buckets are kept indefinitely on the cold storage tier.
Inputs
Input configuration parameters:
Parameter | Description | Default value |
---|---|---|
id | A uuid that uniquely identifies the input. | Not set |
label | A human-readable name for the input. | Not set |
index | The uuid of the index that this input will send events to | Not set |
token_required | A boolean value indicating whether a token is required to access the input. If true, the input will only accept events with a valid token. | Not set |
token | A string containing the token that should be used for authentication when sending events to this input. This field is optional and can be left blank if token_required is false. | Not set |
time_extractor | The time extractor to use for this input. See Time Extractors for more information. | { "field": "timestamp", "format": "%s" } |
Time Extractors
Time extractors are used to extract a timestamp from a given log message. The extracted timestamp is then used for sorting and retention purposes.
There are 3 types of time extractors:
FieldTimeExtractor
- extracts the timestamp from a JSON field in the log message using the provided format string.RegexTimeExtractor
- extracts the timestamp from the log message using a regular expression, and then parses it using the provided format string.RegexFieldTimeExtractor
- extracts the timestamp from a JSON field in the log message using a regular expression, and then parses it using the provided format string.
If an extractor fails to extract a timestamp from a log message, then it will be assigned the current time as its timestamp.
Field Time Extractor
The FieldTimeExtractor
is used to extract a timestamp from a JSON field in the log message using the provided format string. If no Time Extractor is specified for an input, the default is to use this type with the following configuration:
field
:timestamp
format
:%s
This means that by default, the extractor will look for a JSON field called timestamp
, and attempt to parse it as a Unix timestamp with seconds
precision. If no such field is found, or if parsing fails, then the current time will be used as the timestamp.
Regex Time Extractor
The RegexTimeExtractor
is used to extract a timestamp from the log message using a regular expression, and then parse it using the provided format string. This type of extractor can be useful for parsing timestamps that are not in JSON format, or for which the field name is not known in advance.
It has two parameeters to configure:
regex
: A regular expression pattern used to match and extract the timestamp from the log message. The timestamp must be captured by a named group calledtimestamp
.format
: A format string used to parse the extracted timestamp.
Regex Field Time Extractor
The RegexFieldTimeExtractor
is used to extract a timestamp from a JSON field in the log message using a regular expression, and then parse it using the provided format string. This type of extractor can be useful for parsing timestamps that have additional characters or formatting around them, such as quotes or brackets.
It has three parameters to configure:
field
: The name of the JSON field containing the timestamp.regex
: A regular expression pattern used to match and extract the timestamp from the field value. The timestamp must be captured by a named group calledtimestamp
.format
: A format string used to parse the extracted timestamp.
Command Types
There are 2 primary types of commands, and a few sub-types.
- Source commands produce a stream of events on their own. They must be used as the first command in a search.
- Stage commands take a stream of events from another command and transform it into a new stream of events. They cannot be used as the first command in a search.
- Aggregation some stage commands aggregate multiple events into a single event. these commands may block the stream until they have enough data to produce an output event.
Source Commands
Source commands produce a stream of events on their own.
They must be used as the first command in a search.
select
The select
command is used to scan raw events from indices based on keywords or terms. It will likely be the most common command you use and will be the first command in most searches.
It will automatically check if events are valid JSON and parse the 1st level of keys as fields. It will also run any per-index extraction logic to extract additional fields from the event.
Syntax
The select
command may be followed by multiple index arguments structured like this:
systemd(foo bar)
The search terms are a space separated list of keywords or terms to match against the raw event data. An event must contain all of the search terms in order for it to be returned.
For the example above, the systemd
index will be searched and only events containing both foo
and bar
tokens will be returned.
The select
command will automatically break the contents between ()
into appropriate tokens, so you can use spaces or not as you see fit. It will also automatically strip symbols from the search terms with the exception that any nested open parentheses must also be followed by correpsonding closing parentheses.
An index argument that contains a quoted string will only match events containing that exact phrase, symbols included. For example:
systemd("foo:bar")
will only return events containing the exact text foo:bar
, while
systemd(foo:bar)
will return any event containing both foo
and bar
as it will be broken into two separate tokens and remove the symbol.
Duplicate index arguments
If a select
command contains multiple instances of the same index argument, each argument will proccessed independently resulting in duplicate events being returned. For example:
| select systemd(foo bar) systemd(foo bar)
will return all events containing foo
and bar
twice.
Because of this behaviour, be careful when using mulitiple index arguments for the same index as it may result in duplicated events being returned if events are matched by multiple key words or terms.
Example
| select systemd(foo bar) syslog("complex_literal:with-other.stuff")
rawselect
The rawselect
command behaves almost identically like the select
command, but without performing any additional processing such as JSON detection or additional field extraction.
Source Commands
Stage commands take a stream of events from another command and transform it into a new stream of events.
They cannot be used as the first command in a search.
match
The match
command is used to limit results to those that meet certain criteria. There can be multiple criterion, and all must be met for a result to be included in the output.
For fields that have multiple values, if any of the values match the criteria then the result will be included in the output.
Syntax
The match
command accepts one or more expressions in the following format:
field=<expression>
The format accepts either =
or !=
to indicate whether a field should match (or not) an expression.
The expression can be any of the following:
- A string, denomiated by double quotes (
"
). This will match results where the field value is exactly equal to the provided string:field="value"
- A regular expression, denoted by forward slashes (
/
):
The regular expression must be in the format used by the rustfield=/regex/
regex
crate here. - An identifier, which can be used to compare the values of two fields. This is useful for comparing a field with a value from another field in the same result:
field1=field2
- A glob expression, denoted as either an identifier or a string either appended or prepended with an asterisk (
*
). This will match results where the field value either begins or ends with the provided string. There must be exaclty one*
in the expression and it can only appear at the beginning or end of the expression:field=*value field="*value" field=value* field="value*"
- A wildcard (
*
), which will match any value for that field. This is useful when you want to check if the field exists, but don't care about its value:field=*
Combining expressions
Multiple expressions can be combined using common boolean operations via and
, or
, xor
and not
. If no operator is specified for multiple expressions then they will be combined with an implicit and
operation.
Expressions also support grouping using parentheses ()
to specifiy the order or grouping of operations, there is no guarantee of order of evaluation otherwise.
Example
field foo
contains either the string bar
, or has the same value as field baz
:
| match foo="bar" OR foo=baz
field foo
starts with bar
and ends with baz
(note that and
is implicit here):
| match foo="bar*" foo="*baz"
field foo
contains the string bar
or baz
, or the field x exists:
| match (foo=/bar/ OR foo=/baz/) OR x=*
field foo
does not contain the string bar
:
| match NOT foo=/bar/
| match foo!=/bar/
filter
The filter
command is very similar to the match
command, however rather than only returning events that match an expression, it instead returns all events; but only retains values of fields that match the given expression.
For example, if you have an event with a field called foo
, with the value["bar", "baz"]
, and run filter foo = "bar"
, then the resulting event will still contain the field foo
, but it's value will be reduced to just ["bar"]
.
Syntax
The syntax for the filter command is the same as that of the match command, see here for more details.
Combining expressions
Like the match
command, the boolean operations and
, or
, xor
and not
are still valid; however unlike the match
command they are a no-op and will simply execute the nested expressions as filters.
Example
Only retain values of the field foo
that match the regular expression /bar/
:
| filter foo=/bar/
Only retain values of the field foo
that do not match the regular expression /bar/
:
| filter foo!=/bar/
rename
The rename
command is used to rename fields in events; it can be particularly useful for handling json data where the field names are not valid identifier for search queries.
Syntax
The rename
command takes a list of source and destination field names, separated by the to
keyword:
| rename <source> to <destination> [<source> to <destination> ...]
<source>
can be either an identifier or a quoted string to allow for otherwise invalid names. <destination>
must be an identifier and cannot be quoted.
Examples
Rename the invalid field name foo:bar
to foobar
:
| rename "foo:bar" to foobar
fields
The fields
command is used to specifiy the fields that should be present in each event
Syntax
The fields
command accepts a list of field names:
| fields foo bar baz
This command will remove all fields from an event except for foo
, bar
, and baz
; if any of these fields are not present in the event, they will be added with a null value.
Example
Only retain the _raw
and _time
fields for all events:
| fields _raw _time
extract
The extract
command is used to extract new fields from existing ones using named capture groups in a regular expression.
Syntax
The extract
command accepts multiple argumets in the following format:
field=/(?<new_field>.+)/
With the example above, the extract
command will run the regular expression /(?<new_field>.+)/
on each value of the field
field. If a match is found, it will either create or append the match to the new_field
field.
Example
For an example ssh login event where the message
field contains the following:
Accepted password for user from 192.168.0.10 port 60782 ssh2
The following command will extract authentication method, the username and the source IP address into new fields:
| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/
extractjson
The extractjson
command is similar to the extract
command, but it instead parses the contents of all specifed files as JSON and adds each identiified key-value pair as a new field on the event.
Syntax
The extractjson
command accepts a list of field names to attempt to extract JSON from. If any of these fields are present, they will be parsed as JSON and each key-value pair in the resulting object will be added as a new field on the event. The new fields will be prefixed with the name of the original field that was extracted, followed by a _
character.
| extractjson <field> [<field> ...]
Example
For an example event where the field foo
is a JSON string with the following value:
{"bar": "baz"}
This command would add a new field called foo_bar
to the event, with the value of "baz"
:
| extractjson foo
eval
The eval
command is very flexible, and allows for a wide range of operations to be performed on field values. It can be used to perform mathematical calculations, string manipulation, and more.
Syntax
The eval
command accepts arguments with the following structure:
new_field=<expression>
The can be multiple arguments for a given eval command, separated by spaces. The expressions will be evaluated in left-to-right order and subseuqent expressions may refer to fields created by previous expressions.
There are a wide range of functions that can be used within the expression. These include:
- conditionals - Control flow based on conditions
- encoding - Functions for encoding and decoding data
- maths - Mathematical operations
- text - String manipulation functions
- multivalue - Functions for working with multivalued fields
- cryptography - Cryptographic functions such as hashing operations
- time - Functions for working with time and dates
As well as the subcommands above, there are also primitive expressions field
and literal
that can be used to refer to existing fields or literal values respectively.
Eval subcommands that accept arguments can be arbitarily nested, allowing for complex expressions to be built up.
Example
Create a new field foo
with the literal
value of bar
on all events:
| eval foo="bar"
Create a new field baz
with the value of whatever the foo
field contains on all events:
| eval baz=foo
For an example of using conditionals - take the following search which returns ssh logins from multiple indices:
| select systemd(accepted ssh) syslog(accepted sshd)
The systemd
index use capitals for field names while the syslog
index uses lowercase. We can use eval to create a consistent set of fields across both indices that we can then extract fields from:
| select systemd(accepted ssh) syslog(accepted sshd)
| eval message=if(MESSAGE=*, MESSAGE, message)
| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/
An example of using nested subcommands to create a new field host
from a field fqdn
containing the value: hostname.subdomain.domain.tld
:
| eval host=mvindex(split(fqdn, "."), 0)
This command splits the fqdn
field on each period character and then extracts the first element of that array (the hostname). The result is a new field called host
with the value of hostname
.
encoding
HexEncode
The hexencode
subcommand is used to encode data into hexadecimal format:
| eval encoded=hexencode(<expression>)
HexDecode
The hexdecode
subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':
| eval decoded=hexdecode(<expression>)
Base64Encode
The b64encode
subcommand is used to encode data into the base 64 format:
| eval encoded=b64encode(<expression>)
Base64Decode
The b64decode
subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':
| eval decoded=b64decode(<expression>)
UrlEncode
The urlencode
subcommand is used to URL-encode a string. This function converts all values that are not allowed in a URL into their %-delimited hexadecimal representation:
| eval encoded=urlencode(<expression>)
UrlDecode
The urldecode
subcommand is used to decode data in the <> format, it will return a utf-8 string so if binary data is decoded it will be converted and any unprintable characters will be replaced with '�':
| eval decoded=urldecode(<expression>)
conditionals
if
The if
subcommand has the follow syntax:
if(<match expression>, <eval expression if true>, <eval expression if false>)
<match expression>
is an expression in the same format as that of the match
command, and <eval expression if true>
and <eval expression if false>
are expressions to be evaluated if the match expression evaluates to true or false respectively.
Example
Take the following search which returns ssh logins from multiple indices:
| select systemd(accepted ssh) syslog(accepted sshd)
The systemd
index use capitals for field names while the syslog
index uses lowercase. We can use eval to create a consistent set of fields across both indices that we can then extract fields from:
| select systemd(accepted ssh) syslog(accepted sshd)
| eval message=if(MESSAGE=*, MESSAGE, message)
| extract message=/^\w+\s(?<auth_method>\w+)\s\w+\s(?<user>\w+)\s\w+\s(?<remote>[^\s]+)/
maths
E
The e
subcommand returns 2.718281828459045, Euler's number.
| eval e=e()
Pi
The pi
subcommand returns the value of pi (3.141592653589793).
| eval pi=pi()
Sin
The sin
subcommand calculates the sine of a number in radians.
| eval sin=sin(<expression>)
Cos
The cos
subcommand calculates the cosine of a number in radians.
| eval cos=cos(<expression>)
Tan
The tan
subcommand calculates the tangent of a number in radians.
| eval tan=tan(<expression>)
SinH
The sinh
subcommand calculates the hypebolic sine of a number in radians.
| eval sinh=sinh(<expression>)
CosH
The cosh
subcommand calculates the hypebolic cosine of a number in radians.
| eval cosh=cosh(<expression>)
TanH
The tanh
subcommand calculates the hypebolic tangent of a number in radians.
| eval tanh=tanh(<expression>)
ASin
The asin
subcommand calculates the arcsine of a number in radians.
| eval asin=asin(<expression>)
ACos
The acos
subcommand calculates the arccosine of a number in radians.
| eval acos=acos(<expression>)
ATan
The atan
subcommand calculates the arctangent of a number in radians.
| eval atan=atan(<expression>)
ASinH
The asinh
subcommand calculates the inverse hypebolic sine of a number in radians.
| eval asinh=asinh(<expression>)
ACosH
The acosh
subcommand calculates the inverse hypebolic cosine of a number in radians.
| eval acosh=acosh(<expression>)
ATanH
The atanh
subcommand calculates the inverse hypebolic tangent of a number in radians.
| eval atanh=atanh(<expression>)
Exp
The exp
subcommand calculates the base e exponential of a number. This is equivalent to e^x
.
| eval exp=exp(<expression>)
Ln
The ln
subcommand calculates the nautral logarithm (base e) of a number.
| eval ln=ln(<expression>)
Sqrt
The sqrt
subcommand calculates the sqaure root of a number.
| eval sqrt=sqrt(<expression>)
Abs
The abs
subcommand calculates the apsolute value of a number.
| eval abs=abs(<expression>)
Ceil
The ceil
subcommand calculates the ceiling (next highest integer) of a number.
| eval ceil=ceil(<expression>)
Floor
The floor
subcommand calculates the floor (next lowest integer) of a number.
| eval floor=floor(<expression>)
Log
The log
subcommand calculates the logarithm of a number in the provided base
| eval result=log(<expression>, <base>)
Pow
The pow
subcommand raises a number to the power of another number.
| eval result=pow(<expression>, <exponent>)
Nrt
The nrt
subcommand calculates the nth root of a number.
| eval result=nrt(<expression>, <degree>)
Add
The add
subcommand adds 2 numbers together.
| eval result=add(<expression>, <expression>)
Sub
The sub
subcommand sutracts one number from another.
| eval result=sub(<expression>, <expression>)
Mul
The mul
subcommand multiplies two numbers.
| eval result=mul(<expression>, <expression>)
Div
The div
subcommand divides one number by another.
| eval result=div(<expression>, <expression>)
text
Len
The len
subcommand is used to get the length of a string in bytes. This means that for utf-8 encoded strings containin non-ascii characters, the result may not be what you expect.
| eval len=len(<expression>)
Lower
The lower
subcommand converts all uppercase letters in a string to lowercase.
| eval lower=lower(<expression>)
Upper
The upper
subcommand converts all lowercase letters in a string to uppercase.
| eval upper=upper(<expression>)
Trim
The trim
subcommand removes leading and trailing whitespace from a string.
| eval trimmed=trim(<expression>)
Concatenate
The concatenate
subcommand concatenates two or more strings into one, you can optionally specify a delimiter to be inserted between the strings with the sep="val"
argument.
| eval abc=concat("a", "b" , "c")
| eval a_b=concat("a", "b" , sep="_")
LStrip
The lstrip
subcommand removes all characters matching a pattern from the left side of a string until it encounters a character not in the pattern.
NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.
| eval stripped=lstrip(<value expression>, <pattern expression>)
Example removing with a the field foo
containing the following value abcFoocba
| eval stripped=lstrip(foo, "abc")
The result will be Foocba
.
RStrip
The rstrip
subcommand removes all characters matching a pattern from the right side of a string until it encounters a character not in the pattern.
NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.
| eval stripped=rstrip(<value expression>, <pattern expression>)
Example removing with a the field foo
containing the following value abcFoocba
| eval stripped=rstrip(foo, "abc")
The result will be abcFoo
.
Strip
The strip
subcommand removes all characters matching a pattern from the either side of a string until it encounters a character not in the pattern.
NOTE: The order of the characers in the pattern does not matter, only that they are present in the string.
| eval stripped=strip(<value expression>, <pattern expression>)
Example removing with a the field foo
containing the following value abcFoocba
| eval stripped=strip(foo, "abc")
The result will be Foo
.
Split
The split
subcommand will split a string into an array of substrings based on a delimiter.
| eval split=split(<value expression>, <delimiter expression>)
For example splitting up components of an FQDN:
| eval split=split("www.google.com", ".")
This will return a multivalue field with the following values ["www","google","com"]
.
SubStr
The substr
subcommand returns a substring of a string based on a start and end index.
| eval sub=substr(<value expression>, <start index>, <end index>)
For example extracting foo
from foobar
:
| eval sub=substr("foobar", 0, 3)
Replace
The replace
subcommand performs a find and replace on a string.
| eval edited=replace(<value expression>, <find expression>, <replace expression>)
For example replacing foo
with bar
in the value foobar
resulting in barbar
:
| eval edited=replace("foobar", "foo", "bar")
multivalue
Min
The mvmin
subcommand returns the smallest value of a multivalued field.
For a field foo
with values [1, 2, 3]
this example will set min
to 1
.
| eval min=mvmin(foo)
Max
The mvmax
subcommand returns the largest value of a multivalued field.
For a field foo
with values [1, 2, 3]
this example will set max
to 3
.
| eval max=mvmax(foo)
Dedup
The mvdedup
subcommand returns the contents of a multivalued field with duplicates removed.
For a field foo
with values [1, 1, 3]
this example will set unique
to [1, 3]
.
| eval unique=mvdedup(foo)
Sort
The mvsort
subcommand returns the contents of a multivalued field sorted in ascending order.
For a field foo
with values [3, 1, 2]
this example will set sorted
to [1, 2, 3]
.
| eval sorted=mvsort(foo)
Reverse
The mvrev
subcommand returns the contents of a multivalued field in reverse order.
For a field foo
with values [1, 2, 3]
this example will set reversed
to [3, 2, 1]
.
| eval reversed=mvrev(foo)
Count
The mvcount
subcommand returns the number of values in a multivalued field.
Join
The mvjoin
subcommand returns a multivalue field with all the values of the second expresion appended to the first expression.
With field1
containing ["a","b"]
and field2
containing ["c","d"]
, this example command with create a field merged
that contains ["a","b","c","d"]
.
| eval merged = mvjoin(field1, field2)
Index
The mvindex
subcommand returns the value at the specified index of a multivalued field.
With field1
containing ["a","b"]
, this example command with create a field first_value
that contains "a"
.
| eval first_value = mvindex(field1, 0)
Range
The mvrange
subcommand returns the values of a multivalued field within a start and end index range.
With field1
containing ["a","b","c"]
, this example command with create a field subset
that contains ["b","c"]
.
| eval subset = mvrange(field1, 1, 2)
cryptography
Md5
The md5
subcommand is used to calculate the md5 hash of the nested expression:
| eval hash=md5(<expression>)
Sha1
The sha1
subcommand is used to calculate the sha1 hash of the nested expression:
| eval hash=sha1(<expression>)
Sha224
The sha224
subcommand is used to calculate the sha224 hash of the nested expression:
| eval hash=sha224(<expression>)
Sha256
The sha256
subcommand is used to calculate the sha256 hash of the nested expression:
| eval hash=sha256(<expression>)
Sha384
The sha384
subcommand is used to calculate the sha384 hash of the nested expression:
| eval hash=sha384(<expression>)
Sha512
The sha512
subcommand is used to calculate the sha512 hash of the nested expression:
| eval hash=sha512(<expression>)
Sha3_224
The sha3_224
subcommand is used to calculate the sha3_224 hash of the nested expression:
| eval hash=sha3_224(<expression>)
Sha3_256
The sha3_256
subcommand is used to calculate the sha3_256 hash of the nested expression:
| eval hash=sha3_256(<expression>)
Sha3_384
The sha3_384
subcommand is used to calculate the sha3_384 hash of the nested expression:
| eval hash=sha3_384(<expression>)
Sha3_512
The sha3_512
subcommand is used to calculate the sha3_512 hash of the nested expression:
| eval hash=sha3_512(<expression>)
time
Now
The now
subcommand returns the current date and time:
| eval now=now()
StrfTime
The strftime
subcommand formats a timestamp into a string. The format variables that can be used are listed here.
Example to get the current year:
| eval year=strftime(now(), "%Y")
StrpTime
The strptime
subcommand attempts to parse a string into a timestamp. The format variables that can be used are listed here.
The parsing format must include timezone information.
| eval
time=strptime("1970-01-01T00:00:00Z", "%+")
time2=strptime("1970-01-01T00:00:00+0000", "%Y-%m-%dT%H:%M:%S%z")
Aggregation Commands
Some stage commands aggregate multiple events into a single event.
These commands may block the stream until they have enough data to produce an output event.
stats
The stats
command is used to calculate statistics over all events in a given search.
Syntax
The stats command is structured as follows:
- A list of aggregation functions, that may take a field name as an argument; and may have an alias specified with the
as
keyword to set the name of the resulting field.- There must always be at least one aggregation function.
- An optional
by
identifier, followed by at least one field name. All unique permutations of the values of these fields will result in a new aggregation group containing a copy of all specified aggregation functions.
| stats <aggregation>(<field>) as <output-name> by <field1>, <fieldN>
Aggregation Functions
List of available aggregation functions:
Takes a field name argument
sum
- Sums the numeric values in the given field.avg
- Calculates the average numeric value of the given field.min
- Finds the smallest value in the given field.max
- Finds the largest value in the given field.unique
- Counts the number of unique values in the given field.values
- Returns a list of all unique values in the given field.
No field name argument
count
- Increments a counter for each aggregation group.
Example
Count how many events are associated with each systemd unit:
| select systemd
| stats count() by SYSTEMD_UNIT
will output rows with 2 columns, SYSTEMD_UNIT
and count
, containing the name of a systemd unit and the number of events associated with it respectively.
Get the number of events + the number of unique systemd units for each host:
| select systemd
| stats count() as event_count, unique(SYSTEMD_UNIT) as unique_units by HOSTNAME
will output rows with 3 columns, HOSTNAME
, event_count
, and unique_units
.