Use regular expression functions to extract, match, replace, and split strings in SQL and SPL queries.
Regular expression functions
SLS supports the following regular expression functions, all using RE2 syntax.
|
Function |
Syntax |
Description |
SQL |
SPL |
|
regexp_extract_all(x, regular expression) |
Returns an array of all substrings that match a regular expression. |
√ |
× |
|
|
regexp_extract_all(x, regular expression, n) |
Returns the specified capturing group from all matches as an array. |
√ |
× |
|
|
regexp_extract(x, regular expression) |
Returns the first substring that matches a regular expression. |
√ |
√ |
|
|
regexp_extract(x, regular expression, n) |
Returns the n-th capturing group from the first match. |
√ |
√ |
|
|
regexp_extract_bool(x, regular expression) |
Converts the first match to BOOLEAN. Returns |
√ |
× |
|
|
regexp_extract_bool(x, regular expression, n) |
Converts the specified capturing group from the first match to BOOLEAN. Returns |
√ |
× |
|
|
regexp_extract_long(x, regular expression) |
Converts the first match to BIGINT. Returns |
√ |
× |
|
|
regexp_extract_long(x, regular expression, n) |
Converts the specified capturing group from the first match to BIGINT. Returns |
√ |
× |
|
|
regexp_extract_double(x, regular expression) |
Converts the first match to DOUBLE. Returns |
√ |
× |
|
|
regexp_extract_double(x, regular expression, n) |
Converts the specified capturing group from the first match to DOUBLE. Returns |
√ |
× |
|
|
regexp_extract_map(x, regular expression, keys) |
Maps provided keys to their corresponding capturing group matches. |
√ |
× |
|
|
regexp_extract_map(x, regular expression) |
Maps the first capturing group as keys to the second as values. |
√ |
× |
|
|
regexp_like(x, regular expression) |
Checks whether a string matches a regular expression. |
√ |
√ |
|
|
regexp_replace(x, regular expression) |
Removes all substrings that match a regular expression. |
√ |
√ |
|
|
regexp_replace(x, regular expression, replace string) |
Replaces all substrings that match a regular expression with a replacement string. |
√ |
√ |
|
|
regexp_split(x, regular expression) |
Splits a string by a regular expression and returns the resulting substrings in an array. |
√ |
× |
To extract a single quotation mark (') with a regular expression function, escape it by doubling to ''. Example 3 of the regexp_extract function demonstrates this approach.
regexp_extract_all function
Extracts all substrings matching a regular expression from a source string.
Syntax
-
Returns all matching substrings as an array.
regexp_extract_all(x, regular expression) -
Returns all matches of a specified capturing group as an array.
regexp_extract_all(x, regular expression, n)
Parameters
|
Parameter |
Description |
|
x |
The source string. It must be a VARCHAR. |
|
regular expression |
The regular expression. Must contain capturing groups when parameter n is specified. For example, |
|
n |
The capturing group index. Must be an integer greater than or equal to 1. |
Return value
ARRAY
Examples
-
Example 1: Extract all digits from the
server_protocolfield.-
Sample field
server_protocol:HTTP/2.0 -
Query and analysis statement (Test)
*| SELECT regexp_extract_all(server_protocol, '\d+') -
Query and analysis results: The query returns
["2","0"]in the_col0column, which are all the digit substrings extracted fromHTTP/2.0.
-
-
Example 2: Extract the 'Chrome' substring from the
http_user_agentfield and count Chrome requests.-
Sample field
http_user_agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1 -
Query and analysis statement (Test)
*| SELECT regexp_extract_all(http_user_agent, '(Chrome)',1) AS Chrome, count(*) AS count GROUP BY Chrome -
Query and analysis results: The
Chromecolumn returns["Chrome"]and thecountcolumn returns103440.
-
regexp_extract function
Extracts the first substring matching a regular expression from a source string.
Syntax
-
Returns the first substring that matches a regular expression.
regexp_extract(x, regular expression) -
Returns the substring that matches the nth capturing group.
regexp_extract(x, regular expression, n)
Parameters
|
Parameter |
Description |
|
x |
The source string. It must be of the VARCHAR type. |
|
regular expression |
The regular expression. Can contain capturing groups. For example, |
|
n |
The capturing group index (1-based). Must be a positive integer. |
Return value
Returns a VARCHAR string.
Examples
SQL
-
Example 1: Extract the first digit from the
server_protocolfield.-
Sample field
server_protocol:HTTP/2.0 -
Query statement (Test)
*|SELECT regexp_extract(server_protocol, '\d+') -
The query returns a column named
_col0with the value2.
-
-
Example 2: Extract the file name from the
request_urifield and count requests per file.-
Sample field
request_uri:/request/path-3/file-5 -
Query statement (Test)
* | SELECT regexp_extract(request_uri, '.*\/(file.*)', 1) AS file, count(*) AS count GROUP BY file -
Sample result: The file column is
file-5, and the count column is17127.
-
-
Example 3: Extract the single quotation mark (') and the digits from the
messagefield.-
Sample field
message:error'1232 -
Query statement
* | SELECT regexp_extract(message, '''\d+')NoteTo match a single quotation mark ('), you must escape it by doubling it to
''in the regular expression string. -
The result is
'1232.
-
SPL
-
Example 1: Extract the first digit from the server_protocol field.
-
Sample field
-
server_protocol:HTTP/2.0
-
Query statement
* | extend a = regexp_extract(server_protocol, '\d+')
-
Result
a: 2
-
Example 2: Extract the file part from the
request_urifield.-
Sample field
-
request_uri:/request/path-3/file-5
-
Query statement
* | extend a = regexp_extract(request_uri, '.*\/(file.*)',1)
-
Result
The result is a: file-5.
-
Example 3: Extract the single quotation mark (') and the digits from the
messagefield.-
Sample field
-
message:error'1232
-
Query statement
* | extend a = regexp_extract(message, '''\d+')
Note
To match a single quotation mark ('), you must escape it by doubling it to '' in the regular expression string.
-
Result
1 12-19 10:39:33
a: '1232
regexp_extract_bool function
Extracts the first matching substring and converts it to BOOLEAN ("true"/"false", case-insensitive). Returns null on failure.
Syntax
-
Converts the first match to BOOLEAN. Returns
nullon failure.regexp_extract_bool(x, regular expression) -
Converts the specified capturing group from the first match to BOOLEAN. Returns
nullon failure.regexp_extract_bool(x, regular expression, n)
Parameters
|
Parameter |
Description |
|
x |
The source string. This parameter is of the VARCHAR type. |
|
regular expression |
A regular expression that contains capturing groups. For example, |
|
n |
The index of the capturing group to extract. This parameter is a 1-based integer. |
Return value type
BOOLEAN type.
Examples
-
Extract a boolean value from a field.
-
Sample field
false -
Query and analysis statement (Test)
*| select regexp_extract_bool('false', '[a-zA-Z]+') -
Query and analysis results
Returns a single
_col0column with one row of data. The value isfalse.
-
regexp_extract_long function
Extracts a substring matching a regular expression and converts it to BIGINT. Returns null if the conversion fails.
Syntax
-
Extracts the first match and converts it to BIGINT.
regexp_extract_long(x, regular expression) -
Converts the specified capturing group from the first match to BIGINT.
regexp_extract_long(x, regular expression, n)
Parameters
|
Parameter |
Description |
|
x |
The source string. This parameter is of the VARCHAR type. |
|
regular expression |
A regular expression that contains capturing groups. For example, |
|
n |
The index of the capturing group to extract. The index is a 1-based integer. |
Return value type
BIGINT
Examples
-
Extract the number from the
timefield.-
Sample field
time:19/Dec/2024:06:16:06 -
Query statement (Test)
*|SELECT regexp_extract_long(time, '(\d{2})/', 1) -
Query and analysis results
Returns a column named
_col0that contains two rows, each with the value19.
-
regexp_extract_double function
Extracts a substring matching a regular expression and converts it to DOUBLE. Returns null if the conversion fails.
Syntax
-
Converts the first match to DOUBLE. Returns
nullon failure.regexp_extract_double(x, regular expression) -
Converts the specified capturing group from the first match to DOUBLE. Returns
nullon failure.regexp_extract_double(x, regular expression, n)
Parameters
|
Parameter |
Description |
|
x |
The source string. Its type must be VARCHAR. |
|
regular expression |
A regular expression that contains capturing groups. For example, |
|
n |
The index of the capturing group to extract. The index is an integer that starts at 1. |
Return value type
double
Examples
-
Extract the floating-point number from the
server_protocolfield.-
Sample field
server_protocol:HTTP/1.1 -
Query and analysis statement (Test)
*|SELECT regexp_extract_double(server_protocol, '\d+\.\d+') -
Query and analysis results
Returns a column named
_col0. This column contains two rows, each with the value1.1.
-
regexp_extract_map function
Extracts substrings matching capturing groups in a regular expression and returns them as a key-value map.
Syntax
-
Creates a map by pairing the provided keys with substrings matched by the corresponding capturing groups.
regexp_extract_map(x, regular expression, keys)
-
Creates a map by using a regular expression with two capturing groups that match the keys and values, respectively.
regexp_extract_map(x, regular expression)
Parameters
|
Parameter |
Description |
|
x |
The source string to search. The data type is VARCHAR. |
|
regular expression |
A regular expression that contains capturing groups. For example, |
|
keys |
The key names for the captured substrings. The data type is ARRAY(VARCHAR). The number of keys must match the number of capturing groups in the regular expression. |
Return value type
MAP(VARCHAR, VARCHAR)
Examples
-
Example 1: Extract the protocol name and version from the server_protocol field.
-
Sample field
server_protocol: 'HTTP/2.0' -
Query statement
select regexp_extract_map(server_protocol, '(\w+)/([\d\.]+)', array['name', 'version']) as protocol -
Query and analysis results
The query returns the following map in the protocol column:
{"name":"HTTP","version":"2.0"}.
-
-
Example 2: Extract all key-value pairs from the content field.
-
Sample field
content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+"' -
Query statement
select regexp_extract_map(content, '(\w+)="([^"]*)"') as args -
Output data
{"latency":"45.911µs","verb":"GET","userAgent":"kube-probe/1.30+","URI":"/healthz"}
-
regexp_like function
Checks whether a string matches a regular expression.
Syntax
regexp_like(x, regular expression)
Parameters
|
Parameter |
Description |
|
x |
The source string. Must be of type VARCHAR. |
|
regular expression |
The regular expression pattern to match. |
Return value type
BOOLEAN
Examples
SQL
Check if the server_protocol field contains digits.
-
Sample field
server_protocol:HTTP/2.0 -
Query and analysis statement (Test)
*| select regexp_like(server_protocol, '\d+') -
The query returns two rows. Both values are
true. This indicates that the value of the server_protocol field matches the\d+regular expression (contains digits).
SPL
Check if the server_protocol field contains digits.
-
Sample field
server_protocol:HTTP/2.0
-
Query and analysis statement
* |extend a = regexp_like(server_protocol, '\d+')
-
Query and analysis results
1 12-18 19:34:20 1.2.3.4
a: true
regexp_replace function
Removes or replaces substrings matching a regular expression.
Syntax
-
Removes all substrings from a source string that match a regular expression.
regexp_replace(x, regular expression) -
Replaces all substrings that match a regular expression with a replacement string.
regexp_replace(x, regular expression, replace string)
Parameters
|
Parameter |
Description |
|
x |
The source string to search. This parameter must be of the VARCHAR type. |
|
regular expression |
The regular expression pattern. |
|
replace string |
The replacement string. If omitted, matched substrings are removed. |
Return value type
VARCHAR
Examples
SQL
-
Example 1: In the
regionfield, replace region names that start withcnwith China, and then count the number of requests from China.-
Sample field
region:cn-shanghai -
Query and analysis statement (Test)
* | select regexp_replace(region, 'cn.*','China') AS region, count(*) AS count GROUP BY region -
Query and analysis results: The query returns a single row where the region column is
Chinaand the count column is168871.
-
-
Example 2: Remove the version number from the
server_protocolfield and count requests per protocol.-
Sample field
server_protocol:HTTP/2.0 -
Query and analysis statement (Test)
*| select regexp_replace(server_protocol, '.\d+') AS server_protocol, count(*) AS count GROUP BY server_protocol -
Query and analysis results: The query returns a single row where
server_protocolisHTTPandcountis168871.
-
SPL
-
Example 1: Replace all region names in the
regionfield value that start with 'cn' with China.-
Sample field
-
region:cn-shanghai
-
Query and analysis statement
* | extend a = regexp_replace(region, 'cn.*','China')
-
Query and analysis results
1 12-18 19:34:20 1.2.3.4 a: China -
Example 2: Delete the version number from the
server_protocolfield value.-
Sample field
-
server_protocol:HTTP/2.0
-
Query and analysis statement (Test)
* | extend a = regexp_replace(server_protocol, '.\d+')
-
Query and analysis results
The query returns a log entry that includes a timestamp, the destination IP address 1.2.3.4, and the server_protocol field with the version number removed.
regexp_split function
Splits a source string into an array of substrings using a regular expression as the delimiter.
Syntax
regexp_split(x, regular expression)
Parameters
|
Parameter |
Description |
|
x |
The source string. Its data type is VARCHAR. |
|
regular expression |
The regular expression pattern used as the delimiter. |
Return value type
array
Example
Use a forward slash (/) to split the value of the request_uri field.
-
Sample field
request_uri:/request/path-0/file-7 -
Query and analysis statement (Test)
* | SELECT regexp_split(request_uri,'/') -
The query and analysis results are
["","request","path-0","file-7"].