Regular Expression Functions

更新时间:
复制 MD 格式

Use regular expression functions to extract, match, replace, and split strings in SQL and SPL queries.

Regular expression functions

SLS supports the following regular expression functions, all using RE2 syntax.

Important If you want to use strings in analytic statements, you must enclose strings in single quotation marks (''). Strings that are not enclosed or enclosed in double quotation marks ("") indicate field names or column names. For example, 'status' indicates the status string, and status or "status" indicates the status log field.

Function

Syntax

Description

SQL

SPL

regexp_extract_all function

regexp_extract_all(x, regular expression)

Returns an array of all substrings that match a regular expression.

×

regexp_extract_all(x, regular expression, n)

Returns the specified capturing group from all matches as an array.

×

regexp_extract function

regexp_extract(x, regular expression)

Returns the first substring that matches a regular expression.

regexp_extract(x, regular expression, n)

Returns the n-th capturing group from the first match.

regexp_extract_bool function

regexp_extract_bool(x, regular expression)

Converts the first match to BOOLEAN. Returns null if the conversion fails.

×

regexp_extract_bool(x, regular expression, n)

Converts the specified capturing group from the first match to BOOLEAN. Returns null if the conversion fails.

×

regexp_extract_long function

regexp_extract_long(x, regular expression)

Converts the first match to BIGINT. Returns null if the conversion fails.

×

regexp_extract_long(x, regular expression, n)

Converts the specified capturing group from the first match to BIGINT. Returns null if the conversion fails.

×

regexp_extract_double function

regexp_extract_double(x, regular expression)

Converts the first match to DOUBLE. Returns null if the conversion fails.

×

regexp_extract_double(x, regular expression, n)

Converts the specified capturing group from the first match to DOUBLE. Returns null if the conversion fails.

×

regexp_extract_map function

regexp_extract_map(x, regular expression, keys)

Maps provided keys to their corresponding capturing group matches.

×

regexp_extract_map(x, regular expression)

Maps the first capturing group as keys to the second as values.

×

regexp_like function

regexp_like(x, regular expression)

Checks whether a string matches a regular expression.

regexp_replace function

regexp_replace(x, regular expression)

Removes all substrings that match a regular expression.

regexp_replace(x, regular expression, replace string)

Replaces all substrings that match a regular expression with a replacement string.

regexp_split function

regexp_split(x, regular expression)

Splits a string by a regular expression and returns the resulting substrings in an array.

×

Note

To extract a single quotation mark (') with a regular expression function, escape it by doubling to ''. Example 3 of the regexp_extract function demonstrates this approach.

regexp_extract_all function

Extracts all substrings matching a regular expression from a source string.

Syntax

  • Returns all matching substrings as an array.

    regexp_extract_all(x, regular expression)
  • Returns all matches of a specified capturing group as an array.

    regexp_extract_all(x, regular expression, n)

Parameters

Parameter

Description

x

The source string. It must be a VARCHAR.

regular expression

The regular expression. Must contain capturing groups when parameter n is specified. For example, (\d)(\d)(\d) specifies three capturing groups.

n

The capturing group index. Must be an integer greater than or equal to 1.

Return value

ARRAY

Examples

  • Example 1: Extract all digits from the server_protocol field.

    • Sample field

      server_protocol:HTTP/2.0
    • Query and analysis statement (Test)

      *| SELECT regexp_extract_all(server_protocol, '\d+')
    • Query and analysis results: The query returns ["2","0"] in the _col0 column, which are all the digit substrings extracted from HTTP/2.0.

  • Example 2: Extract the 'Chrome' substring from the http_user_agent field and count Chrome requests.

    • Sample field

      http_user_agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1
    • Query and analysis statement (Test)

      *| SELECT regexp_extract_all(http_user_agent, '(Chrome)',1) AS Chrome, count(*) AS count GROUP BY Chrome
    • Query and analysis results: The Chrome column returns ["Chrome"] and the count column returns 103440.

regexp_extract function

Extracts the first substring matching a regular expression from a source string.

Syntax

  • Returns the first substring that matches a regular expression.

    regexp_extract(x, regular expression)
  • Returns the substring that matches the nth capturing group.

    regexp_extract(x, regular expression, n)

Parameters

Parameter

Description

x

The source string. It must be of the VARCHAR type.

regular expression

The regular expression. Can contain capturing groups. For example, (\d)(\d)(\d) contains three capturing groups.

n

The capturing group index (1-based). Must be a positive integer.

Return value

Returns a VARCHAR string.

Examples

SQL

  • Example 1: Extract the first digit from the server_protocol field.

    • Sample field

      server_protocol:HTTP/2.0
    • Query statement (Test)

      *|SELECT regexp_extract(server_protocol, '\d+')
    • The query returns a column named _col0 with the value 2.

  • Example 2: Extract the file name from the request_uri field and count requests per file.

    • Sample field

      request_uri:/request/path-3/file-5
    • Query statement (Test)

      * | SELECT regexp_extract(request_uri, '.*\/(file.*)', 1) AS file, count(*) AS count GROUP BY file
    • Sample result: The file column is file-5, and the count column is 17127.

  • Example 3: Extract the single quotation mark (') and the digits from the message field.

    • Sample field

      message:error'1232
    • Query statement

      * | SELECT regexp_extract(message, '''\d+') 
      Note

      To match a single quotation mark ('), you must escape it by doubling it to '' in the regular expression string.

    • The result is '1232.

SPL

  • Example 1: Extract the first digit from the server_protocol field.

    • Sample field

server_protocol:HTTP/2.0
  • Query statement

* | extend a = regexp_extract(server_protocol, '\d+')
  • Result

a:  2
  • Example 2: Extract the file part from the request_uri field.

    • Sample field

request_uri:/request/path-3/file-5
  • Query statement

* | extend a = regexp_extract(request_uri, '.*\/(file.*)',1)
  • Result

The result is a: file-5.

  • Example 3: Extract the single quotation mark (') and the digits from the message field.

    • Sample field

message:error'1232
  • Query statement

* | extend a = regexp_extract(message, '''\d+') 

Note

To match a single quotation mark ('), you must escape it by doubling it to '' in the regular expression string.

  • Result

1  12-19 10:39:33
   a: '1232

regexp_extract_bool function

Extracts the first matching substring and converts it to BOOLEAN ("true"/"false", case-insensitive). Returns null on failure.

Syntax

  • Converts the first match to BOOLEAN. Returns null on failure.

    regexp_extract_bool(x, regular expression)
  • Converts the specified capturing group from the first match to BOOLEAN. Returns null on failure.

    regexp_extract_bool(x, regular expression, n)

Parameters

Parameter

Description

x

The source string. This parameter is of the VARCHAR type.

regular expression

A regular expression that contains capturing groups. For example, (\d)(\d)(\d) specifies three capturing groups.

n

The index of the capturing group to extract. This parameter is a 1-based integer.

Return value type

BOOLEAN type.

Examples

  • Extract a boolean value from a field.

    • Sample field

      false 
    • Query and analysis statement (Test)

      *| select regexp_extract_bool('false', '[a-zA-Z]+')
    • Query and analysis results

      Returns a single _col0 column with one row of data. The value is false.

regexp_extract_long function

Extracts a substring matching a regular expression and converts it to BIGINT. Returns null if the conversion fails.

Syntax

  • Extracts the first match and converts it to BIGINT.

    regexp_extract_long(x, regular expression)
  • Converts the specified capturing group from the first match to BIGINT.

    regexp_extract_long(x, regular expression, n)

Parameters

Parameter

Description

x

The source string. This parameter is of the VARCHAR type.

regular expression

A regular expression that contains capturing groups. For example, (\d)(\d)(\d) contains three capturing groups.

n

The index of the capturing group to extract. The index is a 1-based integer.

Return value type

BIGINT

Examples

  • Extract the number from the time field.

    • Sample field

      time:19/Dec/2024:06:16:06
    • Query statement (Test)

      *|SELECT regexp_extract_long(time, '(\d{2})/', 1) 
    • Query and analysis results

      Returns a column named _col0 that contains two rows, each with the value 19.

regexp_extract_double function

Extracts a substring matching a regular expression and converts it to DOUBLE. Returns null if the conversion fails.

Syntax

  • Converts the first match to DOUBLE. Returns null on failure.

    regexp_extract_double(x, regular expression)
  • Converts the specified capturing group from the first match to DOUBLE. Returns null on failure.

    regexp_extract_double(x, regular expression, n)

Parameters

Parameter

Description

x

The source string. Its type must be VARCHAR.

regular expression

A regular expression that contains capturing groups. For example, (\d)(\d)(\d) specifies three capturing groups.

n

The index of the capturing group to extract. The index is an integer that starts at 1.

Return value type

double

Examples

  • Extract the floating-point number from the server_protocol field.

    • Sample field

      server_protocol:HTTP/1.1
    • Query and analysis statement (Test)

      *|SELECT regexp_extract_double(server_protocol, '\d+\.\d+') 
    • Query and analysis results

      Returns a column named _col0. This column contains two rows, each with the value 1.1.

regexp_extract_map function

Extracts substrings matching capturing groups in a regular expression and returns them as a key-value map.

Syntax

  • Creates a map by pairing the provided keys with substrings matched by the corresponding capturing groups.

regexp_extract_map(x, regular expression, keys)
  • Creates a map by using a regular expression with two capturing groups that match the keys and values, respectively.

regexp_extract_map(x, regular expression)

Parameters

Parameter

Description

x

The source string to search. The data type is VARCHAR.

regular expression

A regular expression that contains capturing groups. For example, (\d)(\d)(\d) specifies three capturing groups.

keys

The key names for the captured substrings. The data type is ARRAY(VARCHAR). The number of keys must match the number of capturing groups in the regular expression.

Return value type

MAP(VARCHAR, VARCHAR)

Examples

  • Example 1: Extract the protocol name and version from the server_protocol field.

    • Sample field

      server_protocol: 'HTTP/2.0'
    • Query statement

      select regexp_extract_map(server_protocol, '(\w+)/([\d\.]+)', array['name', 'version']) as protocol
    • Query and analysis results

      The query returns the following map in the protocol column: {"name":"HTTP","version":"2.0"}.

  • Example 2: Extract all key-value pairs from the content field.

    • Sample field

      content: 'verb="GET" URI="/healthz" latency="45.911µs" userAgent="kube-probe/1.30+"'
    • Query statement

      select regexp_extract_map(content, '(\w+)="([^"]*)"') as args
    • Output data

      {"latency":"45.911µs","verb":"GET","userAgent":"kube-probe/1.30+","URI":"/healthz"}

regexp_like function

Checks whether a string matches a regular expression.

Syntax

regexp_like(x, regular expression)

Parameters

Parameter

Description

x

The source string. Must be of type VARCHAR.

regular expression

The regular expression pattern to match.

Return value type

BOOLEAN

Examples

SQL

Check if the server_protocol field contains digits.

  • Sample field

    server_protocol:HTTP/2.0
  • Query and analysis statement (Test)

    *| select regexp_like(server_protocol, '\d+')
  • The query returns two rows. Both values are true. This indicates that the value of the server_protocol field matches the \d+ regular expression (contains digits).

SPL

Check if the server_protocol field contains digits.

  • Sample field

server_protocol:HTTP/2.0
  • Query and analysis statement

* |extend a = regexp_like(server_protocol, '\d+')
  • Query and analysis results

1  12-18 19:34:20  1.2.3.4
        a: true

regexp_replace function

Removes or replaces substrings matching a regular expression.

Syntax

  • Removes all substrings from a source string that match a regular expression.

    regexp_replace(x, regular expression)
  • Replaces all substrings that match a regular expression with a replacement string.

    regexp_replace(x, regular expression, replace string)

Parameters

Parameter

Description

x

The source string to search. This parameter must be of the VARCHAR type.

regular expression

The regular expression pattern.

replace string

The replacement string. If omitted, matched substrings are removed.

Return value type

VARCHAR

Examples

SQL

  • Example 1: In the region field, replace region names that start with cn with China, and then count the number of requests from China.

    • Sample field

      region:cn-shanghai
    • Query and analysis statement (Test)

      * | select regexp_replace(region, 'cn.*','China') AS region, count(*) AS count GROUP BY region
    • Query and analysis results: The query returns a single row where the region column is China and the count column is 168871.

  • Example 2: Remove the version number from the server_protocol field and count requests per protocol.

    • Sample field

      server_protocol:HTTP/2.0
    • Query and analysis statement (Test)

      *| select regexp_replace(server_protocol, '.\d+') AS server_protocol, count(*) AS count GROUP BY server_protocol
    • Query and analysis results: The query returns a single row where server_protocol is HTTP and count is 168871.

SPL

  • Example 1: Replace all region names in the region field value that start with 'cn' with China.

    • Sample field

region:cn-shanghai
  • Query and analysis statement

* | extend a = regexp_replace(region, 'cn.*','China')
  • Query and analysis results

    1  12-18 19:34:20  1.2.3.4
       a: China
  • Example 2: Delete the version number from the server_protocol field value.

    • Sample field

server_protocol:HTTP/2.0
  • Query and analysis statement (Test)

* | extend a = regexp_replace(server_protocol, '.\d+')
  • Query and analysis results

The query returns a log entry that includes a timestamp, the destination IP address 1.2.3.4, and the server_protocol field with the version number removed.

regexp_split function

Splits a source string into an array of substrings using a regular expression as the delimiter.

Syntax

regexp_split(x, regular expression)

Parameters

Parameter

Description

x

The source string. Its data type is VARCHAR.

regular expression

The regular expression pattern used as the delimiter.

Return value type

array

Example

Use a forward slash (/) to split the value of the request_uri field.

  • Sample field

    request_uri:/request/path-0/file-7
  • Query and analysis statement (Test)

    * | SELECT regexp_split(request_uri,'/')
  • The query and analysis results are ["","request","path-0","file-7"].