SEARCH function

更新时间:
复制 MD 格式

SEARCH is a Boolean predicate for full-text search in a WHERE clause. It accepts a Domain-Specific Language (DSL) expression and pushes matching conditions down to an inverted index for accelerated execution.

Syntax

SEARCH('<search_expression>')
SEARCH('<search_expression>', '<default_field>')
SEARCH('<search_expression>', '<default_field>', '<default_operator>')
ParameterRequiredDescription
<search_expression>YesA SEARCH DSL query expression — a string literal
<default_field>NoColumn name to match against when a DSL term does not specify a field
<default_operator>NoDefault Boolean operator for bare multi-term expressions. Accepts and or or (case-insensitive). Default: or

Return type: BOOLEAN — TRUE when the row matches.

Query types at a glance

Each query type has different index requirements. Choosing the right type ensures condition pushdown to the inverted index.

Query typeSyntaxSemanticsIndex requirement
Termcol:termMatches a single term in tokenized resultsAny inverted index
ANYcol:ANY(t1 t2 ...)Matches rows containing any listed term (OR)Tokenizing index (english, chinese, unicode)
ALLcol:ALL(t1 t2 ...)Matches rows containing all listed terms (AND)Tokenizing index (english, chinese, unicode)
Phrasecol:"phrase"Matches terms in exact orderTokenizing index with position info (support_phrase)
Wildcardcol:prefix*, col:*mid*, col:?ingle* = any length, ? = one characterNon-tokenizing index (or tokenizing with lower_case)
Regexcol:/regex/Lucene-style regular expressionNon-tokenizing index only
EXACTcol:EXACT(text)Matches the entire column value, case-sensitiveNon-tokenizing index (no parser)
Multi-columncol1:term OR col2:ANY(...)Matches across multiple columnsPer-column inverted index
Variant sub-columnvariant_col.sub.path:termMatches a VARIANT sub-column by dot pathInverted index on the VARIANT column

Boolean combinations (AND, OR, NOT) work across all query types.

Supported syntax

Term queries

Match a single term in the tokenized results of a column. Case sensitivity depends on the lower_case index property. Create an inverted index with a suitable parser or analyzer for the column.

SELECT id, title FROM search_test_basic WHERE SEARCH('title:Machine');
SELECT id, title FROM search_test_basic WHERE SEARCH('title:Python');
SELECT id, title FROM search_test_basic WHERE SEARCH('category:Technology');

ANY queries

Match rows where the column contains any of the listed terms (OR semantics). Order and duplicates are ignored.

SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python javascript)');
SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(machine learning tutorial)');

-- A single-term ANY is equivalent to a term query.
SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ANY(python)');

ALL queries

Match rows where the column contains all listed terms (AND semantics). Order and duplicates are ignored.

SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(machine learning)');
SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(programming tutorial)');

-- A single-term ALL is equivalent to a term query.
SELECT id, title FROM search_test_basic WHERE SEARCH('tags:ALL(python)');

Boolean operations

Combine sub-expressions using AND, OR, and NOT. Explicit Boolean operators inside the DSL have the highest precedence.

Place all matchable conditions inside SEARCH to benefit from condition pushdown. Use standard WHERE conditions for non-indexed filters.

SELECT id, title FROM search_test_basic
WHERE SEARCH('title:Machine AND category:Technology');

SELECT id, title FROM search_test_basic
WHERE SEARCH('title:Python OR title:Data');

SELECT id, title FROM search_test_basic
WHERE SEARCH('category:Technology AND NOT title:Machine');

Complex nested expressions

Use parentheses to control Boolean precedence. Multilayer nesting is supported.

SELECT id, title FROM search_test_basic
WHERE SEARCH('(title:Machine OR title:Python) AND category:Technology');

SELECT id, title FROM search_test_basic
WHERE SEARCH('tags:ANY(python javascript) AND (category:Technology OR category:Programming)');

Phrase queries

Match a continuous, ordered sequence of terms based on the column's analyzer. Enclose the phrase in double quotation marks.

The target column must have a tokenizing inverted index with position information. To do this, configure a parser.

SELECT id, title FROM search_test_basic
WHERE SEARCH('content:"machine learning"');

Multi-column searches

Match across multiple columns in a single expression. Each column's behavior is based on its own index and tokenization configuration.

SELECT id, title FROM search_test_basic
WHERE SEARCH('title:Python OR tags:ANY(database mysql) OR author:Alice');

SELECT id, title FROM search_test_basic
WHERE SEARCH('tags:ALL(tutorial) AND category:Technology');

Wildcard queries

Use * to match a string of any length and ? to match a single character.

Wildcard queries work best with non-tokenizing indexes. On tokenizing indexes, enable lower_case for case-insensitive matching.

SELECT id, title FROM search_test_basic
WHERE SEARCH('firstname:Chris*');

-- Use the default field parameter with a wildcard.
SELECT id, firstname FROM people
WHERE SEARCH('Chris*', 'firstname');

Regular expression queries

Match using Lucene-style regular expressions. Enclose the pattern in forward slashes.

Only non-tokenizing inverted indexes support regular expression queries.

SELECT id, title FROM corpus
WHERE SEARCH('title:/data.+science/');

EXACT queries

Match the entire value of a column exactly. The match is case-sensitive and does not match partial terms.

Create a non-tokenizing inverted index on the column — do not set a parser.

SELECT id
FROM t
WHERE SEARCH('content:EXACT(machine learning)');

Variant sub-column queries

Access and match a VARIANT sub-column using dot notation. Non-existent sub-columns return no match.

Supports Boolean combinations, ANY/ALL, and nested paths.

SELECT id
FROM test_variant_search_subcolumn
WHERE SEARCH('properties.message:alpha');

Default field and default operator

When a bare term in the DSL has no explicit field prefix, SEARCH expands it to the default_field:

  • SEARCH('foo bar', 'tags') expands to tags:ANY(foo bar) (default operator: or)

  • SEARCH('foo bar', 'tags', 'and') expands to tags:ALL(foo bar)

Explicit Boolean operations in the DSL always take precedence over the default_operator.

NULL handling

SEARCH follows SQL three-value logic:

  • If all column values in the match are NULL, the result is UNKNOWN and is filtered out by WHERE.

  • Boolean short-circuit rules apply when combined with other sub-expressions: TRUE OR NULL = TRUE, FALSE OR NULL = NULL, NOT NULL = NULL.

Examples

The following examples show how different query types behave on the same table with two indexes: a non-tokenizing index and a tokenizing index.

-- Create both a non-tokenizing and a tokenizing inverted index on the same column.
CREATE TABLE t (
  id INT,
  content STRING,
  INDEX idx_untokenized(content) USING INVERTED,
  INDEX idx_tokenized(content)  USING INVERTED PROPERTIES("parser" = "standard")
);

-- EXACT matches the full value using the non-tokenizing index.
SELECT id, content FROM t WHERE SEARCH('content:EXACT(machine learning)') ORDER BY id;

-- EXACT does not match partial values.
SELECT id, content FROM t WHERE SEARCH('content:EXACT(machine)') ORDER BY id;

-- ANY and ALL use the tokenizing index.
SELECT id, content FROM t WHERE SEARCH('content:ANY(machine learning)') ORDER BY id;
SELECT id, content FROM t WHERE SEARCH('content:ALL(machine learning)') ORDER BY id;

-- Compare EXACT vs. ANY on the same terms.
SELECT id, content FROM t WHERE SEARCH('content:EXACT(deep learning)') ORDER BY id;
SELECT id, content FROM t WHERE SEARCH('content:ANY(deep learning)') ORDER BY id;

-- Combine EXACT and ANY in a single expression.
SELECT id, content
FROM t
WHERE SEARCH('content:EXACT(machine learning) OR content:ANY(intelligence)')
ORDER BY id;

-- Use the default field and default operator as a shorthand.
-- SEARCH('deep learning', 'tags', 'and') expands to tags:ALL(deep learning).
SELECT id, tags
FROM tag_dataset
WHERE SEARCH('deep learning', 'tags', 'and');

-- Combine a phrase query and a wildcard.
SELECT id, content FROM t
WHERE SEARCH('content:"deep learning" OR content:AI*')
ORDER BY id;

VARIANT column example

CREATE TABLE test_variant_search_subcolumn (
  id BIGINT,
  properties VARIANT<PROPERTIES("variant_max_subcolumns_count"="0")>,
  INDEX idx_properties (properties) USING INVERTED PROPERTIES (
    "parser" = "unicode",
    "lower_case" = "true",
    "support_phrase" = "true"
  )
);

-- Term query on a sub-column.
SELECT id
FROM test_variant_search_subcolumn
WHERE SEARCH('properties.message:alpha')
ORDER BY id;

-- AND query across the same sub-column.
SELECT id
FROM test_variant_search_subcolumn
WHERE SEARCH('properties.message:alpha AND properties.message:beta')
ORDER BY id;

-- ALL is equivalent to the AND form above.
SELECT id
FROM test_variant_search_subcolumn
WHERE SEARCH('properties.message:ALL(alpha beta)')
ORDER BY id;

-- OR across different sub-columns.
SELECT id
FROM test_variant_search_subcolumn
WHERE SEARCH('properties.message:hello OR properties.category:beta')
ORDER BY id;

Limitations

Range and list clauses, such as field:<range> and field:IN(...), are still degraded to plain term matching. Use standard SQL range or IN filters instead:

-- Range filter using SQL.
SELECT * FROM t WHERE created_at >= '2024-01-01';