Match phrase query
A match phrase query is similar to a match query, except that a match phrase query evaluates the positions of tokens. A row meets the query conditions only if the order and positions of the tokens in the row match the order and positions of the tokens that are contained in the keyword. If the tokenization method for the field that you want to query is fuzzy tokenization, match phrase query is performed at a lower latency than wildcard query.
Scenarios
You can use match phrase query to search for data that contains a specific phrase in which the words are arranged in a specific order. You can use match phrase query together with tokenization to perform full-text search in specific scenarios, such as big data analysis, content search, and personalized recommendation. For example, you can query sentences that contain a specific phrase in content search and locate messages that are arranged in a specific sequence in chat records.
Features
A match phrase query uses approximate matches to query data and evaluates the positions of tokens. For example, the value in the column of the TEXT type is "Hangzhou West Lake Scenic Area" in a row and the keyword you specify is "Hangzhou Scenic Area". Tablestore returns the row when you use match query. However, when you use match phrase query, Tablestore does not return the row. The distance between "Hangzhou" and "Scenic Area" in the keyword is 0, but the distance in the column of this row is 2 because the two words "West" and "Lake" exist between "Hangzhou" and "Scenic Area".
When you use match phrase query, you must specify the name of the field that you want to query and the keyword. A row meets the query conditions only if the order and positions of the tokens in the row match the order and positions of the tokens that are contained in the keyword.
When you perform a match phrase query, you can specify the weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score, the columns that you want to return, whether to return the total number of rows that meet the query conditions, and the method that is used to sort the returned rows.
API operation
You can call the Search or ParallelScan operation and set the query type to MatchPhraseQuery to perform a match phrase query.
Parameters
|
Parameter |
Description |
|
fieldName |
The name of the field that you want to match. You can perform match phrase queries on TEXT fields. |
|
text |
The keyword that is used to match the value of the field when you perform a match phrase query. If the field that you want to match is a TEXT field, the keyword is tokenized into multiple tokens based on the analyzer type that you specify when you create the search index. If you do not specify the analyzer type when you create the search index, single-word tokenization is performed. For more information, see Tokenization. For example, if you perform a match phrase query by using the phrase "this is", "..., this is tablestore" and "this is a table" are returned. "this table is ..." or "is this a table" is not returned. |
|
query |
The type of the query. Set the query parameter to matchPhraseQuery. |
|
offset |
The position from which the current query starts. |
|
limit |
The maximum number of rows that you want the current query to return. To query only the number of rows that meet the query conditions without specific data, set the limit parameter to 0. |
|
getTotalCount |
Specifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which specifies that the total number of rows that meet the query conditions is not returned. If you set this parameter to true, the query performance is compromised. |
|
weight |
The weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score. This parameter is used in full-text search scenarios. If you specify a higher weight for the field that you want to query, the BM25-based keyword relevance score for the field is higher. The value of this parameter is a positive floating point number. This parameter does not affect the number of rows that are returned. However, this parameter affects the BM25-based keyword relevance scores of the query results. |
|
tableName |
The name of the data table. |
|
indexName |
The name of the search index. |
|
columnsToGet |
Specifies whether to return all columns of each row that meets the query conditions. You can specify the returnAll and columns fields for the columnsToGet parameter. The default value of the returnAll field is false, which specifies that not all columns are returned. In this case, you can use the columns field to specify the columns that you want to return. If you do not specify the columns that you want to return, only the primary key columns are returned. If you set the returnAll field to true, all columns are returned. |
Notes
Search Index provides only basic BM25 relevance scoring and does not support custom relevance models.
Methods
You can use the Tablestore console, Tablestore CLI, or Tablestore SDKs to perform a match phrase query.
Before you perform a match phrase query, make sure that the following preparations are made:
Use an Alibaba Cloud account or a RAM user with the required permissions for Table Store operations. To grant permissions to a RAM user, see Grant permissions to a RAM user by using a RAM policy.
If you use an SDK or a command-line tool, create an AccessKey for your Alibaba Cloud account or RAM user if you do not have one.
You have created a data table.
A Search Index has been created for the data table.
If you use an SDK, initialize the Tablestore Client.
If you use the command-line tool, download and start the tool, then configure the connection to your instance and select the target table. For more information, see Download the command-line tool, Start the tool and configure connection information, and Data table operations.
Billing
In VCU mode (formerly reserved mode), Search Index queries consume VCU compute resources. In CU mode (formerly pay-as-you-go mode), they consume read throughput. For more information, see Search Index metering and billing.
FAQ
References
Search Index supports various query types for multi-dimensional data queries, including term query, terms query, match all query, match query, phrase match query, range query, prefix query, suffix query, wildcard query, token-based wildcard query, boolean query, geo query, nested query, vector search, and exists query.
When you query data, you can sort and paginate the result set or perform collapsing (deduplication).
For data analysis, such as finding the maximum or minimum value, calculating a sum, or counting rows, you can use the statistical aggregation or SQL query features.
To quickly export data regardless of the result set order, you can use the Parallel Scan feature.