Token-based wildcard query
In wildcard query (WildcardQuery) scenarios that use the *word* pattern, you can combine fuzzy tokenization and a match phrase query to perform high-performance fuzzy queries.
Background information
Fuzzy queries are a common requirement in database operations, such as querying filenames or mobile phone numbers. To perform fuzzy queries in Tablestore, you can use a wildcard query on a search index, which is similar to the LIKE function in MySQL. However, wildcard queries have limitations. The search query cannot exceed 32 characters, and performance degrades as the data volume increases.
To address these issues, search index supports token-based wildcard queries for improved fuzzy query performance. When you use a token-based wildcard query, there is no limit on the length of the search query. However, the original content is limited to a maximum of 1,024 characters. Content that exceeds this limit is truncated, and only the first 1,024 characters are retained.
Scenarios
Choose the appropriate method to perform fuzzy queries based on your scenario.
For wildcard query scenarios that use the
*word*pattern, you can use a token-based wildcard query. For example, you can use this method to find mobile phone numbers that contain123at any position by searching for"123".In most cases, this method improves performance by more than 10 times compared to a standard wildcard query.
Assume a data table contains a file_name column. In the search index, this column is of the Text type, and the analyzer is set to Fuzzy Analyzer. To find the row where the file_name column value is
2021 woRK@Hangzhou, you must use a match phrase query (MatchPhraseQuery). The search query must be a consecutive substring of the original text.If the search query is
2021,20,21,work,WORK,@,Hang,zhou,Hangzhou, or@Hangzhou, the query matches the row where the file_name column value is2021 woRK@Hangzhou.If the search query is
21work,2021Hangzhou,2120, or#Hangzhou, the query does not match the row where the file_name column value is2021 woRK@Hangzhou.
For other complex query scenarios, you can use a wildcard query to perform fuzzy queries. For more information, see Wildcard query.
Usage
Follow these steps to perform a fuzzy query using the token-based method:
When you create a search index, set the column type to Text and the analyzer to Fuzzy Analyzer. You can use the default configurations for the other parameters. For more information, see Create a search index.
NoteIf a search index already exists, you can use the dynamic schema modification feature to add a virtual column for the specified column. Set the virtual column type to Text and the analyzer to Fuzzy Analyzer. For more information, see Dynamically modify a schema and Virtual columns.
When you query data using the search index, use a MatchPhraseQuery. For more information, see Match phrase query.
Appendix: Use case
The following example demonstrates how to use a token-based wildcard query to perform a fuzzy query.
import com.alicloud.openservices.tablestore.SyncClient;
import com.alicloud.openservices.tablestore.model.ColumnValue;
import com.alicloud.openservices.tablestore.model.PrimaryKey;
import com.alicloud.openservices.tablestore.model.PrimaryKeyBuilder;
import com.alicloud.openservices.tablestore.model.PrimaryKeyValue;
import com.alicloud.openservices.tablestore.model.PutRowRequest;
import com.alicloud.openservices.tablestore.model.RowPutChange;
import com.alicloud.openservices.tablestore.model.search.CreateSearchIndexRequest;
import com.alicloud.openservices.tablestore.model.search.CreateSearchIndexResponse;
import com.alicloud.openservices.tablestore.model.search.FieldSchema;
import com.alicloud.openservices.tablestore.model.search.FieldType;
import com.alicloud.openservices.tablestore.model.search.IndexSchema;
import com.alicloud.openservices.tablestore.model.search.SearchQuery;
import com.alicloud.openservices.tablestore.model.search.SearchRequest;
import com.alicloud.openservices.tablestore.model.search.SearchResponse;
import com.alicloud.openservices.tablestore.model.search.query.QueryBuilders;
import java.util.Arrays;
import java.util.Collections;
import static org.junit.Assert.assertEquals;
public class TestFuzzy {
private static final String tableName = "analysis_test";
private static final String indexName = "analysis_test_index";
public void testFuzzyMatchPhrase(SyncClient client) throws Exception {
// Define the table schema.
IndexSchema indexSchema = new IndexSchema();
indexSchema.setFieldSchemas(Collections.singletonList(
// Note: If the name field was originally of the Keyword type, queries may fail after you change the field type to Text and configure an analyzer.
// To retain both Keyword and Text types, see the example for the virtual column feature. If you only need to use the name field to perform queries that match *abc*, use only the Text type field. The Keyword type is not required.
new FieldSchema("name", FieldType.TEXT).setAnalyzer(FieldSchema.Analyzer.Fuzzy)
));
// Create a search index.
{
CreateSearchIndexRequest request = new CreateSearchIndexRequest();
request.setTableName(tableName);
request.setIndexName(indexName);
request.setIndexSchema(indexSchema);
CreateSearchIndexResponse response = client.createSearchIndex(request);
}
// Write a row of data.
PrimaryKey primaryKey = PrimaryKeyBuilder.createPrimaryKeyBuilder()
.addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString("1"))
.addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(1))
.addPrimaryKeyColumn("pk3", PrimaryKeyValue.fromBinary(new byte[]{1, 2, 3}))
.build();
RowPutChange rowPutChange = new RowPutChange(tableName, primaryKey);
// Write an attribute column.
rowPutChange.addColumn("name", ColumnValue.fromString("Tuner1024x768P.mp4"));
PutRowRequest request = new PutRowRequest(rowPutChange);
client.putRow(request);
// Wait for the data to be synchronized to the search index.
Thread.sleep(1000 * 180);
// Demonstrate the query feature that matches *abc*.
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tun", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tun er", 0);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner102", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x7", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x768P.mp4", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "24x768P.mp4", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name", "24x76 8P.mp4", 0);
assertMatchPhraseQuery(client, tableName, indexName, "name", "24x7 P.mp4", 0);
}
// Use a virtual column.
public void testFuzzyMatchPhraseWithVirtualField(SyncClient client) throws Exception {
// Define the table schema.
IndexSchema indexSchema = new IndexSchema();
indexSchema.setFieldSchemas(Arrays.asList(
// The source field is of the Keyword type, which is convenient for exact match queries.
new FieldSchema("name", FieldType.KEYWORD).setIndex(true).setStore(true),
// Create a virtual column named name_virtual_text. Set the type of the virtual column to Text and the analyzer to Fuzzy. The source of this virtual column is the name field.
new FieldSchema("name_virtual_text", FieldType.TEXT).setIndex(true).setAnalyzer(FieldSchema.Analyzer.Fuzzy).setVirtualField(true).setSourceFieldName("name")
));
// Create a search index.
{
CreateSearchIndexRequest request = new CreateSearchIndexRequest();
request.setTableName(tableName);
request.setIndexName(indexName);
request.setIndexSchema(indexSchema);
CreateSearchIndexResponse response = client.createSearchIndex(request);
}
// Write a row of data.
PrimaryKey primaryKey = PrimaryKeyBuilder.createPrimaryKeyBuilder()
.addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString("1"))
.addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(1))
.addPrimaryKeyColumn("pk3", PrimaryKeyValue.fromBinary(new byte[]{1, 2, 3}))
.build();
RowPutChange rowPutChange = new RowPutChange(tableName, primaryKey);
// Write an attribute column.
rowPutChange.addColumn("name", ColumnValue.fromString("Tuner1024x768P.mp4"));
PutRowRequest request = new PutRowRequest(rowPutChange);
client.putRow(request);
// Wait for the data to be synchronized to the search index.
Thread.sleep(1000 * 180);
// Demonstrate the query scenario that matches *abc*.
// Note that the query field is the virtual column name_virtual_text, not name.
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tun", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tun er", 0);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner102", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x7", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x768P.mp4", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x768P.mp4", 1);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x76 8P.mp4", 0);
assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x7 P.mp4", 0);
}
// Show how to implement MatchPhraseQuery.
public static void assertMatchPhraseQuery(SyncClient client, String tableName, String indexName, String fieldName, String searchContent, long exceptCount) {
SearchRequest searchRequest = new SearchRequest();
searchRequest.setTableName(tableName);
searchRequest.setIndexName(indexName);
SearchQuery searchQuery = new SearchQuery();
// Use MatchPhraseQuery to query the tokenized field.
searchQuery.setQuery(QueryBuilders.matchPhrase(fieldName, searchContent).build());
searchQuery.setLimit(0);
// To demonstrate the feature, this example returns the total number of matched rows. If you do not need the total count, set this to false for better performance.
searchQuery.setGetTotalCount(true);
searchRequest.setSearchQuery(searchQuery);
SearchResponse response = client.search(searchRequest);
assertEquals(String.format("field:[%s], searchContent:[%s]", fieldName, searchContent), exceptCount, response.getTotalCount());
}
}FAQ
References
When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, fuzzy query, Boolean query, geo query, nested query, KNN vector query, and exists query. You can select query methods based on your business requirements to query data from multiple dimensions.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Perform sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (remove duplicates).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.