Token-based wildcard query

更新时间: 2026-01-20 21:42:55

In wildcard query (WildcardQuery) scenarios that use the *word* pattern, you can combine fuzzy tokenization and a match phrase query to perform high-performance fuzzy queries.

Background information

Fuzzy queries are a common requirement in database operations, such as querying filenames or mobile phone numbers. To perform fuzzy queries in Tablestore, you can use a wildcard query on a search index, which is similar to the LIKE function in MySQL. However, wildcard queries have limitations. The search query cannot exceed 32 characters, and performance degrades as the data volume increases.

To address these issues, search index supports token-based wildcard queries for improved fuzzy query performance. When you use a token-based wildcard query, there is no limit on the length of the search query. However, the original content is limited to a maximum of 1,024 characters. Content that exceeds this limit is truncated, and only the first 1,024 characters are retained.

Scenarios

Choose the appropriate method to perform fuzzy queries based on your scenario.

  • For wildcard query scenarios that use the *word* pattern, you can use a token-based wildcard query. For example, you can use this method to find mobile phone numbers that contain 123 at any position by searching for "123".

    In most cases, this method improves performance by more than 10 times compared to a standard wildcard query.

    Assume a data table contains a file_name column. In the search index, this column is of the Text type, and the analyzer is set to Fuzzy Analyzer. To find the row where the file_name column value is 2021 woRK@Hangzhou, you must use a match phrase query (MatchPhraseQuery). The search query must be a consecutive substring of the original text.

    • If the search query is 2021, 20, 21, work, WORK, @, Hang, zhou, Hangzhou, or @Hangzhou, the query matches the row where the file_name column value is 2021 woRK@Hangzhou.

    • If the search query is 21work, 2021Hangzhou, 2120, or #Hangzhou, the query does not match the row where the file_name column value is 2021 woRK@Hangzhou.

  • For other complex query scenarios, you can use a wildcard query to perform fuzzy queries. For more information, see Wildcard query.

Usage

Follow these steps to perform a fuzzy query using the token-based method:

  1. When you create a search index, set the column type to Text and the analyzer to Fuzzy Analyzer. You can use the default configurations for the other parameters. For more information, see Create a search index.

    Note

    If a search index already exists, you can use the dynamic schema modification feature to add a virtual column for the specified column. Set the virtual column type to Text and the analyzer to Fuzzy Analyzer. For more information, see Dynamically modify a schema and Virtual columns.

  2. When you query data using the search index, use a MatchPhraseQuery. For more information, see Match phrase query.

Appendix: Use case

The following example demonstrates how to use a token-based wildcard query to perform a fuzzy query.

import com.alicloud.openservices.tablestore.SyncClient;
import com.alicloud.openservices.tablestore.model.ColumnValue;
import com.alicloud.openservices.tablestore.model.PrimaryKey;
import com.alicloud.openservices.tablestore.model.PrimaryKeyBuilder;
import com.alicloud.openservices.tablestore.model.PrimaryKeyValue;
import com.alicloud.openservices.tablestore.model.PutRowRequest;
import com.alicloud.openservices.tablestore.model.RowPutChange;
import com.alicloud.openservices.tablestore.model.search.CreateSearchIndexRequest;
import com.alicloud.openservices.tablestore.model.search.CreateSearchIndexResponse;
import com.alicloud.openservices.tablestore.model.search.FieldSchema;
import com.alicloud.openservices.tablestore.model.search.FieldType;
import com.alicloud.openservices.tablestore.model.search.IndexSchema;
import com.alicloud.openservices.tablestore.model.search.SearchQuery;
import com.alicloud.openservices.tablestore.model.search.SearchRequest;
import com.alicloud.openservices.tablestore.model.search.SearchResponse;
import com.alicloud.openservices.tablestore.model.search.query.QueryBuilders;

import java.util.Arrays;
import java.util.Collections;

import static org.junit.Assert.assertEquals;


public class TestFuzzy {
    private static final String tableName = "analysis_test";
    private static final String indexName = "analysis_test_index";


    public void testFuzzyMatchPhrase(SyncClient client) throws Exception {
        // Define the table schema.
        IndexSchema indexSchema = new IndexSchema();
        indexSchema.setFieldSchemas(Collections.singletonList(
                // Note: If the name field was originally of the Keyword type, queries may fail after you change the field type to Text and configure an analyzer.
                // To retain both Keyword and Text types, see the example for the virtual column feature. If you only need to use the name field to perform queries that match *abc*, use only the Text type field. The Keyword type is not required.
                new FieldSchema("name", FieldType.TEXT).setAnalyzer(FieldSchema.Analyzer.Fuzzy)
        ));
        // Create a search index.
        {
            CreateSearchIndexRequest request = new CreateSearchIndexRequest();
            request.setTableName(tableName);
            request.setIndexName(indexName);
            request.setIndexSchema(indexSchema);
            CreateSearchIndexResponse response = client.createSearchIndex(request);
        }

        // Write a row of data.
        PrimaryKey primaryKey = PrimaryKeyBuilder.createPrimaryKeyBuilder()
                .addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString("1"))
                .addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(1))
                .addPrimaryKeyColumn("pk3", PrimaryKeyValue.fromBinary(new byte[]{1, 2, 3}))
                .build();
        RowPutChange rowPutChange = new RowPutChange(tableName, primaryKey);
        // Write an attribute column.
        rowPutChange.addColumn("name", ColumnValue.fromString("Tuner1024x768P.mp4"));
        PutRowRequest request = new PutRowRequest(rowPutChange);
        client.putRow(request);

        // Wait for the data to be synchronized to the search index.
        Thread.sleep(1000 * 180);

        // Demonstrate the query feature that matches *abc*.
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tun", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tun er", 0);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner102", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x7", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "Tuner1024x768P.mp4", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "24x768P.mp4", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "24x76 8P.mp4", 0);
        assertMatchPhraseQuery(client, tableName, indexName, "name", "24x7 P.mp4", 0);
    }

    // Use a virtual column.
    public void testFuzzyMatchPhraseWithVirtualField(SyncClient client) throws Exception {
        // Define the table schema.
        IndexSchema indexSchema = new IndexSchema();
        indexSchema.setFieldSchemas(Arrays.asList(
                // The source field is of the Keyword type, which is convenient for exact match queries.
                new FieldSchema("name", FieldType.KEYWORD).setIndex(true).setStore(true),
                // Create a virtual column named name_virtual_text. Set the type of the virtual column to Text and the analyzer to Fuzzy. The source of this virtual column is the name field.
                new FieldSchema("name_virtual_text", FieldType.TEXT).setIndex(true).setAnalyzer(FieldSchema.Analyzer.Fuzzy).setVirtualField(true).setSourceFieldName("name")
        ));
        // Create a search index.
        {
            CreateSearchIndexRequest request = new CreateSearchIndexRequest();
            request.setTableName(tableName);
            request.setIndexName(indexName);
            request.setIndexSchema(indexSchema);
            CreateSearchIndexResponse response = client.createSearchIndex(request);
        }

        // Write a row of data.
        PrimaryKey primaryKey = PrimaryKeyBuilder.createPrimaryKeyBuilder()
                .addPrimaryKeyColumn("pk1", PrimaryKeyValue.fromString("1"))
                .addPrimaryKeyColumn("pk2", PrimaryKeyValue.fromLong(1))
                .addPrimaryKeyColumn("pk3", PrimaryKeyValue.fromBinary(new byte[]{1, 2, 3}))
                .build();
        RowPutChange rowPutChange = new RowPutChange(tableName, primaryKey);
        // Write an attribute column.
        rowPutChange.addColumn("name", ColumnValue.fromString("Tuner1024x768P.mp4"));
        PutRowRequest request = new PutRowRequest(rowPutChange);
        client.putRow(request);

        // Wait for the data to be synchronized to the search index.
        Thread.sleep(1000 * 180);

        // Demonstrate the query scenario that matches *abc*.
        // Note that the query field is the virtual column name_virtual_text, not name.
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tun", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tun er", 0);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner102", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x7", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "Tuner1024x768P.mp4", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x768P.mp4", 1);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x76 8P.mp4", 0);
        assertMatchPhraseQuery(client, tableName, indexName, "name_virtual_text", "24x7 P.mp4", 0);
    }

    // Show how to implement MatchPhraseQuery.
    public static void assertMatchPhraseQuery(SyncClient client, String tableName, String indexName, String fieldName, String searchContent, long exceptCount) {
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.setTableName(tableName);
        searchRequest.setIndexName(indexName);
        SearchQuery searchQuery = new SearchQuery();
        // Use MatchPhraseQuery to query the tokenized field.
        searchQuery.setQuery(QueryBuilders.matchPhrase(fieldName, searchContent).build());
        searchQuery.setLimit(0);
        // To demonstrate the feature, this example returns the total number of matched rows. If you do not need the total count, set this to false for better performance.
        searchQuery.setGetTotalCount(true);
        searchRequest.setSearchQuery(searchQuery);
        SearchResponse response = client.search(searchRequest);
        assertEquals(String.format("field:[%s], searchContent:[%s]", fieldName, searchContent), exceptCount, response.getTotalCount());
    }
}

FAQ

References

上一篇: Suffix query 下一篇: Geo query
阿里云首页 表格存储 相关技术圈