Feature functions available for rough sort-OpenSearch(Open Search)-阿里云帮助中心

Search ranking is typically a two-stage process. The first stage, a rough sort, quickly filters the initial search results to find a smaller set of high-quality documents. The second stage, a fine sort, then applies more complex scoring logic to this set to produce the final ranking. A rough sort significantly impacts search performance, while a fine sort has a greater effect on ranking quality. Therefore, a rough sort should be simple and efficient, using only the most critical ranking factors. Both rough sort and fine sort are configured using sort expressions. This topic describes the feature functions that you can use in a rough sort.

Feature functions

static_bm25: Calculate static text relevance

Syntax: static_bm25()
Parameters: None.
Return value: A float value in the range [0.0, 1.0].
Scenario: Use this function in a rough sort expression to incorporate the static text relevance score. For example: static_bm25().
Notes:
- The static_bm25() function is enabled by default in the default rough sort configuration.

Note

When the static_bm25() score can exceed 1.0:

If a query uses analysis features such as synonyms (for example, a query for index:'Apple' is expanded to query=index:'Apple' OR index:'apple'), the static_bm25() score is accumulated for documents that match both 'Apple' and 'apple', which results in a final rough sort score greater than 1.

exact_match_boost: Get the maximum boost weight

Syntax: exact_match_boost()
Parameters: None.
Return value: An integer in the range [0, 99].
Scenario: You want to rank documents based on the boost weight of the matching term. For a query like query=default:'OpenSearch'^60 OR default:'opensearch'^50, a document containing "OpenSearch" will rank higher than a document containing "opensearch". The rough sort expression would be: exact_match_boost().
Notes:
- The fields referenced in the query must be configured as index fields.
- For query terms without a specified boost, the default boost value is 99.
- For exclusive applications, when used in a rough sort, the exact_match_boost function supports sum and max as optional parameters.

timeliness: Calculate a timeliness score

Syntax: timeliness(pubtime)
Parameter: pubtime: The field to evaluate. Must be an integer, representing a Unix timestamp in seconds.
Return value: A float value in the range [0.0, 1.0]. A higher value indicates a more recent document. The function returns 0.0 if the timestamp is in the future.
Scenario: To factor in document recency based on the create_timestamp field, use the expression: timeliness(create_timestamp).
Notes:
- The pubtime field must be configured as an attribute field.

timeliness_ms: Calculate a timeliness score

Syntax: timeliness_ms(pubtime)
Parameter: pubtime: The field to evaluate. Must be an integer, representing a Unix timestamp in milliseconds.
Return value: A float value in the range [0.0, 1.0]. A higher value indicates a more recent document. The function returns 0.0 if the timestamp is in the future.
Scenario: To factor in document recency based on the create_timestamp field, use the expression: timeliness_ms(create_timestamp).
Notes:
- The pubtime field must be configured as an attribute field.

normalize: Normalize a numeric value

Overview: In relevance calculations, document quality is measured across different dimensions. The scores from these dimensions often have vastly different ranges. For example, a document's click count could be in the millions, while its text relevance score is between 0.0 and 1.0. These values are not directly comparable. To use them together in a formula, you must first normalize them to a common scale. The normalize function provides a simple way to do this. It supports three methods and automatically selects one based on the provided parameters: arctangent normalization (with only the value parameter), log normalization (with value and max), and linear normalization (with value, max, and min).
Syntax: normalize(value, max, min)
Parameters:
Return value: A double value in the range [0.0, 1.0].
Scenario 1: To normalize the price field when its value range is unknown, use: normalize(price).

Scenario 2: To normalize the price field when only its maximum value (e.g., 100) is known, use: normalize(price, 100).

Scenario 3: To normalize the price field when both its maximum (100) and minimum (1) values are known, use: normalize(price, 100, 1).

Scenario 4: To normalize the result of the distance function to the range [0.0, 1.0], use: normalize(distance(longitude_in_doc, latitude_in_doc, longitude_in_query, latitude_in_query)).
Notes:
- Fields used as parameters must be configured as attribute fields.
- For arctangent normalization, if the value is less than 0, the function returns 0.0.
- For log normalization, the max value must be greater than 1.0.
- For linear normalization, the max value must be greater than the min value.

category_score: Category prediction function

Description: This function returns a score indicating how well a document's category matches the query's predicted category.

Syntax:

category_score(cate_id)
Parameter:

cate_id: The field used as the category ID during model training. Must be an integer.
Return value: An integer in the range [0, 2].
Scenario: Use category_score(cate_id) in a sort expression. For more information, see Use the category prediction feature.
Notes:
- This function must be used with the category prediction algorithm.