Search ranking is typically a two-stage process. The first stage, a rough sort, quickly filters the initial search results to find a smaller set of high-quality documents. The second stage, a fine sort, then applies more complex scoring logic to this set to produce the final ranking. A rough sort significantly impacts search performance, while a fine sort has a greater effect on ranking quality. Therefore, a rough sort should be simple and efficient, using only the most critical ranking factors. Both rough sort and fine sort are configured using sort expressions. This topic describes the feature functions that you can use in a rough sort.
Feature functions
static_bm25: Calculate static text relevance
-
Syntax: static_bm25()
-
Parameters: None.
-
Return value: A float value in the range [0.0, 1.0].
-
Scenario: Use this function in a rough sort expression to incorporate the static text relevance score. For example:
static_bm25(). -
Notes:
-
The
static_bm25()function is enabled by default in thedefaultrough sort configuration.
-
When the static_bm25() score can exceed 1.0:
If a query uses analysis features such as synonyms (for example, a query for index:'Apple' is expanded to query=index:'Apple' OR index:'apple'), the static_bm25() score is accumulated for documents that match both 'Apple' and 'apple', which results in a final rough sort score greater than 1.
exact_match_boost: Get the maximum boost weight
-
Syntax: exact_match_boost()
-
Parameters: None.
-
Return value: An integer in the range [0, 99].
-
Scenario: You want to rank documents based on the boost weight of the matching term. For a query like
query=default:'OpenSearch'^60 OR default:'opensearch'^50, a document containing "OpenSearch" will rank higher than a document containing "opensearch". The rough sort expression would be:exact_match_boost(). -
Notes:
-
The fields referenced in the query must be configured as index fields.
-
For query terms without a specified boost, the default boost value is 99.
-
For exclusive applications, when used in a rough sort, the
exact_match_boostfunction supportssumandmaxas optional parameters.
-
timeliness: Calculate a timeliness score
-
Syntax: timeliness(pubtime)
-
Parameter:
pubtime: The field to evaluate. Must be an integer, representing a Unix timestamp in seconds. -
Return value: A float value in the range [0.0, 1.0]. A higher value indicates a more recent document. The function returns 0.0 if the timestamp is in the future.
-
Scenario: To factor in document recency based on the
create_timestampfield, use the expression:timeliness(create_timestamp). -
Notes:
-
The
pubtimefield must be configured as an attribute field.
-
timeliness_ms: Calculate a timeliness score
-
Syntax: timeliness_ms(pubtime)
-
Parameter:
pubtime: The field to evaluate. Must be an integer, representing a Unix timestamp in milliseconds. -
Return value: A float value in the range [0.0, 1.0]. A higher value indicates a more recent document. The function returns 0.0 if the timestamp is in the future.
-
Scenario: To factor in document recency based on the
create_timestampfield, use the expression:timeliness_ms(create_timestamp). -
Notes:
-
The
pubtimefield must be configured as an attribute field.
-
normalize: Normalize a numeric value
-
Overview: In relevance calculations, document quality is measured across different dimensions. The scores from these dimensions often have vastly different ranges. For example, a document's click count could be in the millions, while its text relevance score is between 0.0 and 1.0. These values are not directly comparable. To use them together in a formula, you must first normalize them to a common scale. The
normalizefunction provides a simple way to do this. It supports three methods and automatically selects one based on the provided parameters: arctangent normalization (with only thevalueparameter), log normalization (withvalueandmax), and linear normalization (withvalue,max, andmin). -
Syntax: normalize(value, max, min)
-
Parameters:
-
Return value: A double value in the range [0.0, 1.0].
-
Scenario 1: To normalize the
pricefield when its value range is unknown, use:normalize(price).Scenario 2: To normalize the
pricefield when only its maximum value (e.g., 100) is known, use:normalize(price, 100).Scenario 3: To normalize the
pricefield when both its maximum (100) and minimum (1) values are known, use:normalize(price, 100, 1).Scenario 4: To normalize the result of the
distancefunction to the range [0.0, 1.0], use:normalize(distance(longitude_in_doc, latitude_in_doc, longitude_in_query, latitude_in_query)). -
Notes:
-
Fields used as parameters must be configured as attribute fields.
-
For arctangent normalization, if the
valueis less than 0, the function returns 0.0. -
For log normalization, the
maxvalue must be greater than 1.0. -
For linear normalization, the
maxvalue must be greater than theminvalue.
-
category_score: Category prediction function
Description: This function returns a score indicating how well a document's category matches the query's predicted category.
-
Syntax:
category_score(cate_id) -
Parameter:
cate_id: The field used as the category ID during model training. Must be an integer. -
Return value: An integer in the range [0, 2].
-
Scenario: Use
category_score(cate_id)in a sort expression. For more information, see Use the category prediction feature. -
Notes:
-
This function must be used with the category prediction algorithm.
-