Hint

更新时间:
复制 MD 格式

OpenSearch Retrieval Engine Edition supports SQL hints — optimizer directives embedded in SQL comments — to fine-tune query execution beyond what the default query planner produces. Hints do not cause statement errors when invalid; the system silently ignores them. To confirm whether a hint took effect, check the execution plan after running the query.

If a hint does not behave as expected, see Conflict resolution for the rules the system applies when hints conflict or are invalid.

Prerequisites

Before you begin, ensure that you have:

  • OpenSearch Retrieval Engine Edition with HA3 version V3.7.3 or later for HASH_JOIN, LOOKUP_JOIN, NORMAL_AGG, NO_INDEX, and ATTR hints

  • HA3 version V3.8.0 or later for LOCAL_PARALLEL hints

Syntax

Place hints inside SQL block comments (/*+ ... */) immediately after SELECT or after the table name in the FROM clause.

SELECT [/*+ HintName(params) */]
  { * | projectItem [, projectItem ]* }
FROM tableExpression [/*+ HintName(params) */]

HintName: [a-zA-Z][a-zA-Z_]*

Params:
    Identifier[, Identifier]
  | Identifier=Identifier[, Identifier=Identifier]

Identifier: [a-zA-Z_][a-zA-Z_0-9]*

Multiple hints can appear in the same comment block, separated by commas:

SELECT /*+ LOOKUP_JOIN(tj_relation), NORMAL_AGG(distributionCheck='false', propScope='all') */
    ...

Built-in hints

OpenSearch Retrieval Engine Edition provides four built-in hints for controlling join algorithms and aggregation behavior.

HASH_JOIN

Forces the HashJoin operator for the specified tables, performing an inner join.

Syntax: HASH_JOIN(tableName1, tableName2, ...)

Example:

SELECT
    /*+ HASH_JOIN(tj_relation) */   -- force hash join at this level
    *
FROM
    (
        SELECT
            /*+ HASH_JOIN(tj_item_raw) */   -- force hash join for the inner subquery
            *
        FROM
            (
                SELECT
                    sum(tj_item.id) as sum0
                FROM
                    tj_item
                GROUP BY
                    tj_item.id
            ) B
            JOIN tj_item_raw ON B.sum0 = tj_item_raw.id
    ) D
    JOIN tj_relation ON D.sum0 = tj_relation.item_id

Both join operations use the hash join algorithm.

LOOKUP_JOIN

Forces the LookupJoin operator for the specified tables, performing an inner join.

Syntax: LOOKUP_JOIN(tableName1, tableName2, ...)

Example:

SELECT
    /*+ LOOKUP_JOIN(tj_relation) */   -- force lookup join at this level
    *
FROM
    (
        SELECT
            /*+ LOOKUP_JOIN(tj_item_raw) */   -- force lookup join for the inner subquery
            *
        FROM
            (
                SELECT
                    sum(tj_item.id) as sum0
                FROM
                    tj_item
                GROUP BY
                    tj_item.id
            ) B
            JOIN tj_item_raw ON B.sum0 = tj_item_raw.id
    ) D
    JOIN tj_relation ON D.sum0 = tj_relation.item_id

Both join operations use the lookup join algorithm.

NORMAL_AGG

Runs the aggregation you define in the first phase of the query, instead of splitting it across phases.

Syntax: NORMAL_AGG(distributionCheck='false'|'true', propScope='all')

ParameterDescriptionDefault
distributionCheckWhether to check data distribution. If set to 'true' and the group key is not the partition key, the hint is ignored.
propScopeScope of hint propagation. Set to 'all' to push the hint to the current aggregation node and all downstream nodes. Omit to push only to the nearest aggregation node.Nearest node

Example:

SELECT
    /*+ LOOKUP_JOIN(tj_relation), NORMAL_AGG(distributionCheck='false', propScope='all') */
    tj_relation.price
FROM
    (
        SELECT
            *
        FROM
            (
                SELECT
                    sum(tj_item.id) as sum0
                FROM
                    tj_item
                GROUP BY              -- (1) first aggregation
                    tj_item.id
            ) B
            JOIN tj_item_raw ON B.sum0 = tj_item_raw.id
    ) D
    JOIN tj_relation ON D.sum0 = tj_relation.item_id
GROUP BY                              -- (2) second aggregation, runs as normal aggregation
    tj_relation.price

The aggregation runs as a single-phase normal aggregation, not split across two phases.

NO_INDEX

Disables index optimization for specific fields during the filter phase. Use this to force full scans on fields where index use degrades performance.

Syntax: NO_INDEX(tableName='<table>', fields='<field1>, <field2>, ...')

ParameterDescription
tableNameName of the table containing the target fields
fieldsComma-separated list of fields for which to disable index optimization

Example:

SELECT
    /*+ LOOKUP_JOIN(tj_relation), NO_INDEX(tableName='tj_relation', fields='pk') */
    *
FROM
    (
        SELECT
            /*+ NO_INDEX(tableName='tj_item', fields='shop_id, reserve_price') */
            *
        FROM
            (
                SELECT
                    SUM(tj_item.id) as sum0
                FROM
                    tj_item
                    /*+ NO_INDEX(tableName='tj_item', fields='shop_id') */
                WHERE
                    tj_item.id = 100
                    AND tj_item.shop_id = 500       -- shop_id is not indexed here
                GROUP BY
                    tj_item.id
            ) B
            JOIN tj_item_raw ON B.sum0 = tj_item_raw.id
    ) D
    JOIN tj_relation /*+ NO_INDEX(tableName='tj_relation', fields='item_id') */
        ON D.sum0 = tj_relation.item_id
WHERE
    tj_relation.item_id = 900 AND tj_relation.pk = 100   -- pk and item_id are not indexed here

ATTR hints

ATTR hints pass auxiliary information directly to operators to accelerate queries. Use them when querying multiple fields or when you need fine-grained control over scan limits, aggregation group sizes, or join batch sizes.

SCAN_ATTR

Passes configuration to the scan operator.

Syntax: SCAN_ATTR(param1='value1', param2='value2', ...)

ParameterDescriptionDefault
localLimitMaximum number of rows the scan operator returns
batchSizeNumber of rows returned per batch. Use with localLimit.Auto-calculated based on table size
nestTableJoinTypeJoin method for nested tables. Valid values: inner, left.left
hashValuesComma-separated field values used to calculate hash values and identify target partitions. For partition-based queries.
partitionIdsIDs of partitions to query. Set to -1 to scan all partitions. For partition-based queries.

Place the SCAN_ATTR hint next to the table name it applies to (recommended):

SELECT
    company_id, company_name
FROM
    company /*+ SCAN_ATTR(localLimit='3', batchSize='2', nestTableJoinType='inner') */

Alternatively, place it after SELECT:

SELECT /*+ SCAN_ATTR(localLimit='3', batchSize='2', nestTableJoinType='inner') */
    company_id, company_name
FROM
    company

Partition-based queries (for debugging only):

Important

Use partition-based queries for debugging only. Do not use them in production.

-- Query by hash values: the system calculates hash values from a1 and a2
-- to identify the target partitions
SELECT /*+ SCAN_ATTR(hashValues='a1,a2') */ company_id, company_name
FROM company

-- Query by partition IDs: queries data from partition 1 and partition 2
SELECT /*+ SCAN_ATTR(partitionIds='1,2') */ company_id, company_name
FROM company

AGG_ATTR

Passes configuration to the aggregation operator. Place AGG_ATTR only after the SELECT keyword.

Syntax: AGG_ATTR(param1='value1', param2='value2', ...)

ParameterDescriptionDefault
groupKeyLimitMaximum number of data groups the aggregation operator processes200000
stopExceedLimitWhether to return an error when groupKeyLimit is reachedtrue
propScopeScope of hint effect. Default is the current node only.Current node

Example:

SELECT /*+ AGG_ATTR(groupKeyLimit='3', stopExceedLimit='false') */
    company_id, company_name
FROM
    company
GROUP BY
    company_id, company_name

JOIN_ATTR

Passes configuration to the join operator. Place JOIN_ATTR only after the SELECT keyword.

Syntax: JOIN_ATTR(param1='value1', param2='value2', ...)

ParameterDescriptionDefault
lookupTurncateThresholdMaximum number of results after a lookup inner join. The system stops joining once this limit is reached.
lookupBatchSizeNumber of rows per lookup join batch500
propScopeScope of hint effect. Default is the current node only.Current node
defaultValueDefault field values for left outer joins when right-table values are unavailable. Specify one default per data type, e.g., 'INTEGER:10,VARCHAR:aa'.

Control result size with `lookupTurncateThreshold`:

-- Return at most 30 results after the lookup inner join
SELECT /*+ JOIN_ATTR(lookupTurncateThreshold='30', lookupBatchSize='50') */
    id, daogou.company_id, company_name
FROM daogou
    JOIN company ON daogou.id = company.company_id

Combine JOIN_ATTR and SCAN_ATTR in the same statement:

SELECT /*+ JOIN_ATTR(lookupTurncateThreshold='3', lookupBatchSize='2') */
    id, daogou.company_id, company_name
FROM
    daogou /*+ SCAN_ATTR(localLimit='5', batchSize='1') */
    JOIN
        company
    ON
        daogou.id = company.company_id

Set default values for left outer joins:

-- When right-table values are missing, use 10 for INTEGER fields and 'aa' for VARCHAR fields
SELECT /*+ JOIN_ATTR(defaultValue='INTEGER:10,VARCHAR:aa') */
    id, daogou.company_id, company_name
FROM daogou
    JOIN company ON daogou.id = company.company_id

LOCAL_PARALLEL

Requires HA3 version V3.8.0 or later.

LOCAL_PARALLEL enables parallel query execution on the Searcher node. Specify it after the table name to set the number of threads for that table. The system automatically identifies which operations can run in parallel.

LOCAL_PARALLEL is incompatible with the parallel query method used in HA3 versions earlier than V3.8.0. Do not use both at the same time.

Syntax: LOCAL_PARALLEL(tableName='<table>', parallelNum='<n>')

ParameterDescription
tableNameTable to parallelize. Use catalogName.dbName.tableName format; catalogName and dbName are optional and default to system values.
parallelNumNumber of parallel threads

Examples:

-- Simple SELECT with parallel scan
SELECT i1
FROM t1 /*+ LOCAL_PARALLEL(tableName='t1', parallelNum='2') */

-- SELECT with aggregation
SELECT i1, COUNT(*)
FROM t1 /*+ LOCAL_PARALLEL(tableName='t1', parallelNum='2') */
GROUP BY i1

-- JOIN with parallel scan on one table
SELECT t1.id, t2.id
FROM tj_item /*+ LOCAL_PARALLEL(tableName='tj_item', parallelNum='2') */ AS t1
JOIN tj_shop AS t2
ON t1.id = t2.id

If parallel execution does not improve performance as expected, examine the query trace and execution plan to identify the bottleneck operation.

Conflict resolution

When multiple hints conflict, the system applies these rules:

ScenarioSystem behavior
Same hint type, different propScope values on the same nodeThe hint with the smallest propScope value takes effect
Same hint type, same propScope values on the same nodeNone of the conflicting hints are executed
Hint may cause incorrect execution plansThe system does not execute the hint

To verify whether a hint took effect, check the execution plan after running the query. If a hint is not reflected in the execution plan, review the conflict rules above to determine which condition applies.