Scroll iterative query demo

更新时间:
复制 MD 格式

DeepPageingIterator lets you page through large result sets without tracking scroll IDs. Each call to hasNext() and next() automatically manages the underlying scroll session, so you can focus on processing results rather than session state.

Prerequisites

Before you begin, make sure you have:

  • An OpenSearch application with at least one searchable table

  • A RAM user with the required permissions (see Access authorization rules)

  • The OpenSearch SDK for Java V4.0.0 added to your project

Set up credentials

Store your AccessKey pair as environment variables. Do not hardcode credentials in source code.

Linux and macOS

export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>

Replace <access_key_id> and <access_key_secret> with the AccessKey ID and AccessKey secret of your RAM user.

Windows

  1. Create an environment variable file and add ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET with your AccessKey ID and AccessKey secret.

  2. Restart Windows for the changes to take effect.

Important

The AccessKey pair of an Alibaba Cloud account has access to all API operations. Use a Resource Access Management (RAM) user for API calls and routine O&M. For details on creating a RAM user, see Create a RAM user and Create an AccessKey pair. If using a RAM user, make sure the AliyunServiceRoleForOpenSearch role has the required permissions. See AliyunServiceRoleForOpenSearch.

Limitations

Scroll queries have the following constraints:

  • The aggregate, distinct, and rank clauses are not supported.

  • Sorting is supported on a single field only.

  • The start parameter in the config clause has no effect; the default value 0 is always used.

How it works

The SDK builds the scroll session through a chain of objects:

  1. OpenSearch — initialized with your credentials and API endpoint

  2. OpenSearchClient — wraps the OpenSearch object

  3. SearcherClient — wraps the OpenSearchClient object

  4. Config — defines the application name, hits per page, return fields, and data format

  5. SearchParams — holds the query, filter, and sort conditions, plus a DeepPaging object

  6. DeepPageingIterator — drives the scroll loop; each next() call fetches the next page and advances the scroll ID automatically

Determine whether an error has occurred based on the error code and message, not the status field. See Error codes.

Implement iterative scroll queries

The following example uses OpenSearch SDK for Java V4.0.0. DeepPageingIterator handles scroll ID management automatically, so you do not need to pass a scroll ID between requests.

package com.aliyun.opensearch;

import com.aliyun.opensearch.sdk.dependencies.com.google.common.collect.Lists;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.search.*;
import com.aliyun.opensearch.search.DeepPageingIterator;
import java.nio.charset.Charset;

public class testScrollIterator {

    // Scroll queries do not support the aggregate, distinct, or rank clause,
    // and support sorting on a single field only.
    private static String appName = "Name of the OpenSearch application that you want to manage";
    private static String tableName = "Name of the table to which data is to be uploaded";
    private static String host = "Endpoint of the OpenSearch API in your region";

    public static void main(String[] args) {

        // Read credentials from environment variables.
        // Set the environment variables before running this example.
        String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
        String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");

        // Print the file encoding and default charset for debugging.
        System.out.println(String.format("file.encoding: %s", System.getProperty("file.encoding")));
        System.out.println(String.format("defaultCharset: %s", Charset.defaultCharset().name()));

        // Build the client chain: OpenSearch -> OpenSearchClient -> SearcherClient.
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        OpenSearchClient serviceClient = new OpenSearchClient(openSearch);
        SearcherClient searcherClient = new SearcherClient(serviceClient);

        // Configure the query: application name, hits per page, return fields, and data format.
        Config config = new Config(Lists.newArrayList(appName));
        // The start parameter has no effect on scroll queries; default value 0 is used.
        config.setStart(start);
        // Return 5 documents per page.
        config.setHits(5);
        // Supported formats: JSON and FULLJSON.
        config.setSearchFormat(SearchFormat.FULLJSON);
        config.setFetchFields(Lists.newArrayList("id", "name", "phone", "int_arr", "literal_arr", "float_arr", "cate_id"));
        // Note: Set the rerank_size parameter via the setReRankSize method of the Rank class.

        // Define the query, filter, and sort conditions.
        SearchParams searchParams = new SearchParams(config);
        // To search across multiple index fields, specify all fields in one setQuery call.
        // Multiple setQuery calls overwrite each other; only the last one takes effect.
        searchParams.setQuery("name:'opensearch'");
        searchParams.setFilter("cate_id<=3");

        Sort sorter = new Sort();
        // Sort by the id field in descending order.
        sorter.addToSortFields(new SortField("id", Order.DECREASE));
        searchParams.setSort(sorter);

        // Attach a DeepPaging object to enable scroll queries.
        DeepPaging deep = new DeepPaging();
        searchParams.setDeepPaging(deep);

        // Create the iterator. It manages scroll IDs automatically.
        DeepPageingIterator pagesIterator = new DeepPageingIterator(searcherClient, searchParams);
        // Set the interval between page fetches, in milliseconds.
        // The default is 100 ms. Adjust based on your throughput needs.
        pagesIterator.setPagingIntervals(80);

        // Iterate through all pages.
        // Check error codes and messages to detect failures, not the status field.
        try {
            System.out.println("test");
            while (pagesIterator.hasNext()) {
                System.out.println("Debugging information:" + pagesIterator.next());
            }
        } catch (Exception ex) {
            System.out.println("Error message:" + ex.getMessage());
        }
    }
}

Key parameters

ParameterMethodDescriptionDefault
Hits per pageconfig.setHits(n)Number of documents returned per page
Data formatconfig.setSearchFormat(...)Return format: JSON or FULLJSON
Return fieldsconfig.setFetchFields(...)List of fields included in each result
QuerysearchParams.setQuery(...)Query clause; specify all index fields in a single call
FiltersearchParams.setFilter(...)Filter condition applied to results
SortsearchParams.setSort(...)Sort field and direction; scroll queries support one field only
Paging intervalpagesIterator.setPagingIntervals(ms)Delay between page fetches, in milliseconds100 ms

What's next