Scroll query demo

更新时间:
复制 MD 格式

Regular search returns at most 5,000 documents. To retrieve larger result sets—for batch data export, analysis pipelines, or machine learning tasks—use scroll queries instead.

Prerequisites

Before you begin, ensure that you have:

  • An OpenSearch application with indexed data

  • An AccessKey pair for a Resource Access Management (RAM) user with the required permissions. See Create a RAM user and Access authorization rules

  • OpenSearch SDK for Java V4.0.0 added to your project dependencies

Important

Use a RAM user's AccessKey pair, not your Alibaba Cloud account's root credentials. The root AccessKey pair has unrestricted access to all APIs. Keep your AccessKey pair out of source code and version control. For setup details, see AliyunServiceRoleForOpenSearch.

Limitations

ConstraintDetail
Supported return formatsfullJSON and JSON only
Unsupported clausesaggregate, distinct, and rank
Max documents per scroll page500
start parameter behaviorIgnored—always starts from position 0

Set up environment variables

Store your AccessKey pair as environment variables before running the demo code.

Linux and macOS

Replace <access_key_id> and <access_key_secret> with your RAM user's AccessKey ID and AccessKey secret.

export ALIBABA_CLOUD_ACCESS_KEY_ID=<access_key_id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<access_key_secret>

Windows

  1. Create an environment variable file and add ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET with your AccessKey ID and AccessKey secret as values.

  2. Restart Windows for the changes to take effect.

For details on creating an AccessKey pair, see Create an AccessKey pair.

How scroll queries work

A scroll query runs in three phases:

  1. Initial request — Send a query with a DeepPaging object to get the first batch of results and a scroll ID.

  2. Subsequent requests — Pass the scroll ID from the previous response to retrieve the next batch. Repeat until the result set is empty.

  3. Expiration — Each scroll ID has a validity period (default: 1 minute). Reset the expiry before each request if you need more time.

Demo code

The following example retrieves all documents matching name:'opensearch' with cate_id<=3, sorted by id in descending order. With 5 documents per page and 25 total documents, the loop runs 6 times—the last iteration returns an empty result set.

All requests use the DeepPaging object to pass the scroll ID and set the validity period.

package com.aliyun.opensearch;

import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.SearcherClient;
import com.aliyun.opensearch.sdk.dependencies.com.google.common.collect.Lists;
import com.aliyun.opensearch.sdk.dependencies.org.json.JSONObject;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.search.*;
import com.aliyun.opensearch.sdk.generated.search.general.SearchResult;
import com.aliyun.opensearch.search.SearchParamsBuilder;
import java.nio.charset.Charset;

public class testScroll {

  // Scroll queries do not support the aggregate, distinct, or rank clause,
  // and support sorting by a single field only.
  private static String appName = "Name of the OpenSearch application that you want to manage";
  private static String host = "Endpoint of the OpenSearch API in your region";

  public static void main(String[] args) {
    // Read credentials from environment variables.
    // Configure the environment variables before running this code.
    String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
    String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");

    System.out.println(
      String.format("file.encoding: %s", System.getProperty("file.encoding"))
    );
    System.out.println(
      String.format("defaultCharset: %s", Charset.defaultCharset().name())
    );

    // Initialize the client.
    OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
    OpenSearchClient serviceClient = new OpenSearchClient(openSearch);
    SearcherClient searcherClient = new SearcherClient(serviceClient);

    // Configure the query: application name, page size, return format, and fields.
    Config config = new Config(Lists.newArrayList(appName));
    config.setStart(start); // The start parameter is ignored for scroll queries; position always starts at 0.
    config.setHits(5);      // Return 5 documents per page.
    config.setSearchFormat(SearchFormat.FULLJSON);
    config.setFetchFields(
      Lists.newArrayList("id", "name", "phone", "int_arr", "literal_arr", "float_arr", "cate_id")
    );

    SearchParams searchParams = new SearchParams(config);
    searchParams.setQuery("name:'opensearch'");
    searchParams.setFilter("cate_id<=3");

    Sort sorter = new Sort();
    sorter.addToSortFields(new SortField("id", Order.DECREASE)); // Sort by id, descending.
    searchParams.setSort(sorter);

    // Create a DeepPaging object to enable scroll queries.
    // Set the scroll ID validity period to 3 minutes (default: 1 minute).
    DeepPaging deep = new DeepPaging();
    deep.setScrollExpire("3m");
    searchParams.setDeepPaging(deep);

    SearchParamsBuilder paramsBuilder = SearchParamsBuilder.create(searchParams);

    // Step 1: Send the initial scroll query to get the first scroll ID.
    SearchResult searchResult;
    try {
      searchResult = searcherClient.execute(paramsBuilder);
      String result = searchResult.getResult();
      JSONObject obj = new JSONObject(result);

      // Step 2: Use the scroll ID from each response to fetch the next batch.
      // With 25 documents and 5 per page, the 6th iteration returns an empty result set.
      for (int i = 1; i <= 6; i++) {
        // When you run the first scroll query, a scroll ID is returned. Use this scroll ID to run the scroll query again.
        deep.setScrollId(
          new JSONObject(obj.get("result").toString())
            .get("scroll_id")
            .toString()
        );
        deep.setScrollExpire("3m"); // Reset the validity period before each request.
        searchResult = searcherClient.execute(paramsBuilder);
        result = searchResult.getResult();
        obj = new JSONObject(result);

        System.out.println("Results for Query No." + i + ": " + obj.get("result"));

        // Sleep 1 second between requests to stay within QPS limits.
        try {
          Thread.sleep(1000);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      }
    } catch (OpenSearchException e) {
      e.printStackTrace();
    } catch (OpenSearchClientException e) {
      e.printStackTrace();
    }
  }
}
Check for errors using the error code and message in the response, not the status field. For error details, see Error codes.

What's next