Cloud database query optimization-Enterprise Mobile Application Studio(EMAS)-阿里云帮助中心

The EMAS Serverless cloud database uses MongoDB. When the data volume is large, query operations can sometimes fail with an operation exceeded time limit error. This topic describes how to optimize queries to avoid this issue.

Set appropriate indexes

If your query operation includes filter conditions, such as equality tests and range filters, or sorting, add an index to the collection.

Note

When you create an index, remove non-selective equality test or range filter fields to reduce the index size.

Follow these guidelines to create an index for complex queries that include equality tests, range filters, and sorting.

The order of fields in the index must be: equality test fields, sort fields, and then range filter fields.
If the equality test includes multiple fields, their order can be interchanged. The ascending or descending order of these fields in the index does not matter.
If there are multiple sort fields, add them to the index in the same order as the sort operation. The ascending or descending order must also match.
If there are multiple range filter fields, place the field with the lower cardinality (the number of distinct values for the field in the collection) first.

Note

If a collection has multiple indexes, MongoDB might not select the correct one for a complex query. You can specify the index to use in your query statement.

The following example shows how to create a compound index that includes course, sex, class, name, birthmonth, and score. The fields course and sex are used for equality tests. They must be placed first, and their order can be interchanged. The fields birthmonth and score are used for range filters and must be placed last. Because the value range for birthmonth is 1 to 12 and the range for score is 0 to 100, place birthmonth before score. The fields class and name are used for sorting and should be placed in the middle. Their order and sort direction must match the sort operation.

// In the score table, query for records of male students born in the second half of the year with a math score greater than 80, and sort by class and name.
mpserverless.db.collection('score').find(
  {
    course: "Math",
    sex: "male",
    birthmonth: { $gt: 6 },
    score: { $gt: 80 },
  },
  {
    sort: { class: 1, name: 1 },
  },
);

Large data volume query optimization

If queries still time out for a large data volume after you set appropriate indexes, consider the following optimization methods.

Avoid using `skip`, especially with large values. The MongoDB server-side still scans the skipped data, so the time taken is linearly related to the number of skipped documents. Instead of `skip`, consider using sorting and range queries.
For very large datasets, query the data in segments. This involves splitting a single query into multiple query operations based on specific conditions.

Calculate count in segments

You can calculate the count in segments by combining `findOne`, `RangeQuery`, `Skip`, and `Sort`. This method requires an appropriate index.

The code example below shows how to use `_id` as the segmentation column. First, call the `findOne` method to query for the `_id` value of the 100,001st, 200,001st, 300,001st record, and so on. When fewer than 100,001 records remain, `findOne` returns an empty result. Then, use `count` to retrieve the number of remaining records. When you call `findOne`, specify a query condition where `_id` is greater than or equal to a specific value, sort by `_id` in ascending order, and skip 100,000 records. When you call `count`, specify that the `_id` value must be greater than the `_id` returned by the last `findOne` call.

module.exports = async (ctx) => {
    const skip = 100000;
    let count = 0;
    let minId = { $minKey: 1 };
    const collection = ctx.mpserverless.db.collection('collectionName');
    while (true) {
        const query = { _id: { $gte: minId } };
        const options = {
            skip,
            projection: { _id: 1 },
            sort: { _id: 1 },
        };
        const findOneResult = await collection.findOne(query, options);
        const { affectedDocs, success } = findOneResult;
        if (!success) {
            throw new Error("findOne return success false");
        }
        if (affectedDocs > 0) {
            const newId = findOneResult.result._id;
            minId = newId;
            count += skip;
        } else {
            break;
        }
    }
    const query = { _id: { $gte: minId } };
    const countResult = await collection.count(query);
    const { affectedDocs, success } = countResult;
    if (!success) {
        throw new Error("count return success false");
    }
    count += affectedDocs;
    return count;
};

Traverse an entire collection

Avoid using `find` with `Skip` and `Limit` to traverse an entire collection. As the number of skipped documents increases, the response time slows and can cause request timeouts. Instead, use `find` with `RangeQuery`, `Limit`, and `Sort`.

The following code provides an example. The data is sorted by `_id`. Each query specifies a condition where the `_id` is greater than the `_id` of the last record from the previous query result. This process is repeated until the end of the collection is reached.

module.exports = async (ctx) => {
    const pageSize = 100;
    let minId = { $minKey: 1 };
    const collection = ctx.mpserverless.db.collection('collectionName');
    while (true) {
        const query = { _id: { $gt: minId } };
        const options = {
            limit: pageSize,
            sort: { _id: 1 },
        };
        const findResult = await collection.find(query, options);
        const { affectedDocs, success } = findResult;
        if (!success) {
            throw new Error("find return success false");
        }
        if (affectedDocs > 0) {
            const data = findResult.result;
            const lastDoc = data[data.length -1];
            minId = lastDoc._id;
            // TODO: Add the logic to process the data list here.
        } else {
            break;
        }
    }
    return 'success';
};

Paged query optimization

Use paging features such as "Previous Page" and "Next Page" instead of jumping to a random page. Consider the paging feature in Google search results. When there are many pages of results, the total page count is not shown. Random page access is only supported for the first few pages, such as the first 10. For subsequent pages, random access is disabled. You can only move forward a small number of pages. This lets you quickly retrieve the next set of results by building on the previous query. You can use `find` with `RangeQuery`, a small `Skip` value, `Limit`, and `Sort`. For example, after you query the data for page 13, you can retrieve the data for page 17. To do this, use the last record from page 13 as the `RangeQuery` condition, skip three pages worth of documents, and then query for one page of data.