Troubleshoot high memory usage in ApsaraDB for MongoDB

更新时间:
复制 MD 格式

Memory usage is a critical metric for ApsaraDB for MongoDB instances. Learn how to check memory usage of an ApsaraDB for MongoDB instance, identify common causes, and apply optimization strategies.

Background information

When an ApsaraDB for MongoDB process starts, it loads binary files and dependent system libraries into memory and manages memory allocation and deallocation for tasks such as client connection management, request processing, and the storage engine. By default, ApsaraDB for MongoDB uses Google's tcmalloc as its memory allocator. Memory is primarily consumed by the WiredTiger storage engine and by client connections and request processing.

Check memory usage

  • Analyze monitoring charts

    You can view the memory usage of ApsaraDB for MongoDB on the Monitoring Information page in the ApsaraDB for MongoDB console. The node composition of ApsaraDB for MongoDB instances varies based on the instance architecture. You can select a node to view its memory usage.

    • Replica set architecture: Includes a primary node, one or more secondary nodes, a hidden node, and optionally one or more read-only nodes.

    • Sharded cluster architecture: The memory usage for each shard is similar to that of a replica set. The Config Server stores configuration metadata. The memory usage of a mongos routing node is related to the size of aggregation result sets, the connection count, and the size of metadata.

  • Use the command line

    Connect to the instance by using the mongo shell and run the db.serverStatus().mem command to view and analyze memory usage. The following code provides a sample output:

{ "bits" : 64, "resident" : 13116, "virtual" : 20706, "supported" : true }
// resident indicates the physical memory used by the mongod process, in MB.
// virtual indicates the virtual memory used by the mongod process, in MB.
Note

serverStatus command reference.

Common causes

Engine memory

Most of the memory in an ApsaraDB for MongoDB instance is used for the storage engine cache. For compatibility and security, ApsaraDB for MongoDB sets the WiredTiger CacheSize to approximately 60% of the instance's total allocated memory. Product specifications.

If the storage engine cache usage reaches 95% of the configured CacheSize, the instance is under high load, and threads handling user requests begin to evict clean pages. If the dirty data in the storage engine cache exceeds 20% of the CacheSize, user threads also evict dirty pages. During this process, users may experience noticeable request blocking. Eviction parameter descriptions.

Check engine memory usage with these methods:

  • Check the memory usage of the WiredTiger storage engine

    In the mongo shell, run the db.serverStatus().wiredTiger.cache command. The output field bytes currently in the cache shows the amount of memory used by the cache. The following code provides a sample output:

    {
       ......
       "bytes belonging to page images in the cache":6511653424,
       "bytes belonging to the cache overflow table in the cache":65289,
       "bytes currently in the cache":8563140208,
       "bytes dirty in the cache cumulative":NumberLong("369249096605399"),
       ......
    }

Connection and request memory

High connection counts consume significant memory for these reasons:

  • Thread stack overhead: Each connection has a corresponding backend thread to process its requests. Each thread can consume up to 1 MB of stack space, though typical usage is in the range of tens to hundreds of kilobytes.

  • TCP connection kernel buffers: At the kernel level, each TCP connection has read and write buffers, which are determined by TCP kernel parameters such as tcp_rmem and tcp_wmem. You do not need to manage this memory. However, more concurrent connections and a larger default socket buffer increase TCP memory consumption.

  • tcmalloc memory management: Each request allocates temporary buffers for packet handling and sorting. After completion, these buffers return to tcmalloc's internal cache rather than immediately to the OS. tcmalloc gradually releases cached memory, but unreleased memory can accumulate to tens of gigabytes.

Investigation methods:

  • Check connection usage

  • Check the amount of memory retained by tcmalloc

    Run the db.serverStatus().tcmalloc command to check the amount of memory that tcmalloc has not returned to the operating system. The tcmalloc cache size can be calculated by using the following formula: tcmalloc cache = pageheap_free_bytes + total_free_byte. The following code provides a sample output:

    {
       ......
       "tcmalloc":{
               "pageheap_free_bytes":NumberLong("3048677376"),
               "pageheap_unmapped_bytes":NumberLong("544994184"),
               "current_total_thread_cache_bytes":95717224,
               "total_free_byte":NumberLong(1318185960),
               ......
       }
    }
    Note

    tcmalloc memory allocator reference.

Metadata memory

A large number of databases, collections, and indexes in an ApsaraDB for MongoDB instance consumes significant memory due to associated metadata. Earlier versions of ApsaraDB for MongoDB may have the following issues:

  • In ApsaraDB for MongoDB versions earlier than 4.0, a full logical backup may open a large number of file handles. If these handles are not returned to the operating system promptly, memory usage can increase rapidly.

  • In ApsaraDB for MongoDB 4.0 and earlier, deleting a large number of collections may not properly remove the corresponding file handles, which can cause a memory leak.

Index creation memory

During normal data writes, a secondary node maintains a buffer of up to approximately 256 MB for oplog application. However, replicating index creation operations on a secondary node can consume more memory.

  • In ApsaraDB for MongoDB versions earlier than 4.2, index creation supports the background option. When you specify {background:true}, the index is built in the background. The replication of this operation is serial and can consume up to 500 MB of memory.

  • In ApsaraDB for MongoDB 4.2 and later, the background option is deprecated. Secondary nodes can replicate index creation operations in parallel, which consumes more memory. Creating multiple indexes at the same time may cause an out of memory (OOM) error.

Plan cache memory usage

In some scenarios, a single query may have a large number of candidate execution plans, which can cause the plan cache to consume a significant amount of memory.

Check plan cache memory usage: In ApsaraDB for MongoDB 4.0 and later, you can run the db.serverStatus().metrics.query.planCacheTotalSizeEstimateBytes command to check the memory usage.

Note

Optimization strategies

Memory optimization aims to balance resource utilization and performance, not to minimize usage at all costs.

ApsaraDB for MongoDB specifies the CacheSize, which cannot be modified. Optimize memory usage with these strategies:

  • Control concurrent connections. The default MongoDB driver establishes a pool of 100 connections. If many clients connect, reduce the pool size per client. Keep total persistent connections below 1,000 to avoid excessive memory and context-switching overhead.

  • Reduce per-request memory overhead by creating indexes to avoid collection scans and in-memory sorting.

  • If memory usage remains high after query and connection optimization, upgrade the instance memory specification to prevent OOM errors and performance degradation from excessive cache eviction.

  • Accelerate memory release by tcmalloc. If the memory usage of your database instance exceeds 80%, you can adjust tcmalloc-related parameters on the Parameter Settings page in the console.

    1. Enable the tcmallocAggressiveMemoryDecommit parameter. This setting effectively resolves memory retention issues.

    2. If the issue persists, gradually increase the tcmallocReleaseRate value (for example, from 1 to 3, then to 5).

    Important
    • Adjust during off-peak hours. These parameters may degrade database performance. Revert immediately if your business is affected.

    • Monitor CPU usage, response time (RT), and opCounters before and after adjustment to assess impact.

  • Optimize database and collection count by removing unnecessary collections and indexes, consolidating tables, splitting the instance, or migrating to a sharded cluster. Instance performance slows down or becomes abnormal due to an excessive number of databases and tables.

Note

If you encounter other scenarios that may involve a ApsaraDB for MongoDB while using ApsaraDB for MongoDB, contact Alibaba Cloud technical support.

References

Eviction parameters

Parameter

Default

Description

eviction_target

80%

When cache usage exceeds the eviction_target, background eviction threads start evicting clean pages.

eviction_trigger

95%

When cache usage exceeds the eviction_trigger, user threads also start evicting clean pages.

eviction_dirty_target

5%

When the dirty cache ratio exceeds the eviction_dirty_target, background eviction threads start evicting dirty pages.

eviction_dirty_trigger

20%

When the dirty cache ratio exceeds the eviction_dirty_trigger, user threads also start evicting dirty pages.

eviction_updates_target

2.5%

When the cache update ratio exceeds the eviction_updates_target, background eviction threads start evicting memory fragments related to small objects.

eviction_updates_trigger

10%

When the cache update ratio exceeds the eviction_updates_trigger, user threads also start evicting memory fragments related to small objects.

FAQ

Q: How do I increase the memory limit for aggregation operations in MongoDB?

A: ApsaraDB for MongoDB does not allow you to directly increase the memory limit for aggregation operations. Each stage in a MongoDB aggregation pipeline has a 100 MB memory limit. If a stage exceeds this limit, the system returns an error. You can resolve this issue by explicitly specifying the {allowDiskUse:true} option in your aggregation pipeline. MongoDB 6.0 and later versions support the global default parameter allowDiskUseByDefault. When an aggregation operation requires excessive memory, MongoDB automatically uses temporary disk space to avoid high memory consumption. For other strategies to reduce memory usage, see Optimization strategies.