System configuration optimization

更新时间:
复制 MD 格式

Alibaba Cloud Linux 3 ships with a set of pre-tuned kernel parameters optimized for cloud workloads. This document describes those defaults and the common parameters you may need to adjust for your specific workload.

Important

Adjust kernel parameters only when you have observed data that justifies the change. Understand what each parameter does before modifying it — parameter behavior can vary across kernel versions and environment types.

Optimized configurations for Alibaba Cloud Linux 3

These parameters are pre-configured in Alibaba Cloud Linux 3. The values listed are the optimized defaults applied by the OS.

Performance improvement

ParameterValueDescription
net.ipv4.tcp_timeout_init1000Initial TCP retransmission timeout, in milliseconds. Minimum value: 2 HZ. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_timeout_init1000Initial timeout for SYN-ACK retransmission, in milliseconds. Minimum value: 2 HZ. After the first retransmission, the timeout doubles. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_timeout_max120000Maximum SYN-ACK retransmission timeout, in milliseconds. Minimum value: 2 HZ. Each retransmission doubles the timeout, starting from tcp_synack_timeout_init, up to this cap. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_ato_min40Minimum ACK timeout, in milliseconds. Valid values: 4–200 ms. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_init_cwnd10Initial TCP congestion window size. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_synack_retries2Number of SYN-ACK retransmissions when the server does not receive the final ACK. On a good-quality network, three retries take approximately 7 seconds before the connection is dropped.
net.ipv4.tcp_slow_start_after_idle0Controls whether slow start restarts after a TCP connection becomes idle. 0 disables restart, preserving the congestion window across idle periods. 1 enables restart. For long-lived connections with intermittent traffic bursts, set this to 0 to avoid throughput penalties after short idle gaps.
/sys/kernel/mm/transparent_hugepage/hugetext_enabled0Controls the Hugetext feature, which maps code segments of binaries and dynamic libraries using huge pages to reduce iTLB misses. Valid values: 0 = disabled; 1 = huge pages for binaries and dynamic libraries only; 2 = executable anonymous huge pages only; 3 = both. Enable Hugetext for workloads with large code segments, such as databases and large applications, to reduce iTLB misses and improve performance. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.

Resource utilization improvement

ParameterValueDescription
net.ipv4.tcp_syn_retries4Number of SYN retransmissions when the client does not receive a SYN-ACK. With an initial retransmission timeout (RTO) of 1 second, four retransmissions take approximately 15 seconds and the connection times out after about 31 seconds.
net.ipv4.tcp_retries28Maximum retransmissions for an active TCP connection that stops receiving ACKs. With an initial RTO of 200 ms, eight retransmissions take approximately 51 seconds and the final timeout occurs after about 102 seconds.
net.ipv4.tcp_tw_timeout60Timeout for a TCP socket in TIME_WAIT state, in seconds. Valid values: 1–600 seconds. For more information, see Modify the TCP TIME-WAIT timeout period. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later.

net.ipv4.tcp_max_tw_buckets5000Maximum number of TCP connections allowed in TIME_WAIT state simultaneously. When TIME_WAIT connections exhaust the port range defined by net.ipv4.ip_local_port_range, new connect() calls fail. Increase this value if you see TCP: time wait bucket table overflow errors. For details, see Why do many "TCP: time wait bucket table overflow" errors occur on a Linux ECS instance?

Network security

ParameterValueDescription
net.ipv4.conf.all.rp_filter0Reverse path filtering for all current network interface cards (NICs). Valid values: 0 = disabled; 1 = strict (discard packet if its reverse path does not match the receiving interface); 2 = loose (discard only if the source address is unreachable via any interface). >
Warning

Setting this to 1 causes packet loss in multi-NIC systems where inbound and outbound traffic uses different NICs. Do not enable strict mode in multi-NIC environments.

net.ipv4.conf.default.rp_filter0Reverse path filtering applied to newly added NICs. Same valid values and warning as net.ipv4.conf.all.rp_filter.
net.ipv4.conf.default.arp_announce2Source IP selection for ARP requests sent from newly added NICs. Valid values: 0 = any local address on any interface; 1 = prefer a source IP in the same subnet as the destination; 2 = must use the IP of the outbound interface (no ARP sent if no suitable address exists).
net.ipv4.conf.all.arp_announce2Source IP selection for ARP requests sent from all current NICs. Same valid values as net.ipv4.conf.default.arp_announce.
net.ipv4.tcp_syncookies1SYN flood protection. Valid values: 0 = disabled; 1 = enabled (activates only when the SYN backlog is full); 2 = unconditionally enabled (testing only). >
Important

SYN cookies are a fallback mechanism, not a solution for overloaded servers. If SYN flood warnings appear in your logs but the source is legitimate traffic rather than an attack, tune net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and net.ipv4.tcp_synack_retries instead. Note that SYN cookies disable TCP options such as window scaling and timestamps, which can degrade performance for some services.

Other common system configurations for Alibaba Cloud Linux 3

These parameters ship with upstream defaults. Use them as a reference when diagnosing performance or resource issues — adjust only when you have observed data that justifies a change.

Performance improvement

ParameterDefault valueDescription
net.ipv4.ip_local_port_range32768 60999Ephemeral port range for outbound TCP/UDP connections. When most ports in this range are in use, the kernel's linear search for a free port increases CPU utilization. Widen this range if you observe high CPU from port exhaustion, or if connect() calls start returning EADDRNOTAVAIL.
net.ipv4.tcp_rmem4096 131072 6291456Per-TCP-socket receive buffer size, in bytes: minimum, default, and maximum. The default is independent of instance type. Increase these values on high-memory instances with sustained high-bandwidth connections. >
Important

Setting a very large maximum can consume significant memory. Each socket can use up to the maximum value — for example, 1 million sockets at 6 MiB each could require up to 6 TiB of buffer space.

net.ipv4.tcp_wmem4096 16384 4194304Per-TCP-socket send buffer size, in bytes: minimum, default, and maximum. Same tuning guidance as net.ipv4.tcp_rmem.
net.core.netdev_max_backlog1000Maximum length of the per-CPU socket buffer (skb) queue used for receive packet steering (RPS) and loopback or veth traffic. Increase this if you see dropped packets on high-throughput loopback or veth interfaces.
net.core.somaxconn4096Maximum listen backlog queue length per socket. For applications like NGINX that handle large numbers of short-lived connections, increase this value. To check whether tuning is needed, run ss -ntl and compare the Recv-Q (current backlog) against the Send-Q (socket backlog limit). If Recv-Q approaches Send-Q, increase this parameter.
net.core.rmem_max212992Maximum receive socket buffer size, in bytes. For TCP, this cap applies only when an application calls setsockopt(SO_RCVBUF) explicitly; otherwise net.ipv4.tcp_rmem controls the limit. For UDP with many connections on a single socket, increase this value.
net.core.wmem_max212992Maximum send socket buffer size, in bytes. For TCP, this cap applies only when an application calls setsockopt(SO_SNDBUF) explicitly; otherwise net.ipv4.tcp_rmem controls the limit.
/sys/block/<device>/queue/nomerges0Controls I/O merge behavior for the device. Valid values: 0 = all merge types enabled; 1 = only simple one-shot merges (disables complex merges); 2 = all merges disabled. Most workloads benefit from merging. For workloads with purely random I/O where the chance of mergeable requests is low, set to 2 to save the CPU cycles spent checking for merges.
/sys/block/<device>/queue/read_ahead_kb4096Read-ahead size for sequential reads, in KB. The kernel default is 128 KB; the tuned service increases it to 4,096 KB. For sequential workloads (large file reads, log processing), keep the higher value or increase further. For random I/O workloads, reduce to 128 KB to avoid prefetching data that will not be used.
/sys/block/<device>/queue/rq_affinity1Controls which CPU handles I/O completion.

For rq_affinity, the trade-offs between values are:

ValueBehaviorBest for
0Completion runs on the CPU that triggered the interruptLowest latency for interrupt-heavy workloads
1Completion runs on any CPU in the same socket as the submitter (cache-friendly, but the first CPU in the group gets higher load)Most workloads — default and recommended
2Completion runs on the exact CPU that submitted the I/O (balanced CPU load, slightly lower efficiency than 1)High-concurrency workloads with many cores
ParameterDefault valueDescription
/sys/block/<device>/queue/schedulermq-deadline (single queue) or none (multiple queues)I/O scheduler. Alibaba Cloud Linux 3 supports mq-deadline, kyber, bfq, and none. The blk-mq layer selects mq-deadline for single-queue devices and none for multi-queue devices. For workloads that need low read latency, switch to kyber and configure the target latency value.
/sys/kernel/mm/pagecache_limit/enabled0Enables or disables the page cache limit feature system-wide. 0 = disabled; 1 = enabled. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.

/sys/fs/cgroup/memory/memory.pagecache_limit.enable0Enables or disables the page cache limit feature for a specific memcg. 0 = disabled for this memcg; 1 = enabled.
/sys/fs/cgroup/memory/memory.pagecache_limit.size0Page cache usage cap for the current memcg tree, in bytes. Valid values: 0 to the value of memory.limit_in_bytes for the current memcg. Setting to 0 disables the page cache limit feature for this memcg regardless of the global or per-memcg switch. A non-zero value sets the upper limit of page cache usage for the memcg tree.

Network security

ParameterDefault valueDescription
net.ipv4.conf.all.arp_ignore0Controls ARP reply behavior for all current NICs. Valid values: 0 = reply to ARP requests for any local IP, including loopback addresses, regardless of which NIC receives the request; 1 = reply only if the target IP is configured on the receiving NIC; 2 = reply only if the target IP is on the receiving NIC and the source IP is in the same subnet. For example, if eth0 receives an ARP request for the IP of eth1: with value 0, eth0 replies; with value 1 or 2, it does not.
net.ipv4.conf.default.arp_ignore0ARP reply behavior for newly added NICs. Same valid values and behavior as net.ipv4.conf.all.arp_ignore.
net.ipv4.ip_forward0Enables or disables IPv4 packet forwarding. 0 = disabled; 1 = enabled. Enable this when the instance acts as a router or NAT gateway.

Resource utilization

ParameterDefault valueDescription
net.ipv4.tcp_fin_timeout60Duration a TCP connection stays in FIN_WAIT2 state after the local side initiates a close, in seconds. The default of 60 seconds is appropriate for most workloads. If you observe a large number of FIN_WAIT2 connections (check with `netstat -antgrep FIN_WAIT2wc -l`), reduce this value to reclaim ports faster. For more information, see Why does a Linux ECS instance have many TCP connections in the FIN_WAIT2 state?
net.ipv4.tcp_tw_reuse2Controls reuse of TIME_WAIT sockets for new connections. 0 = disabled; 1 = globally enabled; 2 = enabled for loopback only.
net.ipv4.tcp_keepalive_time7200Interval between keepalive probes when TCP keepalive is enabled, in seconds. Keepalive probes confirm that the remote end of an idle connection is still reachable.

System limits

ParameterDefault valueDescription
fs.aio-max-nr65536Maximum number of concurrent asynchronous I/O (AIO) requests system-wide. The kernel accumulates the nr_events argument of each io_setup() call into aio-nr. If aio-nr + nr_events > aio-max-nr, io_setup() returns -EAGAIN. Increase this for database or search workloads that rely heavily on Linux AIO. Monitor aio-nr to determine the right value for your environment.
fs.file-maxSet based on reserved memory at bootMaximum number of file handles the kernel allows system-wide. Up to 10% of reserved memory can be used for file handles. The minimum is 8,192 (the NR_FILE value). Increase this only if processes fail with "too many open files" at the system level.
fs.nr_open1048576Maximum number of open file handles per process. The per-process limit set by ulimit -n (RLIMIT_NOFILE) cannot exceed this value. Increase fs.nr_open before raising ulimit -n beyond 1,048,576.

Monitoring

ParameterDefault valueDescription
net.netfilter.nf_conntrack_max262144Maximum number of connection tracking entries in the nf_conntrack hash table. Calculated as 4 × net.netfilter.nf_conntrack_buckets. Increase this if applications experience intermittent packet loss and the kernel log shows nf_conntrack: table full, dropping packet. For more information, see What do I do if applications on an ECS instance occasionally experience packet loss and the kernel log contains the "kernel: nf_conntrack: table full, dropping packet" error?
net.netfilter.nf_conntrack_tcp_timeout_time_wait120How long nf_conntrack tracks a TCP connection in TIME_WAIT state, in seconds.
net.netfilter.nf_conntrack_tcp_timeout_established432000How long iptables keeps an established TCP connection in the tracking table before closing it due to inactivity, in seconds.
fs.inotify.max_queued_events16384Maximum number of events that can queue for an inotify instance before events are dropped. inotify is the kernel subsystem for monitoring file and directory events. Use the default unless your application processes file events in large batches.
fs.inotify.max_user_instances128Maximum number of inotify instances a user can create. This limit prevents runaway processes from consuming excessive memory by creating many monitoring instances. Use the default unless your application requires more instances.
fs.inotify.max_user_watches8192Maximum number of watches a user can add across all inotify instances. A watch is a (path, event mask) pair that tells inotify which events to report for a specific file or directory. Increase this if your application monitors a large number of files or directories.
/sys/block/<device>/queue/hang_threshold5000I/O hang detection threshold, in milliseconds. The kernel flags an I/O operation as hung if it does not complete within this time. Adjust this based on your storage and workload characteristics. For more information, see Detect I/O hangs in the file system and block layer. >
Important

This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed.