Alibaba Cloud Linux 3 ships with a set of pre-tuned kernel parameters optimized for cloud workloads. This document describes those defaults and the common parameters you may need to adjust for your specific workload.
Adjust kernel parameters only when you have observed data that justifies the change. Understand what each parameter does before modifying it — parameter behavior can vary across kernel versions and environment types.
Optimized configurations for Alibaba Cloud Linux 3
These parameters are pre-configured in Alibaba Cloud Linux 3. The values listed are the optimized defaults applied by the OS.
Performance improvement
| Parameter | Value | Description |
|---|---|---|
net.ipv4.tcp_timeout_init | 1000 | Initial TCP retransmission timeout, in milliseconds. Minimum value: 2 HZ. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_synack_timeout_init | 1000 | Initial timeout for SYN-ACK retransmission, in milliseconds. Minimum value: 2 HZ. After the first retransmission, the timeout doubles. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_synack_timeout_max | 120000 | Maximum SYN-ACK retransmission timeout, in milliseconds. Minimum value: 2 HZ. Each retransmission doubles the timeout, starting from tcp_synack_timeout_init, up to this cap. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_ato_min | 40 | Minimum ACK timeout, in milliseconds. Valid values: 4–200 ms. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_init_cwnd | 10 | Initial TCP congestion window size. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_synack_retries | 2 | Number of SYN-ACK retransmissions when the server does not receive the final ACK. On a good-quality network, three retries take approximately 7 seconds before the connection is dropped. |
net.ipv4.tcp_slow_start_after_idle | 0 | Controls whether slow start restarts after a TCP connection becomes idle. 0 disables restart, preserving the congestion window across idle periods. 1 enables restart. For long-lived connections with intermittent traffic bursts, set this to 0 to avoid throughput penalties after short idle gaps. |
/sys/kernel/mm/transparent_hugepage/hugetext_enabled | 0 | Controls the Hugetext feature, which maps code segments of binaries and dynamic libraries using huge pages to reduce iTLB misses. Valid values: 0 = disabled; 1 = huge pages for binaries and dynamic libraries only; 2 = executable anonymous huge pages only; 3 = both. Enable Hugetext for workloads with large code segments, such as databases and large applications, to reduce iTLB misses and improve performance. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. |
Resource utilization improvement
| Parameter | Value | Description |
|---|---|---|
net.ipv4.tcp_syn_retries | 4 | Number of SYN retransmissions when the client does not receive a SYN-ACK. With an initial retransmission timeout (RTO) of 1 second, four retransmissions take approximately 15 seconds and the connection times out after about 31 seconds. |
net.ipv4.tcp_retries2 | 8 | Maximum retransmissions for an active TCP connection that stops receiving ACKs. With an initial RTO of 200 ms, eight retransmissions take approximately 51 seconds and the final timeout occurs after about 102 seconds. |
net.ipv4.tcp_tw_timeout | 60 | Timeout for a TCP socket in TIME_WAIT state, in seconds. Valid values: 1–600 seconds. For more information, see Modify the TCP TIME-WAIT timeout period. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. Deprecated in Alibaba Cloud Linux 4 and later. |
net.ipv4.tcp_max_tw_buckets | 5000 | Maximum number of TCP connections allowed in TIME_WAIT state simultaneously. When TIME_WAIT connections exhaust the port range defined by net.ipv4.ip_local_port_range, new connect() calls fail. Increase this value if you see TCP: time wait bucket table overflow errors. For details, see Why do many "TCP: time wait bucket table overflow" errors occur on a Linux ECS instance? |
Network security
| Parameter | Value | Description |
|---|---|---|
net.ipv4.conf.all.rp_filter | 0 | Reverse path filtering for all current network interface cards (NICs). Valid values: 0 = disabled; 1 = strict (discard packet if its reverse path does not match the receiving interface); 2 = loose (discard only if the source address is unreachable via any interface). > Warning Setting this to |
net.ipv4.conf.default.rp_filter | 0 | Reverse path filtering applied to newly added NICs. Same valid values and warning as net.ipv4.conf.all.rp_filter. |
net.ipv4.conf.default.arp_announce | 2 | Source IP selection for ARP requests sent from newly added NICs. Valid values: 0 = any local address on any interface; 1 = prefer a source IP in the same subnet as the destination; 2 = must use the IP of the outbound interface (no ARP sent if no suitable address exists). |
net.ipv4.conf.all.arp_announce | 2 | Source IP selection for ARP requests sent from all current NICs. Same valid values as net.ipv4.conf.default.arp_announce. |
net.ipv4.tcp_syncookies | 1 | SYN flood protection. Valid values: 0 = disabled; 1 = enabled (activates only when the SYN backlog is full); 2 = unconditionally enabled (testing only). > Important SYN cookies are a fallback mechanism, not a solution for overloaded servers. If SYN flood warnings appear in your logs but the source is legitimate traffic rather than an attack, tune |
Other common system configurations for Alibaba Cloud Linux 3
These parameters ship with upstream defaults. Use them as a reference when diagnosing performance or resource issues — adjust only when you have observed data that justifies a change.
Performance improvement
| Parameter | Default value | Description |
|---|---|---|
net.ipv4.ip_local_port_range | 32768 60999 | Ephemeral port range for outbound TCP/UDP connections. When most ports in this range are in use, the kernel's linear search for a free port increases CPU utilization. Widen this range if you observe high CPU from port exhaustion, or if connect() calls start returning EADDRNOTAVAIL. |
net.ipv4.tcp_rmem | 4096 131072 6291456 | Per-TCP-socket receive buffer size, in bytes: minimum, default, and maximum. The default is independent of instance type. Increase these values on high-memory instances with sustained high-bandwidth connections. > Important Setting a very large maximum can consume significant memory. Each socket can use up to the maximum value — for example, 1 million sockets at 6 MiB each could require up to 6 TiB of buffer space. |
net.ipv4.tcp_wmem | 4096 16384 4194304 | Per-TCP-socket send buffer size, in bytes: minimum, default, and maximum. Same tuning guidance as net.ipv4.tcp_rmem. |
net.core.netdev_max_backlog | 1000 | Maximum length of the per-CPU socket buffer (skb) queue used for receive packet steering (RPS) and loopback or veth traffic. Increase this if you see dropped packets on high-throughput loopback or veth interfaces. |
net.core.somaxconn | 4096 | Maximum listen backlog queue length per socket. For applications like NGINX that handle large numbers of short-lived connections, increase this value. To check whether tuning is needed, run ss -ntl and compare the Recv-Q (current backlog) against the Send-Q (socket backlog limit). If Recv-Q approaches Send-Q, increase this parameter. |
net.core.rmem_max | 212992 | Maximum receive socket buffer size, in bytes. For TCP, this cap applies only when an application calls setsockopt(SO_RCVBUF) explicitly; otherwise net.ipv4.tcp_rmem controls the limit. For UDP with many connections on a single socket, increase this value. |
net.core.wmem_max | 212992 | Maximum send socket buffer size, in bytes. For TCP, this cap applies only when an application calls setsockopt(SO_SNDBUF) explicitly; otherwise net.ipv4.tcp_rmem controls the limit. |
/sys/block/<device>/queue/nomerges | 0 | Controls I/O merge behavior for the device. Valid values: 0 = all merge types enabled; 1 = only simple one-shot merges (disables complex merges); 2 = all merges disabled. Most workloads benefit from merging. For workloads with purely random I/O where the chance of mergeable requests is low, set to 2 to save the CPU cycles spent checking for merges. |
/sys/block/<device>/queue/read_ahead_kb | 4096 | Read-ahead size for sequential reads, in KB. The kernel default is 128 KB; the tuned service increases it to 4,096 KB. For sequential workloads (large file reads, log processing), keep the higher value or increase further. For random I/O workloads, reduce to 128 KB to avoid prefetching data that will not be used. |
/sys/block/<device>/queue/rq_affinity | 1 | Controls which CPU handles I/O completion. |
For rq_affinity, the trade-offs between values are:
| Value | Behavior | Best for |
|---|---|---|
0 | Completion runs on the CPU that triggered the interrupt | Lowest latency for interrupt-heavy workloads |
1 | Completion runs on any CPU in the same socket as the submitter (cache-friendly, but the first CPU in the group gets higher load) | Most workloads — default and recommended |
2 | Completion runs on the exact CPU that submitted the I/O (balanced CPU load, slightly lower efficiency than 1) | High-concurrency workloads with many cores |
| Parameter | Default value | Description |
|---|---|---|
/sys/block/<device>/queue/scheduler | mq-deadline (single queue) or none (multiple queues) | I/O scheduler. Alibaba Cloud Linux 3 supports mq-deadline, kyber, bfq, and none. The blk-mq layer selects mq-deadline for single-queue devices and none for multi-queue devices. For workloads that need low read latency, switch to kyber and configure the target latency value. |
/sys/kernel/mm/pagecache_limit/enabled | 0 | Enables or disables the page cache limit feature system-wide. 0 = disabled; 1 = enabled. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. |
/sys/fs/cgroup/memory/memory.pagecache_limit.enable | 0 | Enables or disables the page cache limit feature for a specific memcg. 0 = disabled for this memcg; 1 = enabled. |
/sys/fs/cgroup/memory/memory.pagecache_limit.size | 0 | Page cache usage cap for the current memcg tree, in bytes. Valid values: 0 to the value of memory.limit_in_bytes for the current memcg. Setting to 0 disables the page cache limit feature for this memcg regardless of the global or per-memcg switch. A non-zero value sets the upper limit of page cache usage for the memcg tree. |
Network security
| Parameter | Default value | Description |
|---|---|---|
net.ipv4.conf.all.arp_ignore | 0 | Controls ARP reply behavior for all current NICs. Valid values: 0 = reply to ARP requests for any local IP, including loopback addresses, regardless of which NIC receives the request; 1 = reply only if the target IP is configured on the receiving NIC; 2 = reply only if the target IP is on the receiving NIC and the source IP is in the same subnet. For example, if eth0 receives an ARP request for the IP of eth1: with value 0, eth0 replies; with value 1 or 2, it does not. |
net.ipv4.conf.default.arp_ignore | 0 | ARP reply behavior for newly added NICs. Same valid values and behavior as net.ipv4.conf.all.arp_ignore. |
net.ipv4.ip_forward | 0 | Enables or disables IPv4 packet forwarding. 0 = disabled; 1 = enabled. Enable this when the instance acts as a router or NAT gateway. |
Resource utilization
| Parameter | Default value | Description | ||
|---|---|---|---|---|
net.ipv4.tcp_fin_timeout | 60 | Duration a TCP connection stays in FIN_WAIT2 state after the local side initiates a close, in seconds. The default of 60 seconds is appropriate for most workloads. If you observe a large number of FIN_WAIT2 connections (check with `netstat -ant | grep FIN_WAIT2 | wc -l`), reduce this value to reclaim ports faster. For more information, see Why does a Linux ECS instance have many TCP connections in the FIN_WAIT2 state? |
net.ipv4.tcp_tw_reuse | 2 | Controls reuse of TIME_WAIT sockets for new connections. 0 = disabled; 1 = globally enabled; 2 = enabled for loopback only. | ||
net.ipv4.tcp_keepalive_time | 7200 | Interval between keepalive probes when TCP keepalive is enabled, in seconds. Keepalive probes confirm that the remote end of an idle connection is still reachable. |
System limits
| Parameter | Default value | Description |
|---|---|---|
fs.aio-max-nr | 65536 | Maximum number of concurrent asynchronous I/O (AIO) requests system-wide. The kernel accumulates the nr_events argument of each io_setup() call into aio-nr. If aio-nr + nr_events > aio-max-nr, io_setup() returns -EAGAIN. Increase this for database or search workloads that rely heavily on Linux AIO. Monitor aio-nr to determine the right value for your environment. |
fs.file-max | Set based on reserved memory at boot | Maximum number of file handles the kernel allows system-wide. Up to 10% of reserved memory can be used for file handles. The minimum is 8,192 (the NR_FILE value). Increase this only if processes fail with "too many open files" at the system level. |
fs.nr_open | 1048576 | Maximum number of open file handles per process. The per-process limit set by ulimit -n (RLIMIT_NOFILE) cannot exceed this value. Increase fs.nr_open before raising ulimit -n beyond 1,048,576. |
Monitoring
| Parameter | Default value | Description |
|---|---|---|
net.netfilter.nf_conntrack_max | 262144 | Maximum number of connection tracking entries in the nf_conntrack hash table. Calculated as 4 × net.netfilter.nf_conntrack_buckets. Increase this if applications experience intermittent packet loss and the kernel log shows nf_conntrack: table full, dropping packet. For more information, see What do I do if applications on an ECS instance occasionally experience packet loss and the kernel log contains the "kernel: nf_conntrack: table full, dropping packet" error? |
net.netfilter.nf_conntrack_tcp_timeout_time_wait | 120 | How long nf_conntrack tracks a TCP connection in TIME_WAIT state, in seconds. |
net.netfilter.nf_conntrack_tcp_timeout_established | 432000 | How long iptables keeps an established TCP connection in the tracking table before closing it due to inactivity, in seconds. |
fs.inotify.max_queued_events | 16384 | Maximum number of events that can queue for an inotify instance before events are dropped. inotify is the kernel subsystem for monitoring file and directory events. Use the default unless your application processes file events in large batches. |
fs.inotify.max_user_instances | 128 | Maximum number of inotify instances a user can create. This limit prevents runaway processes from consuming excessive memory by creating many monitoring instances. Use the default unless your application requires more instances. |
fs.inotify.max_user_watches | 8192 | Maximum number of watches a user can add across all inotify instances. A watch is a (path, event mask) pair that tells inotify which events to report for a specific file or directory. Increase this if your application monitors a large number of files or directories. |
/sys/block/<device>/queue/hang_threshold | 5000 | I/O hang detection threshold, in milliseconds. The kernel flags an I/O operation as hung if it does not complete within this time. Adjust this based on your storage and workload characteristics. For more information, see Detect I/O hangs in the file system and block layer. > Important This is a custom feature in Alibaba Cloud Linux 3. Long-term maintenance is not guaranteed. |