In Alibaba Cloud Linux 2 (kernel version 4.19.36-12.al7 and later) and Alibaba Cloud Linux 3, the cgroup writeback feature is available for the cgroup v1 kernel interface. This feature lets you limit the rate of buffered I/O.
Background information
Control groups, referred to as cgroups in this topic, are a Linux kernel feature that lets you allocate resources. cgroups are available in two versions: cgroup v1 and cgroup v2. For more information, see the What are Control Groups section in the Resource Management Guide. This topic describes how to enable the cgroup writeback feature for cgroup v1 to limit the buffered I/O rate of processes.
Limits
After you enable the cgroup writeback feature, you must ensure that the mapping between the memory subsystem (memcg) and the I/O subsystem (blkcg) follows the rules described in this section. This is required to limit the buffered I/O rate for processes.
The cgroup writeback feature requires the memcg and blkcg to work in conjunction to limit the buffered I/O rate. By default, the control subsystems of cgroup v1 do not work together. Therefore, you must connect the memcg and blkcg based on a specific rule: each memcg must map to a single blkcg. The mapping can be one-to-one or many-to-one, but not one-to-many or many-to-many.
For example, to limit the buffered I/O rate for processes A and B, the following constraints apply.
If A and B belong to different memcgs, they can map to different blkcgs in a one-to-one relationship. For example, A belongs to
memcg1andblkcg1, and B belongs tomemcg2andblkcg0.If A and B belong to different memcgs, they can also map to the same blkcg. For example, A belongs to
memcg1and B belongs tomemcg2, but both A and B belong toblkcg2.If A and B belong to the same memcg, they must be mapped to the same blkcg. For example, A and B can both belong to
memcg0andblkcg3.
To prevent unexpected issues after you enable cgroup writeback, set the cgroup.procs interface for the blkcg before you limit the buffered I/O rate. You can write a process ID to this interface to ensure a unique blkcg mapping. You can also use tools to view the mapping between the memcg and blkcg. For more information, see Confirm the mapping between memcg and blkcg.
During operations and maintenance (O&M), a process might be moved to another cgroup. Based on the mapping rule, no issues occur if a process moves between memcgs. However, if a process moves between blkcgs, an issue occurs. To prevent this, the feature's code includes a rule: if a process in an active blkcg is moved to another blkcg, the mapping is reset to the root blkcg. The rate limiting feature is then no longer effective because a rate limit threshold is typically not set on the root blkcg.
Although the kernel code includes a rule to prevent unexpected issues, you should avoid moving processes between blkcgs during operations.
Enable the cgroup writeback feature
The cgroup writeback feature is disabled by default for the cgroup v1 interface. Follow these steps to enable it.
Add the
cgwb_v1field using thegrubbycommand to enable the feature.In this example, the kernel version is
4.19.36-12.al7.x86_64. Replace this with your actual kernel version. You can run theuname -rcommand to check your kernel version.sudo grubby --update-kernel="/boot/vmlinuz-4.19.36-12.al7.x86_64" --args="cgwb_v1"Restart the system for the changes to take effect.
sudo rebootRun the following command to read the
/proc/cmdlinekernel file and confirm that the kernel command-line parameters include thecgwb_v1field. This enables theblkio.throttle.write_bps_deviceandblkio.throttle.write_iops_deviceinterfaces in the blkcg to limit the buffered I/O rate.cat /proc/cmdline | grep cgwb_v1
In a Kubernetes (k8s) environment, after you enable the cgroup writeback feature, you must also merge the memory and blkio cgroup subsystems. This prevents rate limiting from failing if a process is moved.
Merge the memory and blkio cgroup subsystems.
Edit the system.conf file.
sudo vim /etc/systemd/system.confModify the JoinControllers configuration. For example:
JoinControllers=cpu,cpuacct net_cls,net_prio memory,blkioPress the Esc key to exit edit mode. Then, enter :wq to save and exit.
Run the following command to rebuild the initial RAM disk image. This ensures the systemd configuration changes take effect.
sudo dracut /boot/initramfs-4.19.36-12.al7.x86_64.img 4.19.36-12.al7.x86_64 --forceRun the following command to restart the system.
sudo rebootRun the following command to verify that the memory and blkio subsystems are merged.
ls /sys/fs/cgroup
Verify that cgroup writeback is effective
This example simulates two I/O-generating processes to verify that the cgroup writeback feature is effective.
The
ddcommand provides quick feedback. You can use theiostatcommand to view the results.The
ddcommand writes data sequentially. During a sequential I/O refresh, the system writes back data in 1 MB chunks. Therefore, set theblkio.throttle.write_bps_devicethreshold to a value of at least 1 MB (1048576). If you set a value less than 1 MB, the I/O process may hang.
Simulate two I/O-generating processes. As required by the limits, first set the
cgroup.procsinterface of the blkcg.sudo mkdir /sys/fs/cgroup/blkio/blkcg1 sudo mkdir /sys/fs/cgroup/memory/memcg1 sudo bash -c "echo $$ > /sys/fs/cgroup/blkio/blkcg1/cgroup.procs" # $$ is your process ID sudo bash -c "echo $$ > /sys/fs/cgroup/memory/memcg1/cgroup.procs" # $$ is your process IDUse the
blkio.throttle.write_bps_deviceinterface in the blkcg to limit the buffered I/O rate.sudo bash -c "echo 254:48 10485760 > /sys/fs/cgroup/blkio/blkcg1/blkio.throttle.write_bps_device" # Set the disk write-back rate limit to 10 MB/s based on the device number.Use the
ddcommand without theoflag=syncparameter to generate cached asynchronous I/O.sudo dd if=/dev/zero of=/mnt/vdd/testfile bs=4k count=10000Use the iostat tool to query the results. Check the
wMB/scolumn in the output. If the rate is limited to approximately 10 MB/s, the cgroup writeback feature is working correctly.iostat -xdm 1 vdd
Confirm the mapping between memcg and blkcg
You can use one of the following methods to confirm that the mapping between memcg and blkcg is one-to-one or many-to-one.
View the mapping between memcg and blkcg.
sudo cat /sys/kernel/debug/bdi/bdi_wb_linkThe following sample output shows that the memcg and blkcg have a one-to-one mapping.
memory <---> blkio memcg1: 35 <---> blkcg1: 48Use the ftrace kernel monitoring tool.
Enable the ftrace tool.
sudo bash -c "echo 1 > /sys/kernel/debug/tracing/events/writeback/insert_memcg_blkcg_link/enable"You can view the output information.
sudo cat /sys/kernel/debug/tracing/trace_pipeThe following sample output shows
memcg_ino=35 blkcg_ino=48. This indicates that the memcg and blkcg have a one-to-one mapping.<...>-1537 [006] .... 99.511327: insert_memcg_blkcg_link: memcg_ino=35 blkcg_ino=48 old_blkcg_ino=0