Alibaba Cloud Linux 3 EROFS文件系统挂载功能异常导致系统宕机的修复方案
本文为您介绍Alibaba Cloud Linux 3(kernel-5.10.134-15.al8)EROFS文件系统挂载功能异常导致系统宕机的原因及解决方案。
问题描述
在符合如下条件的Alibaba Cloud Linux 3实例中,通过块设备挂载EROFS文件系统可能会导致系统宕机。
镜像:Alibaba Cloud Linux 3.2104。
内核:kernel-5.10.134-15.al8。
执行以下命令判断当前系统是否存在问题。
sudo yum install -y erofs-utils
mkdir -p test mnt
mkfs.erofs foo.erofs test
sudo mount -t erofs -o loop foo.erofs mnt
如果出现此问题,系统将会发生宕机,并提供如下调用栈信息。
[ 225.747952] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000370
..
[ 225.752658] CPU: 3 PID: 5829 Comm: mount Kdump: loaded Not tainted 5.10.134-15.al8.aarch64 #1
[ 225.753089] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017
[ 225.753468] pstate: 62401005 (nZCv daif +PAN -UAO +TCO BTYPE=--)
[ 225.753775] pc : __erofs_bread+0x64/0x1d0 [erofs]
[ 225.754016] lr : erofs_read_metabuf+0x44/0x80 [erofs]
[ 225.754271] sp : ffff800013fcbb00
[ 225.754442] x29: ffff800013fcbb00 x28: ffff0000c5ac0000
[ 225.754711] x27: 0000000000000000 x26: ffff0000ef1dcf50
[ 225.754982] x25: ffff0000d9896b80 x24: 0000000000000000
[ 225.755271] x23: 0000000000000001 x22: ffff0000ef1dcdd8
[ 225.755540] x21: ffff800013fcbbb0 x20: 0000000000000000
[ 225.755810] x19: 0000000000000000 x18: 0000000000000000
[ 225.756079] x17: 0000000000000000 x16: 0000000000000000
[ 225.756347] x15: ffffffffffffffff x14: ffffff0000000000
[ 225.756618] x13: 00000000000003f3 x12: 0000000000000000
[ 225.756888] x11: 0000000000000040 x10: ffff800011d169b8
[ 225.757158] x9 : ffff800009124a84 x8 : ffff0000c7cd9e00
[ 225.757427] x7 : 0000000000000000 x6 : 000000000000003f
[ 225.757697] x5 : ffff0000c773f000 x4 : 0000000000000001
[ 225.757966] x3 : 0000000000000000 x2 : ffff0000ef1dcdd8
[ 225.758235] x1 : ffff800013fcbbb0 x0 : 0000000000000000
[ 225.758508] Call trace:
[ 225.758636] __erofs_bread+0x64/0x1d0 [erofs]
[ 225.758859] erofs_read_metabuf+0x44/0x80 [erofs]
[ 225.759112] erofs_read_superblock+0x60/0x264 [erofs]
[ 225.759370] erofs_fc_fill_super+0xf0/0x310 [erofs]
[ 225.759621] get_tree_bdev+0x15c/0x250
[ 225.760109] erofs_fc_get_tree+0x38/0x54 [erofs]
[ 225.760662] vfs_get_tree+0x2c/0xf0
[ 225.761157] do_new_mount+0x164/0x1d0
[ 225.761652] path_mount+0x1bc/0x570
[ 225.762133] __arm64_sys_mount+0x114/0x140
[ 225.762633] el0_svc_common+0x90/0x250
[ 225.763124] do_el0_svc+0x7c/0x90
[ 225.763579] el0_svc+0x1c/0x30
[ 225.764019] el0_sync_handler+0xa8/0xb0
[ 225.764498] el0_sync+0x168/0x180
[ 225.764949] Code: eb1b001f 540003e0 aa0103e0 97ffff6c (f941bb00)
[ 225.765543] ---[ end trace 50d06630866b5b03 ]---
问题根因
kernel-5.10.134-15.al8内核EROFS文件系统相关新特性对__erofs_bread()函数进行修改时未正确处理通用块设备挂载的场景,导致系统出现空指针解引用。
解决方案
安装内核热补丁。
sudo yum install -y kernel-hotfix-18359162-5.10.134-15
避免使用kernel-5.10.134-15.al8内核版本,例如升级至kernel-5.10.134-15.1.al8及以后版本。具体操作,请参见更换内核版本。