Alibaba Cloud Linux 3 EROFS文件系统挂载功能异常导致系统宕机的修复方案

本文为您介绍Alibaba Cloud Linux 3(kernel-5.10.134-15.al8)EROFS文件系统挂载功能异常导致系统宕机的原因及解决方案。

问题描述

在符合如下条件的Alibaba Cloud Linux 3实例中,通过块设备挂载EROFS文件系统可能会导致系统宕机。

  • 镜像:Alibaba Cloud Linux 3.2104。

  • 内核:kernel-5.10.134-15.al8。

执行以下命令判断当前系统是否存在问题。

sudo yum install -y erofs-utils
mkdir -p test mnt
mkfs.erofs foo.erofs test
sudo mount -t erofs -o loop foo.erofs mnt

如果出现此问题,系统将会发生宕机,并提供如下调用栈信息。

[  225.747952] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000370
..
[  225.752658] CPU: 3 PID: 5829 Comm: mount Kdump: loaded Not tainted 5.10.134-15.al8.aarch64 #1
[  225.753089] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017
[  225.753468] pstate: 62401005 (nZCv daif +PAN -UAO +TCO BTYPE=--)
[  225.753775] pc : __erofs_bread+0x64/0x1d0 [erofs]
[  225.754016] lr : erofs_read_metabuf+0x44/0x80 [erofs]
[  225.754271] sp : ffff800013fcbb00
[  225.754442] x29: ffff800013fcbb00 x28: ffff0000c5ac0000
[  225.754711] x27: 0000000000000000 x26: ffff0000ef1dcf50
[  225.754982] x25: ffff0000d9896b80 x24: 0000000000000000
[  225.755271] x23: 0000000000000001 x22: ffff0000ef1dcdd8
[  225.755540] x21: ffff800013fcbbb0 x20: 0000000000000000
[  225.755810] x19: 0000000000000000 x18: 0000000000000000
[  225.756079] x17: 0000000000000000 x16: 0000000000000000
[  225.756347] x15: ffffffffffffffff x14: ffffff0000000000
[  225.756618] x13: 00000000000003f3 x12: 0000000000000000
[  225.756888] x11: 0000000000000040 x10: ffff800011d169b8
[  225.757158] x9 : ffff800009124a84 x8 : ffff0000c7cd9e00
[  225.757427] x7 : 0000000000000000 x6 : 000000000000003f
[  225.757697] x5 : ffff0000c773f000 x4 : 0000000000000001
[  225.757966] x3 : 0000000000000000 x2 : ffff0000ef1dcdd8
[  225.758235] x1 : ffff800013fcbbb0 x0 : 0000000000000000
[  225.758508] Call trace:
[  225.758636]  __erofs_bread+0x64/0x1d0 [erofs]
[  225.758859]  erofs_read_metabuf+0x44/0x80 [erofs]
[  225.759112]  erofs_read_superblock+0x60/0x264 [erofs]
[  225.759370]  erofs_fc_fill_super+0xf0/0x310 [erofs]
[  225.759621]  get_tree_bdev+0x15c/0x250
[  225.760109]  erofs_fc_get_tree+0x38/0x54 [erofs]
[  225.760662]  vfs_get_tree+0x2c/0xf0
[  225.761157]  do_new_mount+0x164/0x1d0
[  225.761652]  path_mount+0x1bc/0x570
[  225.762133]  __arm64_sys_mount+0x114/0x140
[  225.762633]  el0_svc_common+0x90/0x250
[  225.763124]  do_el0_svc+0x7c/0x90
[  225.763579]  el0_svc+0x1c/0x30
[  225.764019]  el0_sync_handler+0xa8/0xb0
[  225.764498]  el0_sync+0x168/0x180
[  225.764949] Code: eb1b001f 540003e0 aa0103e0 97ffff6c (f941bb00)
[  225.765543] ---[ end trace 50d06630866b5b03 ]---

问题根因

kernel-5.10.134-15.al8内核EROFS文件系统相关新特性对__erofs_bread()函数进行修改时未正确处理通用块设备挂载的场景,导致系统出现空指针解引用。

解决方案

  • 安装内核热补丁。

    sudo yum install -y kernel-hotfix-18359162-5.10.134-15
  • 避免使用kernel-5.10.134-15.al8内核版本,例如升级至kernel-5.10.134-15.1.al8及以后版本。具体操作,请参见更换内核版本