CVE-2024-35931

2024-05-1900:00:00

ubuntu.com

linux kernel

amdgpu

pci error slot reset

ras recovery

system hang

6.5 Medium

AI Score

Confidence

High

0.0004 Low

EPSS

Percentile

9.1%

JSON

In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Skip do PCI error slot reset during RAS recovery Why: The PCI
error slot reset maybe triggered after inject ue to UMC multi times, this
caused system hang. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset
succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled.
[ 557.373722] [drm] PTB located at 0x0000031FED700000 [ 557.373788] [drm]
VRAM is lost due to GPU reset! [ 557.373789] [drm] PSP is resuming… [
557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1
pci_status: 0. Exit, result = 3, need reset [ 557.547067] [drm] PCI error:
detected callback, state(1)!! [ 557.547069] [drm] No support for XGMI hive
yet… [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device
state = 1 pci_status: 0. Enter [ 557.607763] mlx5_core 0000:55:00.0: wait
vital counter value 0x16b5b after 1 iterations [ 557.607777] mlx5_core
0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err
= 0, result = 5, recovered [ 557.610492] [drm] PCI error: slot reset
callback!! … [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2)
succeeded! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2)
succeeded! [ 560.689562] general protection fault, probably for
non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008]
CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic
#101-Ubuntu [ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS
C2789.5.BS.1C11.AG.1 11/08/2023 [ 560.720959] Workqueue: amdgpu-reset-hive
amdgpu_ras_do_recovery [amdgpu] [ 560.728887] RIP:
0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891]
Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00
00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02
00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP:
0018:ffa0000032e53d80 EFLAGS: 00010202 [ 560.763848] RAX: ffa00000001dfd10
RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX:
5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867]
RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [
560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12:
0000000000000000 [ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000
R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000)
GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS:
0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68
CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0:
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433]
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [
560.843444] PKRU: 55555554 [ 560.846480] Call Trace: [ 560.849225] <TASK> [
560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.856488] ?
show_trace_log_lvl+0x1d6/0x2ea [ 560.861379] ?
amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ?
show_regs.part.0+0x23/0x29 [ 560.872293] ? __die_body.cold+0x8/0xd [
560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ?
exc_general_protection+0x1c5/0x410 [ 560.885532] ?
asm_exc_general_protection+0x27/0x30 [ 560.891025] ?
amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323]
amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520]
process_one_work+0x228/0x3d0 How: In RAS recovery, mode-1 reset is issued
from RAS fatal error handling and expected all the nodes in a hive to be
reset. no need to issue another mode-1 during this procedure.

OSVersionArchitecturePackageVersionFilename
ubuntu18.04noarchlinux< anyUNKNOWN
ubuntu20.04noarchlinux< anyUNKNOWN
ubuntu22.04noarchlinux< anyUNKNOWN
ubuntu23.10noarchlinux< anyUNKNOWN
ubuntu24.04noarchlinux< anyUNKNOWN
ubuntu14.04noarchlinux< anyUNKNOWN
ubuntu16.04noarchlinux< anyUNKNOWN
ubuntu18.04noarchlinux-aws< anyUNKNOWN
ubuntu20.04noarchlinux-aws< anyUNKNOWN
ubuntu22.04noarchlinux-aws< anyUNKNOWN

OS	Version	Architecture	Package	Version	Filename
ubuntu	18.04	noarch	linux	< any	UNKNOWN
ubuntu	20.04	noarch	linux	< any	UNKNOWN
ubuntu	22.04	noarch	linux	< any	UNKNOWN
ubuntu	23.10	noarch	linux	< any	UNKNOWN
ubuntu	24.04	noarch	linux	< any	UNKNOWN
ubuntu	14.04	noarch	linux	< any	UNKNOWN
ubuntu	16.04	noarch	linux	< any	UNKNOWN
ubuntu	18.04	noarch	linux-aws	< any	UNKNOWN
ubuntu	20.04	noarch	linux-aws	< any	UNKNOWN
ubuntu	22.04	noarch	linux-aws	< any	UNKNOWN