kernel exception时打印出的ESR相关信息

kernel exception时打印出的ESR相关信息

<1>[ 7766.006249] Unhandled fault at 0xffffff800188d408
<1>[ 7766.006256] Mem abort info:
<1>[ 7766.006259]   ESR = 0x86000003
<1>[ 7766.006264]   Exception class = IABT (current EL), IL = 32 bits
<1>[ 7766.006268]   SET = 0, FnV = 0
<1>[ 7766.006271]   EA = 0, S1PTW = 0
<1>[ 7766.006277] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000352033d5
<1>[ 7766.006281] [ffffff800188d408] pgd=000000009d7fe003, pud=000000009d7fe003, pmd=00000000625c6003, pte=0040080063544793
<0>[ 7766.006294] Internal error: level 3 address size fault: 86000003 [#1] PREEMPT SMP

ESR相关信息说明

上述kernel exception时打印出的ESR(Exception Syndrome Register (EL1))值为0x86000003,看下ESR_EL1 register bit assignment:

ESR_EL1是一个64bit register,先要看EC(exception class) field,这个field是在这个register的bit[31:26],占6bit。

ISS依EC不同而有不同的含义。

此实例中EC值是0x21(0b100001),查看EC值解释表,可以得知0b100001是instruction abort,然后查看instruction abort对应的ISS

ECMeaningISSApplies when
0b000000

Unknown reason.

ISS encoding for exceptions with an unknown reason
0b000001

Trapped WF* instruction execution.

Conditional WF* instructions that fail their condition code check do not cause an exception.

ISS encoding for an exception from a WF* instruction
0b100001

Instruction Abort taken without a change in Exception level.

Used for MMU faults generated by instruction accesses and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug-related exceptions.

ISS encoding for an exception from an Instruction Abort

主要看IFSC bit field,这个bit field值的含义说明在如下的table里,在本实例中,IFSC bit field的值是3,所以是“Address size fault, level 3”

ISS encoding for an exception from an Instruction Abort

24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RES0 SET FnV EA RES0 S1PTW RES0 IFSC

IFSC, bits [5:0]

Instruction Fault Status Code.

IFSCMeaningApplies when
0b000000

Address size fault, level 0 of translation or translation table base register.

0b000001

Address size fault, level 1.

0b000010

Address size fault, level 2.

0b000011

Address size fault, level 3.

0b000100

Translation fault, level 0.

0b000101

Translation fault, level 1.

其打印出来的IL = 32bits表示的是instruction length是32bit,即一条指令长度是4 byte

ESR_EL1 register具体说明见如下链接:

https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-?lang=en#fieldset_0-24_0_14-5_0

kernel exception是会打印出当前fault address对应的PGD/PUD/PMD/PTE

<1>[ 7766.006281] [ffffff800188d408] pgd=000000009d7fe003, pud=000000009d7fe003, pmd=00000000625c6003, pte=0040080063544793

pgd= 000000009d7fe003,
pud= 000000009d7fe003,
pmd=00000000625c6003,
pte=  0040080063544793

此kernel exception(KE)是发生在一台2G DRAM的ARM64机器上,所以看起来PGD/PUD/PMD page table descriptor的值是正常的。而PTE page table descriptor的值有问题,它所表示的物理地址是0x80063544000,对于2G DRAM的机器,物理地址应该要小于0xFFFFFFFF。

kernel oops log里的Code行log

[  794.274311] Code: f946a2c9 12001eea 0b350157 9b1b2789 (39402529) 

kernel里发生oops,比如data abort、instruction abort,此时会将哪一条指令触发的data abort、instruction abort以及其前面的几条打印出来,根据这条指令,可以定位出对应source code位置。

比如是在某个ko里某一个函数里发生的oops,则根据这个函数的反汇编代码,在里面搜索39402529,这条指令以及其前面几条如下,所以直接用39402529指令前的地址来执行llvm-symbolizer即可定位出对应source code位置:

llvm-symbolizer -e xxx.ko 0x39402529

227c7c: 12001eea and w10, w23, #0xff
227c80: 0b350157 add w23, w10, w21, uxtb
227c84: 9b1b2789 madd x9, x28, x27, x9
227c88: 39402529 ldrb w9, [x9,#9]

在这之前,可以根据PC所指向的函数的大小,和你反汇编出来的这个函数的汇编代码大小相比较,如果相等,可以确认这个ko或者vmlinux和发生此问题的image是相匹配的,比如如下PC所指向的函数的大小是0xb10:

[  794.235944] XXX_OSD_WindowDestroy+0xb0/0xb10 [xxx.ko]

在反汇编出来的函数里搜索导致问题的instruction时,有可能搜到的不止一条,此时可能需要分析对应的汇编指令来确定是哪一条,或者在确认PC所指向的函数所说明的size和反汇编出来的这个函数的大小是一样的情况下,用这个函数的基地址加上offset,根据相加结果来定位对应的source code位置,比如上述PC所指向的位置在XXX_OSD_WindowDestroy()里的offset是0xb1

原文地址:https://www.cnblogs.com/aspirs/p/15744470.html