Redhat Crash Utility-Ramdump

Redhat Crash Utility

edit by liaoye@2014/9/16  

http://blog.csdn.net/paul_liao



Crash utility是redhat提供的开源的ramdump解析工具,官方站点:http://people.redhat.com/anderson/ ,能够下载源代码编译。展讯、Marvell和MTK 平台的ramdump能够用Crash utility解析,高通有自家的工具或者用trace32。


Crash utility 编译
1、 须要安装的工具
sudo apt-get install libaio-dev  libncurses5-dev  zlib1g-dev liblzma-dev  flex bison byacc


2、解压缩包编译
tar zxvf crash-7.0.8.tar.gz
cd crash-7.0.8

make target=ARM

假设须要64bit:

make target=ARM64

3、编译外部lib

make extensions target=ARM64

展讯 ramdump抓取方法
当系统出现kernel panic的时候会自己主动把ramdump保持在T卡log的 sysdump文件下,一共两个文件:
 
使用crash utility解析时须要合成一个dump文件才干解析:
cat sysdump.core.0* > dump.bin

Marvell ramdump抓取方法
当系统出现kernel panic的时候会自己主动进入EMMD dump模式,假设检測到SD card。 屏幕显示“EMMD SD DUMP”,系统会自己主动把整个memory 保存到sdcard中。然后关机,能够从sdcard中拿到RAMDUMP0000.gz。否则显示“EMMD USB DUMP”。通过USB连接电脑用fastboot 工具将memory dump出来。
Linux
# fastboot-linux-marvell dump dump.bin
Windows:
D:fastboot_windows>fastboot-windows-marvell.exe dump dump.bin

MTK ramdump抓取方法

a.使能ramdump机制

须要加入例如以下代码

diff --gita/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.cb/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

index 8b2b93a..2ec509f 100644

---a/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

+++b/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

@@ -457,6 +457,17 @@ static int __initmrdump_init(void)

       }

       

       atomic_notifier_chain_register(&panic_notifier_list,&mrdump_panic_blk);

+           //add this block

+               

+       {

+      mrdump_enable = 1;

+                                         

+      mrdump_plat->hw_enable(mrdump_enable);

+                                                          

+      mrdump_cb->machdesc.nr_cpus = NR_CPUS;

+      

+      __inner_flush_dcache_all();

+       }

       return 0;

 }

打开config

+CONFIG_MTK_AEE_POWERKEY_HANG_DETECT=y

+CONFIG_MTK_AEE_MRDUMP=y

+CONFIG_MTK_MRDUMP=y

+CONFIG_MTK_DBG_DUMP=y

另外须要关闭:CONFIG_MTK_AEE_IPANIC,打开了会生成sys_mini_dump。从而不会生成sys_core_dump。

Cat /sys/module/mrdump/parameters/enable 确认是否生效

b.抓取ramdump

Kernel出现panic or oops之后会重新启动进入lkramdump mode,把ram转储到/data/No_Delete.rdmp。然后在收集到mtklog/aee_exp/db*文件里。通过gat工具导出并把SYS_COREDUMP解析出来就可以。


高通ramdump抓取方法

Kernel出现panic or oops之后会重新启动进入ramdump mode, 然后通过QPST工具把ramdump导出来。高通提供了解析工具linux ramdump parser和crashscope能够进行简单的解析,更复杂的解析须要trace32。


crash utility使用
官方提供了具体的使用文档http://people.redhat.com/anderson/crash_whitepaper,可供參考,以下是一些经常使用的操作。

1、 进入crash命令行:./crash-arm  vmlinux  dump.bin
paul@paul-VirtualBox:~$ ./crash-arm  vmlinux  dump.bin 


crash-arm 7.0.5
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation

Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-elf-linux"...

      KERNEL: vmlinux                           
    DUMPFILE: dump.bin
        CPUS: 1
        DATE: Wed Jan  1 10:26:26 2014
      UPTIME: 00:34:14
LOAD AVERAGE: 3.61, 3.59, 3.16
       TASKS: 650
    NODENAME: localhost
     RELEASE: 3.10.33
     VERSION: #4 SMP PREEMPT Wed Sep 10 14:44:32 CST 2014
     MACHINE: armv7l  (unknown Mhz)
      MEMORY: 512 MB
       PANIC: "c0 4233 (sh) Internal error: Oops: 805 [#1] PREEMPT SMP ARM" (check log for details)

         PID: 4233
     COMMAND: "sh"
        TASK: d37f7b40  [THREAD_INFO: cf512000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash-arm>
Crash-arm是编译出来的crash工具二进制文件。 dump.bin是抓取到的ramdump。vmlinux和dump.bin的版本号必需要要匹配上。否则无法解析。


2、然后在命令行运行log指令获取到kmsg
crash-arm> log  
or
crash-arm> log > kmsg

3、bt 获取调用栈,通过调用栈的信息能够恢复现场查找问题。
crash-arm> bt
PID: 37     TASK: db34a640  CPU: 0   COMMAND: "kworker/u8:1"
 #0 [<c016ad38>] (try_to_suspend) from [<c0143a5c>]
 #1 [<c0143a5c>] (process_one_work) from [<c0144138>]
 #2 [<c0144138>] (worker_thread) from [<c0149c94>]
 #3 [<c0149c94>] (kthread) from [<c010f498>]

crash-arm> bt -f
PID: 37     TASK: db34a640  CPU: 0   COMMAND: "kworker/u8:1"
 #0 [<c016ad38>] (try_to_suspend) from [<c0143a5c>]
    [PC: c016ad38  LR: c0143a5c  SP: db391ee8  SIZE: 16]
    db391ee8: 00000838 c0a5f01c db367080 c0143a5c 
 #1 [<c0143a5c>] (process_one_work) from [<c0144138>]
    [PC: c0143a5c  LR: c0144138  SP: db391ef8  SIZE: 56]

    db391ef8: c2907600 c0a7be74 00000001 00000000 
    db391f08: 00000000 db367080 db80ec14 db367098 
    db391f18: db390000 db390000 c0ab39a3 00000001 
    db391f28: db80ec00 c0144138 

#2 [<c0144138>] (worker_thread) from [<c0149c94>]
    [PC: c0144138  LR: c0149c94  SP: db391f30  SIZE: 56]
    db391f30: c0144000 00000000 00000000 db390000 
    db391f40: db391f64 db8b3e98 00000000 db367080 
    db391f50: c0144000 00000000 00000000 00000000 
    db391f60: 00000000 c0149c94 

#3 [<c0149c94>] (kthread) from [<c010f498>]
    [PC: c0149c94  LR: c010f498  SP: db391f68  SIZE: 72]
    db391f68: 04000000 00000000 00000000 db367080 
    db391f78: 00000000 00000000 db391f80 db391f80 
    db391f88: 00000000 00000000 db391f90 db391f90 
    db391f98: db391fac db8b3e98 c0149bf0 00000000 
    db391fa8: 00000000 c010f498

PC program counter。指向当前指向的指令;
LR link register。指向下一条指向的指令;
SP stack pointer。Linux栈的生长方向是由高地址向低地址。


分析下上面红颜色标记的栈数据的含义,首先反汇编vmlinux得到:
static void process_one_work(struct worker *worker, struct work_struct *work)

 162360 __releases(&pool->lock)
 162361 __acquires(&pool->lock)
 162362 {
 162363 c0143928:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
 162364 c014392c:   e1a05001    mov r5, r1
 162365 c0143930:   e5913000    ldr r3, [r1]
 162366 c0143934:   e24dd014    sub sp, sp, #20
 162367 c0143938:   e1a04000    mov r4, r0

……


能够看出从后面開始依次是lr, fp, sl, r9, r8, r7, r6, r5, r4。其它的是后来入栈的数据,能够对比汇编查找。
   c2907600 c0a7be74 00000001 00000000 
    00000000 db367080 db80ec14 db367098 
    db390000 db390000 c0ab39a3 00000001 
    db80ec00 c0144138 

4、struct指令, 通过上面的调用栈信息能够恢复相关的数据,比方struct work_struct。


crash-arm> struct work_struct c0a5f02c
struct work_struct {
  data = {
    counter = 0
  }, 
  entry = {
    next = 0x0, 
    prev = 0xc0a5f034 <autosleep_lock+8>
  }, 
  func = 0xc0a5f034 <autosleep_lock+8>
}

5、whatis 获取函数原型
crash-arm> whatis try_to_suspend 
void try_to_suspend(struct work_struct *);


6、解析出logcat
载入外部logcat.so
crash-arm> extend logcat.so
crash-arm> logcat

7、help, 很多其它指令能够输入help查询或http://people.redhat.com/anderson/crash_whitepaper 


Case study
1、制造kernel panic能够加入空指针,也能够echo c > /proc/sysrq-trigger。我在代码里做了

例如以下改动:
+++kernel/power/autosleep.c
@@ -26,12 +30,16 @@
 static void try_to_suspend(struct work_struct *work)
 {
  unsigned int initial_count, final_count;
+ int *p = 0;
 

  if (!pm_get_wakeup_count(&initial_count, true))
  goto out;
 
  mutex_lock(&autosleep_lock);
 
+ if (work->func != NULL) 
+ *p = 6;
+

  if (!pm_save_wakeup_count(initial_count) ||
当work->func不为NULL(这里仅仅是为了做实验,work->func肯定不会为NULL)时。给指向地址0的指针P赋值导致出现panic。




2、 运行log指令,从解析的kmsg信息中能够定位到出现panic的详细位置
PC is at try_to_suspend+0x38/0xe0  
pc : [<c016ad38>]
0x38偏移量, 0xE0是try_to_suspend函数的总长度

1087 [   82.566833] c0 37 (kworker/u8:1) Unable to handle kernel NULL pointer dereference at virtual address 00000000
1088 [   82.577697] c0 37 (kworker/u8:1) pgd = c0104000
1089 [   11.830322] c0 37 (kworker/u8:1) SEH:seh_api_ioctl_handler 6
1090 
1091 [   82.582458] c0 37 (kworker/u8:1) [00000000] *pgd=00000000
1092 [   82.587860] c0 37 (kworker/u8:1)
1093 [   82.589965] c0 37 (kworker/u8:1) Internal error: Oops: 805 [#1] PREEMPT SMP ARM

1094 [   82.597259] c0 37 (kworker/u8:1) Modules linked in: audiostub cidatattydev gs_modem ccinetdev cci_datastub citt     y iml_module seh cploaddev msocketk geu galcore(O)                                                                
1095 [   82.610107] c0 37 (kworker/u8:1) CPU: 0 PID: 37 Comm: kworker/u8:1 Tainted: G        W  O 3.10.33 #51

1096 [   82.619354] c0 37 (kworker/u8:1) Workqueue: autosleep try_to_suspend
1097 [   82.623901] c0 37 (kworker/u8:1) task: db34a640 ti: db390000 task.ti: db390000
1098 [   82.631164] c0 37 (kworker/u8:1) PC is at try_to_suspend+0x38/0xe0
1099 [   82.637359] c0 37 (kworker/u8:1) LR is at try_to_suspend+0x28/0xe0

1100 [   82.643585] c0 37 (kworker/u8:1) pc : [<c016ad38>]    lr : [<c016ad28>]    psr: a00e0013
1101                sp : db391ee8  ip : 00000000  fp : 00000000
1102 [   82.656921] c0 37 (kworker/u8:1) r10: db2a5400  r9 : 00000000  r8 : db390000
1103 [   82.664001] c0 37 (kworker/u8:1) r7 : db80ec00  r6 : c0ab3d34  r5 : c0a5f01c  r4 : c0a5f01c
1104 [   82.672393] c0 37 (kworker/u8:1) r3 : 00000000  r2 : 00000006  r1 : 200e0013  r0 : c0a5f02c

1105 [   82.680755] c0 37 (kworker/u8:1) Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel


3、反汇编vmlinux
arm-linux-androideabi-objdump -C -S vmlinux > vmlinux-dump 
通过地址c016ad38能够查找到是运行以下这条指令出现了panic。从kmsg能够得知r3 : 00000000、r2 : 00000006,向地址0x0赋值肯定是非法的。
272190 c016ad38:   15832000    strne   r2, [r3]

运行*p = 6的条件是work->func != NULL,R0寄存器的值是try_to_suspend()函数的參数struct work_struct *。R0~R3为什么被用来装载函数參数,能够搜索下APCS标准。


if (work->func != NULL) 
*p = 6;
运行 struct work_struct c0a5f02c 能够恢复当时的struct work_struct,能够清楚看到work->func并不为NULL。
crash-arm> struct work_struct c0a5f02c
struct work_struct {
  data = {

    counter = 0
  }, 
  entry = {
    next = 0x0, 
    prev = 0xc0a5f034 <autosleep_lock+8>
  }, 
  func = 0xc0a5f034 <autosleep_lock+8>
}

上面仅仅是给出一个简单的样例用作学习,实际调试过程中遇到的panic肯定不会像样例这么简单。



參考:
http://blog.csdn.net/keyboardota/article/details/6799054
http://people.redhat.com/anderson/crash_whitepaper

原文地址:https://www.cnblogs.com/yjbjingcha/p/6853199.html