linux pstore 存储内核崩溃日志

linux pstore 存储内核崩溃日志

备注:
  参考博客:
  (1)Linux pstore 实现自动“抓捕”内核崩溃日志
  (2)1-Linux 保存kernel panic信息到flash

文章目录

  • linux pstore 存储内核崩溃日志
    • 背景
    • 简介
    • ramoops方式
      • ramoops配置打开
      • ramoops写入测试
    • mtdoops方式
      • mtdoops配置打开
      • mtdoops写入测试
    • mtdpstore
      • mtdpstore配置打开
      • mtdpstore写入测试

背景

实际项目是嵌入式linux环境,内核为linux-4.19版本,一直想实现panic/oops等log信息存放在spinor/spinand的设备的功能,便于分析相关的异常log。直至看到 《Linux pstore 实现自动“抓捕”内核崩溃日志》 解决了困扰我已久的问题,但pstore-blk只支持linux-5.8后的内核,遂产生将此pstore-blk的功能移植到linux-4.19 版本上。我于是将linux-5.15.105移植到linux-4.19上,整个移植过程较为顺利,自测基本功能是可行。

简介

在系统运行过程中,如果内核发生了panic,那么开发人员需要通过内核报错日志来进行定位问题。但是很多时候出现问题的时候没有接调试串口,而报错日志是在内存里面的,重启后就丢失了。所以需要一种方法,可以在系统发生crash时,将crash info保存于非易失存储器中。

这对分析那种小概率且没办法抓到现场的问题非常实用,尤其是现在智能互联网的设备逐渐普及的时候,远端的设备可以自己捕抓崩溃日志再通过网络传输到服务器,维护人员就可以根据收集来的日志定位和解决问题,然后通过OTA让设备升级迭代。

内核使用 kmsg_dump_register() 函数来注册捕获panic或者oops,如今内核已经有多种捕获panic的方式,最新的是pstore方式。

根据网上搜寻的资料,在pstore文件系统之前其实有不少类似的实现。

  • apanic
    Android最早的panic信息记录的方案。在linux 2.6的安卓的内核中找到,却没有提交到社区,后来被放弃维护了。网上找不到放弃的原因,我自己猜测是因为其只适用于mtd nand,然而现在的Android基本用的都是emmc。apanic应该是Android Panic的缩写吧,可以实现在内核崩溃时,把日志转存到mtd nand。
      

  • ramoops
    这里指的是最早的ramoops实现,在最新代码已经整合入pstore中,以pstore/ram的后端形式存在。ramoops可以把日志转存到重启不掉电的ram中。这里对ram有一点要求,即使重启ram的数据也不能丢失。
      

  • crashlog
    这是openwrt提供的内核patch,并没有提交到内核社区。它也是基于ram,只能转存Panic/Oops的日志。
      

  • mtdoops
    MTD子系统支持的功能,与pstore非常相似,只支持转存Panic/Oops日志,不能以文件呈现,需要用户自行解析整个MTD分区。(因为功能的相似,我实现了mtdpstore用于替代mtdoops)
      

  • kdump
    如果说pstore是个轻量级的内核崩溃日志转存的方案,kdump则是一个重量级的问题分析工具。在崩溃时,由kdump产生一个用于捕抓当前信息的内核,该内核会收集内存所有信息到dump core文件中。在重启后,捕抓到的信息保存在特定的文件中。类似的还有netdump和diskdump。kdump的方案适用于服务器这种有大量资源的设备,功能也非常强大,但对嵌入式设备非常不友好。

pstore的前端,是指转存的日志类型,pstore的后端,是指转存到什么类型的设备。

目前支持以下几个前端:

  • dmesg:主要是转存Panic/Oops时log_buf里面的内核日志

  • pmsg:提供给用户空间存储日志的入口,在Android里有看到被用于存储系统的日志。

  • console:终端日志

  • ftrace:function trace的信息

目前支持以下几种后端:

  • pstore/ram:Persistent Ram,重启不会丢数据的内存

  • pstore/blk:(v5.8以后的版本)所有可写的块设备,例如磁盘、U盘、emmc、NFTL nand等

  • mtd device:(v5.8以后的版本)mtd设备,例如 mtd nand。(mtd设备的支持依赖于 pstore/blk 后端,准确来说不是一种独立后端)

详细参考文档:

  • Documentation/admin-guide/ramoops.rst
  • Documentation/admin-guide/pstore-blk.rst

ramoops方式

ramoops配置打开

  • 配置内核
File systems --->
      [*] Miscellaneous filesystems  --->
               <*>   Persistent store support
                        (10240) Default kernel log storage space 
                        < >     DEFLATE (ZLIB) compression   
                        < >     LZO compression      
                        < >     LZ4 compression   
                        < >     LZ4HC compression   
                        [ ]     842 compression    
                        [ ]     zstd compression
                        [*]     Log kernel console messages 
                        [*]     Log user space messages
                        <*>     Log panic/oops to a RAM buffer  
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y

CONFIG_MTD_OOPS=y

CONFIG_MAGIC_SYSRQ=y

CONFIG_PANIC_TIMEOUT=-1
  • 配置预留内存
reserved-memory {
    #address-cells = <1>;
    #size-cells = <1>;
    ranges;

    ramoops@11000000{
            compatible = "ramoops";
            reg = <0x11000000 0x100000>;
            record-size     = <0x00020000>;
            console-size    = <0x00020000>;
            ftrace-size     = <0x00020000>;
    };
};

ramoops写入测试

# echo c > /proc/sysrq-trigger 
[   53.539402] sysrq: Trigger a crash
[   53.542844] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   53.551414] pgd = 78dc9424
[   53.554145] [00000000] *pgd=16b33835, *pte=00000000, *ppte=00000000
[   53.560683] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[   53.566709] Modules linked in:
[   53.569787] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #91
[   53.575544] Hardware name: arobot r8 family
[   53.579752] PC is at sysrq_handle_crash+0x1c/0x28
[   53.584471] LR is at sysrq_handle_crash+0x8/0x28
[   53.589102] pc : [<c030b818>]    lr : [<c030b804>]    psr: 600e0013
[   53.595382] sp : c6b35ea0  ip : 00000000  fp : 000c8008
[   53.600620] r10: 00000004  r9 : 00000000  r8 : 00000063
[   53.605858] r7 : c090d300  r6 : 00000008  r5 : c090d300  r4 : c091aa28
[   53.612399] r3 : 00000001  r2 : 00000000  r1 : c7ebb390  r0 : 00000063
.........

由于log数据存放于DDR,不能掉电,只能依靠自动重启机制来查看,故而要配置:CONFIG_PANIC_TIMEOUT,让系统在 panic 后能自动重启。

重启后,查看数据的流程如下:

# mount -t pstore pstore /sys/fs/pstore/
# cd /sys/fs/pstore/
# ls
console-ramoops-0  dmesg-ramoops-0    dmesg-ramoops-1
#

mtdoops方式

mtdoops配置打开

  • 配置内核
File systems --->
      [*] Miscellaneous filesystems  --->
               <*>   Persistent store support
                        (10240) Default kernel log storage space 
                        < >     DEFLATE (ZLIB) compression   
                        < >     LZO compression      
                        < >     LZ4 compression   
                        < >     LZ4HC compression   
                        [ ]     842 compression    
                        [ ]     zstd compression
                        [*]     Log kernel console messages 
                        [*]     Log user space messages
                        < >     Log panic/oops to a RAM buffer  
                        < >     Log panic/oops to a block device  
Device Drivers  ---> 
      <*> Memory Technology Device (MTD) support  ---> 
            <*>   Log panic/oops to an MTD buffer  
            < >   Log panic/oops to an MTD buffer based on pstore 
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y

CONFIG_MTD_OOPS=y

CONFIG_MAGIC_SYSRQ=y
  • 配置分区
    cmdline方式:
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs mtdoops.mtddev=pstore";

blkparts = "mtdparts=spi0.0:64k(spl)ro,256k(uboot)ro,64k(dtb)ro,128k(pstore),3m(kernel)ro,4m(rootfs)ro,-(data)";

part of方式:

bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs mtdoops.mtddev=pstore";
partition@60000 {
    label = "pstore";
    reg = <0x60000 0x20000>;
 };

mtdoops写入测试

# echo c > /proc/sysrq-trigger 
[55632.357502] sysrq: Trigger a crash
[55632.360984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[55632.369504] pgd = ddcf897d
[55632.372426] [00000000] *pgd=16b36835, *pte=00000000, *ppte=00000000
[55632.378878] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[55632.384897] Modules linked in:
[55632.387972] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #90
[55632.393727] Hardware name: arobot r8 family
[55632.397931] PC is at sysrq_handle_crash+0x1c/0x28
[55632.402648] LR is at sysrq_handle_crash+0x8/0x28
[55632.407276] pc : [<c030a5d8>]    lr : [<c030a5c4>]    psr: 600e0013
[55632.413553] sp : c6b2fea0  ip : 00000000  fp : 000c8008
[55632.418789] r10: 00000004  r9 : 00000000  r8 : 00000063
[55632.424025] r7 : c090d300  r6 : 00000008  r5 : c090d300  r4 : c091a9cc
[55632.430563] r3 : 00000001  r2 : 00000000  r1 : c7ebb390  r0 : 00000063
[55632.437103] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none

.............

重启后查看mtd日志

# cat /dev/mtd3 > 1.txt
# cat 1.txt
............

mtdpstore

mtdpstore配置打开

  • 配置内核
File systems --->
      [*] Miscellaneous filesystems  --->
               <*>   Persistent store support
                        (10240) Default kernel log storage space 
                        < >     DEFLATE (ZLIB) compression   
                        < >     LZO compression      
                        < >     LZ4 compression   
                        < >     LZ4HC compression   
                        [ ]     842 compression    
                        [*]     zstd compression
                                Default pstore compression algorithm (zstd)  --->
                        [*]     Log kernel console messages 
                        [*]     Log user space messages
                        < >     Log panic/oops to a RAM buffer  
                        <*>     Log panic/oops to a block device
                         ( )            block device identifier
                         (64)          Size in Kbytes of kmsg dump log to store
                         (2)            Maximum kmsg dump reason to store    
                         (64)          Size in Kbytes of pmsg to store    
                         (64)          Size in Kbytes of console log to store 
Device Drivers  ---> 
      <*> Memory Technology Device (MTD) support  ---> 
            < >   Log panic/oops to an MTD buffer  
            <*>   Log panic/oops to an MTD buffer based on pstore 
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_BLK=y

CONFIG_MTD_PSTORE=y

CONFIG_MAGIC_SYSRQ=y
  • 配置分区
    cmdline方式:
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs pstore_blk.blkdev=pstore";

blkparts = "mtdparts=spi0.0:64k(spl)ro,256k(uboot)ro,64k(dtb)ro,128k(pstore),3m(kernel)ro,4m(rootfs)ro,-(data)";

part of方式:

bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs pstore_blk.blkdev=pstore";
partition@60000 {
    label = "pstore";
    reg = <0x60000 0x20000>;
};

mtdpstore写入测试

# echo c > /proc/sysrq-trigger 
[  121.945495] sysrq: Trigger a crash
[  121.948979] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  121.957506] pgd = eabf0695
[  121.960430] [00000000] *pgd=16b33835, *pte=00000000, *ppte=00000000
[  121.966887] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[  121.972908] Modules linked in:
[  121.975982] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #90
[  121.981738] Hardware name: arobot r8 family
[  121.985942] PC is at sysrq_handle_crash+0x1c/0x28
[  121.990659] LR is at sysrq_handle_crash+0x8/0x28
[  121.995287] pc : [<c030a5d8>]    lr : [<c030a5c4>]    psr: 600e0013
[  122.001564] sp : c6b35ea0  ip : 00000000  fp : 000c8008
[  122.006800] r10: 00000004  r9 : 00000000  r8 : 00000063
[  122.012036] r7 : c090d300  r6 : 00000008  r5 : c090d300  r4 : c091a9cc
[  122.018574] r3 : 00000001  r2 : 00000000  r1 : c7ebb390  r0 : 00000063

......

重启后查看pstore日志

# mount -t pstore pstore /sys/fs/pstore/
# cd /sys/fs/pstore/
# ls
dmesg-pstore_blk-0  dmesg-pstore_blk-1
# 
# 
# head -n 5 dmesg-pstore_blk-0
Panic#2 Part1
<1>[    0.000000] Booting Linux on physical CPU 0x0
<7>[    0.000000] Linux version 4.19.123  (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #90 SMP PREEMPT 2023.03.30 19:36:54 a19d55d3c
<6>[    0.000000] OF: fdt: Machine model: Tina
<6>[    0.000000] Memory policy: Data cache writealloc
# 
# 
# head -n 5 dmesg-pstore_blk-1
Oops#1 Part1
<1>[    0.000000] Booting Linux on physical CPU 0x0
<7>[    0.000000] Linux version 4.19.123 (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #90 SMP PREEMPT 2023.03.30 19:36:54 a19d55d3c
<6>[    0.000000] OF: fdt: Machine model: Tina
<6>[    0.000000] Memory policy: Data cache writealloc
# 
# 
# tail  -n 5 dmesg-pstore_blk-0
<4>[  122.396718] [<c0101a0c>] (__irq_svc) from [<c0107dfc>] (arch_cpu_idle+0x1c/0x38)
<4>[  122.404135] [<c0107dfc>] (arch_cpu_idle) from [<c013be70>] (do_idle+0xdc/0x100)
<4>[  122.411462] [<c013be70>] (do_idle) from [<c013bfe0>] (cpu_startup_entry+0x18/0x1c)
<4>[  122.419047] [<c013bfe0>] (cpu_startup_entry) from [<101023ac>] (0x101023ac)
<4>[  123.434873] SMP: failed to stop secondary CPUs
# 
# 
# tail  -n 5 dmesg-pstore_blk-1
<4>[  122.206383] 5fc0: 00000000 00000001 000cb778 00000004 000c7d7c 00000020 000c82b8 000c8008
<4>[  122.214571] 5fe0: 00000000 beee988c 0001bb20 b6ebc056
<0>[  122.219640] Code: e59f2010 e5823000 f57ff04e e3a02000 (e5c23000) 
<4>[  122.225745] Disabling lock debugging due to kernel taint
<4>[  122.233525] ---[ end trace 03f2787ef5d29e4a ]---
# 

............

版权声明:本文为博主作者:楓潇潇原创文章,版权归属原作者,如果侵权,请联系我们删除!

原文链接:https://blog.csdn.net/u013836909/article/details/129894795

共计人评分,平均

到目前为止还没有投票!成为第一位评论此文章。

(0)
心中带点小风骚的头像心中带点小风骚普通用户
上一篇 2024年1月16日
下一篇 2024年1月16日

相关推荐