Kernel Crash - kdump

Chetan_Tiwary_ · ‎11-17-2023

Kdump is a crucial tool for Linux system administrators and developers, enabling them to effectively diagnose and address kernel crashes. To identify the cause of kernel panic, you can use the kdump service to collect crash dumps, perform a root cause analysis and troubleshoot the system.

By capturing the kernel's memory state at the time of a crash, kdump facilitates the identification of the root cause and the collection of detailed crash information. This information is essential for bug reporting, improving system uptime, and simplifying crash analysis.

kdump uses the kexec system call to boot into a secondary kernel, the capture kernel, without restarting the system. The capture kernel then extracts the contents of the crashed kernel's memory, forming a crash dump (vmcore), and saves it to a file. The capture kernel resides in a reserved area of system memory.

Core Dump --> kernel memory image of an app crash.

Crash Dump --> kernel memory image of OS crash.

kexec-tools package provides the kdump service.

The kdump service automatically calculates the required memory. To enable this feature, add the crashkernel=auto setting in the GRUB_CMDLINE_LINUX parameter of the /etc/default/grub configuration file.

GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto no_timer_check net.ifnames=0 console=ttyS0"

then regenerate the GRUB2 configuration : grub2-mkconfig -o /boot/grub2/grub.cfg

Crash dump can be stored in various ways: as a file on the local filesystem, written directly to a device, or transmitted over a network eg. NFS. The default behavior being to save the vmcore file in the /var/crash directory on the local filesystem. Set this in /etc/kdump.conf ( path /var/crash ).

Kernel crashes, typically caused by unrecoverable errors like OOM events, Hung processes, Critical Hardware failures, Magic Sysrq etc.

**************************Analyzing Crash Dumps **********************************************

# yum install crash
# yum install kernel-debuginfo

We need to use the crash utility :

#crash /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux /var/crash/127.0.0.1-2023-11-18-14:05:33/vmcore ( decompressed vmlinuz image with actual vmcore )

...
WARNING: kernel relocated [202MB]: patching 90160 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2023-11-18-11:51:55/vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Sat Nov  18 11:51:55 2023
      UPTIME: 01:03:57
LOAD AVERAGE: 0.00, 0.00, 0.00
       TASKS: 586
    NODENAME: localhost.localdomain
     RELEASE: 4.18.0-5.el8.x86_64
     VERSION: #1 SMP Sat Nov 18 11:51:55 UTC 2023
     MACHINE: x86_64  (2904 Mhz)
      MEMORY: 2.9 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 10635
     COMMAND: "bash"
        TASK: ffff8d6c84271800  [THREAD_INFO: ffff8d6c84271800]
         CPU: 1
       STATE: TASK_RUNNING (SYSRQ)

crash>

You can see some info about the cause of Kernel panic ( sysrq : trigger a crash ).

In the same crash prompt : you can use commands like logs, ps (processes ), bt ( kerne stack traces), vm ( virtual memory ), file ( open files ) etc to analyse the crash logs and do the RCA

crash> log
... several lines omitted ...
EIP: 0060:[<c068124f>] EFLAGS: 00010096 CPU: 2
EIP is at sysrq_handle_crash+0xf/0x20
EAX: 00000063 EBX: 00000063 ECX: c09e1c8c EDX: 00000000
ESI: c0a09ca0 EDI: 00000286 EBP: 00000000 ESP: ef4dbf24

crash> files
PID: 5591   TASK: f196d560  CPU: 2   COMMAND: "bash"
ROOT: /    CWD: /root
 FD    FILE     DENTRY    INODE    TYPE  PATH
  0  f734f640  eedc2c6c  eecd6048  CHR   /pts/0
  1  efade5c0  eee14090  f00431d4  REG   /proc/sysrq-trigger
  2  f734f640  eedc2c6c  eecd6048  CHR   /pts/0

Refer : 1. Ch10s02 in RH342 for practice

2. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/system_design_g...

Trevor · ‎11-17-2023

Chetan -

You are the gift that just keeps on giving!!!

Very, very nice writeup on kdump. As usual, a very
succinct delivery thtat packs the punch of an entire
chapter of content!

Many thanks for another contribution. Also, thanks
for the references.

Trevor "Red Hat Evangelist" Chandler

Trevor · ‎11-18-2023

kdump is automatically installed on RHEL 9.

The command systemctl status kdump can be used to check the status of kdump.

In RHEL 9, the "crash" tool that is used to inspect crash dumps, is in the package
named "crash".

Once the crash tool is launched, using the ps, vm, log, or files commands (at the
crash tool prompt) will glean enough information that should allow a methodical
assessment of the cause of the kernel panic.

The information provided by the crash tool subcommands:

ps -> display what processes were running at the time of the kernel panic

vm -> display anything that was loaded in the virtual memory at the time of the
kernel panic

files -> display what files were open at the time of the kernel panic

log -> display any logs

Trevor "Red Hat Evangelist" Chandler

Chetan_Tiwary_ · ‎11-19-2023

Prior to RHEL 8, the kdump service would only start much later in the boot sequence, leading to the loss of valuable crash information during the early boot stages. To address this limitation, RHEL 8 introduced the innovative "early kdump support" mechanism.

Early kdump is compatible with the same dump targets and configuration parameters as standard kdump. The early kdump functionality is disabled by default.

To set up early kdump support :

1. Ensure a kdump initramfs exists for the current kernel wherein the kdump service should be started if one does not exist.

2. Rebuild the initramfs of the booting kernel with early kdump support :

dracut -f --add earlykdump

3. Append the rd.earlykdump kernel boot parameter to kernelopts line in grub.

4. Reboot and then verify the same enabled in logs :

journalctl -x | grep early-kdump