Kdump is a crucial tool for Linux system administrators and developers, enabling them to effectively diagnose and address kernel crashes. To identify the cause of kernel panic, you can use the kdump service to collect crash dumps, perform a root cause analysis and troubleshoot the system.
By capturing the kernel's memory state at the time of a crash, kdump facilitates the identification of the root cause and the collection of detailed crash information. This information is essential for bug reporting, improving system uptime, and simplifying crash analysis.
kdump uses the kexec system call to boot into a secondary kernel, the capture kernel, without restarting the system. The capture kernel then extracts the contents of the crashed kernel's memory, forming a crash dump (vmcore), and saves it to a file. The capture kernel resides in a reserved area of system memory.
Core Dump --> kernel memory image of an app crash.
Crash Dump --> kernel memory image of OS crash.
kexec-tools package provides the kdump service.
The kdump service automatically calculates the required memory. To enable this feature, add the crashkernel=auto setting in the GRUB_CMDLINE_LINUX parameter of the /etc/default/grub configuration file.
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto no_timer_check net.ifnames=0 console=ttyS0"
then regenerate the GRUB2 configuration : grub2-mkconfig -o /boot/grub2/grub.cfg
Crash dump can be stored in various ways: as a file on the local filesystem, written directly to a device, or transmitted over a network eg. NFS. The default behavior being to save the vmcore file in the /var/crash directory on the local filesystem. Set this in /etc/kdump.conf ( path /var/crash ).
Kernel crashes, typically caused by unrecoverable errors like OOM events, Hung processes, Critical Hardware failures, Magic Sysrq etc.
**************************Analyzing Crash Dumps **********************************************
# yum install crash
# yum install kernel-debuginfo
We need to use the crash utility :
#crash /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux /var/crash/127.0.0.1-2023-11-18-14:05:33/vmcore ( decompressed vmlinuz image with actual vmcore )
... WARNING: kernel relocated [202MB]: patching 90160 gdb minimal_symbol values KERNEL: /usr/lib/debug/lib/modules/4.18.0-5.el8.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2023-11-18-11:51:55/vmcore [PARTIAL DUMP] CPUS: 2 DATE: Sat Nov 18 11:51:55 2023 UPTIME: 01:03:57 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 586 NODENAME: localhost.localdomain RELEASE: 4.18.0-5.el8.x86_64 VERSION: #1 SMP Sat Nov 18 11:51:55 UTC 2023 MACHINE: x86_64 (2904 Mhz) MEMORY: 2.9 GB PANIC: "sysrq: SysRq : Trigger a crash" PID: 10635 COMMAND: "bash" TASK: ffff8d6c84271800 [THREAD_INFO: ffff8d6c84271800] CPU: 1 STATE: TASK_RUNNING (SYSRQ) crash>
You can see some info about the cause of Kernel panic ( sysrq : trigger a crash ).
In the same crash prompt : you can use commands like logs, ps (processes ), bt ( kerne stack traces), vm ( virtual memory ), file ( open files ) etc to analyse the crash logs and do the RCA
crash> log ... several lines omitted ... EIP: 0060:[<c068124f>] EFLAGS: 00010096 CPU: 2 EIP is at sysrq_handle_crash+0xf/0x20 EAX: 00000063 EBX: 00000063 ECX: c09e1c8c EDX: 00000000 ESI: c0a09ca0 EDI: 00000286 EBP: 00000000 ESP: ef4dbf24
crash> files PID: 5591 TASK: f196d560 CPU: 2 COMMAND: "bash" ROOT: / CWD: /root FD FILE DENTRY INODE TYPE PATH 0 f734f640 eedc2c6c eecd6048 CHR /pts/0 1 efade5c0 eee14090 f00431d4 REG /proc/sysrq-trigger 2 f734f640 eedc2c6c eecd6048 CHR /pts/0
Refer : 1. Ch10s02 in RH342 for practice
You are the gift that just keeps on giving!!!
Very, very nice writeup on kdump. As usual, a very
succinct delivery thtat packs the punch of an entire
chapter of content!
Many thanks for another contribution. Also, thanks
for the references.
kdump is automatically installed on RHEL 9.
The command systemctl status kdump can be used to check the status of kdump.
In RHEL 9, the "crash" tool that is used to inspect crash dumps, is in the package
Once the crash tool is launched, using the ps, vm, log, or files commands (at the
crash tool prompt) will glean enough information that should allow a methodical
assessment of the cause of the kernel panic.
The information provided by the crash tool subcommands:
ps -> display what processes were running at the time of the kernel panic
vm -> display anything that was loaded in the virtual memory at the time of the
files -> display what files were open at the time of the kernel panic
log -> display any logs
Prior to RHEL 8, the kdump service would only start much later in the boot sequence, leading to the loss of valuable crash information during the early boot stages. To address this limitation, RHEL 8 introduced the innovative "early kdump support" mechanism.
Early kdump is compatible with the same dump targets and configuration parameters as standard kdump. The early kdump functionality is disabled by default.
To set up early kdump support :
1. Ensure a kdump initramfs exists for the current kernel wherein the kdump service should be started if one does not exist.
2. Rebuild the initramfs of the booting kernel with early kdump support :
dracut -f --add earlykdump
3. Append the rd.earlykdump kernel boot parameter to kernelopts line in grub.
4. Reboot and then verify the same enabled in logs :
journalctl -x | grep early-kdump