Happy Wednesday, everyone.
Whether you are just starting your Linux journey or already working in production, this one is for you. Today we look at a classic situation that appears in interviews and on real servers.
Most of us learn that kill -9 is the "ultimate" way to stop a process. But what if you use it correctly and the process still refuses to disappear?
This challenge helps you understand how Linux processes really work, which is a key part of the "Operate running systems" objective.
You are troubleshooting a legacy application. You run top and notice a process named legacy_app_worker.
You decide to stop it:
[root@server ~]# kill -9 4055
You check again with ps aux | grep 4055, but the process is still there:
root 4055 0.0 0.0 0 0 ? Z 10:00 0:00 [legacy_app_worker] <defunct>
You try kill -9 again. No error, no change. The entry refuses to go away.
You are looking at a zombie process. Your task is to explain it and clean it up.
kill -9 not work here? (Hint: can you kill something that is already dead?)ps command would you run to show the PID, PPID, state, and command for process 4055 so you can see its parent process ID clearly?If you are or preparing for an exam, this is a great small challenge to understand processes beyond the basic "kill the PID" approach.
Let us see how you would explain and fix this. Post your answers below.
1) The kill -9 command did not work because the process is already 'dead'. The -9 or SIGKILL signal has no effect here. The process has already been removed from memory and the signal does not get processed. This is often a sign that the parent application has not handled the child process correctly.
2) The command ps -O ppid= 4055 will show the process id, the parent process id, the process state and the command that launched the process.
3) To remove the zombie process (without a reboot) we have 2 options. We can try sending the SIGCHILD signal to the parent process with kill -s SIGCHILD <parent process id>. If that fails, then we can kill the parent process.
Bonus:
Whilst zombie process are shown as not using any memory they still use a small ammount of memory. They also have a process ID (PID) which uses up the number of PIDs the OS can use. As the number or zombie processes increases, these can have an effect on the operating systems ability to create new processes.
just a typo in the answer: amount - simple m
ps: long long time ago Midnight commander caused on one our community (student) server thousands of zombie processes
Why kill -9 didn’t work:
A zombie process is already dead. Only its entry is left in the process table, so signals like kill -9 cannot stop it.
Find the parent process:
ps -o pid,ppid,state,cmd -p 4055
Fix:
Find the parent PID (for example 4001) and restart or kill the parent process so it reaps the zombie:
kill 4001
(or kill -SIGCHLD 4001)
Why clean zombies:
Too many zombie processes fill the process table and can stop new programs from starting, causing system problems even though they use no CPU or memory.
Why doesn’t kill -9 work on a zombie
Because the zombie process is already dead It has finished running and only its entry in the process table remains kill -9 works on running processes not on zombies
Command to find the parent process of the zombie
Use: ps -eo pid,ppid,state,cmd | grep ' Z'
This shows the zombie’s PID and its parent PID (PPID)
Or use: pstree -p to see the parent in a tree view
How to clean it up after finding the parent
Send SIGCHLD to the parent: kill -s SIGCHLD <parent_pid>
If that doesn’t work restart or kill the parent
kill -TERM <parent_pid> (Graceful stop)
kill -KILL <parent_pid> (Force stop)
If the parent dies init (PID 1) will clean up the zombie
Why remove zombies even if they use 0 CPU and memory
They occupy a process ID entry Too many zombies can block new processes
They indicate a bug in the parent process
They clutter monitoring and can cause confusion
Large numbers can affect system stability
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.