Hello Experts,
The server is Redhat Linux 5.4 (vcpu 64 cores and 128GB RAM) and running Oracle DB 10g. It hangs occasionally by CPU spike (plateauing 90-99 %sy) and I have to force reboot the VM guest.
Then, no log detail on messages file and Oracle DB alert log file. What I suspect is that recently, I just changed it IO scheduler mode from CFQ to Deadlock on all disks and 7 hours later, the system hung again and I rebooted the server, then I checked the messages log and found EXT3 fs corrupt on 1 disk, which is a disk for Oracle DB data files with no activities (meaning it only contains backup data and the size is not big).
I don't know why Oracle DB accesses that files on the disk because no users have activities on it. Anyway, after removing the disk, the system seems to be good, it has been running normally for 1 week without issues.
What I'd like to know are:
- Can changing from CFQ to Deadline scheduler expose pre-existing disk issues which I've never found on messages log file?
- Can corrupted disk like that cause CPU spike
Thank you
KL
@KL_ 1. yes I think so. For database systems, the Deadline I/O scheduler is a better choice than the CFQ algorithm bcz Deadline puts a cap on how long any single request has to wait. This helps maintain good disk throughput, which is essential for database applications.
2. yes, corrupted disks can causethat - kernel can be stuck in I/O , journaling or may be in spin lock https://en.wikipedia.org/wiki/Spinlock - did you see anything in dmesg like this ?
When you do a fsck - did yu get journal errors or any metadata errors - then it will confirm your findings.
Red Hat
Learning Community
A collaborative learning environment, enabling open source skill development.