Red Hat Linux Interview Series 11

Chetan_Tiwary_ · ‎09-28-2024

Q.) You have received a high priority incident in your support queue about a particular linux server being too slow to respond - as an admin, how will you handle / troubleshoot / gather information about the issue ?

Q.) Explain what is happening here ?

sysctl -a | grep -i Commit && grep -i Commit /proc/meminfo
vm.nr_overcommit_hugepages = 0
vm.overcommit_kbytes = 0 
vm.overcommit_memory = 1 
vm.overcommit_ratio = 500 
CommitLimit: 23449596 kB
Committed_AS:23578974 kB

Q.) How will you approach to troubleshoot this issue :

fsck.ext4: Bad magic number in super-block while trying to open /dev/vdb 

/dev/vdb: The superblock could not be read or does not describe a correct ext4 filesystem

I'll be posting a series of Linux-related questions covering various skill levels. Feel free to share your insights and expertise. Your contributions will benefit learners at all stages, from those in current roles to those preparing for Linux interviews.

Trevor · ‎09-29-2024

I'd like to respond to that 2nd question, that's showing some output based on the
/proc/meminfo file.

Preliminary Information

/proc/meminfo is comprised of many data fields.

The /proc/meminfo file inside the /proc pseudo-filesystem provides a usage report about memory on the system

/proc/meminfo is a virtual file that reports the amount of available and used memory. It contains real-time information about the system's memory usage as well as buffers and shared memory used by the kernel.

The file contents of /proc/meminfo can provide statistics like:

- used and available memory

- swap space

- cache and buffers

- etc.

Memory overcommit is the kernel's attempt to utilize as much memory as possible.

Overcommit memory, is a feature in the Linux kernel, that enables processes to allocate memory beyond the system's physical limits.

CommitLimit

- a memory statistic from the /proc/meminfo file

- provides the amount of memory currently available for allocation on the Linux system

- Based on the overcommit ratio (vm.overcommit_ratio)

- This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in vm.overcommit_memory).

The following represent kernel parameters:

- vm.overcommit_kbytes = 0

- vm.overcommit_memory = 1

- vm.overcommit_ratio = 500

vm.overcommit_kybtes

- is the counterpart of overcommit_ratio

- is not set if the overcommit_ratio kernel parameter is set

vm.overcommit_memory

- a kernel parameter in the Linux operating system that controls the memory overcommit behavior of the system's virtual memory manager.

- This parameter influences how the Linux kernel handles memory allocation requests when the system is running low on available physical memory.

- With the help of this parameter, the Linux kernel allows sysadmins to make balance between memory utilization, system stability, and the risk of out-of-memory situations.

- The vm.overcommit_memory setting in the Linux kernel governs the behavior of memory overcommitment.

The vm.overcommit_memory parameter defines 3 different modes of memory overcommit behavior:

1. Mode 0 (Default):

This mode adheres to the traditional overcommit behavior. The kernel allows

processes to allocate more memory than is physically available, assuming that most processes won't use all the memory they request. Actual memory allocation occurs

on-demand, and if the system runs out of physical memory, it starts to kill processes

to free up space.

2. Mode 1 (Conservative):

In this mode, the kernel still allows overcommitment, but it also performs additional

hecks when memory allocation requests are made. These checks are intended to

insure that there's a reasonable expectation that the requested memory will

eventually be used. If the requested memory is deemed unlikely to be used, the

allocation request might be denied even if there is technically enough virtual

memory available.

3. Mode 2 (Strict):

This mode prevents overcommitment entirely. The kernel tries to ensure that the

sum of the memory allocations requested by all processes does not exceed the

available physical memory and swap space. When a process requests memory in

excess of the available resources, the allocation request is denied, and the process is

usually notified with an error.

The vm.overcommit_memory parameter is located in the /proc/sys/vm/ directory and can be configured using the sysctl command or by directly modifying the value in the /proc/sys/vm/overcommit_memory file.

The overcommit policy is set via the sysctl vm.overcommit_memory command.

The overcommit amount can be set via vm.overcommit_ratio (percentage) or vm.overcommit_kbytes (absolute value).

The current overcommit limit and amount committed are viewable in /proc/meminfo as CommitLimit and Committed_AS respectively.

The overcommit amount can be set using the following parameters:

- vm.overcommit_ratio: Percentage

- vm.overcommit_kbytes: Absolute value

Note: Only one of these parameters can be specified at a time, and setting one disables the other.

The overcommit policy is set via the sysctl vm.overcommit_memory command.

Now that I've gotten some preliminaries out of the way, let me move on to responding

to the question: What is happening here?

grep -i commitlimit /proc/meminfo

vm.overcommit_kbytes = 0

vm.overcommit_memory = 1

vm.overcommit_ratio = 500

CommitLimit: 23449596 kB

In the information provided above, the CommitLimit is being computed based on the

kernel parameters vm.overcommit_memory and vm.overcommit_ratio (which is configured in conservative mode).

CommitLimit

- a data statistic gleaned from the /proc/meminfo file

- the amount of memory currently available for allocation on the Linux system

- the size (in bytes) of virtual memory that can be committed without having to extend the paging files

- CommitLimit is RAM size (not free RAM, but total RAM, usable by the OS) plus current pagefile size.

- refers to the current overcommit limit

- CommitLimit = ([total RAM pages] - [total huge TLB pages]) * overcommit_ratio / 100 + [total swap pages]

vm.overcommit_kybtes

- a kernel parameter

- is the counterpart of overcommit_ratio

- is not set if the overcommit_ratio kernel parameter is set

vm.overcommit_memory

- a kernel parameter

- influences how the Linux kernel handles memory allocation requests when the system is running low on available physical memory

vm.overcommit_ratio

- a kernel parameter

- used to set the overcommit amount

Bonus coverage:

1) The grep command shown in the question does not display the four lines of output.
Only one line of output is displayed, and that's for the CommitLimit data statistic.

2) To display the remaining three lines of output shown in the question, that refer to
kernel parameters, I used the following command: # sysctl -a

Trevor "Red Hat Evangelist" Chandler

Chetan_Tiwary_ · ‎09-30-2024

@Trevor Thanks for pointing that out! I have modified the question accordingly. Your explanation is spot on!

Trevor · ‎10-01-2024

Here's my take on that first question: the slow server

My research shows that the three primary causes of a sluggish Linux system are:
1) CPU
2) RAM
3) Disk I/O

There are several tools that can be used to investigate how these areas of a system are
performing/functioning. Two of the more popular and useful tools used are: 1) top and
2) sar

One of the beauties about the top utility is that it provides a real-time look at what's
happening on a Linux system.

When looking at the information provided by the top utility, one of the key pieces of inforrmation to view first is the "load average". You will see three values show for the load average. Those three values refer to the past one, five, and fifteen minutes of system operation. To put into perspective those three values, an ideal load average is when its value is lower than the number of CPUs in the Linux server. For example, with only one CPU in the Linux server, it's best if the load average is below 1. Generally speaking, if a 1-minute average is above the number of physical CPUs on the system, then the system is most likely CPU bound.

Some other worthy information to look at in the output of the top utility would be the following for the CPU (for each CPU if multiple CPUs exist on the system):
- us: This percentage represents the amount of CPU consumed by user processes.
- sy: This percentage represents the amount of CPU consumed by system processes.
- id: This percentage represents how idle each CPU is.

Again, this information represents real-time statistics. The values for us, sy, and id will
identify if the CPU (or CPUs) is bound by user processes or system processes.
Moving on to the sar utility, it is a tool that collects system data every 10 minutes by default. However, this collection interval can be changed by editing the /etc/cron.d/sysstat file.

The command sar -u provides information about all CPUs on the system, starting at midnight.

As is the case with the top utility, the main things to view in the sar -u command output are %user, %system, %iowait, and %idle. This information can tell you how far back the server has been having issues.

To check RAM performance, the sar -r command will provide the last day's memory usage.

The main thing to look for in RAM usage is %memused and %commit. A important note about the %commit field: This field can show above 100% since the Linux kernel routinely overcommits RAM. If %commit is consistently over 100%, this result could be an indicator that the system needs more RAM.

For disk I/O performance, use sar -d, which gives you the disk I/O output using just the device name. For this output, looking at %util and %await will give you a good overall picture of disk I/O on the system. The %util field refers to the utilization of that device. The await field contains the amount of time the I/O spends in the scheduler. Await is measured in milliseconds. The value that appears in the await field, that reflects issues on a system, is not a one-size-fits-all - that is, the impactful value will depend on the specific environment.

As a side note, to get the name of the disk device(s), use the command: # sar -dP

Whatever your tool of choice to glean performance information, you'll want to use something that provides information on the all important resources: CPU, RAM, Disk

Trevor "Red Hat Evangelist" Chandler

Red Hat Linux Interview Series 11

administration

DevOps

job interview

Learn

linux

linux-admin

Practice

Question

rhca

round

screening

SRE

SysAdmin

technical

troubleshooting