Chetan_Tiwary_
Community Manager
Community Manager
  • 234 Views

Red Hat Linux Interview Series 11

Q.) You have received a high priority incident in your support queue about a particular linux server being too slow to respond  - as an admin, how will you handle / troubleshoot / gather information about the issue ?

 

Q.) Explain what is happening here ?

 

 

sysctl -a | grep -i Commit && grep -i Commit /proc/meminfo
vm.nr_overcommit_hugepages = 0
vm.overcommit_kbytes = 0 
vm.overcommit_memory = 1 
vm.overcommit_ratio = 500 
CommitLimit: 23449596 kB
Committed_AS:23578974 kB

 

 

 

Q.) How will you approach to troubleshoot this issue :

 

 

fsck.ext4: Bad magic number in super-block while trying to open /dev/vdb 

/dev/vdb: The superblock could not be read or does not describe a correct ext4 filesystem

 

 

 

 

I'll be posting a series of Linux-related questions covering various skill levels. Feel free to share your insights and expertise. Your contributions will benefit learners at all stages, from those in current roles to those preparing for Linux interviews.

 

3 Replies
Trevor
Starfighter Starfighter
Starfighter
  • 206 Views

I'd like to respond to that 2nd question, that's showing some output based on the 
/proc/meminfo file.

 

Preliminary Information
 
/proc/meminfo is comprised of many data fields.
 
The /proc/meminfo file inside the /proc pseudo-filesystem provides a usage report about memory on the system
 
/proc/meminfo is a virtual file that reports the amount of available and used memory. It contains real-time information about the system's memory usage as well as buffers and shared memory used by the kernel.
 
The file contents of /proc/meminfo can provide statistics like:
- used and available memory
- swap space
- cache and buffers
- etc.
 
Memory overcommit is the kernel's attempt to utilize as much memory as possible.
 
Overcommit memory, is a feature in the Linux kernel, that enables processes to allocate memory beyond the system's physical limits.
 
CommitLimit
- a memory statistic from the /proc/meminfo file
- provides the amount of memory currently available for allocation on the Linux system
- Based on the overcommit ratio (vm.overcommit_ratio)
- This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in vm.overcommit_memory).
 
The following represent kernel parameters:
- vm.overcommit_kbytes = 0 
- vm.overcommit_memory = 1 
- vm.overcommit_ratio = 500 
 
vm.overcommit_kybtes
- is the counterpart of overcommit_ratio
- is not set if the overcommit_ratio kernel parameter is set
 
vm.overcommit_memory
- a kernel parameter in the Linux operating system that controls the memory overcommit behavior of the system's virtual memory manager.
- This parameter influences how the Linux kernel handles memory allocation requests when the system is running low on available physical memory.
- With the help of this parameter, the Linux kernel allows sysadmins to make balance between memory utilization, system stability, and the risk of out-of-memory situations.
- The vm.overcommit_memory setting in the Linux kernel governs the behavior of memory overcommitment.
 
The vm.overcommit_memory parameter defines 3 different modes of memory overcommit behavior:
1. Mode 0 (Default):
This mode adheres to the traditional overcommit behavior. The kernel allows 
processes to allocate more memory than is physically available, assuming that most processes won't use all the memory they request. Actual memory allocation occurs 
on-demand, and if the system runs out of physical memory, it starts to kill processes 
to free up space.
2. Mode 1 (Conservative):
In this mode, the kernel still allows overcommitment, but it also performs additional 
hecks when memory allocation requests are made. These checks are intended to 
insure that there's a reasonable expectation that the requested memory will 
eventually be used. If the requested memory is deemed unlikely to be used, the 
allocation request might be denied even if there is technically enough virtual 
memory available.
3. Mode 2 (Strict):
This mode prevents overcommitment entirely. The kernel tries to ensure that the 
sum of the memory allocations requested by all processes does not exceed the 
available physical memory and swap space. When a process requests memory in 
excess of the available resources, the allocation request is denied, and the process is 
usually notified with an error.
 
The vm.overcommit_memory parameter is located in the /proc/sys/vm/ directory and can be configured using the sysctl command or by directly modifying the value in the /proc/sys/vm/overcommit_memory file.
 
The overcommit policy is set via the sysctl vm.overcommit_memory command.
 
The overcommit amount can be set via vm.overcommit_ratio (percentage) or vm.overcommit_kbytes (absolute value).
 
The current overcommit limit and amount committed are viewable in /proc/meminfo as CommitLimit and Committed_AS respectively.
 
The overcommit amount can be set using the following parameters:
- vm.overcommit_ratio: Percentage
- vm.overcommit_kbytes: Absolute value 
Note:  Only one of these parameters can be specified at a time, and setting one disables the other. 
 
The overcommit policy is set via the sysctl vm.overcommit_memory command.
 
Now that I've gotten some preliminaries out of the way, let me move on to responding
to the question:  What is happening here?
 
grep -i commitlimit /proc/meminfo 
vm.overcommit_kbytes = 0 
vm.overcommit_memory = 1 
vm.overcommit_ratio = 500 
CommitLimit: 23449596 kB
 
In the information provided above, the CommitLimit is being computed based on the
kernel parameters vm.overcommit_memory and vm.overcommit_ratio (which is configured in conservative mode).
 
CommitLimit
- a data statistic gleaned from the /proc/meminfo file
- the amount of memory currently available for allocation on the Linux system
- the size (in bytes) of virtual memory that can be committed without having to extend the paging files
- CommitLimit is RAM size (not free RAM, but total RAM, usable by the OS) plus current pagefile size.
- refers to the current overcommit limit
- CommitLimit =  ([total RAM pages] - [total huge TLB pages]) * overcommit_ratio / 100 + [total swap pages]
 
vm.overcommit_kybtes 
- a kernel parameter 
- is the counterpart of overcommit_ratio
- is not set if the overcommit_ratio kernel parameter is set
 
vm.overcommit_memory 
- a kernel parameter 
- influences how the Linux kernel handles memory allocation requests when the system is running low on available physical memory
 
vm.overcommit_ratio 
- a kernel parameter
- used to set the overcommit amount
 
 
Bonus coverage:
 
1)  The grep command shown in the question does not display the four lines of output.
Only one line of output is displayed, and that's for the CommitLimit data statistic.
2)  To display the remaining three lines of output shown in the question, that refer to
kernel parameters, I used the following command:   #  sysctl  -a
 
 

 

Trevor "Red Hat Evangelist" Chandler
Chetan_Tiwary_
Community Manager
Community Manager
  • 117 Views

@Trevor Thanks for pointing that out! I have modified the question accordingly. Your explanation is spot on!

Trevor
Starfighter Starfighter
Starfighter
  • 78 Views

Here's my take on that first question:  the slow server 

My research shows that the three primary causes of a sluggish Linux system are:
1) CPU
2) RAM
3) Disk I/O

There are several tools that can be used to investigate how these areas of a system are
performing/functioning. Two of the more popular and useful tools used are: 1) top and
2) sar


One of the beauties about the top utility is that it provides a real-time look at what's
happening on a Linux system. 

When looking at the information provided by the top utility, one of the key pieces of inforrmation to view first is the "load average". You will see three values show for the load average. Those three values refer to the past one, five, and fifteen minutes of system operation. To put into perspective those three values, an ideal load average is when its value is lower than the number of CPUs in the Linux server. For example, with only one CPU in the Linux server, it's best if the load average is below 1. Generally speaking, if a 1-minute average is above the number of physical CPUs on the system, then the system is most likely CPU bound.

Some other worthy information to look at in the output of the top utility would be the following for the CPU (for each CPU if multiple CPUs exist on the system):
- us: This percentage represents the amount of CPU consumed by user processes.
- sy: This percentage represents the amount of CPU consumed by system processes.
- id: This percentage represents how idle each CPU is.

Again, this information represents real-time statistics. The values for us, sy, and id will
identify if the CPU (or CPUs) is bound by user processes or system processes.
Moving on to the sar utility, it is a tool that collects system data every 10 minutes by default. However, this collection interval can be changed by editing the  /etc/cron.d/sysstat file.

The command sar -u provides information about all CPUs on the system, starting at midnight.

As is the case with the top utility, the main things to view in the sar -u command output are %user, %system, %iowait, and %idle. This information can tell you how far back the server has been having issues.

To check RAM performance, the sar -r command will provide the last day's memory usage.

The main thing to look for in RAM usage is %memused and %commit. A important note about the %commit field: This field can show above 100% since the Linux kernel routinely overcommits RAM.  If %commit is consistently over 100%, this result could be an indicator that the system needs more RAM.

For disk I/O performance, use sar -d, which gives you the disk I/O output using just the device name. For this output, looking at %util and %await will give you a good overall picture of disk I/O on the system. The %util field refers to the utilization of that device. The await field contains the amount of time the I/O spends in the scheduler. Await is measured in milliseconds. The value that appears in the await field, that reflects issues on a system, is not a one-size-fits-all - that is, the impactful value will depend on the specific environment.

As a side note, to get the name of the disk device(s), use the command: # sar -dP

Whatever your tool of choice to glean performance information, you'll want to use something that provides information on the all important resources: CPU, RAM, Disk

Trevor "Red Hat Evangelist" Chandler
0 Kudos
Join the discussion
You must log in to join this conversation.