cancel
Showing results for 
Search instead for 
Did you mean: 
Chetan_Tiwary_
Community Manager
Community Manager
  • 717 Views

Red Hat Linux Interview Series 35

Jump to solution

Q.) A nightly job fails when the server reboots during execution. Which of the cron, anacron or systemd timer would you use and why ?

 

Q.) You need to move /var to another physical disk while the system is live and preserve SELinux contexts. How would you achieve this ?

 

Q.) app.log is filling up rapidly and causing high iowait and disk latency. How would you prevent this from happening again?

 

Bonus Q.) Detect unauthorized file deletion in /etc with auditd, auto-lock the offending user, and send alerts.

 

 

 

Level L2 and above

 

I'll be posting a series of Linux-related questions covering various skill levels. Feel free to share your insights and expertise. Your contributions will benefit learners at all stages, from those in current roles to those preparing for Linux interviews.

1 Solution

Accepted Solutions
Trevor
Starfighter Starfighter
Starfighter
  • 620 Views

Well, I guess I'll get started with that last (3rd) question first:

Q.) app.log is filling up rapidly and causing high iowait and disk latency. How would you prevent this from happening again?

Looking at disk latency, there are several common causes for this - processing of
large files, or data transfers, being a couple of contributors.  This somewhat points
to app.log being involved in some way.

Regarding high iowait, once again there are several possible causes - high disk
activity being responsible for this performance issue.

Since app.log the issue mentions that app.log is filling up rapidly, and is the cause
of these two issues, I'm going to bypass the use of my arsenal of useful tools - iostat,
vmstat, top, iotop, sar, dd, ioping - and leap right to the assumption that the app.log
file is having the major, direct impact.  My assumption about the app.log file is that 
the size that it is growing to, is approaching the max size of the filesystem it resides
on. 

I don't have information on the size of the filesystem for /var/log, so I'm going to have
to make an assumption on that to complete my solution - this will certainly be sufficient
to make my point.  I will assume that /var/log is on a fileystem that is 20G in size.  Using
that assumption, if I limit the size of app.log to no more than 20% of that 20G, that 
app.log would have to be rotated, to start over from zero (0), each time it reached a 
size of 4G.

Note:  My use of that 20% figure is nothing carved in stone anywhere.  That figure is
solely based on this one sys admin's experience, and do not represent those of the
Linux industry!!!

Did I mention rotate earlier? Yes I did!  Sounds like I might want to bring the logrotate
utility into this discussion.  I think I will!

If I use the logrotate utility, I can rotate "app.log" when it reaches 20% of the filesystem
size.  Rotating the file, based on size, vs some frequency (i.e. hourly, daily, monthly, etc.)
would certainly ensure that the size of app.log won't grow to a size that would impact
the performance of disk I/O operations.

So, based on my assumption, and the intended remedy, here's my approach:

Assuming that the application that is writing to the app.log file is named "app" (very
original, huh), I'll create a directory named "app", in the /etc/logrotate.d directory, where
I can place a configuration file for logrotate to use, for the "app" application, and its
log file (app.log).

mkdir  /etc/logrotate.d/app

 

After creating that directory, I'll now create the necessary configuraiton file:

/var/log/app.log
{
     maxsize 4G
     postrotate
          rsync -av    /var/log/app.log-*     remote_user@remote_host:/app_logs/
     endscript

     rotate = 1
}

With this configuration file, among other things, I'm copying the rotated app.log file to
a remote Linux system, so that I can have an archive if needed down the road. 
"Better to have and not need, than to need and not have!"  - TL Chandler

Note:  That directive --    rotate = 1    -- is an instruction to the logrotate utility that
specifies the number of rotated files to keep.  So, there will never be more than one
rotate app.log file in on the filesystem.

Okay, now with that logrotate configuration file place, I'm going to now keep an eye
on the iowait and disk latency stats, to see if this approach has remedied the high iowait
and disk latency issue.  If not, then I'll go to my arsenal of tools to do some deep-dive investigation.

 

Trevor "Red Hat Evangelist" Chandler

View solution in original post

8 Replies
Trevor
Starfighter Starfighter
Starfighter
  • 621 Views

Well, I guess I'll get started with that last (3rd) question first:

Q.) app.log is filling up rapidly and causing high iowait and disk latency. How would you prevent this from happening again?

Looking at disk latency, there are several common causes for this - processing of
large files, or data transfers, being a couple of contributors.  This somewhat points
to app.log being involved in some way.

Regarding high iowait, once again there are several possible causes - high disk
activity being responsible for this performance issue.

Since app.log the issue mentions that app.log is filling up rapidly, and is the cause
of these two issues, I'm going to bypass the use of my arsenal of useful tools - iostat,
vmstat, top, iotop, sar, dd, ioping - and leap right to the assumption that the app.log
file is having the major, direct impact.  My assumption about the app.log file is that 
the size that it is growing to, is approaching the max size of the filesystem it resides
on. 

I don't have information on the size of the filesystem for /var/log, so I'm going to have
to make an assumption on that to complete my solution - this will certainly be sufficient
to make my point.  I will assume that /var/log is on a fileystem that is 20G in size.  Using
that assumption, if I limit the size of app.log to no more than 20% of that 20G, that 
app.log would have to be rotated, to start over from zero (0), each time it reached a 
size of 4G.

Note:  My use of that 20% figure is nothing carved in stone anywhere.  That figure is
solely based on this one sys admin's experience, and do not represent those of the
Linux industry!!!

Did I mention rotate earlier? Yes I did!  Sounds like I might want to bring the logrotate
utility into this discussion.  I think I will!

If I use the logrotate utility, I can rotate "app.log" when it reaches 20% of the filesystem
size.  Rotating the file, based on size, vs some frequency (i.e. hourly, daily, monthly, etc.)
would certainly ensure that the size of app.log won't grow to a size that would impact
the performance of disk I/O operations.

So, based on my assumption, and the intended remedy, here's my approach:

Assuming that the application that is writing to the app.log file is named "app" (very
original, huh), I'll create a directory named "app", in the /etc/logrotate.d directory, where
I can place a configuration file for logrotate to use, for the "app" application, and its
log file (app.log).

mkdir  /etc/logrotate.d/app

 

After creating that directory, I'll now create the necessary configuraiton file:

/var/log/app.log
{
     maxsize 4G
     postrotate
          rsync -av    /var/log/app.log-*     remote_user@remote_host:/app_logs/
     endscript

     rotate = 1
}

With this configuration file, among other things, I'm copying the rotated app.log file to
a remote Linux system, so that I can have an archive if needed down the road. 
"Better to have and not need, than to need and not have!"  - TL Chandler

Note:  That directive --    rotate = 1    -- is an instruction to the logrotate utility that
specifies the number of rotated files to keep.  So, there will never be more than one
rotate app.log file in on the filesystem.

Okay, now with that logrotate configuration file place, I'm going to now keep an eye
on the iowait and disk latency stats, to see if this approach has remedied the high iowait
and disk latency issue.  If not, then I'll go to my arsenal of tools to do some deep-dive investigation.

 

Trevor "Red Hat Evangelist" Chandler
Chetan_Tiwary_
Community Manager
Community Manager
  • 521 Views

@Trevor 

Chetan_Tiwary__0-1750680133630.jpeg

 

Trevor
Starfighter Starfighter
Starfighter
  • 396 Views

Wow!  What an honor!!!  Many, many, many thanks Chetan!!!  I'm 
absolutely in celebratory mode!!!!

Well, my celebration is going to have to be brief, because there's a lot
more work to get done.  I gotta ask for some clarification on that very 
nice bonus question.  That "unauthorized file deletion" spec is throwing
me a curve.  The only way that I can see an account deleting anything
in the /etc directory is that it's the root, or some account in the sudoers
file.  Even if a nefarious actor gains access to either of these accounts,
they will have authorization to delete any content in the /etc directory. 
An auditd rule can easily be configured to log any content deleted from the
/etc directory.  However, with the deletion being possible only by accounts
with the proper authorization, it seems counterproductive to lock those
accounts.

Okay, what am I missing?  

Thanks for locating my crown!  You saved me from having to post a
reward

 

 

 

 

 

 

Trevor "Red Hat Evangelist" Chandler
Chetan_Tiwary_
Community Manager
Community Manager
  • 331 Views

@Trevor A great logical counter and thats a characteristic of an insightful and experienced sysadmin / candidate !!

You are right but here the scenario is more like frmo the standpoint that expands the definition of "unauthorized" beyond mere file system permissions to encompass actions that violate security policy, indicate a compromised account, or represent malicious insider activity. 

A candidate is being tested here for :

  • Security audit concepts
  • Risk assessment - root/ sudo enabled user account can be compromised.
  • Strategy to contain the breach
  • Damage control
Winsock
Mission Specialist
Mission Specialist
  • 378 Views

Hopefully allowed to touch base here and keep this baby rolling, looks like one person answered and did a fantastic job, I'm not well versed in all things Red Hat but, while rsync is ideal for file translation, interruptions, integrity via checksums, compression, and just by typing rsync of course. It's main purpose, correct me if I'm wrong is delta transfers, maybe I shouldn't say the main purpose, but it's efficiency. Which in this case is less relevant for rotated logs, which are typically new files. Rsync can be resource heavy especially for 4GiB files, if that's a reachable size for your ever so clever app titled 'app'. Offloading the logs to a remote host is a reactive measure that addresses the symptom, but not the underlying problem of why the log is growing rapidly. So now we have continued I/O strain, potential remote issues, potential missed diagnostics, and performance overhead. Especially if the network to 'remote_host' is slow or unreliable, could add to the I/O wait during postrotate phase. SSH dependency - Access and auth, if either fail the postrotate script will fail, with potential logrotate and the system in an inconsistent state. Just to stir things up even more, the wildcard is brave, although it's a questionare and the environment seems pretty assumable. 0 error handling, which again, controlled environment, just messing around, who needs it. Verbose output might generate more than what you want to chew up for logrotate. Ultimately just adding more I/O calls to the system. 

Excessive writes, causing I/O wait during writes, even if rotated frequently.
Not sure the entire background for our lovely remote host, but hopefully it's an audit and purge factory. If not, welp, we got another issue.
Offloading the log brushes any application error, misconfig, attack, etc. off to the next or nomansland - void.
The overhead of frequent rotations is viable in high load environments, which hey, I'm assuming everythings game.

Let's first off start by checking out what ol app has so much to talk about and why it wont be quite. 

No code block so I'll pretend Markdown exists. Nevermind that's awful we will go for bold.

head -n 100 /var/log/app.log
tail -n 100 /var/log/app.log

Maybe if we dig a little more we can systemd and journal it.

systemd status app.service
journalctl -xeu app --since "1 hour ago" | wc -l
journalctl -xeu app.service -b

 

Nerd it up a little bit and quantify the log rate. A thousand per minute is definitely a sign of overlogging.

grep "$(date '+%Y-%m-%d %H:%M')" /var/log/app.log | wc -l

 

Hopefully slide on by with adjusting the log level of the application. Or filtering in rsyslog.

Guess if not, we are in for a bit of a game. 

grep -i "error\|exception\|failed" /var/log/app.log | sort | uniq -c | sort -nr | head -n 10

Cross your fingers it's an error loop, pumping messages / stack traces. Evaluate the logs and determine the next best approach. Hopefully a simple configuration error or permissions issue.

We can review system resources at this point, 

top -b -n 1
free -h (or -m if you really want it all)

Check for rsyslog usage.


lsof /var/log/app.log
cat /etc/rsyslog.conf /etc/rsyslog.d/app.conf

Temporarily suppress repetitive logs assuming syslog, which is probably harder than log levels, none the less. 

# /etc/rsyslog.d/app.conf
if $programname == 'app' and $msg contains "error_string_not_creative_tonight" then stop

Or even better, property filtering, which I think is standard. Maybe not this mess, but standalone.

# /etc/rsyslog.d/app.conf
 :programname, isequal, "app" {
  :msg, contains, "error_string_still_not_creative" stop
   :msg, regex, "DEBUG|TRACE" stop
    /var/log/app.log
}
& stop

Could be a lot prettier if we had code blocks or Markdown enabled.
Don't let me forget SELinux or restart rsyslog - shhhhh

 

restorecon -v /etc/rsyslog.d/app.conf

chcon -t var_log_t /var/log/app.log

ausearch -m avc -ts recent | grep "rsyslog" (or app perhaps)

systemctl restart rsyslog

If app writes directly to app.log we can potentially modify the app's log config or use a pipe to redirect the logs to rsyslog. No need to do so, just being a nerd, so we can skip this step.

Check for high %iowait / latency
df -T /var/log
findmnt /var/log
ioping -c 10 /var/log
iostat -x 1 10

~~ Optimize filesystem mount options - you get the idea I assume. Can provide more details upon request. ~~


Guess we continue resource checks depending on the output of the log, if we didn't already fix the issue via permissions, fs issues, connection, configuration, etc.


lsof -p 1993
free -h (or -m)
ss | ip | netstat -tuln - (Don't pipe those together, netstat is oldschool, ss is modern, ip route is ip route)
ss -tuln sport = :1993
ss -tunp | grep 1993
awk '{print $1}' /var/log/app.log | sort | uniq -c | sort -nr | head -n 10
tcpdump -i eth0 port 1993 -c 100

Maybe a game to be played after all? High traffic, sad face. Potential attack, happy face.

firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" port port="1993" protocol="tcp" limit value="100/m" reject'
firewall-cmd --reload
semanage fcontext -a -t firewalld_var_run_t "/var/run/firewalld.*"
restorecon -R -v /var/run/firewalld

Might as show you ipv6 too.

firewall-cmd --permanent --add-rich-rule='rule family="ipv6" source address="::/0" port port="1993" protocol="tcp" limit value="100/m" reject'
firewall-cmd --reload
semanage fcontext -a -t firewalld_var_run_t "/var/run/firewalld.*"
restorecon -R -v /var/run/firewalld

Could do more error handling, but I've spent enough time on this lol.

 

firewall-cmd --runtime-to-permanent

firewall-cmd --reload

semanage fcontext -a -t firewalld_var_run_t "/var/run/firewalld.*"

ausearch -m avc -ts recent | grep firewalld

Can also do some error handling, if interested let me know.

Traffic taken care of, this moves on to just maybe we get to play a game. Planning and preparation phase, establishing objectives, scope, and rules of engagement.

To be continued...

Welp, no fun here, we probably fixed the issue by now or know what the problem is, not sure if we are developing the app, but if we are, here we come source code. If not consultations at its best. Good luck.

Determine user and group.

systemctl cat app.service
ps -auxZ | grep app

"[Service]
User=app
Group=app"

The entire post really spawned from this, my bad.

# /etc/logrotate.d/app

maxsize 4G
rotate 4
compress
delaycompress
missingok
notifempty
create 0640 USER GROUP
postrotate
/usr/bin/systemctl reload app.service > /var/log/logrotate_app.log 2>&1 || {
    echo "systemctl reload failed for app.service at $(date)" >>          /var/log/logrotate_errors.log
    /usr/sbin/sendmail -s "Logrotate Reload Failure" aferguson@ordl.org < /          /var/log/logrotate_errors.log
    exit 1
    }
endscript
}


restorecon -v /etc/logrotate.d/app
chcon -t var_log_t /var/log/logrotate_app.log /var/log/logrotate_errors.log
ausearch -m avc -ts recent | grep logrotate | audit2allow -M logrotate_app
semodule -i logrotate_app.pp

Yikes it looked halfway decent before the bold, I copy and pasted from blank page and some of the line spacing also got skewed. We definitely need a code block or ability to write in markdown.

Open Research and Development Laboratories
Enterprise Systems Architect
Kernel Engineer
|RH|FOSS|
Concurrency | Dynamics | Mutants | Memory Space
Chetan_Tiwary_
Community Manager
Community Manager
  • 331 Views

@Winsock what a detailed answer covering a wide range of things / concepts !! Thanks for sharing your inputs!

Trevor
Starfighter Starfighter
Starfighter
  • 333 Views

Okay, let's have a look at that first question:

Q.) A nightly job fails when the server reboots during execution. Which of the cron, anacron or systemd timer would you use and why ?

 

This is a pretty straight forward one.  Of the 3 schedulers that I have at my disposal,
I'm going to eliminate cron right away because it's designed for systems that are
consistently running.  With cron, If the system is down during a scheduled time, the job
is missed, and won't be executed until the next scheduled time - if there is a next
scheduled time.

So, that leaves us to investigate the remaining 2 schedulers: Anacron and 
Systemd Timer.

Anacron is intended for systems that may be powered off for periods, and ensures
that scheduled tasks are executed when the system is back online, even if they were
missed due to downtime.  If the system is off when a scheduled task is due, Anacron
will execute it as soon as the system is powered on.   Great!  Anacron is a candidate
for our scenario.

Well, it just so happens that Systemd Timer offers that same feature that Anacron
offers - runs jobs even if the system is off at the scheduled time, when the system
is powered on again.

So, since I have an option to use either Anacron or Systemd Timer for this scenario,
what additional criteria am I going to look at to make my selection.  Well, it depends
on features are required for the scheduler.  In the case of Systemd Timer, it offers 
more advanced features than Anacron, some of which include:
- finer-grained scheduling
- better logging
- integration with the systemd ecosystem

Let me keep things simple, and conclude with this summary:

- If you need a simple way to schedule tasks on systems that may not be always
running, Anacron is a good option.
- If you need more advanced scheduling, better logging, and integration with systemd,
systemd timers are the preferred choice.

 

 

Trevor "Red Hat Evangelist" Chandler
Trevor
Starfighter Starfighter
Starfighter
  • 306 Views

Okay, it's time for that second question:

Q.) You need to move /var to another physical disk while the system is live and preserve SELinux contexts. How would you achieve this ?

We want to do 2 things:
- move (a directory)
- preserve (SELinux contexts of that moved directory)

The mv command will accmplish both of those!

Let's assumne that my destinaiton physical disk is mounted at /mnt/dest1.  
The command that will move /var to that physical disk, preserving the SELinux
context of /var, is:      #  mv   /var   /mnt/dest1

To verify that this actually achieved what we were wanting, execute the following
commands:

Before the mv command:      #  ls  -ldZ  /var

After the mv command:         #  ls  -ldZ  /mnt/dest1/var

 

That's my offering!

 

 

 

 

Trevor "Red Hat Evangelist" Chandler
Join the discussion
You must log in to join this conversation.