cancel
Showing results for 
Search instead for 
Did you mean: 
iranzo Cadet
Cadet
  • 1,034 Views

Automation for troubleshooting?

Hi

I wanted to know what sysadmins out there are using for automation in troubleshooting (pattern matching, package versions affected by a bug, misconfigurations, etc).


Thanks!

Pablo

8 Replies
Edu Cadet
Cadet
  • 1,018 Views

Re: Automation for troubleshooting?

I am interested in this topic too.

Tags (1)
0 Kudos
Reply
Loading...
Flight Engineer Tjako Flight Engineer
Flight Engineer
  • 975 Views

Re: Automation for troubleshooting?

Hello,

It's rather difficult to use automation to deal with troubleshooting, the only way that is really use full is configuration management.

To check all the systems with tooling like ansible/puppet/chef/saltstack to enforce working configurations and fixes for already known bugs that have been fixed in the past. (And properly analysed with root cause analysis etc)

Besides that log analysing tools like elk could give you a hand to help you detect/pinpoint failures in complex automation chains.

Maybe someone did find/build a tool that tries to fix problems, but I expect it has a low success rate ( and I would definitely validate the solution first before implementing it )

 

TJ

0 Kudos
Reply
Loading...
iranzo Cadet
Cadet
  • 960 Views

Re: Automation for troubleshooting?

Hi, Full configuration management is an option, but not troubleshooting. There are already tools for it:

- xsos (checks sosreport data or current system), can report information on some network buffers or memory load

- sarstats (can check sar historic data to automate the 'peak' detection on load, etc)

- lynis (can check on known security bugs or misconfigurations)

- last year we gave a talk at Devconf.cz (available on google) on "Detect pitfalls on OSP deployments" on things we worked at for it.

There are also other tools that allow to apply remediation

Usual company wikis or knowledge base store the information on the known issues, instead of checking for it next time an issue happens that's the kind of automated checks that can be performed.... and once the known has been ruled out, you can go for doing the traditional step-by-step approach (then, document it to feed it back into the detection process)

 

Regards,

0 Kudos
Reply
Loading...
Starfighter Lisenet Starfighter
Starfighter
  • 851 Views

Re: Automation for troubleshooting?

We use Instana to monitor and troubleshoot Java applications. We previously used AppDynamics, but it became too expensive.

You mentioned Lynis. We use Lynis for general security auditing and hardening (alongside OpenSCAP), however, I would not say that it's a troubleshooting tool.

Reply
Loading...
Flight Engineer americanada Flight Engineer
Flight Engineer
  • 846 Views

Re: Automation for troubleshooting?

I 2nd Instana.  Great tool for Java apps, I wish I had known about it - or that it had been around - years ago.

0 Kudos
Reply
Loading...
  • 966 Views

Re: Automation for troubleshooting?

Fixes can be applied by autmation by to find the cause and test the solution I go with tradiotnal way debugging step by step.

0 Kudos
Reply
Loading...
DMB Cadet
Cadet
  • 943 Views

Re: Automation for troubleshooting?

You can write quick scripts to quickly collect information about the system but I think these are are hand-rolled by admins with an eye toward problems commonly encountered in their specific environments.  They do, however, make data collection fast an consistent.  You might also pipe the collected output through an awk script that highlights anomalys.

0 Kudos
Reply
Loading...
Flight Engineer bonnevil Flight Engineer
Flight Engineer
  • 860 Views

Re: Automation for troubleshooting?

One thing you might look at is Red Hat Insights.

Insights is a SaaS-based tool that's hosted on Red Hat Customer Portal.  You send a small amount of system metadata to Insights (less than an sosreport), and it does an analysis for known issues, misconfigurations, security vulnerabilities (could be vulnerable packages, could be things like Spectre/Meltdown/L1TF) and so on.  Support fairly frequently adds more "rules" for issues to check for to the Insights knowledge base, and they're typically tied to Customer Portal kbase articles.  But the really cool thing is that to automate remediation of most issues, you can generate Ansible Playbooks or bash scripts, with an estimate of how "risky" the remediation might be.  You can see and control the metadata it'll send to the Insights service, too.

Yeah, I work at Red Hat, but I legitimately think it's a useful tool and it sounds like it would address at least part of what you're looking for.  If you've already got "Smart Management" subscriptions and a Satellite server, I think you should already have access to Insights, and you can set up the Satellite to act as a proxy to Insights and as a local UI to review reports and plan remediations.  There's more info at https://access.redhat.com/products/red-hat-insights and https://www.redhat.com/en/technologies/management/insights.

0 Kudos
Reply
Loading...
Join the discussion
You must log in to join this conversation.