Good watchdog for services?
I have a simple script i use to keep track of services running on some of my servers and restart them if they die. Its pretty easy and its dead fast to implement on a server. I wonder if anyone has a simpler/better way than"my" script to watchdog services and restart them as nessecary?
This is what i use now and call on from /etc/crontab once every five minutes or whatever depending on how critical the service is.
server:~ # cat /usr/bin/service-watchdog
#!/bin/bash
MYPROC=programname
COUNT=$(UNIX95=1 ps -C $MYPROC -o pid= -o args= | wc -l) \
if [ $COUNT -lt 1 ]
then
echo "Service xxx is restarting at 'date' >> /var/log/messages
/etc/init.d/servicename restart
else
echo "Service xxx is running."
fi
programname is whatever the process is called
servicename is what the file for starting the service in /etc/init.d is called.
:woohoo:
Comments
-
I wrote a similar script a few years ago that used pgrep. If pgrep returned any value besides 1 (I don't think one would use this for init) then it would exit, otherwise, restart the service.0
-
Useful!0
-
I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart.0
-
In my case, I was hosting Ventrilo for my WoW guild. It would die occasionally, so I wrote a script that did pgrep vent_srv, and if that returned 1 then I restarted it.0
-
Khabi wrote:I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart.
The services that die are mostly Novells little gems. Groupwise and some Novell Open Enterprise services has funny little bugs that doesnt seem to be on any priority lists. I havent had any open source services die on me ever that i couldnt fix myself.0 -
ah Novell, I should have guessed0
-
I mostly use Webmins "System and Server Status" on those systems where I have it available. As a plus, it also send customizabloe Mails when a critical service ist down.0
-
For keeping tabs on all the various servers i manage i use Zenoss. Its a pretty slick tool for a busy admin.0
-
If you are writing to /var/log/messages, it would be "cleaner" to use logger.
If the service to be restarted supports the "status" function, I have found that using it (and grepping its output), instead of using ps+grep leads to fewer false positives. It also respects the logic that the application has for determining whether or not it is running. of course, the app could be retarded. in this case, i usually use pidof, instead of ps, since, for me, ps also leads to more false positives.
If you do not want to rely on cron, you can always stick an entry in inittab, but that is probably overkill.0 -
You can try nagios...It's far away from being simple to configure but certainly is one of the best "watchdogs" out there.
~nm0 -
If the program creates a pid file you can use checkproc
Usage:
checkproc -p <pid file> <app name>
Example:checkproc -p /var/run/daemon.pid /usr/local/daemon/daemon.pl
I use this in my init scripts for all my perl daemons.
It could be easily used in a bash script to check the return / exit code.
Example
echo $?
Exit Codes:
0 = Running
1 = Dead but pid exists
3.= Dead and no pid
You may also want to read up on startproc and killproc. I suggest starting with the man pages. There are lots of useful options that can make your bash script very powerful. Then pop it into cron to automate it.
Enjoy
--
Shawn0 -
If you are running Fedora 9 or later, or Ubuntu 6.10 or later, the good ol' standard init system has been quietly replaced with upstart.
They actually almost did too good of a job implementing this, since most people didn't even notice the change.
While I've never set it up to do this, it is supposed to be able to monitor services and restart them automagically if they die.
Its configuration (at least on Fedora) lives in /etc/event.d0
Categories
- All Categories
- 227 LFX Mentorship
- 227 LFX Mentorship: Linux Kernel
- 806 Linux Foundation IT Professional Programs
- 361 Cloud Engineer IT Professional Program
- 182 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 150 Cloud Native Developer IT Professional Program
- 138 Express Training Courses & Microlearning
- 138 Express Courses - Discussion Forum
- Microlearning - Discussion Forum
- 6.3K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 71 LFC131 Class Forum
- 44 LFD102 Class Forum
- 228 LFD103 Class Forum
- 19 LFD110 Class Forum
- 41 LFD121 Class Forum
- 18 LFD133 Class Forum
- 8 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 5 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 700 LFD259 Class Forum
- 111 LFD272 Class Forum - Discontinued
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 172 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 3 LFS116 Class Forum
- 7 LFS118 Class Forum
- LFS120 Class Forum
- 9 LFS142 Class Forum
- 8 LFS144 Class Forum
- 4 LFS145 Class Forum
- 3 LFS146 Class Forum
- 2 LFS148 Class Forum
- 14 LFS151 Class Forum
- 4 LFS157 Class Forum
- 42 LFS158 Class Forum
- LFS158-JP クラス フォーラム
- 10 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 32 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム - Discontinued
- 19 LFS203 Class Forum
- 135 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 4 LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 52 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 156 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 9 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 128 LFS260 Class Forum
- 160 LFS261 Class Forum
- 43 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 25 LFS268 Class Forum
- 31 LFS269 Class Forum
- 5 LFS270 Class Forum
- 202 LFS272 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム
- 4 LFS147 Class Forum
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 10 LFW111 Class Forum
- 262 LFW211 Class Forum
- 183 LFW212 Class Forum
- 15 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 797 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 759 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 354 Ubuntu
- 469 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 94 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 64 Mobile Computing
- 18 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 373 Off Topic
- 115 Introductions
- 174 Small Talk
- 23 Study Material
- 806 Programming and Development
- 304 Kernel Development
- 484 Software Development
- 1.8K Software
- 263 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 98 All In Program
- 98 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)