Welcome to the new Linux Foundation Forum!

Good watchdog for services?

I have a simple script i use to keep track of services running on some of my servers and restart them if they die. Its pretty easy and its dead fast to implement on a server. I wonder if anyone has a simpler/better way than"my" script to watchdog services and restart them as nessecary?

This is what i use now and call on from /etc/crontab once every five minutes or whatever depending on how critical the service is.

server:~ # cat /usr/bin/service-watchdog

#!/bin/bash

MYPROC=programname

COUNT=$(UNIX95=1 ps -C $MYPROC -o pid= -o args= | wc -l) \

if [ $COUNT -lt 1 ]

then

echo "Service xxx is restarting at 'date' >> /var/log/messages

/etc/init.d/servicename restart

else

echo "Service xxx is running."

fi

programname is whatever the process is called

servicename is what the file for starting the service in /etc/init.d is called.

:woohoo:

Comments

  • jaymzjaymz Posts: 2
    I wrote a similar script a few years ago that used pgrep. If pgrep returned any value besides 1 (I don't think one would use this for init) then it would exit, otherwise, restart the service.
  • NoxnNoxn Posts: 2
    Useful!
  • KhabiKhabi Posts: 6
    I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart. :)
  • You may disable the echo commands as they clog up your root's mailbox, and send a mail directly to root once the need of restarting a service arises.

    /etc/init.d/servicenme restart 2> /tmp/restart
    mail root < /tmp/restart
  • jaymzjaymz Posts: 2
    In my case, I was hosting Ventrilo for my WoW guild. It would die occasionally, so I wrote a script that did pgrep vent_srv, and if that returned 1 then I restarted it.
  • tuxmaniatuxmania Posts: 19
    Khabi wrote:
    I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart. :)

    The services that die are mostly Novells little gems. Groupwise and some Novell Open Enterprise services has funny little bugs that doesnt seem to be on any priority lists. I havent had any open source services die on me ever that i couldnt fix myself.
  • KhabiKhabi Posts: 6
    ah Novell, I should have guessed :)
  • I mostly use Webmins "System and Server Status" on those systems where I have it available. As a plus, it also send customizabloe Mails when a critical service ist down.
  • tuxmaniatuxmania Posts: 19
    For keeping tabs on all the various servers i manage i use Zenoss. Its a pretty slick tool for a busy admin.
  • atreyuatreyu Posts: 216
    If you are writing to /var/log/messages, it would be "cleaner" to use logger.

    If the service to be restarted supports the "status" function, I have found that using it (and grepping its output), instead of using ps+grep leads to fewer false positives. It also respects the logic that the application has for determining whether or not it is running. of course, the app could be retarded. in this case, i usually use pidof, instead of ps, since, for me, ps also leads to more false positives.

    If you do not want to rely on cron, you can always stick an entry in inittab, but that is probably overkill.
  • LegacyUserLegacyUser Posts: 0
    You might be interested in Monit
  • neomancerneomancer Posts: 3
    You can try nagios...It's far away from being simple to configure but certainly is one of the best "watchdogs" out there.

    ~nm
  • SiduSSiduS Posts: 5
    If the program creates a pid file you can use checkproc

    Usage:
    checkproc -p <pid file> <app name>

    Example:
    checkproc -p /var/run/daemon.pid /usr/local/daemon/daemon.pl
    

    I use this in my init scripts for all my perl daemons.

    It could be easily used in a bash script to check the return / exit code.

    Example
    echo $?

    Exit Codes:
    0 = Running
    1 = Dead but pid exists
    3.= Dead and no pid

    You may also want to read up on startproc and killproc. I suggest starting with the man pages. There are lots of useful options that can make your bash script very powerful. Then pop it into cron to automate it.

    Enjoy

    --
    Shawn
  • isaacisaac Posts: 17
    If you are running Fedora 9 or later, or Ubuntu 6.10 or later, the good ol' standard init system has been quietly replaced with upstart.

    They actually almost did too good of a job implementing this, since most people didn't even notice the change. ;)

    While I've never set it up to do this, it is supposed to be able to monitor services and restart them automagically if they die.

    Its configuration (at least on Fedora) lives in /etc/event.d
Sign In or Register to comment.