Welcome to the Linux Foundation Forum!

Good watchdog for services?

I have a simple script i use to keep track of services running on some of my servers and restart them if they die. Its pretty easy and its dead fast to implement on a server. I wonder if anyone has a simpler/better way than"my" script to watchdog services and restart them as nessecary?

This is what i use now and call on from /etc/crontab once every five minutes or whatever depending on how critical the service is.

server:~ # cat /usr/bin/service-watchdog

#!/bin/bash

MYPROC=programname

COUNT=$(UNIX95=1 ps -C $MYPROC -o pid= -o args= | wc -l) \

if [ $COUNT -lt 1 ]

then

echo "Service xxx is restarting at 'date' >> /var/log/messages

/etc/init.d/servicename restart

else

echo "Service xxx is running."

fi

programname is whatever the process is called

servicename is what the file for starting the service in /etc/init.d is called.

:woohoo:

Comments

  • jaymz
    jaymz Posts: 2
    I wrote a similar script a few years ago that used pgrep. If pgrep returned any value besides 1 (I don't think one would use this for init) then it would exit, otherwise, restart the service.
  • Noxn
    Noxn Posts: 2
    Useful!
  • Khabi
    Khabi Posts: 6
    I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart. :)
  • jaymz
    jaymz Posts: 2
    In my case, I was hosting Ventrilo for my WoW guild. It would die occasionally, so I wrote a script that did pgrep vent_srv, and if that returned 1 then I restarted it.
  • tuxmania
    tuxmania Posts: 19
    Khabi wrote:
    I'm kindof interested in what kind of apps you are running that need to be constantly restarted? I've done the same type of thing to restart dead services as a 'quick fix', but I would think the best course is to fix whatever is causing the application to restart. :)

    The services that die are mostly Novells little gems. Groupwise and some Novell Open Enterprise services has funny little bugs that doesnt seem to be on any priority lists. I havent had any open source services die on me ever that i couldnt fix myself.
  • Khabi
    Khabi Posts: 6
    ah Novell, I should have guessed :)
  • I mostly use Webmins "System and Server Status" on those systems where I have it available. As a plus, it also send customizabloe Mails when a critical service ist down.
  • tuxmania
    tuxmania Posts: 19
    For keeping tabs on all the various servers i manage i use Zenoss. Its a pretty slick tool for a busy admin.
  • atreyu
    atreyu Posts: 216
    If you are writing to /var/log/messages, it would be "cleaner" to use logger.

    If the service to be restarted supports the "status" function, I have found that using it (and grepping its output), instead of using ps+grep leads to fewer false positives. It also respects the logic that the application has for determining whether or not it is running. of course, the app could be retarded. in this case, i usually use pidof, instead of ps, since, for me, ps also leads to more false positives.

    If you do not want to rely on cron, you can always stick an entry in inittab, but that is probably overkill.
  • neomancer
    neomancer Posts: 3
    You can try nagios...It's far away from being simple to configure but certainly is one of the best "watchdogs" out there.

    ~nm
  • SiduS
    SiduS Posts: 5
    If the program creates a pid file you can use checkproc

    Usage:
    checkproc -p <pid file> <app name>

    Example:
    checkproc -p /var/run/daemon.pid /usr/local/daemon/daemon.pl
    

    I use this in my init scripts for all my perl daemons.

    It could be easily used in a bash script to check the return / exit code.

    Example
    echo $?

    Exit Codes:
    0 = Running
    1 = Dead but pid exists
    3.= Dead and no pid

    You may also want to read up on startproc and killproc. I suggest starting with the man pages. There are lots of useful options that can make your bash script very powerful. Then pop it into cron to automate it.

    Enjoy

    --
    Shawn
  • isaac
    isaac Posts: 17
    If you are running Fedora 9 or later, or Ubuntu 6.10 or later, the good ol' standard init system has been quietly replaced with upstart.

    They actually almost did too good of a job implementing this, since most people didn't even notice the change. ;)

    While I've never set it up to do this, it is supposed to be able to monitor services and restart them automagically if they die.

    Its configuration (at least on Fedora) lives in /etc/event.d

Categories

Upcoming Training