I recently had more than one hour of downtime following a crash on
RSSEpisodes.com, my Django powered spare-time
developed website. I am still investigating the exact cause for the
crash, but what I learned was that the FastCGI process was killed and it
was never restarted.
No FastCGI process running means no website.
To ensure that this does not happen again, I have set up monitoring of
the FastCGI process, using djb's Daemontools. Daemontools comes
with a tool called supervise which monitors a daemon and restarts it
if it dies. Exactly what I want and super-easy. Here's how I set it up.
I'm using Gentoo on this particular server and here are the three steps
I needed to perform to install Daemontools:
# emerge daemontools
# rc-update add svscan default
# /etc/init.d/svscan start
Daemontools creates a directory /service -- in here you create one
directory for each service you want monitored. In this case it is only the
Django web application.
However, once svscan (the daemontools daemon) sees a directory below
/service it tries to supervise it, so we will create the web
application service directory somewhere else and then create a symbolic
link to the appropriate location. In the examples below this link is
named webapp.
The svscan daemon tries to execute a file named run in each
subdirectory of /service, so we will create a simple shell-script that
invokes our FastCGI process
# mkdir ~/service
<create the file ~/service/run>
The only two file that you need to create is given below. The script, run, sets the PYTHONPATH environment variable and the executes manage.py runfcgi. Notice that it calls exec so that it executes in the current process instead of forking a new one. It also calls setuidgid as a security measure so that the webapp process executes as a non-privileged user.
This is ~/service/run:
#!/bin/sh
export PYTHONPATH=$HOME/extra/python/path
export WEBAPP=$HOME/path/to/webapp
exec setuidgid username $HOME/$WEBAPP/manage.py runfcgi daemonize=false host=localhost port=4040
The important part of this script is daemonize=false. If you don't
specify this the Python process will return immediatly. Daemontools will
recognize that the process it started has died and of course restart it.
Now we create a symbolic link in /service that points to ~/service,
I named it webapp. As soon as Daemontools sees this link, it should
happen immediately, it runs the run script. We can use the svstat
utility to monitor the process' uptime:
# ln -s ~/service /service/webapp
# svstat /service/*
/service/webapp: up (pid 12078) 5 seconds
If manage.py dies it will restart automatically within a second!
Whenever we want to restart the FastCGI process we simply kill it:
# kill 12078
<process is restarted by svnscan>
You do this every time you update your code.
We give manage.py a host name and port number. These values should
match whatever your webserver expects. Since I'm using lighttpd to power
RSSEpisodes I have something like this in my lighttpd.conf:
fastcgi.server = (
"/django.fcgi" =>
((
"host" => "127.0.0.1",
"port" => 3333,
"check-local" => "disable",
))
)
alias.url += (
"/media/" => "/path/to/django/contrib/admin/media/",
)
url.rewrite-once = (
"^(/media.*)$" => "$1",
"^(/.*)$" => "/django.fcgi$1",
)
That was all. Now that my Django process is securly monitored I can
investigate what caused the downtime in the first place.