How We Built It, Part 7: Cron and Automation

Blog
How We Built It, Part 7: Cron and Automation

Oops, something went wrong! Site admins have been notified.

In Part 6: The Forms we built the ways people send data into the site. This post is about the work the site does on its own, with nobody watching — the scheduled jobs that keep the calendar fresh and the data safe. None of it is glamorous, and that's exactly why it matters: the best automation is the kind you forget is even running.

Two things run on a schedule on Chattanooga.Digital: Drupal's own cron, every hour, and a weekly off-site backup, every Sunday morning. Both are driven by one small container, and the whole arrangement is just a few lines of configuration.

What "cron" even is

Cron is the oldest idea in server administration: a list of commands, each paired with a schedule, that something runs for you on time, forever. Each schedule is five fields — minute, hour, day-of-month, month, day-of-week. A line like 0 * * * * means "at minute 0 of every hour"; 0 3 * * 0 means "at 03:00 on Sunday" (day-of-week 0). That's the entire vocabulary.

The interesting question for a Dockerized site is where cron runs. We don't want it on the developer's laptop, and we don't want to wedge it into the web server. So it gets its own container.

The crond container

Back in Part 1 we noted that make up boots four containers — mariadb, php, nginx, and one called crond. That last one is the scheduler. It uses the exact same Drupal image as the web container, so it has Drush and the full codebase available, but its only job is to wake up on schedule and run things.

The schedule lives in one environment variable on that service in compose.yaml:

crond:
image: wodby/drupal-cms:${DRUPAL_CMS_TAG}
environment:
  CRONTAB: "0 * * * * drush -r /var/www/html/web cron\n0 3 * * 0 /usr/local/bin/backup-offsite.sh"
command: sudo -E crond -f -d 0

The Wodby image reads that CRONTAB value on boot and installs it as the container's crontab. The \n separates two cron lines into two scheduled jobs:

0 * * * * drush -r /var/www/html/web cron — run Drupal's cron every hour, on the hour.
0 3 * * 0 /usr/local/bin/backup-offsite.sh — run our backup script at 03:00 every Sunday.

That's the whole scheduler. Two lines. The rest of this post is about why each one exists and what it actually does.

Job one: hourly Drupal cron

Drupal has its own internal cron — a queue of maintenance tasks that core and contributed modules register. Running drush cron fires all of them at once. On most sites this handles unglamorous housekeeping: clearing expired caches, checking for security updates, running deferred queue work. On this site it does two things we care about a lot, both of which trace back to the calendar from Part 5.

Rolling the recurring-event window forward. A weekly meetup is stored once, as a recurrence rule, not as 500 separate rows. But the calendar has to render actual dated occurrences, so Smart Date's recurrence engine expands each rule into concrete instances out to a fixed horizon. The job that does the expanding is smart_date_recur_cron, and it runs as part of hourly Drupal cron. As real-world time advances, this is what keeps next month's occurrences appearing — without it, the calendar would slowly run dry at the edge of its window. (That window is deliberately capped at three months; the calendar post explains why a bigger horizon was a memory problem.)

Re-importing external feeds. Several events come from other organizations' public iCal feeds. Those imports are scheduled to refresh roughly every twelve hours, and the mechanism that decides "is it time to import again?" is, again, Drupal cron. Because cron runs hourly, the importer gets a chance to check its own clock every hour and pulls fresh data twice a day. New events that a partner org posts to their calendar show up here on their own, with no one touching our admin screen.

This is the quiet payoff of the hourly schedule: the calendar is a living thing that stays current by itself. You can watch it work locally — trigger a cron run by hand and see the same thing the scheduler does every hour:

docker compose exec php drush cron

Job two: the weekly off-site backup

The second cron line runs scripts/backup-offsite.sh every Sunday at 03:00. Its job is to get a complete, restorable copy of the site off the server, so a dead disk or a deleted stack can never take the content with it. It does four things in order:

Dumps the database with drush sql:dump --gzip into a dated file like db-chattanooga-prod-YYYYMMDD.sql.gz.
Archives the uploaded files — everything under sites/default/files — into files-chattanooga-prod-YYYYMMDD.tar.gz, skipping the regenerable caches (php/, styles/, css/, js/).
Uploads both to a GitHub Release tagged backups-chattanooga-prod, creating that release the first time if it doesn't exist yet.
Prunes old assets, keeping the four newest of each kind so the release doesn't grow forever.

Why a GitHub Release as the destination? It's free, it's already where the code lives, it's geographically separate from the server, and it's trivial to pull from — exactly the properties you want in a backup target. (For a client engagement we deliberately avoid a certain large cloud provider's storage service; a GitHub Release sidesteps that entirely.)

Two cron gotchas worth knowing

Automation that runs unattended fails in ways you only learn the hard way. Two from this project earned permanent comments in the script.

Cron-spawned jobs get a stripped environment. In production the GitHub token is a Docker Swarm secret, normally exported into the container's shell at boot. But a job launched by cron doesn't inherit that shell's environment — it starts clean. So backup-offsite.sh reads the token straight from the secret file instead of trusting an environment variable:

if [ -z "${GITHUB_BACKUP_TOKEN:-}" ] && [ -f /run/secrets/github_backup_token ]; then
  GITHUB_BACKUP_TOKEN=$(tr -d '\n\r' < /run/secrets/github_backup_token)
fi

Any new script that runs from cron and needs a secret has to do the same — environment inheritance can't be relied on in that context.

Backups are written to /tmp, on purpose. The obvious place to stage the dump would be inside sites/default/files, but that directory is owned by the web user (UID 82) and the cron container runs as a different user that can't write there. /tmp is world-writable and, as a bonus, is wiped on restart — so a stray dump can never accidentally end up baked into a future snapshot of the site. The durable copy is always the one on the GitHub Release.

Production-only, by design

One last decision: the weekly backup only runs in production. Staging exists to be rebuilt from the recipes on every redeploy — it has no content worth preserving, so pushing weekly backups from it would just produce confusing, ambiguous release assets. The staging compose file keeps the backup script available for the occasional manual run, but its crontab contains only the hourly Drupal cron line. The Sunday backup line lives exclusively in the production configuration. The full reasoning is in docs/BACKUP_STRATEGY.md.

What's next

We now have a site that builds from recipes, serves real content, takes form submissions, and quietly maintains and backs itself up. The only thing left is to get it onto the public internet. In Part 8 we deploy it — Docker Swarm, Traefik, and the path from "running on my laptop" to "live at a real domain."

← Part 6: The Forms · Part 8: Deploy It →

More insights

Want updates from Chattanooga.Digital?

Pre-join the co-op to receive new posts, workshop schedules, and member updates.

Pre-join