Post

Linux - Automation: Advanced Bash Traps and Signal Handling for Robust Atomic Scripts

Linux - Automation: Advanced Bash Traps and Signal Handling for Robust Atomic Scripts

The ‘Zombie’ Resource Problem

We have all written a Bash script that creates a temporary directory, starts a background process, or mounts a filesystem, only to have the script crash halfway through. The result? A /tmp folder that never gets deleted, or a background process that keeps running forever, consuming CPU and locking files.

In professional automation, a script must be “Atomic”-it either finishes completely, or it cleans up after itself perfectly regardless of how it failed (Ctrl+C, a crash, or a system shutdown). The secret to this is the trap command.

How ‘trap’ Works: The Signal Pipeline

The trap command tells the shell to execute a specific function or command when it receives a signal.

  • EXIT: The script is finishing (either naturally or via exit).
  • SIGINT: The user pressed Ctrl+C.
  • SIGTERM: A system manager (like systemd) is asking the script to stop.
  • ERR: A command within the script failed (requires set -e).

Implementation: The ‘Perfect’ Temporary Workspace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash
# Advanced Bash Boilerplate: exit on error, fail on unset variables, fail on pipe error
set -euo pipefail

# Define the cleanup function
cleanup() {
    # Capture the exit code of the last command
    local exit_code=$?
    echo "Cleaning up... (Exit code: $exit_code)"

    # Remove temporary files
    if [[ -d ${TEMP_DIR:-} ]]; then
        rm -rf "$TEMP_DIR"
    fi

    # If we started a background process, kill it
    if [[ -n ${BG_PID:-} ]]; then
        kill "$BG_PID" 2>/dev/null || true
    fi

    # Final exit with the original code
    exit "$exit_code"
}

# Register the trap: Run cleanup on EXIT, INT, or TERM
trap cleanup EXIT INT TERM

# Create a temporary workspace
TEMP_DIR=$(mktemp -d)
echo "Working in $TEMP_DIR"

# Start a 'dummy' background task
sleep 100 &
BG_PID=$!

# Simulate a failure
if [[ ${1:-} == "fail" ]]; then
    echo "Simulating a crash!"
    exit 1
fi

echo "Task complete!"

Advanced Logic: The ‘Double Trap’ and Lockfiles

Sometimes, your cleanup process itself might take time (e.g., unmounting a busy network drive). What if the user presses Ctrl+C a second time during the cleanup? The Ideal Fix: Redefine the trap inside the cleanup function to ignore further signals.

1
2
3
4
5
6
7
cleanup() {
    # Ignore further signals during cleanup to prevent interruption
    trap '' INT TERM
    echo "Performing slow cleanup..."
    umount /mnt/backup_storage
    exit
}

Implementing Mutual Exclusion (Locking)

To prevent a script from running twice at the same time:

1
2
3
4
5
6
7
LOCKFILE="/tmp/myscript.lock"
if ! exec 3>"$LOCKFILE" 2>/dev/null; then
    echo "Error: Script is already running."
    exit 1
fi
# Ensure the lock is released on exit
trap 'rm -f "$LOCKFILE"' EXIT

Real-World Use Case: Database Backups

When performing a database backup, you often need to lock tables or create a filesystem snapshot. If your script crashes while the tables are locked, your application will go offline.

1
2
3
4
5
lock_database
trap 'unlock_database' EXIT INT TERM

# Perform the backup
tar -czf backup.tar.gz /var/lib/mysql/data

With this trap, even if the disk fills up or the SSH connection drops, the EXIT trap will ensure the unlock_database command runs, saving your production environment from a permanent hang.

Summary

Reliability in Linux administration isn’t about writing code that never fails; it’s about writing code that fails gracefully. By mastering Bash traps and signal handling, you ensure that your automation scripts never leave your systems in an inconsistent state. This level of defensive programming is what separates a “quick script” from a production-grade infrastructure tool. It is a mandatory skill for anyone managing critical servers.

This post is licensed under CC BY 4.0 by the author.