Production Grade Bash Scripts

Most shell scripts start innocent, just a few lines to glue things together. Blink twice, and it’s deploying infrastructure, rotating secrets, restarting servers, and possibly provisioning a small nation-state. It’s doing things for the people, by the people, held together by echo.

This isn’t another “bash scripting 101” tutorial. You already know how to loop over a list and grep things. This is about writing scripts that survive real-world conditions: bad input, missing dependencies, flaky networks, and humans.

Each section is a standalone upgrade, a small change with a big payoff. No boilerplate templates, no rules for the sake of rules. Just practical ways to make your scripts safer, cleaner, and less terrifying to revisit six months later.

To be clear, you shouldn’t write production-grade bash scripts. But if you’re going to do it anyway - and you are - at least do it well.

Let’s get into it.

1. Strict Mode: `set -euo pipefail`

Shell doesn’t assume anything is dangerous. You probably should.

Bash scripts don’t fail loudly by default. A command can break, a variable can be unset, and the script might just continue like nothing happened.

Strict mode gives you a safety net:

set -euo pipefail

-e: Exit immediately if any command fails
-u: Error on using unset variables
-o pipefail: Fail if any command in a pipeline fails

Without it, a bad cp, a mistyped variable, or a broken pipe can slip by unnoticed, until the consequences show up in logs (or don’t).

Strict mode doesn’t solve every problem, but it reduces surprises. And that’s a good start.

Add it at the top. Always.

2. Quoting Variables

One misplaced quote can turn a cleanup script into a delete-everything script.

Shell expands variables aggressively. If there’s a space, wildcard, or newline hiding inside a variable, Bash won’t warn you, it’ll just interpret it literally.

Consider this:

rm -rf $TARGET_DIR/*

Looks harmless, right? But if TARGET_DIR="/important data", this becomes:

rm -rf /important data/*

And now you’ve run two separate commands:

rm -rf /important
rm -rf data/*

That’s not a bug. That’s Bash doing its job, with no questions asked.

The fix is simple:

rm -rf "$TARGET_DIR"/*

Quoting variables ensures that the shell treats them as a single argument, exactly as intended, even when the value contains spaces, tabs, or wildcards.

This is especially critical for paths, user input, flags, or anything derived from the environment. It’s a small habit that prevents catastrophic outcomes.

Always quote your variables, even when you think you don’t need to.

3. Dependency Checks

If your script assumes a tool is installed, it should also be the first to check.

Nothing derails a shell script faster than a missing binary. Your script might rely on curl, jq, docker, or awk, but unless you’re checking, you’re just hoping they’re there.

A simple pattern:

command -v curl >/dev/null 2>&1 || {
  echo "Error: curl is not installed." >&2
  exit 1
}

You can wrap this into a reusable function:

require() {
  command -v "$1" >/dev/null 2>&1 || {
    echo "Missing dependency: $1" >&2
    exit 1
  }
}

require curl
require jq

Failing early is better than failing halfway through. Especially when your script depends on tools that might not be standard on every system.

Check dependencies before you need them, scripts shouldn’t assume anything but /bin/sh.

4. Logging to a File

If something breaks, the log should know before you do.

A good script logs what it’s doing. A great script logs everything, to both the terminal and a file, without duplicating effort.

Instead of manually redirecting each command’s output, you can route everything with a single line:

exec > >(tee -a "$LOG_FILE" | logger -t tag-name -s) 2>&1

This does three things:

Appends all stdout/stderr to $LOG_FILE
Streams output live to the terminal (tee)
Sends logs to syslog via logger, tagged with tag-name

It works at the script level, so every line - every error, every echo - gets captured. Perfect for debugging, postmortems, or just knowing what actually happened.

All you need before this is a defined log path:

LOG_FILE="/var/log/my-script.log"

No more wondering “what happened” after the script exits.

Log early, log everything, because terminal output disappears fast.

5. Greppable & Parsable Output

Logs aren’t just for humans, they’re for scripts too.

Readable logs are good. Logs that can be searched, filtered, and parsed by tools? Much better.

A well-structured log line starts with a timestamp and log level (INFO, WARN, ERROR, and DEBUG), avoids unnecessary spaces, and uses consistent key=value pairs for metadata. This makes it easy for grep, awk, cut, or any log processor to extract exactly what they need - without guessing.

For example, output like this:

2025-04-16T13:37:00+0000:::[INFO]:::event=task_start,user_id=123,task_id=456
2025-04-16T13:37:01+0000:::[ERROR]:::event=file_open_failed,file=config.json,reason=NoSuchFile

Lets you quickly extract signals from the noise:

# Show all errors
grep ':::\[ERROR\]:::' script.log

# Filter logs by user ID
grep 'user_id=123' script.log

# Use awk to extract event names
awk -F':::' '{print $3}' script.log | awk -F',' '{for(i=1;i<=NF;i++) if($i ~ /^event=/) print $i}'

What really makes a good log line?

No spaces between fields-use a delimiter
Consistent key=value pairs for structured data
Fixed field order (timestamp, level, then data)
Unique and searchable field names (e.g., user_id, not just id)
Machine readability first, human readability second
No unstructured errors-wrap them with context (reason=timeout instead of “it broke”)
Avoid redundancy-don’t repeat info the timestamp or log level already shows

Optionally, Colored Logs + File Output are making this more professional and nice.

You can wrap logging in a simple function for better formatting:

LOG_FILE="./script.log"

log() {
  local level="$1"
  shift
  local ts
  ts="$(date '+%Y-%m-%dT%H:%M:%S%z')"
  local fields="$*"
  local message="$ts:::$level:::$fields"

  # Color-coded terminal output
  case "$level" in
    ERROR) echo -e "\033[0;31m$message\033[0m" >&2 ;;
    WARN)  echo -e "\033[0;33m$message\033[0m" >&2 ;;
    *)     echo "$message" >&2 ;;
  esac

  # Log to file
  echo "$message" >> "$LOG_FILE"
}

This gives you:

Colored terminal output
Structured logs saved to file
Greppable, parseable history

Structure your logs like someone else will have to debug them. Because one day, they will.

6. Add a Debug Mode

Sometimes you want to see everything. Sometimes you don’t.

A debug mode lets you toggle verbose output without editing your script every time. It’s helpful during development and when things go wrong, especially in longer scripts where silent failures can hide.

Start with a flag:

DEBUG=false

Add a logging function:

debug() {
  if [ "$DEBUG" = true ]; then
    echo "[DEBUG] $*"
  fi
}

Use it like this:

debug "Copying files to $DEST_DIR"
cp "$SRC_FILE" "$DEST_DIR"

To enable debug mode, either export the variable or pass it as an argument:

DEBUG=true ./deploy.sh

Or parse it from flags:

while [[ $# -gt 0 ]]; do
  case "$1" in
    --debug) DEBUG=true ;;
  esac
  shift
done

This keeps your logs quiet by default, but gives you insight when you need it, without rewriting anything.

Debug mode is like turning the lights on before you panic.

7. Add a Progress Spinner

Because silence feels like failure.

Long-running commands can make users wonder if the script hung or crashed. A simple spinner adds just enough feedback to show that something’s happening without cluttering the terminal.

Here’s a minimal implementation:

spinner() {
  local pid=$1
  local delay=0.1
  local spin='|/-\'

  while kill -0 "$pid" 2>/dev/null; do
    for i in $(seq 0 3); do
      printf "\r[%c] Working..." "${spin:$i:1}"
      sleep "$delay"
    done
  done
  printf "\r[✓] Done.        \n"
}

Use it like this:

long_task() {
  sleep 5  # replace this with your actual command
}

long_task &
spinner $!

It’s a small UX upgrade, especially in scripts that handle provisioning, backups, or large data transfers. No output doesn’t have to mean no activity.

A spinner tells the user: “I’m alive. Trust me.”

8. Create a Resumable Script

Because rerunning the whole thing shouldn’t feel like starting over.

Scripts that fail halfway shouldn’t force you to start from scratch. With a few patterns, you can make scripts idempotent or at least resume-aware.

The simplest way is to use marker files:

if [ ! -f /tmp/setup.step1.done ]; then
  echo "Running step 1..."
  # some long-running command
  touch /tmp/setup.step1.done
fi

For multi-step workflows, track state between runs:

STEP_FILE=".script-progress"

mark_done() {
  echo "$1" >> "$STEP_FILE"
}

is_done() {
  grep -q "^$1$" "$STEP_FILE" 2>/dev/null
}

if ! is_done "step:download"; then
  echo "Downloading files..."
  # download logic
  mark_done "step:download"
fi

You can also use checks based on the result of a command instead of marker files, whatever is more reliable in your context.

Now, if your script processes a long input file - say, a list of hosts or user IDs - resumability means not starting from the top again. Instead of deleting lines from the input file, a cleaner pattern is to track progress in a separate file. This way, the input remains untouched, and your script knows exactly where to resume.

INPUT_FILE="input.txt"
STATE_FILE=".progress"

# Start from the saved line number or default to 1
START_LINE=$( [ -f "$STATE_FILE" ] && cat "$STATE_FILE" || echo 1 )

LINE_NO=0
while IFS= read -r entity; do
  LINE_NO=$((LINE_NO + 1))

  # Skip lines that have already been processed
  if [ "$LINE_NO" -lt "$START_LINE" ]; then
    continue
  fi

  echo "[INFO] Processing entity=$entity"
  
  # Simulate processing
  sleep 2
  
  echo "[INFO] Done entity=$entity"

  # Save progress
  echo $((LINE_NO + 1)) > "$STATE_FILE"
done < "$INPUT_FILE"

This way, if the script is interrupted, rerunning it resumes from the last saved line - no extra parsing, no data loss, and no need to modify the input.

Resumability makes scripts safer to rerun, easier to test, and more robust in environments where interruptions happen (CI, provisioning, SSH sessions, etc.).

Let your script remember what it already did, so you don’t have to.

9. Use `trap` for Clean-up

Because your script should clean up after itself, even when it crashes.

If your script creates temporary files, background jobs, or mounts anything, it should also clean up - no matter how it exits. That’s where trap comes in.

The trap builtin lets you register a clean-up function that runs on EXIT, INT, or ERR. Think of it as finally for shell.

TMPDIR=$(mktemp -d)

cleanup() {
  echo "🧹 Cleaning up $TMPDIR"
  rm -rf "$TMPDIR"
}

trap cleanup EXIT

With this, your script now deletes $TMPDIR even if:

The script errors out (exit 1)
The user hits Ctrl+C
A command fails midway

You can also trap specific signals:

trap cleanup INT TERM ERR

For even better hygiene, use sanity checks:

[[ -n "${TMPDIR:-}" && -d "$TMPDIR" ]] && rm -rf "$TMPDIR"

Trap isn’t just for temp files. Use it to:

Kill background jobs
Unmount things
Stop services
Log exit status

Robust scripts don’t just succeed, they fail gracefully. Let your script leave the system better than it found it.

10. Add Meaningful Exit Codes

Because not all failures are created equal.

Every script exits with a status code. By default, 0 means success, and anything else means failure. But if you’re building scripts for automation, chaining, or CI/CD, you should make exit codes meaningful.

Instead of a generic exit 1, define what each failure means:

EXIT_OK=0
EXIT_USAGE=64
EXIT_DEPENDENCY=65
EXIT_RUNTIME=66

Then use them with intent:

if [[ -z "${1:-}" ]]; then
  echo "Missing input file" >&2
  exit $EXIT_USAGE
fi

if ! command -v curl >/dev/null; then
  echo "curl is not installed" >&2
  exit $EXIT_DEPENDENCY
fi

When another script (or a human) runs this, the exit code tells them why it failed, not just that it failed.

You can even log it on exit:

trap 'echo "Exiting with code $?."' EXIT

Or check it in a parent script:

./setup.sh
if [[ $? -eq 65 ]]; then
  echo "Dependency failure. Try installing missing tools."
fi

Use POSIX-reserved ranges if possible:

0: success
1–63: general or misuse errors
64–113: custom app-specific codes

Good scripts don’t just fail - they explain how and why they failed. Exit cleanly. Exit clearly. Exit with purpose.

BONUS: Create a Help Section

Because your future self (and everyone else) deserves a clue.

This might look like an obvious thing, but it’s not in practice. When a script accepts arguments or flags, a --help option should be the first thing you implement. It’s how you turn a script from a mysterious black box into a self-documenting tool.

print_help() {
  cat <<EOF
Usage: $0 [OPTIONS]

Options:
  --debug        Enable verbose output and command tracing
  --resume       Continue from previous checkpoint (uses .script.state)
  --help         Show this help message and exit

Examples:
  $0 --debug
  $0 --resume
EOF
}

Then wire it up:

if [[ "${1:-}" == "--help" ]]; then
  print_help
  exit 0
fi

It’s lightweight, takes 30 seconds to add, and makes your script instantly friendlier to anyone using or reviewing it - including you, six months from now.

Want to go further? Add a --version flag or print dynamic info like:

echo "Script: $0"
echo "Bash: $BASH_VERSION"
echo "Last modified: $(stat -c %y "$0")"

You already wrote the script. Give it a voice.

Make --help the first thing people try, not because they’re lost, but because it’s actually helpful.

Shell scripts have a way of sticking around longer than we expect. What starts as a quick fix often ends up at the heart of something important, automating deployments, gluing services, holding things together quietly in the background.

You don’t need to write perfect scripts. But you can write intentional ones, scripts that fail predictably, log meaningfully, clean up after themselves, and don’t make future-you wonder what past-you was thinking.

If you’ve made it this far, you probably care more than most. That’s a good sign. Because in the end, production-grade isn’t about complexity, it’s about care.

Thanks for reading. Now go refactor that one script you know you should.

1. Strict Mode: set -euo pipefail#

2. Quoting Variables#

3. Dependency Checks#

4. Logging to a File#

5. Greppable & Parsable Output#

6. Add a Debug Mode#

7. Add a Progress Spinner#

8. Create a Resumable Script#

9. Use trap for Clean-up#

10. Add Meaningful Exit Codes#

BONUS: Create a Help Section#