Most shell scripts start innocent, just a few lines to glue things together. Blink twice, and it’s deploying infrastructure, rotating secrets, restarting servers, and possibly provisioning a small nation-state. It’s doing things for the people, by the people, held together by echo
.
This isn’t another “bash scripting 101” tutorial. You already know how to loop over a list and grep things. This is about writing scripts that survive real-world conditions: bad input, missing dependencies, flaky networks, and humans.
Each section is a standalone upgrade, a small change with a big payoff. No boilerplate templates, no rules for the sake of rules. Just practical ways to make your scripts safer, cleaner, and less terrifying to revisit six months later.
To be clear, you shouldn’t write production-grade bash scripts. But if you’re going to do it anyway - and you are - at least do it well.
Let’s get into it.
1. Strict Mode: set -euo pipefail
Shell doesn’t assume anything is dangerous. You probably should.
Bash scripts don’t fail loudly by default. A command can break, a variable can be unset, and the script might just continue like nothing happened.
Strict mode gives you a safety net:
set -euo pipefail
-e
: Exit immediately if any command fails-u
: Error on using unset variables-o pipefail
: Fail if any command in a pipeline fails
Without it, a bad cp
, a mistyped variable, or a broken pipe can slip by unnoticed, until the consequences show up in logs (or don’t).
Strict mode doesn’t solve every problem, but it reduces surprises. And that’s a good start.
Add it at the top. Always.
2. Quoting Variables
One misplaced quote can turn a cleanup script into a delete-everything script.
Shell expands variables aggressively. If there’s a space, wildcard, or newline hiding inside a variable, Bash won’t warn you, it’ll just interpret it literally.
Consider this:
rm -rf $TARGET_DIR/*
Looks harmless, right? But if TARGET_DIR="/important data"
, this becomes:
rm -rf /important data/*
And now you’ve run two separate commands:
rm -rf /important
rm -rf data/*
That’s not a bug. That’s Bash doing its job, with no questions asked.
The fix is simple:
rm -rf "$TARGET_DIR"/*
Quoting variables ensures that the shell treats them as a single argument, exactly as intended, even when the value contains spaces, tabs, or wildcards.
This is especially critical for paths, user input, flags, or anything derived from the environment. It’s a small habit that prevents catastrophic outcomes.
Always quote your variables, even when you think you don’t need to.
3. Dependency Checks
If your script assumes a tool is installed, it should also be the first to check.
Nothing derails a shell script faster than a missing binary. Your script might rely on curl
, jq
, docker
, or awk
, but unless you’re checking, you’re just hoping they’re there.
A simple pattern:
command -v curl >/dev/null 2>&1 || {
echo "Error: curl is not installed." >&2
exit 1
}
You can wrap this into a reusable function:
require() {
command -v "$1" >/dev/null 2>&1 || {
echo "Missing dependency: $1" >&2
exit 1
}
}
require curl
require jq
Failing early is better than failing halfway through. Especially when your script depends on tools that might not be standard on every system.
Check dependencies before you need them, scripts shouldn’t assume anything but
/bin/sh
.
4. Logging to a File
If something breaks, the log should know before you do.
A good script logs what it’s doing. A great script logs everything, to both the terminal and a file, without duplicating effort.
Instead of manually redirecting each command’s output, you can route everything with a single line:
exec > >(tee -a "$LOG_FILE" | logger -t tag-name -s) 2>&1
This does three things:
- Appends all stdout/stderr to
$LOG_FILE
- Streams output live to the terminal (
tee
) - Sends logs to
syslog
vialogger
, tagged withtag-name
It works at the script level, so every line - every error, every echo - gets captured. Perfect for debugging, postmortems, or just knowing what actually happened.
All you need before this is a defined log path:
LOG_FILE="/var/log/my-script.log"
No more wondering “what happened” after the script exits.
Log early, log everything, because terminal output disappears fast.
5. Greppable & Parsable Output
Logs aren’t just for humans, they’re for scripts too.
Readable logs are good. Logs that can be searched, filtered, and parsed by tools? Much better.
A well-structured log line starts with a timestamp and log level (INFO
, WARN
, ERROR
, and DEBUG
), avoids unnecessary spaces, and uses consistent key=value pairs for metadata. This makes it easy for grep
, awk
, cut
, or any log processor to extract exactly what they need - without guessing.
For example, output like this:
2025-04-16T13:37:00+0000:::[INFO]:::event=task_start,user_id=123,task_id=456
2025-04-16T13:37:01+0000:::[ERROR]:::event=file_open_failed,file=config.json,reason=NoSuchFile
Lets you quickly extract signals from the noise:
# Show all errors
grep ':::\[ERROR\]:::' script.log
# Filter logs by user ID
grep 'user_id=123' script.log
# Use awk to extract event names
awk -F':::' '{print $3}' script.log | awk -F',' '{for(i=1;i<=NF;i++) if($i ~ /^event=/) print $i}'
What really makes a good log line?
- No spaces between fields-use a delimiter
- Consistent key=value pairs for structured data
- Fixed field order (timestamp, level, then data)
- Unique and searchable field names (e.g., user_id, not just id)
- Machine readability first, human readability second
- No unstructured errors-wrap them with context (reason=timeout instead of “it broke”)
- Avoid redundancy-don’t repeat info the timestamp or log level already shows
Optionally, Colored Logs + File Output are making this more professional and nice.
You can wrap logging in a simple function for better formatting:
LOG_FILE="./script.log"
log() {
local level="$1"
shift
local ts
ts="$(date '+%Y-%m-%dT%H:%M:%S%z')"
local fields="$*"
local message="$ts:::$level:::$fields"
# Color-coded terminal output
case "$level" in
ERROR) echo -e "\033[0;31m$message\033[0m" >&2 ;;
WARN) echo -e "\033[0;33m$message\033[0m" >&2 ;;
*) echo "$message" >&2 ;;
esac
# Log to file
echo "$message" >> "$LOG_FILE"
}
This gives you:
- Colored terminal output
- Structured logs saved to file
- Greppable, parseable history
Structure your logs like someone else will have to debug them. Because one day, they will.
6. Add a Debug Mode
Sometimes you want to see everything. Sometimes you don’t.
A debug mode lets you toggle verbose output without editing your script every time. It’s helpful during development and when things go wrong, especially in longer scripts where silent failures can hide.
Start with a flag:
DEBUG=false
Add a logging function:
debug() {
if [ "$DEBUG" = true ]; then
echo "[DEBUG] $*"
fi
}
Use it like this:
debug "Copying files to $DEST_DIR"
cp "$SRC_FILE" "$DEST_DIR"
To enable debug mode, either export the variable or pass it as an argument:
DEBUG=true ./deploy.sh
Or parse it from flags:
while [[ $# -gt 0 ]]; do
case "$1" in
--debug) DEBUG=true ;;
esac
shift
done
This keeps your logs quiet by default, but gives you insight when you need it, without rewriting anything.
Debug mode is like turning the lights on before you panic.
7. Add a Progress Spinner
Because silence feels like failure.
Long-running commands can make users wonder if the script hung or crashed. A simple spinner adds just enough feedback to show that something’s happening without cluttering the terminal.
Here’s a minimal implementation:
spinner() {
local pid=$1
local delay=0.1
local spin='|/-\'
while kill -0 "$pid" 2>/dev/null; do
for i in $(seq 0 3); do
printf "\r[%c] Working..." "${spin:$i:1}"
sleep "$delay"
done
done
printf "\r[✓] Done. \n"
}
Use it like this:
long_task() {
sleep 5 # replace this with your actual command
}
long_task &
spinner $!
It’s a small UX upgrade, especially in scripts that handle provisioning, backups, or large data transfers. No output doesn’t have to mean no activity.
A spinner tells the user: “I’m alive. Trust me.”
8. Create a Resumable Script
Because rerunning the whole thing shouldn’t feel like starting over.
Scripts that fail halfway shouldn’t force you to start from scratch. With a few patterns, you can make scripts idempotent or at least resume-aware.
The simplest way is to use marker files:
if [ ! -f /tmp/setup.step1.done ]; then
echo "Running step 1..."
# some long-running command
touch /tmp/setup.step1.done
fi
For multi-step workflows, track state between runs:
STEP_FILE=".script-progress"
mark_done() {
echo "$1" >> "$STEP_FILE"
}
is_done() {
grep -q "^$1$" "$STEP_FILE" 2>/dev/null
}
if ! is_done "step:download"; then
echo "Downloading files..."
# download logic
mark_done "step:download"
fi
You can also use checks based on the result of a command instead of marker files, whatever is more reliable in your context.
Now, if your script processes a long input file - say, a list of hosts or user IDs - resumability means not starting from the top again. Instead of deleting lines from the input file, a cleaner pattern is to track progress in a separate file. This way, the input remains untouched, and your script knows exactly where to resume.
INPUT_FILE="input.txt"
STATE_FILE=".progress"
# Start from the saved line number or default to 1
START_LINE=$( [ -f "$STATE_FILE" ] && cat "$STATE_FILE" || echo 1 )
LINE_NO=0
while IFS= read -r entity; do
LINE_NO=$((LINE_NO + 1))
# Skip lines that have already been processed
if [ "$LINE_NO" -lt "$START_LINE" ]; then
continue
fi
echo "[INFO] Processing entity=$entity"
# Simulate processing
sleep 2
echo "[INFO] Done entity=$entity"
# Save progress
echo $((LINE_NO + 1)) > "$STATE_FILE"
done < "$INPUT_FILE"
This way, if the script is interrupted, rerunning it resumes from the last saved line - no extra parsing, no data loss, and no need to modify the input.
Resumability makes scripts safer to rerun, easier to test, and more robust in environments where interruptions happen (CI, provisioning, SSH sessions, etc.).
Let your script remember what it already did, so you don’t have to.
9. Use trap
for Clean-up
Because your script should clean up after itself, even when it crashes.
If your script creates temporary files, background jobs, or mounts anything, it should also clean up - no matter how it exits. That’s where trap
comes in.
The trap
builtin lets you register a clean-up function that runs on EXIT
, INT
, or ERR
. Think of it as finally
for shell.
TMPDIR=$(mktemp -d)
cleanup() {
echo "🧹 Cleaning up $TMPDIR"
rm -rf "$TMPDIR"
}
trap cleanup EXIT
With this, your script now deletes $TMPDIR
even if:
- The script errors out (
exit 1
) - The user hits
Ctrl+C
- A command fails midway
You can also trap specific signals:
trap cleanup INT TERM ERR
For even better hygiene, use sanity checks:
[[ -n "${TMPDIR:-}" && -d "$TMPDIR" ]] && rm -rf "$TMPDIR"
Trap isn’t just for temp files. Use it to:
- Kill background jobs
- Unmount things
- Stop services
- Log exit status
Robust scripts don’t just succeed, they fail gracefully. Let your script leave the system better than it found it.
10. Add Meaningful Exit Codes
Because not all failures are created equal.
Every script exits with a status code. By default, 0
means success, and anything else means failure. But if you’re building scripts for automation, chaining, or CI/CD, you should make exit codes meaningful.
Instead of a generic exit 1
, define what each failure means:
EXIT_OK=0
EXIT_USAGE=64
EXIT_DEPENDENCY=65
EXIT_RUNTIME=66
Then use them with intent:
if [[ -z "${1:-}" ]]; then
echo "Missing input file" >&2
exit $EXIT_USAGE
fi
if ! command -v curl >/dev/null; then
echo "curl is not installed" >&2
exit $EXIT_DEPENDENCY
fi
When another script (or a human) runs this, the exit code tells them why it failed, not just that it failed.
You can even log it on exit:
trap 'echo "Exiting with code $?."' EXIT
Or check it in a parent script:
./setup.sh
if [[ $? -eq 65 ]]; then
echo "Dependency failure. Try installing missing tools."
fi
Use POSIX-reserved ranges if possible:
0
: success1–63
: general or misuse errors64–113
: custom app-specific codes
Good scripts don’t just fail - they explain how and why they failed. Exit cleanly. Exit clearly. Exit with purpose.
BONUS: Create a Help Section
Because your future self (and everyone else) deserves a clue.
This might look like an obvious thing, but it’s not in practice. When a script accepts arguments or flags, a --help
option should be the first thing you implement. It’s how you turn a script from a mysterious black box into a self-documenting tool.
print_help() {
cat <<EOF
Usage: $0 [OPTIONS]
Options:
--debug Enable verbose output and command tracing
--resume Continue from previous checkpoint (uses .script.state)
--help Show this help message and exit
Examples:
$0 --debug
$0 --resume
EOF
}
Then wire it up:
if [[ "${1:-}" == "--help" ]]; then
print_help
exit 0
fi
It’s lightweight, takes 30 seconds to add, and makes your script instantly friendlier to anyone using or reviewing it - including you, six months from now.
Want to go further? Add a --version
flag or print dynamic info like:
echo "Script: $0"
echo "Bash: $BASH_VERSION"
echo "Last modified: $(stat -c %y "$0")"
You already wrote the script. Give it a voice.
Make --help
the first thing people try, not because they’re lost, but because it’s actually helpful.
Shell scripts have a way of sticking around longer than we expect. What starts as a quick fix often ends up at the heart of something important, automating deployments, gluing services, holding things together quietly in the background.
You don’t need to write perfect scripts. But you can write intentional ones, scripts that fail predictably, log meaningfully, clean up after themselves, and don’t make future-you wonder what past-you was thinking.
If you’ve made it this far, you probably care more than most. That’s a good sign. Because in the end, production-grade isn’t about complexity, it’s about care.
Thanks for reading. Now go refactor that one script you know you should.