DS2002 Data Science Systems

Course materials and documentation for DS2002

View the Project on GitHub ksiller/ds2002-course

Scripting in bash

The goal of this activity is to familiarize you with scripting in bash. Bash scripting is essential for automating tasks, processing data, orchestrating workflows, and building reusable tools that can save time and reduce errors.

Note: Work through the examples below in your terminal (Codespace or local), experimenting with each command and its various options. If you encounter an error message, don’t be discouraged—errors are learning opportunities. Reach out to your peers or instructor for help when needed, and help each other when you can.

If the initial examples feel like a breeze, challenge yourself with activities in the Advanced Concepts section and explore the resource links at the end of this post.

In-class exercises

All scripts should be written in a way that takes into account several factors:

  1. Use the shebang / make the script executable
  2. Error out gracefully –> set -e / error codes –>   exit 1;
  3. Use input parameters
  4. Conditional logic
  5. Environment / full paths / env variables
  6. Logging
  7. Use comments

shebang

A well-formatted bash script begins with a “shebang” line:

#!/bin/bash

that points to the full path of the bash shell. This may differ from one environment to the next.

To make any bit of code executable, use chmod 755 against it.

Use full paths

Any binary executables used in a shell script should be invoked using their full paths. This is to avoid any ambiguity and preempt any errors of a shell not being able to find the command.

For example, when invoking the aws command-line in a script you would normally call

/usr/local/bin/aws

To determine the full path of an executable in a given system, use the which command:

which aws

Graceful Errors

In most bash shell scripts you may set -e near the head of the script. This flag tells the script that, upon any error, it should escape/exit the script and stop running. This is important since to proceed past an error may produce very bad results or unintended consequences.

Another option is to use conditional, such that when a specific line of the script fails to execute, the failed line can exit with a non-zero code. This can be a useful output for debugging

Sleep

If you need a deliberate pause in the middle of a script, simply sleep 5 for a 5-second pause, etc. This may be especially useful in the midst of try logic.

Input parameters

Remember that $0, $1, $2, etc. are reserved parameters bash understands as positional arguments when invoking from the command-line:

positional-args.sh

#!/bin/bash

echo "$0 <-- invoking script"
echo "$1 <-- first parameter"
echo "$2 <-- second parameter"

returns the following output:

$ ./positional-args.sh bananas blueberries

./positional-args.sh <-- invoking script
bananas <-- first parameter
blueberries <-- second parameter

If/Else conditional logic:

Start your if with a comparison, end with fi.

if [[ $VAR -gt 10 ]]
then
  echo "That number is greater than 10."
else
  echo "Your number is pretty small!"
  exit 0;
fi

Environment

env gives you all environment variables for your session. This may vary for an unattended script (without you around).

Add environment variables in bash:

export VARIABLE=value-of-variable

Use full paths to your binaries to avoid your unattended script being unable to locate a binary. Just because you can run it by hand does not mean it can run without you around.

Logging

A simple-yet-valuable step in your scripting is to log. You can log every action taken by the script, or limit logging to successes or failures.

A common format for logging might be a snippet like this:

# First establish the datetime:
NOW=$(date +"%m-%d-%Y-%H:%M:%S")
echo $NOW " OK - Successfully processed " $FILENAME >> /var/log/output.log

The result would be a single file building with each row as it is logged. Note the >> to append to a file instead of overwriting it!

Comment

One of the most useful habits you can develop as a programmer is adding comments to your code. This explains each chunk of code but might also justify why a particular choice has been made. This will be invaluable to you, when you come back to the code two years later, or when your code is shared with others.

Additional Practice

  1. Write a primary script in bash that does two things:
    • Invokes a bash script to retrieve the log file found in retrieve-file.sh.
    • Invokes a python3 script to parse that file and write the output
  2. Want a more challenging assignment? Write a python script that does both tasks. See the Python scripting section for Python-specific guidance.

Advanced Concepts (Optional)

Resources

Bash scripting tutorial