DS2002 Data Science Systems

Course materials and documentation for DS2002

View the Project on GitHub ksiller/ds2002-course

Linux Command Line Interface (CLI) and Filesystem

The goal of this activity is to familiarize you with the fundamental commands used in Unix-like environments (Linux and macOS). These commands are essential for navigating the filesystem, managing files and directories, and manipulating data efficiently.

In-class exercises

Go to your forked course repository and start a Codespaces. If you haven’t forked the repo yet, do it now and start the prebuild of your Codespace as described in Lab 00: Setup; while you’re waiting for the build process to complete start a codespace from the course repo instead so you can start with exercises.

Optional - If you set up software tools on your own computer, for an additional challenge, complete these exercises on your local laptop using either the MacOS Terminal (Mac) or Git Bash (Windows). If you use other terminal programs you may need to modify some commands accordingly. Please be aware that these commands will not work in Windows PowerShell as is.

  1. Start working through CLI commands: Lab 01 (You may see a 404 error until the lab is released)

    If at any time you encounter an error message, don’t be discouraged—errors are learning opportunities. Help each other when you can, and reach out to your instructor for help when needed.

  2. Complete the lab and submit your work by the due date posted on Canvas. The Additional Practices section provides additional details for using some of the most common commands.

Additional Practice

Finding Help

A few places you can find explanations and examples for various commands:

  1. Use the man tool in the terminal! For instance, to learn about cp and all of its features, options, etc., type man cp and read the documentation. Use the up and down arrows to navigate, then press Q to return to the prompt.
  2. Look at Linux Command Reference cheatsheet in Canvas Module 01 and links to additional Resources at the end of this page.

Getting Oriented to your Home Directory

Change directories to your home directory (see the “Navigating the file system” section below for more cd options):

cd ~

Alternatively, you can use cd $HOME or simply cd. All three options will take you to your home directory.

Learn the location of your home directory by issuing the pwd command. pwd is short for “print working directory”.

pwd

Listing Files and Directories (ls)

The ls command displays the contents of a directory. It’s one of the most frequently used commands for exploring the filesystem.

Basic listing:

ls

This shows only visible (non-hidden) files and directories in a simple list format, sorted alphabetically.

Long listing format (ls -l):

ls -l

This displays detailed information about files and directories including:

Listing all files including hidden ones (ls -a or ls -la):

Hidden files and directories start with a . (such as .ssh or .bashrc). To see them:

ls -a

Or combine with the long format:

ls -la

Note that the -al flags (or options) do not have to be in any particular order, so ls -la and ls -al are equivalent.

In the output, you’ll see:

Using wildcards with ls:

Wildcards allow you to match multiple files or directories based on patterns:

List all files ending with .pdf:

ls -al *.pdf

List files matching a pattern:

ls file?.txt

This finds files like file1.txt, file2.txt (where ? matches a single character), but not file10.txt (which has two characters).

Listing a specific directory:

You can list the contents of any directory without changing into it by providing the path:

ls -al /usr/bin

This lists all files in /usr/bin with detailed information, including hidden files.

cd (Change Directory)

Going back to last directory:

cd -

The - is a special argument that takes you back to the previous directory you were in. Useful for quickly switching between two directories.

Changing directly to directory (absolute path):

Remember, a path designates a file or directory, separating subdirectories (sub-folders) with the / character.

cd /workspaces/ds2002-course/practice/01-env

An absolute path starts with / and specifies the complete path from the root of the filesystem. This works from any location.

What happens if the path you entered does not exist? Let’s find out.

cd bogus/path/

You should see the following output:

cd: no such file or directory: /bogus/path/

Changing directory using a relative path:

cd practice/01-env

A relative path doesn’t start with / and is relative to your current directory. If you’re in /workspaces/ds2002-course, then practice/01-env refers to /workspaces/ds2002-course/practice/01-env.

You can use .. to go up a directory, or even multiple directories. Let’s assume you’re in /workspaces/ds2002-course/practice/01-env/. (Run pwd to confirm)

Then execute this command:

cd ../../labs

The cd command took you two levels up to /workspaces/ds2002-course/ and then one directory down into labs. You can run the pwd command to confirm the full path of the directory you’re in now.

Keep experimenting with this so you get comfortable with the concept of relative and absolute paths.

Note: Remember, if you ever get lost use the pwd command to print the current working directory you’re in. And you can execute cd without any arguments to go back to your home directory.

Creating new directories and files

Before proceeding with the activities I highly recommend you change to your home directory and create a new subdirectory cli_exercises. That will ensure that you’re not polluting your forked Git repository.

cd # go to your home directory
mkdir cli_exercises

Create some test files using touch:

touch file1 file2

Add text contents to a file with echo:

You can use echo to pass some data into a file like this:

echo "Hi there everybody, my name is <YOUR NAME>" > file1

This command uses a “redirect” to take the echo command and push it into file1. You could actually just use echo to output anything you want, at any time, but it only prints to the screen and isn’t recorded anywhere. Try it for yourself:

echo "Today is Friday"
echo "A man a plan a canal Panama"

mkdir (Make Directory)

Use the mkdir command to create new directories (folders). You can create single or multiple directories at once.

Create a single directory:

mkdir cli_exercises

This creates a new directory called cli_exercises in the current location. You can verify it was created with ls.

Create multiple directories at once:

mkdir dir1 dir2 dir3

This creates three directories (dir1, dir2, and dir3) in the current location.

Create nested directories (parent and child):

mkdir -p parent/child/grandchild

The -p flag creates parent directories as needed. If parent doesn’t exist, it will be created first, then child, then grandchild.

Note: mkdir will fail if you try to create a directory that already exists, unless you use the -p flag (which won’t error if the directory already exists).

touch (Create Empty File or Update Timestamp)

Use the touch command to create new empty files or update the access/modification timestamp of existing files. It’s commonly used to create placeholder files or trigger file-based operations.

Create a single empty file:

touch newfile.txt
ls -l newfile.txt

Output:

-rw-r--r-- 1 codespace codespace 0 Jan 15 15:30 newfile.txt

The result is a new empty file (0 bytes) named newfile.txt. If the file already exists, touch updates its modification timestamp without changing the file contents.

Create multiple files at once:

touch script.sh data.csv notes.md

If the file already exists, touch updates its access and modification times to the current time without modifying the file contents. Useful for triggering file-based operations or resetting timestamps.

Copying Files and Directories

The cp command copies files and directories. Unlike mv, which moves files, cp creates a duplicate while keeping the original.

Copy a file:

cp file1 file2

This creates a copy of file1 named file2 in the same directory. Both files will exist after the command completes.

Copy a directory (recursive):

For directories, we need the -r flag to indicate a recursive copy:

cp -r dir1 dir2

The -r (or -R) flag tells cp to copy directories recursively, including all files and subdirectories inside them.

Copy multiple files:

cp file1.txt file2.txt file3.txt destination/

This copies all three files into the destination directory.

Copy with wildcards:

cp *.txt backup/

This copies all .txt files in the current directory to the backup directory.

Note: It’s a good practice to leave the trailing / off of directory names when using cp.

Note: cp overwrites existing files without warning (unless using -i for interactive mode). Use cp -i to prompt before overwriting.

Deleting Files and Directories

Use the rm command to delete files or directories. Warning: Deleted files cannot be easily recovered, so use with caution.

Delete a single file:

From the earlier activities, you should have markdown.md, and a few other files in your current directory. Confirm with ls command. Now, let’s delete it

rm markdown.md

The file is removed from the filesystem and cannot be recovered through normal means. Confirm with ls.

Delete multiple files:

rm file10.txt notes.txt newfile.txt

Delete files matching a pattern:

To delete all files ending in *.txt, run this:

rm *.txt

The * wildcard matches all files ending in .txt. Use with caution as this can delete many files at once.

Delete a directory (empty):

rmdir dir1

rmdir only removes empty directories. If the directory contains files, it will fail. You can delete multiple empty directories: rmdir dir2 dir3.

Delete a directory and its contents:

rm -r newdir

Delete directory with confirmation:

rm -ri newdir

Force delete (no prompts):

rm -f script.sh

Common combinations:

rm -rf directory/    # Recursive force delete (no prompts) - VERY DANGEROUS
rm -ri directory/    # Recursive with prompts (safer)
rm -r directory/     # Recursive delete (may prompt for write-protected files)

Note: Safety tips: - Always double-check the path before running rm - Use ls first to verify what you’re about to delete - Use -i flag for interactive mode when unsure - Consider using rm -i as an alias in your .bashrc for safer defaults - Never run rm -rf / or rm -rf ~ - this will delete everything!

Moving and Renaming Files

The mv command can both move and rename files/directories.

Renaming:

Let’s rename notes.md file (created by touch command above)

mv notes.md new_notes.md

The file has been renamed notes.md -> new_notes.md. You can confirm with the ls command.

Move a file to a directory:

Let’s move new_notes.md to the myproject directory (created above).

mv new_notes.md myproject/

When you run ls myproject to list the directory’s content, you should see the new_notes.md file inside myproject now.

Move and rename at the same time:

Let’s move it back and rename it at the same time:

mv myproject/new_notes.md renamed_notes.md

Move multiple files:

The files file1.txt and file2.txt were created earlier. Let’s also create file3.txt and then move them all at once:

touch file3.txt
mv file1.txt file2.txt file3.txt myproject/

You may also use a wildcard, like so:

mv file*.txt myproject/

Note: This wildcard pattern file*.txt will move all files starting with “file” and ending with “.txt”, including file1.txt, file2.txt, file10.txt, and file3.txt. The file notes.txt won’t be moved because it doesn’t start with “file”.

Move a directory:

It is just as easy to move entire directories:

mv myproject newdir

What happens depends on whether newdir exists:

Note: Important notes: - mv overwrites existing files without warning (unless using -i for interactive mode) - Use mv -i to prompt before overwriting - mv preserves file permissions and timestamps when possible - Moving files within the same filesystem is instant (just updates directory entries)

Practice creating, renaming, and deleting directories (see the “Creating new directories and files” section below for comprehensive coverage of mkdir, mv, and rm):

mkdir mynewdir
ls -al
mv mynewdir another-newdir
rm -r another-newdir

Can you already guess the full path of a directory you create? Use cd and pwd to verify.

Working with Text Files

Creating and Editing Text Files with nano

A simple, built-in text editor is called nano. To open nano with an empty, blank document, simply invoke the nano program:

nano

Within the page you see blank space where you will write contents, and a series of possible commands at the bottom marked with the ^ character. This stands for the CONTROL key. If you open a blank document, try writing several lines of text, complete with paragraph breaks and punctuation. When you’re done, press ^X to exit. Upper/lower case does not matter.

This will give you the following prompt:

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ? 

To save your buffer (your open document) just press the Y key. This will give you a final prompt:

File Name to write : 

Here you can name your file anything you want. It will be saved to the directory you were in when you opened up nano.

Note: Do not use word processors like Microsoft Word to edit raw data or code files. The word processor inserts hidden formatting instructions that can mess up your file’s contents.

cat (Concatenate)

Use the cat command to display the entire contents of a file on the screen. It’s best for small files. The name comes from “concatenate” because it can combine multiple files.

cat README.md

Your output may look like this:

# Getting Started

This is a sample README file.
It contains multiple lines of text.

cat prints the entire file contents to the terminal. Useful for quick viewing of small files, but for large files, use less instead to avoid overwhelming the terminal.

Output multiple files:

Let’s assume you have a file list1.txt that contains:

apple
banana

And list2.txt containing:

cherry
date

Run this command:

cat list1.txt list2.txt

And you should get:

apple
banana
cherry
date

cat can read multiple files in sequence and combine their contents. This is the “concatenate” functionality that gives cat its name. The files are combined in the order they appear in the command. When used with >, it redirects the combined output to a new file (see Redirecting Output section below).

less (is more)

Use the less command to view file contents one screen at a time with the ability to scroll up and down. It’s ideal for reading large files without flooding the terminal. It is an advanced version of the more command, hence less is more.

less README.md

Example output (interactive view):

# Getting Started

This is a sample README file.
It contains multiple lines of text.
More content here...
(END)

grep (Global Regular Expression Print)

Use the grep command to search for patterns (text strings) within files. It’s extremely useful for finding specific content in files or filtering command output.

grep "pattern" filename

Example:

grep "README" *.md

Your output may look like this:

README.md:# Getting Started with README
practice/01-env/README.md:## Getting Started
practice/02-cli/README.md:# Exercises: Linux CLI

Useful grep options:

grep -i "pattern" file    # Case-insensitive search
grep -r "pattern" .       # Recursive search in current directory and all subdirectories
grep -n "pattern" file    # Show line numbers
grep -v "pattern" file    # Show lines that DON'T match (invert)

You can combine multiple options as is typical for shell commands. For example, try the following:

grep -rin "linux" . 

wc (Word Count)

Use the wc command to count lines, words, and characters in files. It’s useful for getting statistics about file content, checking file sizes, or verifying data.

Count lines, words, and characters:

wc README.md

Your output may look like this:

  42  156  1234 README.md

The output shows three numbers followed by the filename:

Count only lines:

wc -l README.md

Your output may look like this:

42 README.md

The -l flag counts only the number of lines. Useful for quickly checking how many lines a file contains.

Count only words:

wc -w README.md

Your output may look like this:

156 README.md

The -w flag counts only the number of words (separated by whitespace).

Count only characters:

wc -c README.md

Your output may look like this:

1234 README.md

The -c flag counts only the number of characters (bytes) in the file.

Count multiple files:

wc *.md

Your output may look like this:

  42  156  1234 README.md
  28   89   567 practice/01-env/README.md
  70  245  1801 practice/02-cli/README.md
 140  490  3602 total

When given multiple files, wc shows statistics for each file and a total at the end. Useful for comparing file sizes or getting overall statistics.

Common use cases:

Practice combining commands:

View and work with files using pipes:

cat hello.txt
cat hello.txt | wc
cat mobydick.txt | grep "Captain"
cat mobydick.txt | grep "Captain" | wc -l

See the Connecting Commands with Pipes section below for more details.

Compressing files

Note: Windows users with git-bash have unzip available but not zip. I suggest you work with tar instead.

Compressing or decompressing archives like zips or tarballs is not too difficult:

To create a zip bundle, assuming we are in a directory with file1 and file2 we want to zip up:

zip archive.zip file1 file2

This creates a zip file named archive.zip containing the two files. To unzip, the command is quite simple:

unzip archive.zip

To create a tarball (the common nickname for a tar compressed archive) we often use it in conjunction with the gzip and gunzip options to keep the archive as small as possible. Again assuming we have two files in the current directory named file1 and file2 we want to put in the bundle:

tar -czvf archive.tar.gz file1 file2

The -czvf options mean: -c for CREATE an archive, -z for gzip the archive, -v for verbose output, and -f for write the archive to a file.

To decompress the same archive:

tar -xzvf archive.tar.gz

The only difference in options is the use of -x which means “expand”

Note: It’s extremely useful to know that in the world of the command line you can always add or remove files from archives without re-creating them! They are editable objects when using either the zip or tar commands.

Finding Files

Use the find command to search for files and directories in a directory hierarchy based on various criteria (name, size, type, modification date, etc.). It’s a powerful tool for locating files.

Let’s fetch a large text from a remote source so that we can search through it:

curl https://gist.githubusercontent.com/StevenClontz/4445774/raw/1722a289b665d940495645a5eaaad4da8e3ad4c7/mobydick.txt > mobydick.txt

Find files by file name. Use the find command for this. The syntax is:

find . -name "mobydick.txt"

This issues the find command, searching the present directory (signified by the .) with the name "mobydick.txt". Note that the filename must be an exact match.

To search across all home directories, for example, you would change the path option

find /home -name "filename.txt"

Find files matching a pattern: Use the wildcard * character at the beginning, middle, or end of a term to extend matching. For example, if you only knew that moby was in the name of the file and nothing more, this command would work:

find . -name '*moby*'

Or if you wanted to find all text files by suffix in a directory

find . -name '*.txt'

Command structure:

Other useful find examples:

find . -type f -name "*.py"    # Find all Python files
find . -type d -name "practice" # Find directories named "practice"
find . -size +1M                # Find files larger than 1MB

Sorting

The sort command arranges lines of text in a file. By default, it sorts alphabetically (lexicographically), but it can also sort numerically.

Sorting character sequences

A character sequence, also referred to as a “string”, is something like “Good morning”. It can contain digits but they are not interpreted as numbers in a mathematical sense. See the next section Sorting numbers.

Use the sort command to sort a list of fruit names alphabetically.

Create a file fruits.txt containing:

banana
apple
strawberry
cherry
BLUEBERRY
sort fruits.txt

The output should show this:

BLUEBERRY
apple
banana
cherry
strawberry

Case-insensitive sorting:

If you want to ignore case differences when sorting, use the -f flag:

sort -f fruits.txt

The output should show this:

apple
banana
BLUEBERRY
cherry
strawberry

Save case-insensitive sorted output:

sort -f fruits.txt > sorted_fruits.txt

Sorting numbers (string vs numerical sorting)

Use the sort command to understand the difference between string sorting and numerical sorting.

Create a file numbers.txt containing:

1
6
2
8
10
5

Run this command to sort the content of the file:

sort numbers.txt

The output should show this:

1
10
2
5
6
8

Numerical sorting:

sort -n numbers.txt

The output should show this:

1
2
5
6
8
10

Key differences:

Utility Commands

These commands are used a bit less frequently but can help with basic tasks.

top

top or htop shows you current processes, memory and CPU usage. They allow you to see the pid (process ID) for any process, so that you can monitor it or stop (kill) it.

w

w (who) shows you current users of your system. Typically if you are on a laptop or desktop computer you own, you will be the only user. But large HPC computers may have hundreds of users logged in concurrently.

which

which shows you the path to a specific application (see the “Finding commands” section below for comprehensive coverage):

which python3

You may want to list the contents of the /usr/bin directory to get a sense for all the built-in commands within the Linux kernel and bash shell:

ls -al /usr/bin

history

Do you remember all the commands you ran? If not, don’t worry. Use the history command to get a list in chronological order.

history

The end of your output may look like this:

 ...
   50  ls
   51  rm -ri newdir
   52  rm -f script.sh
   53  find . -name "*.md"
   54  find . -type f -name "*.py"
   55  find . -type d -name "practice"
   56  find . -size +1M
   57  which ls
   58  whereis ls
   59  whereis -b ls
   60  history

When viewing your history, notice the line number with each command. To repeat an item in your history, prefix that number with !:

!999

This will re-execute command number 999 from your history.

hostname

Use the hostname command to display the hostname (network name) of the system. It’s useful for identifying which machine you’re working on, especially in remote or cloud environments.

hostname

Your output may look like this:

codespaces-57da94

The output shows the unique hostname assigned to your codespace. In this case, codespaces-57da94 indicates this is a GitHub Codespaces instance with identifier 57da94. Each codespace gets a unique hostname.

uptime

Use the uptime command to see how long the system has been running, along with the current time, number of users, and system load averages. It’s useful for monitoring system health and uptime.

uptime

Your output may look like this:

 14:30:45 up 2 days,  3:15,  1 user,  load average: 0.05, 0.10, 0.15

Networking / Internet

The Linux OS has several built-in tools for helping check networking, or interacting with remote resources on the Internet.

ping

ping is a simple tool that, like its submarine counterpart, simply bounces a message off of a remote host and tells you if it is reachable:

$ ping google.com
PING google.com (142.251.167.138): 56 data bytes
64 bytes from 142.251.167.138: icmp_seq=0 ttl=57 time=6.479 ms
64 bytes from 142.251.167.138: icmp_seq=1 ttl=57 time=4.430 ms
64 bytes from 142.251.167.138: icmp_seq=2 ttl=57 time=4.407 ms
64 bytes from 142.251.167.138: icmp_seq=3 ttl=57 time=4.518 ms

Press Ctrl+C to stop the pings. Be aware that ping just verified two things for us:

  1. The host google.com is alive and well; and
  2. Our current host has an active Internet connection.

curl

curl is a basic tool for fetching something from the Internet - a file, web page, zip or tar bundle, CSV or JSON datafile, etc. You used curl above to fetch the Moby Dick text. Try it yourself with this list of songs:

curl http://nem2p-dp1-api.pods.uvarc.io/songs

By default, curl displays the contents of what was retrieved. In the case above, you can see the JSON values of a song list. If you wanted to “capture” the data file, you could redirect this command to a file, or use the -O flag (Oh, not zero) to save the file.

Note that you cannot use curl to fetch password-restricted resources (i.e. from Canvas, or Gmail, etc.)

Another useful trick with curl is to find your public IP address:

$ curl ifconfig.me
199.111.240.7

ssh

ssh is the Secure Shell, a method for making secure connections into the terminal of another computer. This might be a computing instance running in the cloud, a supercomputer, or another machine.

SSH connections look very similar to email addresses, in the form of USER @ HOST. (This is no coincidence since email and shell connections are very early Internet tools.)

Try a connection using a password:

ssh ds2002@54.234.9.240

Connect using the password given to you in the Canvas instructions for this lab.

  1. Within the home directory of this shared user account, create a subdirectory named from your UVA computing ID, i.e. mst3k. Create a README.md file within that folder that includes your full name.
  2. Check the login status of other users with the command last -i.
  3. View the history of this account. Since all students are sharing a single account name, you’ll see the history of other students included.
  4. To leave the SSH session, type exit.

Access to this Linux instance will be revoked after Lab 01 closes.

date

The date command displays the system date and time. It’s useful for checking the current time, scheduling tasks, or timestamping operations.

date

Your output may look like this.

Mon Jan 11 14:30:45 UTC 2026

It shows the current date and time in the format: day of week, month, day, time (24-hour format), timezone, and year. UTC (Coordinated Universal Time) is the standard timezone used in many cloud environments.

What’s the operating system version?

Use the cat /etc/os-release command to display operating system identification information. It’s essential for understanding what Linux distribution and version you’re working with.

cat /etc/os-release

Your output may look like this:

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.22.2
PRETTY_NAME="Alpine Linux v3.22"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

Finding commands

When you type a command like ls or python, the shell needs to find the executable file. These commands help you locate where commands are stored on your system.

which

Use the which command to find the location of an executable command in your PATH. It shows the full path to the command that would be executed.

which ls

Your output may look like this:

/usr/bin/ls

whereis

Use the whereis command to locate the binary, source, and manual page files for a command. It’s more comprehensive than which as it searches standard directories, not just PATH.

whereis ls

Your output may look like this:

ls: /usr/bin/ls /usr/share/man/man1/ls.1

Find only the binary:

whereis -b ls

Your output may look like this:

ls: /usr/bin/ls

Key differences between which and whereis:

When to use each:

Connecting Commands with Pipes

A pipe (|) connects the output of one command to the input of another command. This allows you to chain commands together to perform complex operations by combining simple tools.

Basic pipe example:

Navigate to the top level directory of this repository, ds2002-course. Then run:

ls -l | grep "README"

Your output may look like this:

-rw-r--r-- 1 codespace codespace  1234 Jan 15 14:30 README.md

Count lines in a file:

cat README.md | wc -l

Your output may look like this:

42

Key concept: The pipe takes the standard output (stdout) of the left command and feeds it as standard input (stdin) to the right command. This is a fundamental way to combine commands in Linux/Unix systems.

Redirecting output

Output redirection allows you to send command output to a file instead of (or in addition to) displaying it on the screen. This is essential for saving results, creating logs, and processing data.

> (Overwrite redirect)

You can use > after a command to send output to a file. The output file will be overwritten if it exists. If it doesn’t exist, it will be created.

Basic redirect:

echo "Hello World" > greeting.txt

Check the file contents:

$ cat greeting.txt
Hello World

The > operator redirects the output of echo to greeting.txt. If the file already exists, its contents will be completely replaced.

Combine multiple files:

cat list1.txt list2.txt > biglist.txt

If list1.txt contains:

apple
banana

And list2.txt contains:

cherry
date

After running the command, biglist.txt will contain:

apple
banana
cherry
date

cat can read multiple files in sequence and combine their contents. When used with >, it redirects the combined output to a new file (see Redirecting Output section below). This is the “concatenate” functionality that gives cat its name. The files are combined in the order they appear in the command.

Redirect find output:

find . -name "*.md" > markdown_files.txt

Example output in file:

$ cat markdown_files.txt
./README.md
./practice/01-env/README.md
./practice/02-cli/README.md
./labs/lab01-cli.md
...

All markdown files found by the find command are saved to markdown_files.txt instead of being displayed.

» (Append redirect)

Use the >> to append command output to the end of a file without overwriting existing content. If the file doesn’t exist, it will be created.

Append to file:

echo "First line" > log.txt
echo "Second line" >> log.txt
echo "Third line" >> log.txt

Check the file contents:

cat log.txt

You should get:

First line
Second line
Third line

The first echo uses > to create/overwrite the file. Subsequent echo commands use >> to append, preserving previous content.

Key differences > vs >>

Important notes:

Environment Variables

What are environment variables?

Environment variables are named values that store configuration information for your system and applications. They’re accessible to all programs running in your shell session and help programs know where to find things or how to behave.

View a specific environment variable:

echo $HOME

Your output may look like this:

/home/vscode

The $ symbol tells the shell to expand the variable name. $HOME is a built-in environment variable that contains the path to your home directory.

Common built-in environment variables:

Run the following echo commands one after another:

echo $HOME
echo $USER
echo $PWD
echo $PATH

Example output:

/home/vscode
vscode
/workspaces/ds2002-course
/usr/local/bin:/usr/bin:/bin

View all environment variables:

Run this command to see all environment variables:

env

Example Output:

HOME=/home/vscode
USER=vscode
PATH=/usr/local/bin:/usr/bin:/bin
SHELL=/bin/bash
... # many more

The env command displays all environment variables currently set in your session. This is useful for debugging or understanding your system configuration.

Set an environment variable (current session only):

You can set your own environment variables like so:

export MY_VAR="Hello World"
echo $MY_VAR

Output:

Hello World

Consider:

Use environment variables in commands:

You can use environment variables as placeholders in your commands. They will be evaluated when you execute the command. Try this:

cd $HOME
ls $HOME

The shell replaces the variable with its value before executing the command.

Why are environment variables useful?

Understanding KEY and VALUE:

Each environment variable is made of a KEY and a VALUE. The key is the variable name, and the value is what it stores. You can fetch any value by calling it by key name using $KEY.

Setting variables without export (local to current shell):

You can set a variable temporarily within your current session without using export:

FNAME="Waldo"
echo $FNAME

However, variables set this way are only available in the current shell session and won’t be accessible to child processes or scripts. To make a variable available to other processes, you must use export (see above).

Making environment variables persistent:

By default, environment variables set with export only exist in the current terminal session. If you restart the computer or open a new terminal, they will be erased.

To make an environment variable persist in your account, you can store it in a configuration file:

For your user account (persistent across sessions):

Assuming that bash is your default shell, you can edit a hidden file in your home directory, .bashrc, and insert the export command:

export FNAME="Waldo"

Upon your next login or when you start a new terminal session, that variable will be available. You can also reload it in your current session by running:

source ~/.bashrc

For system-wide configuration (requires root/sudo):

If you can become root or use the sudo command, there is also a system-wide file for these exports. Simply insert your KEY=VALUE environment variable there (no export needed). That file can be found at:

/etc/environment

This makes the variable available to all users on the system.

File Permissions

  1. Touch a file named permission_test and echo some content into it.

  2. Use ls -al to see it listed in your directory.

  3. Now change its permissions to 000 like this:

chmod 000 permission_test

Try to cat the contents of the file. You should get a permission denied message.

  1. Now change its permissions so that only you can read and write the file:
chmod 600 permission_test

Use ls -al again to see the permission bits for the file.

  1. Finally, let’s grant other members of your group read access, along with the access we already gave you:
chmod 640 permission_test

List the directory contents once more with ls -al and notice the permission bits for the file.

Notice the full set of characters in the far left column:

-rw-r-----   1 nmagee  staff     0B Jan 16 09:27 permission_test

The first character represents what type of object it is, i.e. file (-), directory (d), link (s), etc.

The next 9 characters represent permissions for the USER (i.e. the owner), GROUP, and OTHER machine users.

Each of those entities can have any combination of rwx permissions, which stands for READ, WRITE, and EXECUTE. This applies both to files and directories.

So to see rwxrwxrwx means the user, group, and other users all have full permissions to read, write, and execute the file/folder. Read more here about POSIX permissions.

As practice, you should now determine what command is required to allow the USER and GROUP read/write permissions to a file, but no access to OTHER users.

Advanced Concepts (Optional)

If you like to dive a bit deeper, explore the following commands. The content in the Advanced section is not part of any quizzes.

The man command in detail (Manual Pages)

Use the man command to display the manual (help documentation) for commands. Manual pages are the built-in documentation system in Linux/Unix systems.

man ls

Example output (interactive view):

LS(1)                    User Commands                   LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by
       default).  Sort entries alphabetically if none of -cftuvSUX nor
       --sort is specified.

       Mandatory  arguments  to  long  options are mandatory for short
       options too.

       -a, --all
              do not ignore entries starting with .
...
(END)

Search for a keyword in manual pages:

man -k "list directory"

Your output may look like this:

ls (1)              - list directory contents
dir (1)             - list directory contents
vdir (1)            - list directory contents in long format

View specific section of manual:

man 1 ls

Common manual sections:

Quick reference (whatis):

whatis ls

Your output may look like this:

ls (1)              - list directory contents

Find manual page location:

man -w ls

Your output may look like this:

/usr/share/man/man1/ls.1.gz

Key concepts:

More touch

Create file with specific timestamp:

touch -t 202401151430 file.txt

The -t flag allows you to set a specific timestamp. Format: [[CC]YY]MMDDhhmm[.ss] where:

Common use cases:

Advanced Sorting

Reverse sorting

Use the sort -r command to sort in descending order (reverse alphabetical or numerical order).

sort -r fruits.txt

The output should show this:

strawberry
cherry
banana
apple
BLUEBERRY

Reverse numerical sort:

sort -rn numbers.txt

The output should show this:

10
8
6
5
2
1

-rn combines reverse (-r) and numerical (-n) sorting to get highest numbers first.

Removing duplicates

Use the sort and uniq commands to sort and remove duplicate lines, keeping only unique entries.

Create a file duplicates.txt containing:

apple
banana
apple
cherry
banana
strawberry
sort -u duplicates.txt

The output should show this:

apple
banana
cherry
strawberry

Alternative approach:

sort duplicates.txt | uniq

uniq also removes duplicates, but requires sorted input (hence the pipe from sort).

Sorting by specific columns

Use the sort command with the -k option to sort structured data (like CSV or tab-separated files) by a specific column.

Create a file students.txt containing (tab-separated):

Alice	25	Math
Bob	22	Science
Charlie	25	History
Diana	23	Math

Sort by the first column (name):

sort students.txt

Sort by the second column (age) numerically:

sort -k2 -n students.txt

The output should show this:

Bob	22	Science
Diana	23	Math
Alice	25	Math
Charlie	25	History

Sort by the third column (subject):

sort -k3 students.txt

The output should show this:

Charlie	25	History
Alice	25	Math
Diana	23	Math
Bob	22	Science

Specify a delimiter:

Create a CSV file data.csv:

Name,Age,City
Alice,25,New York
Bob,22,Los Angeles
Charlie,25,Chicago

Sort by the second column (Age) in a CSV:

sort -t',' -k2 -n data.csv

The output should show this:

Name,Age,City
Bob,22,Los Angeles
Alice,25,New York
Charlie,25,Chicago

Sorting large files: Handling memory limitations

Use the sort command with memory management options to sort very large files that might exceed available memory.

sort -T /tmp largefile.txt > sorted_largefile.txt

Specify buffer size:

sort -S 1G -T /tmp hugefile.txt > sorted_hugefile.txt

Combining multiple sort options

Use the sort command to combine multiple sorting options for complex sorting needs.

Sort numerically, in reverse, and remove duplicates:

sort -nru numbers.txt

Sort case-insensitively and remove duplicates:

sort -fu fruits.txt

Sort by column, numerically, in reverse:

sort -t',' -k2 -rn data.csv

You can combine multiple flags:

Key advanced sorting options:

Command Chaining with && and ||

The && (AND) and || (OR) operators chain commands based on their success or failure.

&& (AND operator)

Executes the next command only if the previous command succeeds (exits with status code 0).

mkdir newdir && cd newdir

Creates newdir and only changes into it if creation succeeds. If mkdir fails, cd won’t execute.

Multiple commands:

mkdir project && cd project && touch README.md && ls -l

Each command executes only if the previous one succeeds. If any fails, the chain stops.

|| (OR operator)

Executes the next command only if the previous command fails (exits with non-zero status code).

mkdir backup || echo "Directory already exists"

Attempts to create backup. If it fails, prints the message instead.

Fallback commands:

python3 script.py || python script.py || echo "Python not found"

Tries python3, then python, then prints error if both fail.

Combining && and ||

mkdir project && cd project && touch README.md || echo "Failed to set up project"

Creates directory, changes into it, creates file. If any step fails, prints error message.

Key differences from pipes:

Redirecting Input and Output (Advanced)

File Descriptors

In Linux/Unix, every process has three standard file descriptors:

Understanding these file descriptors allows you to control where input comes from and where output goes.

Redirecting input with <

Create a text file unsorted.txt that contains:

banana
apple
strawberry
cherry
BLUEBERRY

Execute this command:

sort < unsorted.txt > sorted.txt

Observe the content in the output file sorted.txt. It should contain:

apple
banana
cherry
strawberry

This works universally for any command. Try this:

grep "berry" < logfile.txt > errors.txt

Redirecting stderr with 2>:

find /nonexistent 2> errors.log

Example output in errors.log:

find: '/nonexistent': No such file or directory

Redirecting both stdout and stderr:

Method 1: Redirect both separately

command > output.txt 2> errors.txt

Method 2: Redirect stderr to stdout with 2>&1

command > output.txt 2>&1

Example:

find . -name "*.txt" > results.txt 2>&1

Redirect everything to /dev/null:

command > /dev/null 2>&1

Combining input and output redirection:

python script.py < input.txt > output.txt 2> errors.txt

Key concepts:

User and Group Information

In codespace you’re running your terminal session in an isolated single user environment. The prompt shows @ksiller which indicates the GitHub username of the codespace owner. But is that my user account on the system? Let’s find out with the whoami command.

whoami

Your output may look like this:

vscode

The whoami command displays the username of the current user. In Codespaces, the system assigns a user account vscode (or similar) regardless of your GitHub username. The @ksiller in the prompt is just a display name, not the actual system user.

What about the group? You can check with the groups command.

groups

In Codespace your output may look like this:

ds2002 vscode docker

Shows all groups that the current user belongs to. In this case, the user vscode belongs to the groups vscode, ds2002 and docker. These are just examples; the exact groups in your environment will likely differ. Groups are used for managing file permissions and access control.

Who else is in my group?

getent group ds2002   # replace vscode with your group name

Your output may look like this:

ds2002:x:1000:vscode

The getent command queries system databases (like /etc/group) to get information about groups, users, hosts, etc.

Process Information

View running processes:

ps aux

Your output may look like this:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
vscode       1  0.0  0.1  12345  6789 ?        Ss   14:30   0:00 /bin/bash
vscode      42  0.1  2.3  45678 12345 ?        S    14:31   0:05 python script.py

Find a specific process:

ps aux | grep python

Your output may look like this:

vscode      42  0.1  2.3  45678 12345 ?        S    14:31   0:05 python script.py

System Resources

Monitor system resources interactively:

htop

An interactive process viewer (if installed). Press q to quit. Shows CPU, memory usage, and running processes in real-time. More user-friendly than top.

Check disk usage:

df -h

Your output may look like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G  5.2G   14G  28% /

Check disk usage of current directory:

du -sh .

Your output may look like this:

125M    .

File Permissions and Ownership

Change file permissions:

chmod 755 script.sh

Change file ownership:

chown user:group filename

Changes the owner and group of a file. Usually requires sudo (superuser privileges) unless you own the file.

Create directory with specific permissions:

mkdir -m 755 mydir

These advanced commands help you understand and manage your system at a deeper level!

Resources