Course materials and documentation for DS2002
The goal of this activity is to familiarize you with scripting in Python. Python scripting is essential for automating tasks, processing data, orchestrating workflows, and building reusable tools that can save time and reduce errors.
Note: Work through the examples below in your terminal (Codespace or local), experimenting with each command and its various options. If you encounter an error message, don’t be discouraged—errors are learning opportunities. Reach out to your peers or instructor for help when needed, and help each other when you can.
If the initial examples feel like a breeze, challenge yourself with activities in the Advanced Concepts section and explore the resource links at the end of this post.
Scripting in python is fairly similar to bash, but it has a lot more functionality in
terms of libraries, classes, functions, etc. A few things to note:
bash it is not as easy to pass $1, $2 parameters in the command-line.
Refer to Command line arguments in Python for a basic tutorial.bash and other low-level tools (grep, sed, awk, tr, perl, etc.) can parse
plain-text “flat” files fairly efficiently, Python can ingest a data file and load it
into memory for much more complex transformations. A library like pandas can use
dataframes like a staging database for you to query, scan, count, etc. Here’s a great pandas
tutorial on Kaggle.JupyterLab is pre-installed in your codespace environment. To start it:
jupyter lab --allow-root
http://127.0.0.1:8888/lab?token=...
Copy the token info after the “…token=”. We’ll need it in the next step.
Note: Port 8888 is automatically forwarded in Codespaces, so you don’t need to manually configure port forwarding.
Alternatively you can set up the software environment locally on your own computer, see the setup instructions.
python my_script.py # add command line args as needed if the script is written to handle them.
Command line arguments in Python Pandas tutorial on Kaggle.