DS2002 Data Science Systems

Course materials and documentation for DS2002

View the Project on GitHub ksiller/ds2002-course

Git & GitHub

The goal of this activity is to familiarize you with version control using Git and GitHub. These tools are essential for tracking changes in your code, collaborating with others, managing project history, and contributing to open-source projects.

If the initial examples feel like a breeze, challenge yourself with activities in the Advanced Concepts section and explore the resource links at the end of this post.

In-class exercises

At your table, select one person to set up a new repository on GitHub. Work through these steps:

Step 1: Repository Setup

Step 2: Clone the Repository

Important: Make sure you are not inside an existing Git repository when running the git clone command. You don’t want to create nested Git repositories.

Step 3: Create Unique Files

Step 4: Verify on GitHub

Step 5: Pull Latest Changes

So far, so good. Let’s take it to the next level!

Step 6: Create Collision File

When collaborating, team members may be working in parallel on local copies of the same file. This leads to divergence and file version conflicts need to be resolved. Let’s simulate such scenario.

Step 7: Resolving Merge Conflicts

The early bird gets the worm: If you are the first person to push the collision.txt file, you’re in luck—the push should go through without a hitch. However, the others will encounter an error message like this:

! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://github.com/YOUR_USERNAME/REPO_NAME.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.

To resolve the conflict:

Starting with the group member next to the first person who successfully pushed, go clockwise and perform the following steps:

  1. Pull with merge to reconcile the differences:
    git pull origin main --merge
    

    (The --merge flag is explicit and avoids warnings in newer Git versions.)

    This will create a merge commit.

  2. Git will pause and indicate that there are conflicts. VSCode (or your editor) will highlight the conflicting lines in collision.txt.

  3. Resolve the conflict: You want to append (not replace) the content so that everyone’s entry is included. The file should contain all group members’ entries, one per line:
    Alice, cat
    Bob, dog
    Carol, bird
    
  4. After resolving the conflict, stage the resolved file:
    git add collision.txt
    
  5. Complete the merge/rebase:
    git commit
    

    This completes the merge commit.

  6. Push your changes:
    git push origin main
    
  7. The next person in the group should repeat steps 1-6 until everyone has successfully pushed their entry to the consolidated collision.txt file on GitHub.

Congratulations, you did it! You are ready for Lab 02.

Additional Practice

Setting up and Managing Repositories

Read git in Data Science for a brief introduction.

Then work through the Creating and Managing Git Repositories Exercises. These exercises will cover:

Working with branches

  1. List all branches:
    git branch
    

    This shows all local branches. The current branch is marked with an asterisk (*).

  2. Create a new branch:
    git switch -c feature-branch
    

    The -c flag creates a new branch and switches to it immediately. Alternatively, you can create a branch first with git branch feature-branch and then switch to it with git switch feature-branch.

  3. Switch to an existing branch:
    # be safe, make sure you are not losing anything
    git add .
    git commit -m "committing everything before getting files from other branches"
    # now it is safe to switch
    git switch main
    

    This switches you to the main branch. Make sure you’ve committed or stashed any changes before switching branches.

Pull requests

Pull requests (PRs) are a way to propose changes to a repository. When you create a pull request, you’re asking the repository maintainer to review and merge your changes into the main branch. Pull requests allow for code review, discussion, and collaboration before changes are integrated into the project.

  1. Create a new branch for your changes:
    git switch -c my-feature
    
  2. Make some changes:
    echo "## Features" >> README.md
    echo "- Feature 1" >> README.md
    git add README.md
    git commit -m "Add features section to README"
    
  3. Push the branch to GitHub:
    git push -u origin my-feature
    

    The -u flag sets up tracking between your local branch and the remote branch, so future git push and git pull commands know which remote branch to use.

  4. On GitHub:
    • Navigate to your repository
    • You should see a banner suggesting to create a pull request
    • Click “Compare & pull request”
    • Add a description of your changes
    • Click “Create pull request”
  5. Review the pull request:
    • Check the “Files changed” tab to see your modifications
    • Add comments if needed
    • Merge the pull request when ready
  6. After merging, update your local repository:
    git switch main
    git pull origin main --merge
    git branch -d my-feature
    

Advanced Concepts (Optional)

Working with branches and resolving merge conflicts

For an additional challenge work through the scenario in the Advanced Git Demo.

Initializing a new repo and connecting it to GitHub with gh cli

You may already have a project set up in a directory on your computer (or in codespace), but it’s not set up as a Git repository yet. The following steps show you how to initialize it and connect it to GitHub.

Create a new local Git repository

  1. Create a new directory for your project:
    cd # go to your home directory, or any other directory that is NOT inside an existing repo
    mkdir my-git-project
    cd my-git-project
    
  2. Initialize a Git repository:
    git init
    
  3. Verify the repository was created:
    ls -la .git
    

    You should see a .git directory containing the repository metadata.

    Note: This repository only exists in your local environment; it is not on GitHub yet.

  4. Create repository from command line (requires GitHub CLI)
    # Install GitHub CLI if not already installed
    # Then create the repository:
    gh repo create my-git-project --public --source=. --remote=origin --push
    

    This single command creates the GitHub repository and pushes your code.

Stashing, rebasing, etc.

If you want to explore additional Git features, review the Advanced git tutorial.

Creating a Repository from a Template

GitHub allows you to create new repositories from templates, which can include pre-configured files, workflows, and settings. This is useful for starting projects with best practices already in place.

Using the Secure Repository Template

The course repository includes a template URL for creating repositories with security best practices. Here’s how to use it:

Step 1: Get the template URL

The template URL is located in github-new-repo-from-template.txt in this directory (practice/03-git/). The URL format is:

https://github.com/new?owner=YOUR_USERNAME&template_name=secure-repository-supply-chain&template_owner=skills&name=YOUR_REPO_NAME&visibility=public

Step 2: Customize the URL

Replace the placeholders:

Step 3: Create the repository

  1. Copy the complete URL with your customizations
  2. Paste it into your browser’s address bar
  3. Press Enter
  4. GitHub will open the repository creation page with the template pre-selected
  5. Review the settings and click “Create repository”

Example:

If your username is johndoe and you want to create a repo called my-secure-project:

https://github.com/new?owner=johndoe&template_name=secure-repository-supply-chain&template_owner=skills&name=my-secure-project&visibility=public

What you get:

The “secure-repository-supply-chain” template from GitHub Skills includes:

Alternative: Using GitHub’s Web Interface

You can also create a repository from a template using GitHub’s web interface:

  1. Go to the template repository: https://github.com/skills/secure-repository-supply-chain
  2. Click the green “Use this template” button
  3. Select “Create a new repository”
  4. Choose your owner, repository name, and visibility
  5. Click “Create repository”

Resources