Collaboration and testing

Whatever kind of code you are developing, if it is data analysis, numerical, or just complex, it will eventually fail. The question is not if, but when β€” and how long it will take you to find out. Ideally, each commit you add to the main branch should introduce features, without breaking what is already there. How can you be sure that that's the case? For programs that solve numerical problems, for example, even small errors are important to capture, as they might highlight an instability, or something that might be accumulated over iterations.

Checking manually if every part of the code works at each commit you push to main is literally impossible. That's where automated testing comes in. Tests are not optional add-ons, they are core infrastructure. Without them, you're completely blind as you cannot monitor how your code behaves.

Example of a test

A good test is:

  • Targeted: It checks one specific behavior.
  • Deterministic: It gives the same result every time.
  • Fast: Tests should run in seconds, not minutes.

Avoid vague or overly general tests. Don’t just check that a function "runs" Assert specific numerical results. For numerical methods, this might mean checking convergence or expected errors.

Example: bisection algorithm

Let's see this simple example with the bisection algorithm written in C. Let's imagine we wrote the following two source files that will be used by some other program to find the roots of an equation.

bisection.c

#include <math.h>

double bisection(
  double (*f)(double),
  double a,
  double b,
  double tol,
  int max_iter)
{
  for (int i = 0; i < max_iter; i++) {
    double c = (a + b) / 2;
    if (fabs(f(c)) < tol) {
      return c;
    }
    if (f(a) * f(c) < 0) {
      b = c;
    } else {
      a = c;
    }
  }
  // Return the best guess if
  // max iterations are reached
  return (a + b) / 2;
}

bisection.h

double bisection(
  double (*f)(double),
  double a,
  double b,
  double tol,
  int max_iter);

Before going ahead to use this function, let's create a unit test for it. A unit test is a small, isolated test that verifies the functionality of a specific part of the code, usually a single function. It checks if the code behaves as expected under defined conditions. Let's create three unit tests:

test_bisection.c

#include <stdio.h>
#include <assert.h>
#include <math.h>
#include "bisection.h"

// Example function: x^2 - 2
double f(double x) {
    return x * x - 2;
}

// Test function for bisection method
void test_bisection_root() {
    double root = bisection(f, 0, 2, 1e-10, 1000);
    printf("Test 1: Checking if root is close to sqrt(2)...\n");
    assert(fabs(root - sqrt(2)) < 1e-10);  // Root should be close to sqrt(2)
    printf("Test 1 passed πŸ‘Œ.\n");
}

// Test case where the function has no root in the interval
void test_bisection_no_solution() {
    double root = bisection(f, 1, 1.5, 1e-5, 100);
    printf("Test 2: Checking if root is within the interval [1, 1.5]...\n");
    assert(root >= 1 && root <= 1.5);
    printf("Test 2 passed πŸ‘Œ.\n");
}

// Test case with root at the interval's boundary
void test_bisection_edge_case() {
    double root = bisection(f, sqrt(2.)-1.e-10, 2, 1e-8, 100);
    printf("Test 3: Checking if root is close to sqrt(2) in small interval...\n");
    assert(fabs(root - sqrt(2)) < 1e-8);
    printf("Test 3 passed πŸ‘Œ.\n");
}

int main() {
    test_bisection_root();
    test_bisection_no_solution();
    test_bisection_edge_case();

    printf("All tests passed πŸ‘ŒπŸžοΈπŸ–οΈπŸš€.\n");
    return 0;
}

These tests check that the function behaves as expected in three different situations. Let's compile everything and check if it works

> cc -c bisection.c
> cc -o test_bisection test_bisection.c bisection.o -lm
> ./test_bisection
Test 1: Checking if root is close to sqrt(2)...
Test 1 passed πŸ‘Œ.
Test 2: Checking if root is within the interval [1, 1.5]...
Test 2 passed πŸ‘Œ.
Test 3: Checking if root is close to sqrt(2) in small interval...
Test 3 passed πŸ‘Œ.
All tests passed πŸ‘ŒπŸžοΈπŸ–οΈπŸš€.

Every test passed, so now you can commit to main. In the future, when adding other commits, before pushing to main you have to make sure that all tests pass, even the one of past features. Otherwise you are undoing the work of your past self, or someone else.

Workflow in Practice

We will now apply to a simple but structured repository what we have learned in this course.

To contribute to a public repository, the standard approach is to create a fork, i.e. a personal copy of the repository, clone it and interact with it. Once you want to merge your work into the original public repository, you should create "pull request" to the maintainers, wait for a review, and eventually be merged. This is the typical workflow in very large open-source projects, where the development process must be supervised by someone. Otherwise it's chaos.

In a small community, however, if people interact everyday regularly, these additional layers of complications might be unnecessary. Everyone might be able to directly push to the main branch, at his own risk and responsability. This, of course, is possible only in small, local communities. Let's dive into this situation by working together on a very simple molecular dynamics solver.

Molecular dynamics solver

To clone the repository with write permissions (being able to create branches, push to main, etc.) you have to use be added to the repository by the maintainer and use the ssh protocol.

![NOTE]

To use the ssh (Secure Shell) protocol you need an ssh key. SSH key pairs use public key infrastructure technology, the gold standard for digital authentication and encryption. An SSH key relies on the use of two related but asymmetric keys, a public key and a private key, that together create a key pair that is used as the secure access credential. The private key is secret, known only to the user, and should be stored safely. The public key can be shared freely with any SSH server to which the user wishes to connect.

If you don't already have it, to create a ssh key pair you can use the ssh-keygen command, which will generate a public key <keyname>.pub and its private counterpart <keyname> in the ~/.ssh/ directory.

Once your github account has been granted write access to the repository, you will need to copy the public key in your github account in order to associate your local terminal to your github account.

> cd
> git clone git@github.com:scarpma/md6.git
> cd md6

You can check the repository's README.md file to understand what the program does.

In this repository, unfortunately, there are no unit tests. However, another useful kind of tests are what can be called "End-to-End Tests". These tests verify the entire program’s behavior by running the full application in a real environment and testing whether the results match expectations.

In the numerical scientific world, these tests can ensure that the output matches expected results either

  1. From a previous version
  2. From an externally-provided reference:
    • existent analytical solutions
    • existent results in the literature

Check the run_tests.sh to understand what kind of tests the repository does.

We can compile and check if the checked out version of the repository works and passes all tests:

> make
> ./run_tests.sh
make: Nothing to be done for `all'.
===========================
Doing test test_fluid/
===========================

r_max=3.54835    BOXL=5.25    red. dens=0.746356
Initialize FCC lattice and random velocities.
Integration started:

Done!

Testing ...
col 1 max diff: 0
col 2 max diff: 0
col 3 max diff: 0
col 4 max diff: 0
col 5 max diff: 0
col 6 max diff: 0
===========================
ALL TEST PASSED
YOU'RE GOOD TO GO
===========================


===========================
Doing test test_solid/
===========================

r_max=3.54835    BOXL=5.25    red. dens=0.746356
Initialize FCC lattice and random velocities.
Integration started:

Done!

Testing ...
col 1 max diff: 0
col 2 max diff: 0
col 3 max diff: 0
col 4 max diff: 0
col 5 max diff: 0
col 6 max diff: 0
===========================
ALL TEST PASSED
YOU'RE GOOD TO GO
===========================

Ok, everything is set and ready to work.

Rules for collaboration

The rules to contribute to the repository are the following. Let’s say you are adding a new numerical solver to the project:

  1. Dot not break the main branch! Push Only if All Tests Pass.
  2. To implement your stuff, create a specific branch.
  3. Before merging to main:
  • Pull/rebase from main
  • Resolve any conflicts
  • Run full tests locally
  • Only then, push to main
  1. No pull requests, no reviews: just discipline.

If you break the main branch, you break everyone.

What to do

  1. Add a new function and relative unit tests. Examples are:
  • function to read back a VTK file produced by the code. What could be a unit test?
  • implement the computation of Radial Distribution Function (RDF);
  • implement the velocity verlet algorithm;
  • add a "step counter" every 100 time steps that prints some info
  1. Improve performance of the code:
  • remove the old un-necessary write to .xyz files
  1. Refactor part of the codebase
  2. Add more end-to-end tests
  3. Fix a bug, if you find one