Unix and Linux are, loosely speaking, two families of operating systems (OSs) That's the software that sits between the computer hardware and the usual programs.
If you've never hear of them, check the Wikipedia pages of Unix and Linux
Historically, Unix is tied with the C programming language, in which it was written (the first time for an OS! back then OSs were written in assembly). Unix was also instrumental to the birth of Internet (and later of the www).
Regarding HPC systems, the plot on the side that shows the share of top500 supercomputers by OS leaves no doubt that if you want to do computer simulations, you will have to deal with a variant of Linux.
Share of top500 computers by OS
The benefits of Unix/Linux include:
A stable OS structure: You can use a Unix-like OS very much the same way people did in the late 1960s. Graphical interfaces on top have changed (as well as the implementation under the hoods) but the way to use a terminal (nowadays a virtual one) through a command shell as well as the basic system library functions (more on these topics later) have not changed. This means that it will be very likely that the software you write will be still usable in 40 years (and 40 years old software is still usable today) without many changes, if any at all, in most cases. Also, and very importantly, no-one strictly speaking owns Linux, and it will keep being around as long as users and developers will keep using it.
Portability: The software you write/download can be used on a wide range of Unix-based machines, without (too much) hassle. This includes your laptop as well as multimillion pound computing clusters.
Plenty of resources: The amount of freely accessible software, in particular free software where you have access to the source code (and can modify it to your needs) is simply staggering, and so are resources (forums, HOWTOs, online courses, code repositories for development). You can find almost anything you need to perform advanced computer simulations, analyse your data and present them.
There are several OSs based on Linux or variants of Unix. But wait. We said that Unix and Linux are OSs! Well, the same name is used to describe several things. To be correct, Linux is an OS kernel, a piece of software that handles memory, decides when and which programs need to be given time to run on the CPU, and collects the drivers for your devices (hard disks, keyboard, mouse, ...).
On top of the kernel, one needs software libraries that implement basic facilities (like opening, reading and writing files) that are device-independent. In this way you can open and write a file calling the same code, regardless whether this file is on a hard-drive, on a USB device, etc. This is the role of the GNU C library, or, glibc.
There are several ways to try out a linux distribution. The next page will deal with this.
Left: schematics of an OS layer structure; Center: Tux, the Linux mascotte; Right: the GNU logo. Do not let the peaceful, good-natured look of Tux trick you into thinking that Linux is a toy OS.
In any modern operating system, the kernel is the fundamental piece of software that acts as the bridge between the hardware
and the user-space applications running on it. It’s responsible for critical functions like managing hardware resources (CPU, memory, storage devices),
handling system security, and ensuring efficient communication between the hardware and software layers.
Examples of widely used kernels include
Unix: The original kernel, developed in the 1970s, forms the foundation of many modern systems.
BSD Kernels: The family of kernels originating from the Berkley Software Distribution (a variant of UNIX) used in operating systems like FreeBSD, OpenBSD, and NetBSD.
Linux: The kernel used in many operating systems, including GNU/Linux distributions like Ubuntu, Fedora, and Debian.
Windows NT: The kernel used in modern versions of Microsoft Windows, including Windows 10 and Windows Server.
Darwin: The kernel used in some of Apple's macOS systems.
Some early computer systems operated without what we would consider a modern kernel, relying on simpler architectures for hardware and software management.
For example, MS-DOS lacked a true kernel, operating in a single-tasking environment where programs had direct hardware access.
IBM’s OS/360 featured a more advanced system with a Supervisor that managed tasks and memory, but it was part of a larger monolithic structure rather than a distinct, modular kernel.
The AS/400 system (now known as IBM i) used the so-called Licensed Internal Code (LIC) to manage hardware, though its architecture was highly integrated, abstracting much of
the kernel-like functionality into tightly coupled layers like TIMI (Technology Independent Machine Interface), including also an integrated relational database as part of the operating system
itself.
To understand the principles behind how a modern kernel like Linux works, it's helpful to break it down into its main components:
The kernel is responsible for managing processes—the individual programs that run on your computer.
It ensures that each process gets a fair share of CPU time, coordinates multitasking, and handles communication between processes and providing, for example:
Process Scheduling: to decide which processes get CPU time and in what order. This ensures that system resources are used efficiently, allowing multiple processes to run concurrently (multitasking).
Process Isolation: so that one program cannot access or interfere with another’s memory space or data.
Example: When you have multiple programs running (e.g., a browser, music player, and text editor), the kernel manages these processes so they don't conflict and ensures each gets CPU time
through a mechanism called time-sharing.
It ensures that each running process has the memory it needs and that no process exceeds its allocated memory. In particular, it handles:
Memory Allocation: responsible for reserve and release memory as processes request it, ensuring there’s no memory leak (unused memory that’s not returned to the system).
Virtual Memory: swapping inactive memory to disk (in a special area called the swap space) when the system runs low on physical RAM.
Example: When you open a new tab in your browser, the kernel allocates memory to that process. If the system is running low on RAM, the kernel may move some data
from RAM to the swap space, freeing up memory for more immediate tasks.
These are specialized pieces of code that allow the operating system to communicate with hardware devices like printer, network card, or hard drive.
The kernel includes a wide range of device drivers, providing also hardware abstraction, so that user programs don't need to know the details of how
the hardware works.
Example: when you write to a file on a hard drive, the kernel takes care of communicating with the drive’s hardware through a device driver.
Example: when you type on your keyboard, the kernel uses a keyboard driver to receive the input and pass it along to the appropriate application, like your text editor.
The Linux kernel provides a unified file system interface that allows you to store and retrieve files from various storage devices, whether they're local hard drives, SSDs, or
remote file systems accessed over a network. The kernel supports many types of file systems (like ext4, FAT32, NTFS), which dictate how data is stored and retrieved on disk.
It also handles the organization of files into directories and maintains metadata like file permissions, timestamps, and ownership.
Example: When you run the ls command to list files in a directory, the kernel interacts with the file system to retrieve and display this information.
The kernel provides implementations of networking protocols (e.g., TCP/IP) that allow computers to communicate over networks, such as the internet.
This includes also handling routing of communications, packet filtering (the so-called "firewall").
In particular, the linux kernel provides also a socket interface to user applications, allowing programs to send and receive data over the network as though they were reading from or writing to a file.
Example: When you load a webpage in your browser, the kernel handles sending requests to the web server and receiving the response over the network.
The kernel provides mechanisms for processes to communicate and share data with each other. These mechanisms include pipes, message queues, shared memory, and signals. In particular:
Pipes and Message Queues: allow one process to send data to another and let processes exchange structured messages.
Signals: are used to notify processes of asynchronous events, e.g., when a user presses Ctrl+C to terminate a program.
Example: When a program like a web server forks a child process to handle a new request, it can use pipes to send data back and forth between the parent and child processes.
The Linux kernel is responsible for ensuring that users and processes can only access the resources they're allowed to. This includes managing file permissions, user authentication, and process isolation.
The kernel enforces user and group-based permissions, ensuring that users can only access the files and devices they're authorized to.
Example: If you try to delete a system file without the proper permissions, the kernel will prevent you from doing so, protecting the integrity of the system.
This documents provides you some information on some basic unix commands. You should try them out in a terminal as you read, usually this helps remembering them faster.
Using a command-line-interface is one of the most powerful ways to use with a computer. Unix, Linux and Mac OS all provide so called "shells", or command line interpreters running in "terminals" that allow you to invoke ("run") other programs by typing commands on your keyboard. If you boot a linux machine you can access the system through several terminals. Press ctrl+alt+Fn (Fn being a function key) to access them, use F7 or higher to switch back to the graphical user interface (GUI), if it has started. Otherwise launch a terminal in a window in your GUI. Typical Linux terminals include xterm or similar applications. On Mac OS you will have to use the Terminal.app located in the Applications/Utilities folder.
The Bash (Bourne again shell) is a popular shell, written originally in 1978 and available (and often the default shell) in almost all variants of Unix/Linux and Mac OS (where the superset zsh is used).
Once you launch a terminal, the shell will run, and show you the so-called prompt.
> cd # change directory to your home (when no other arguments are passed)
> cd ~ # same
> cd .. # go to the parent directory (mind the space)
> cd <directory> # go to <directory>. This has to be either an absolute path `/home/m.sega/test` or a relative one (without the leading slash)
> cd - # go back to the directory where you were before. ```
Other commands to deal with files and directories
> mkdir <dir> # creates the directory <dir>
> rmdir <dir> # removes the directory <dir>, if empty
> touch <file> # create an empty file named <file>
> rm <file> # remove the file <file>
> rm -rf <dir> # remove the directory and its contents (-f to force), recursively (-r)
> mv <source> <dest> # move file or directory <source> to <dest>. Used also to rename them.
> cp <source> <dest> # copy file <source> to <dest> <dest> could be a new or existing file
# (that will be overwritten), or a destination directory where the file
# is copied.
> pwd # prints working directory (the path of the directory where you are)
> ls # list all the files in the working directory
> ls <dir> # list directoy contents
> mv sample_file directory # moves the file into the directory
> mv directory/sample_file directory/renamed_file # renames the file
[!CAUTION]
rm is irreversible! There is no "bin". Once removed, it no longer exists.
The simplest wildcards in bash are the asterisk '*' (or star) and the question mark ( ? ). The first matches an indefinite number of any characters, the second one any single character.
So, for example, if you have a series of files named data.1.dat, data.2.dat, ..., data.3277.dat, and you want to move them to a subdirectory, just type:
Very important if you want to login to your nearest cluster:
> ssh <username>@<hostname> # open a secure shell (encrypted) connection to <hostname>
#example:
> ssh ucecxxx@myriad.rc.ucl.ac.uk # if you connect for the first time, it will ask if you trust the host.
# If so, answer 'yes' by spelling the whole word, not just 'y'
This content represents the very basic you need to know to make good use of bash. Go ahead if you want to know slightly more advanced topics.
type man <command> to obtain the manual page of (almost) any command. Learn how to interpret the output of man pages. For example the command ls is used to list contents of a directory.
LS(1) BSD General Commands Manual LS(1)
NAME
ls -- list directory contents
SYNOPSIS
ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1%] [file ...]
DESCRIPTION
For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated informa-
tion. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well
as any requested, associated information.
If no operands are given, the contents of the current directory are displayed. If more than one operand is given, non-directory oper-
ands are displayed first; directory and non-directory operands are sorted separately and in lexicographical order.
The following options are available:
-@ Display extended attribute keys and sizes in long (-l) output.
-1 (The numeric digit ``one''.) Force output to be one entry per line. This is the default when output is not to a terminal.
-A List all entries except for . and ... Always set for the super-user.
-a Include directory entries whose names begin with a dot (.).
:
Hit <spacebar> to advance quickly or B to go back. Use the cursor keys to move line-by line. Hit Q to exit.
Other programs have a built-in help, that you can invoke by passing, typically the options -h or --help , for example:
> man --help
yields
man, version 1.6g
usage: man [-adfhktwW] [section] [-M path] [-P pager] [-S list]
[-m system] [-p string] name ...
a : find all matching entries
c : do not use cat file
d : print gobs of debugging information
D : as for -d, but also display the pages
f : same as whatis(1)
h : print this help message
k : same as apropos(1)
K : search for a string in all pages
t : use troff to format pages for printing
w : print location of man page(s) that would be displayed
(if no name given: print directories that would be searched)
W : as for -w, but display filenames only
C file : use `file' as configuration file
M path : set search path for manual pages to `path'
P pager : use program `pager' to display pages
S list : colon separated section list
m system : search for alternate system's man pages
p string : string tells which preprocessors to run
e - [n]eqn(1) p - pic(1) t - tbl(1)
g - grap(1) r - refer(1) v - vgrind(1)
This documents provides you some information on some basic unix commands. You should try them out in a terminal as you read, usually this helps remembering them faster.
Using a command-line-interface is one of the most powerful ways to use with a computer. Unix, Linux and Mac OS all provide so called "shells", or command line interpreters running in "terminals" that allow you to invoke ("run") other programs by typing commands on your keyboard. If you boot a linux machine you can access the system through several terminals. Press ctrl+alt+Fn (Fn being a function key) to access them, use F7 or higher to switch back to the graphical user interface (GUI), if it has started. Otherwise launch a terminal in a window in your GUI. Typical Linux terminals include xterm or similar applications. On Mac OS you will have to use the Terminal.app located in the Applications/Utilities folder.
The Bash (Bourne again shell) is a popular shell, written originally in 1978 and available (and often the default shell) in almost all variants of Unix/Linux and Mac OS (where the superset zsh is used).
Once you launch a terminal, the shell will run, and show you the so-called prompt. When you see it, the shell is ready to accept commands. If you don't see it, you can still type, but what you type will not be interpreted as a command. My prompt is very simple ">". Yours might be longer, showing some information on the directory you are in, or on which computer, and so on. Let's make two examples about typing in with or without a prompt
> pwd
/home/m.sega
> ls
Applications Documents
> echo "hello world"
hello world
In the snippet above I have given three commands. First, pwd , (print working directory) which shows you where you are. Then, ls , that lists the content of the directory. Finally, echo "hello world" , which just outputs what I have written to screen (no, it's not that useless, it can expand variables too, it can be redirected to write to a file and so on...)
There are some special sequences of characters that are expanded into directories, that you must be aware of. These include:
> ~ # your home directory
> ./ # also just . if not followed by another part of a path. The current directory
> ../ # also just .. if not followed by another part of a path. The parent directory
Now comes an example where the prompt disappears (because the system is waiting for input in the form of text)
>cat > file.txt
This is a test
^D
>
The command cat (catenate) is used to concatenate the content of several files or, as in this case, with the "redirection to file" character > , to wait for keyboard input until ctrl+D is hit (appearing as ^D ). Until ctrl+D is pressed, cat will keep adding what you type to the file file.txt .
Let's use cat again to see the result by the content of file file.txt to screen.
> cat file.txt
This is a test
>
Here are some categories of important commands to know.
type man <command> to obtain the manual page of (almost) any command. Learn how to interpret the output of man pages. For example the command ls is used to list contents of a directory.
LS(1) BSD General Commands Manual LS(1)
NAME
ls -- list directory contents
SYNOPSIS
ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1%] [file ...]
DESCRIPTION
For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated informa-
tion. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well
as any requested, associated information.
If no operands are given, the contents of the current directory are displayed. If more than one operand is given, non-directory oper-
ands are displayed first; directory and non-directory operands are sorted separately and in lexicographical order.
The following options are available:
-@ Display extended attribute keys and sizes in long (-l) output.
-1 (The numeric digit ``one''.) Force output to be one entry per line. This is the default when output is not to a terminal.
-A List all entries except for . and ... Always set for the super-user.
-a Include directory entries whose names begin with a dot (.).
:
Hit <spacebar> to advance quickly or B to go back. Use the cursor keys to move line-by line. Hit Q to exit.
Other programs have a built-in help, that you can invoke by passing, typically the options -h or --help , for example:
> man --help
yields
man, version 1.6g
usage: man [-adfhktwW] [section] [-M path] [-P pager] [-S list]
[-m system] [-p string] name ...
a : find all matching entries
c : do not use cat file
d : print gobs of debugging information
D : as for -d, but also display the pages
f : same as whatis(1)
h : print this help message
k : same as apropos(1)
K : search for a string in all pages
t : use troff to format pages for printing
w : print location of man page(s) that would be displayed
(if no name given: print directories that would be searched)
W : as for -w, but display filenames only
C file : use `file' as configuration file
M path : set search path for manual pages to `path'
P pager : use program `pager' to display pages
S list : colon separated section list
m system : search for alternate system's man pages
p string : string tells which preprocessors to run
e - [n]eqn(1) p - pic(1) t - tbl(1)
g - grap(1) r - refer(1) v - vgrind(1)
> cd # change directory to your home (when no other arguments are passed)
> cd ~ # same
> cd .. # go to the parent directory (mind the space)
> cd <directory> # go to <directory>. This has to be either an absolute path `/home/m.sega/test` or a relative one (without the leading slash)
> cd - # go back to the directory where you were before. ```
Let's see an example
> pwd
/home/m.sega
> ls
Applications Documents
> ls -l Documents # -l for more infos including permission, owner, group, size and creation date.
total 165192
-rw-r--r--@ 1 sega staff 598402 Dec 3 2021 Minutes.pdf
drwxr-xr-x@ 23 sega staff 736 Aug 10 10:03 Articles
> pwd
/home/m.sega
> cd Documents/Articles # move to a relative path from where we are
> pwd
/home/m.sega/Documents/Articles
> cd - # go back to where we were before
> pwd
/home/m.sega
Figure out what the following options of ls are doing.
ls -a
ls -R
Other commands to deal with files and directories
> mkdir <dir> # creates the directory <dir>
> rmdir <dir> # removes the directory <dir>, if empty
> touch <file> # create an empty file named <file>
> rm <file> # remove the file <file>
> rm -rf <dir> # remove the directory and its contents (-f to force), recursively (-r)
> mv <source> <dest> # move file or directory <source> to <dest>. Used also to rename them.
> cp <source> <dest> # copy file <source> to <dest> <dest> could be a new or existing file
# (that will be overwritten), or a destination directory where the file
# is copied.
> touch sample_file # creates an empty file
> mkdir directory # creates an empty directory
> mv sample_file directory # moves the file into the directory
> mv directory/sample_file directory/renamed_file # renames the file
The simplest wildcards in bash are the asterisk '*' (or star) and the question mark ( ? ). The first matches an indefinite number of any characters, the second one any single character.
So, for example, if you have a series of files named data.1.dat, data.2.dat, ..., data.3277.dat, and you want to move them to a subdirectory, just type:
Note: Bash does not care about how many consecutive '/' are there in a path. So './data/file.txt' is the same as './data////file.txt'. So, if in doubt, always add a '/' in variables that represent directories. Because /home/m.sega/data/////data.*.dat might be a valid path, but /home/m.sega/datadata.*.dat not.
Understanding the difference between the variable-expanding double quotes, and the opposite behaviour of single quotes will be important when you write scripts.
This is important when you want to generate a string that has the variable name in it (here: $HOME) , and not its associated value (here: /home/m.sega/)
Text files can be dumped to screen ( stdout ) with cat or can be inspected with the commands less or more (they are equivalent, more or less :-). Use "" to move down quickly, "B" to move back, and "/" to search.
> less <file> # inspect the file
> cat <file1> <file2> ... # concatenate files together and print them to screen (unless redirection ">" is used )
> head <file> # show the first 10 lines
> head -n 3 <file> # show the first 3 lines
> tail <file> # show the last 10 lines
> tail -n 3 <file> # show the last 3 lines
One of the most powerful capabilities of a shell is to allow to redirect the output of a command directly to another one. A series of commands that feed the next one with ones' output is called a pipeline.
Let's see an example with the ls and wc commands. The wc command, as the name suggests (word count! in the good old times, people had a need for short commands and plenty of humor) counts the characters, words and lines contained in a file, typed in, or passed through a pipeline.
> # we create first a file with some text in
> echo "When shall we three meet again? In thunder, lightning, or in rain?" > witch.txt
> wc whitch.txt
1 12 67 witch.txt
# one line, 12 words, 67 characters.
If a command does not read from file, but only accepts input from stdin there's no problem. We can just use < , the opposite of the redirection to file > . So, we could also count the words in witch.txt this way:
> wc < whitch.txt
1 12 67 witch.txt
# and if we want to store the output in a file, we could just do
> wc < whitch.txt > count.dat
> cat count.dat
1 12 67 witch.txt
Another way to use wc , not very practical in everyday life, shows you how the command accepts input coming from the keyboard.
> # Counting the words typed
> wc
I am typing something
^D
1 4 22
# one line, four words, 22 characters
Knowing that wc accepts input also from keyboard or from file through < (the so called standard input, or stdin ), we can use the "pipe" character | (a vertical bar) to combine the output of ls (or any other command) and pipe it trough wc:
> ls # let's just check what we have > ls
Applications Documents
> ls | wc
2 2 23
# two lines, two words, 23 characters
As you can see, the directory content has been sent on a one-by-line basis to wc , which counted two files. This is a simple example, but when you have 23472 files in your directory, it's easier to let wc do the job rather than counting yourself.
You can create a longer pipeline if needed. Let's say we want to count how many files match a given pattern. First, we need to get the strings that match a pattern. Your friend here is grep.
> ls
data1 data2 data3 trash1 trash2
> ls | grep data
data1
data2
> ls | grep data | wc -l # count lines
2
# yes, we could have done this also with the '-c' (count) option of grep
> ls | grep -c data
2
# but this was not the aim of this example...
Very important if you want to login to your nearest cluster:
> ssh <username>@<hostname> # open a secure shell (encrypted) connection to <hostname>
#example:
> ssh ucecxxx@myriad.rc.ucl.ac.uk # if you connect for the first time, it will ask if you trust the host.
# If so, answer 'yes' by spelling the whole word, not just 'y'
This content represents the very basic you need to know to make good use of bash. Go ahead if you want to know slightly more advanced topics.
To handle floating point math you need to use other programs within bash. I recommend awk:
> a=3.2 ; awk "BEGIN{ print $a*2}"
6.4
Note the double quotes. This way bash first expands the variable $a into 3.2, and then it feeds it to awk. If you use single quotes, awk expects (a) to be one if its variables, and expands $a into the value of the a-th column of input. Most tutorials would recommend you to use the command bc instead, as in
> a=3.2 ; echo "$a * 2" | bc -l
6.4
This is indeed shorter. But the other mathematical functions of awk are probably easier to use/remember, and it allows you to write quick c-like code, if needed.
Sed is the stream editor. It processes text line by line using . You can use it to manipulate variables in bash
> cat > file.txt
The cow is a coward
The cows are jumping
^D
> sed 's/cow/goat/' file.txt
The goat is a coward
The goats are jumping
> sed 's/cow/goat/g' file.txt
The goat is a goatard
The goats are jumping
> sed 's/cow\>/goat/g file.txt
The goat is a coward
The cows are jumping
# as in other cases, you can let bash expand its own variables within sed, but you must use double quotes
> a='cow'
> sed "s/$a/goat/g" file.txt
The goat is a goatard
The goats are jumping
# If you need your results in a file, you can redirect stdout
> sed 's/cow/goat/' file.txt > newfile.txt
# Or, you can change the file in place with the -i option (not available in all sed implementations)
> sed -i 'sed/cow/goat' file.txt
> cat file.txt
The goat is a coward
The goats are jumping
This page will show you how to stop/resume jobs, or put them in background / foreground.
Now that you know how to start some commands on bash, it's time to learn how to handle the long ones. You could of course open a new terminal if a command takes too long to execute. But this is often cumbersome and might not be what you want (suppose you are chain-connecting through three servers to reach your target. If you open a new terminal on your machine you will have to repeat the whole sequence)
A typical use case is when you want to launch a graphical user interface (GUI) program from the terminal, and be able to still using the terminal (to launch other programs, check outputs and so on). Suppose the GUI program is called myprogram , then by putting an ampersand ( & ) symbol at the end, it will go directly to background, letting you able to use the terminal.
> myprogram & # this will open a window
> ls # we can still use the terminal
Applications Documents
Mind that if myprogram is sending output to the terminal, it will overlap with what you see in your shell. If it keeps producing output, you might not be able to see what you are typing! The way to go around this is to redirect the output either to a file
> myprogram > log & # this will place all output in the file called log
> ls # we can still use the terminal
Applications Documents
or to the special file /dev/null , which discards the output
> myprogram > /dev/null & # this will discard all output
> ls # we can still use the terminal
Applications Documents
Note that unix systems have a different type of output channel, besides stdout , which is called the standard error ( stderr ). It will still end up in your terminal's output. So, you might want to catch or get rid of it as well. The way to do it is to first redirect stdout , and then stderr to stdout. It's easier than it sounds:
> ( myprogram > log 2>&1 ) & # this will put both sdtout (1) and stderr (2) channels into log
> ls # we can still use the terminal
Applications Documents
To stop a running program, simply hit ctrl+z. An easy example with the sleep command (sleep just waits and does nothing, so there won't be much to see...):
> sleep 100000
^Z
[1]+ Stopped sleep 100000
>
You can put it into background, so that it will keep running (sleep does nothing, but this is just an example...) while you are free to use the terminal
You can see the list of jobs currently handled by your shell using the job command :
> jobs
[1]- Stopped vim test.c
[2]+ Stopped vim nothe_test.c
[3] Running sleep 100000 &
[4] Running vmd > /dev/null 2>&1 &
In the example above I have two (character interface) text editors open ( vim ) that have been stopped, as well as two commands running in background ( sleep and vmd )
I can call back into foreground a process by using the fg command, stopping it with ctrl+z and then putting it back to background with bg :
Vi (pronounced as distinct letters, /ˌviːˈaɪ/) is a screen-oriented text editor originally created for the Unix operating system 1.
Vim (vi improved) is a free and open-source, screen-based text editor program. It is an improved clone of Bill Joy's vi 2.
Vi is a modal editor: it operates in either insert mode (where typed text becomes part of the document) or command mode (where keystrokes are interpreted as commands that control the edit session) 1.
To start vim, you can proceed in one of the following ways from :
> vim
# or
> vim sample.txt # to open a new or an existing file
# or
> vim ./ # to browse the local files. You can use any other target
# directory rather the one where you are (./)
To start vim, you can proceed in one of the following ways from :
> vim
# or
> vim sample.txt # to open a new or an existing file
# or
> vim ./ # to browse the local files. You can use any other target
# directory rather the one where you are (./)
Let's choose the first option (just type vim ). You will see the following splash screen
~
~
~ VIM - Vi IMproved
~
~ version 8.2.5032
~ by Bram Moolenaar et al.
~ Vim is open source and freely distributable
~
~ type :help iccf<Enter> for information
~
~ type :q<Enter> to exit
~ type :help<Enter> or <F1> for on-line help
~ type :help version8<Enter> for version info
~
~
Don't worry about what all that means, bear with me for a while.
If you typed i in the COMMAND mode (say, from the splash screen above) you will see the following
~ VIM - Vi IMproved
~
~ version 8.2.5032
~ by Bram Moolenaar et al.
~ Vim is open source and freely distributable
~
~ type :help iccf<Enter> for information
~
~ type :q<Enter> to exit
~ type :help<Enter> or <F1> for on-line help
~ type :help version8<Enter> for version info
-- INSERT --
notice the appearance of --INSERT-- in the bottom line
notice that the first line is empty (your cursor might appear there, static or blinking depending on your system settings). I placed a hashtag (#) to mark the cursor just this time.
all other lines are starting with a tilde (~). In vim this means that they are not really lines of text. It's not important here, but in case you're interested: they just don't exist, they're not even empty! an empty line takes space in memory and is the LF (line feed) character, also known as character:\n, Unicode : U+000A, ASCII : 10, or hexadecimal code : 0x0a)
try to type something, it will appear in the first line, and the splash screen lines will disappear
now that you typed something, let's exit from INSERT mode by pressing the Escape (ESC) key (you can press it how many times you want, no worries). The --INSERT-- in the bottom line will disappear.
let's save what we have written to a file. Since we invoked vim without arguments, our text is in the so-called buffer, but it is not linked to any file one your hard drive.
to save the buffer to a file, type (once you're in COMMAND mode - hit ESC if you're not sure!) :w file.txt and press Enter
I typed something!
and even a second line!
~
~
~
~
~
:w file.txt
You will see the following
I typed something!
and even a second line!
~
~
~
"file.txt" [New] 2L, 43B written
The editor tells you in the bottom line that you have written 2 lines (2L), for a grand total of 43 Bytes to a new file named "file.txt"
You can keep switching to INSERT mode, edit what you need, and then save by going to COMMAND mode (ESC) and writing the file (:w) - no need to repeat the file name.
Notes:
You can move across your written lines both in INSERT and in COMMAND mode using the cursor keys (small arrows).
In some cases, the keybinding might be broken, or some configuration of vim might be wrong. In that case, you can always move around in COMMAND mode (hit ESC!) using the h j k l keys (left down up right). Learn them just in case it's needed
Great! Now you know what is needed to use vim at its most basic level.
This is very minimal, of course, and your editing skills will be very much enhanced if you learn some of what follows. You don't have to, but it's strongly recommended.
It will happen, sooner or later, that you try to open an already opened file, or that your vim crashed leaving behind a swap file. In this case, when opening a file with vim you will be warned by a message like this:
If you have saved your buffer before with :w (do it often) you don't have to worry. Just hit (E) for Edit anyway and proceed as usual, or hit (O) for open read-only. In this case, when the time comes, that you want to save again, you will have to forcedly overwrite the file, typing :w! instead of simply :w. If instead you had unsaved changes that might have been added to the swap file by vim, you might hit (R)ecover and select which swap file to use. I personally never managed to use the recover function properly/usefully. If you do, teach me how, please.
:sp file3.txt → open another file (could be new) in a split window
:vsp file3.txt → open another file in a vertically split window
:wa → write all open files/buffers
:e ./ → open the local directory browser (use any other directory as starting point) Browse with arrow keys, use (Enter) to enter directories or open files
:sp ./ → open the local directory browser in a split window
The split- and vertical split-windows. (:sp and :vsp) looks like this:
You can split/vsplit how many buffers you like, for example as in this case:
To move the focus between buffers, go to COMMAND mode and hit ctrl+[arrow] where [arrow] is one of the cursor keys to move up/down/left/right through the buffers. If you have multiple buffers you can combine commands and, e.g., use 3+ctrl+[arrow-down] to move three buffer down from where you are.
The h j k l and arrow keys are sometimes not enough to move around big files. Here are some important commands
in COMMAND mode (remember, hit ESC to get there)
:1 → go to line 1 (remember to press enter)
:2 → go to line 2 (ok, you get how this goes on)
G → go to the end of the file (G stands for ... mhhh... Gototheendofthefile)
$ → go to the end of the line (you will understand why this symbol once you learn regular expressions)
^ → go to the begginning of the line (same comment as above)
w → move one word to the right (go to its beginning). Keep pressed to move forward also across lines
b → move one word to the left (go to its beginning). Keep pressed to move backward also across lines
} → move to the next paragraph
{ → move to the previous paragraph
If you prepend the (w) and (b) commands with a number, you will move by that amount of words, e.g., (3w) will move three words to the right.
Good. Now you also learned that (some) commands in vim can be combined.
You might have also noticed that some commands require a colon ( : ) at the beginning. This is a particular command mode which is known as Ex mode (from ex, the extended editor, of which vi, the father of vim, was the visual mode. Ex improved the life of many programmers back in 1976, freeing them from the vicious but ubiquitous line editor ed. If you think that vim is difficult to use, try ed once). The Ex-mode commands are mostly used for managing files (opening, saving, ...) and perform extended search/substitution queries. More on this later.
If you are in INSERT mode, you can of course hit the backspace key as many times as you want. If it's just about few characters, it's ok. This can be tiresome if you need to delete several words or paragraphs (sorry, no mouse selection! remember, you might be working on a remote connection, the remote host most likely does not know about your mouse)
in COMMAND mode, if you move the cursor to a specific line or character you can use the following simple commands to quickly cut your text
x → delete the current character
dw → delete the current word (starting from the character where you are)
db → delete the previous word
d} → delete the paragraph (was really poorly written)
dd → delete the current line
You can delete multiple objects (characters, words, lines) this way:
3x → delete 3 characters
3dw → delete 3 words
d3w → same as above
3dd → delete 3 lines
You might have noticed how the deletion command (d) is combined with the commands to move around (w), (b), ( } ), and so on, and that you can make longer chains of commands.
There is some mnemonics in this. For example (d3w) reads "delete 3 words". The equivalent command (3dw) reads "thrice delete a word" (now you see why I chose 3 and not 27 for this example)
The commands x and dd do not just delete a character or a line. They also store them them temporarily in memory (until you type another similar command).
This can be used if you want to move lines of text around. For example:
go to a line, enter command mode (ESC), cut it (dd) and place it in memory
move to another line of the text, press (p) to paste what you have in memory below the current line. Press (P) to paste it above the current line
The paste commands work also for characters cut with (x). Pressing (p) will paste the characters before the location of your cursor. (P) will paste them after.
Do you want to copy a line three times? Simple, hit (3p)
You can enter INSERT mode by pressing (i) (I - capital i), (a) or (A). They differ in the following way:
o → add a new line after the one you are in and start inserting there
O → add a new line before the one you are in and start inserting there
i → start inserting right before the character where you are (insert)
a → start inserting right after the character where you are (append)
I → start inserting at the beginning of the line
A → start inserting at the end of the line
If this seems superfluous, think that the various insert commands can be used in macros. Not superfluous anymore, uh? BTW, (o) and (O) are very handy in everyday use.
OK, I lied again. Vim has also other operating modes, the VISUAL, VISUAL-LINE and VISUAL-BLOCK modes. The actual mode will be listed in the bottom line of your editor as usual.
You can enter them from COMMAND mode in the following way:
v → enters VISUAL mode
V → enters VISUAL-LINE mode
ctrl+V → enters VISUAL BLOCK mode
To exit a VISUAL mode, hit (ESC) twice
In any of the VISUAL modes, after entering the mode, you can move around the text with the arrows (or h j k l ). The selected text will be highlighted. Once you have selected what you want, use any of the commands you know to process that chunk of text (e.g., (y), (x), ...
The ways the text is selected in the different modes is better explained with some screenshots
The VISUAL BLOCK mode is extremely handy if you want to move around columns of data. See also the trick!
One of the most powerful capabilities of vim is to be able to use regular expressions (RegExp) for search/replace queries. See some links in the . Note that RegExp will pop up not only when using vim but also in different context. Knowing them can give you a great advantage.
To search for a pattern, hit ( / ) followed by the pattern you are searching. So, if you look for the pattern 'cow', type /cow and vim will put the cursor at the first occurrence of the three letters "cow". Hit (n) and it will go to the next occurrence. Hit (N) and it will move to the previous one. If you want to match both 'cow' and 'wow', just use the dot (.) as a jolly character, so, type /.ow
In Ex-command mode, :s/<pattern-A>/<pattern-B>/g replaces <pattern-A> with <pattern-B>, as many times (g) as it finds the matching pattern.
For example:
That cow is a coward!
~
~
:s/cow/goat/g
yields
That goat is a goatard!
~
~
If you instead use the full word delimiter at the end (/>)
That cow is a coward!
~
~
:s/cow\>/goat/g
you get
That goat is a coward!
~
~
If you add 'c' (check) as the third element of the search command, vim will prompt at every occurrence to ask you if you want to replace or not. So, for example, the previous command will become :s/cow\>/goat/gc
Note: vim RegExp are slightly different from other implementations (like perl). The world is not perfect.
At the heart of every computer system lies machine language, the most basic form of code that the CPU can understand. Machine language consists entirely of binary digits (0s and 1s) and directly controls the hardware. However, writing programs in machine language is extremely difficult for humans due to its complexity and lack of readability.
An example of machine language might look like this:
10110000 01100001
This sequence of binary numbers could represent an instruction for the CPU:
The first part, 10110000, could represent an instruction, such as MOV (move data).
The second part, 01100001, could represent a memory address or a register, such as moving the value a (ASCII value 97 or 01100001 in binary) into a specific register.
In this case, the machine code might be interpreted by the CPU as: “Move the value 97 into a specific register.”
Since machine language directly interacts with the CPU, each set of binary instructions is architecture-specific, meaning it depends on the type of processor being used. For example, a machine code instruction for an Intel processor will differ from one for an ARM processor, even if they are performing similar operations.
While this example is very basic, a complete program in machine language would consist of hundreds or thousands of such binary instructions.
To make programming more manageable, assembly language was developed. Assembly language uses mnemonic codes and symbols to represent machine language instructions, making it a more human-readable form of the underlying binary code. Each assembly instruction corresponds directly to a machine language instruction, but it still requires intimate knowledge of the computer’s architecture. An assembler is used to convert assembly language into machine code that the CPU can execute.
Here’s a simple example of assembly language that moves the value 5 into a register and then adds 10 to it:
MOV AX, 5 ; Move the value 5 into the AX register
ADD AX, 10 ; Add the value 10 to the AX register
MOV AX, 5: This instruction moves the value 5 into the AX register (a general-purpose register in x86 architecture).
ADD AX, 10: This instruction adds the value 10 to whatever is already in the AX register. After this, AX will hold the value 15.
Each line corresponds directly to a CPU operation and is much easier to read than raw machine code, but it still requires understanding of the processor’s architecture and available registers. An assembler would convert this assembly code into machine language for the CPU to execute.
As computing evolved, the need for even more user-friendly and efficient programming methods led to the development of high-level languages (like C, Fortran, etc.). These languages are designed to be easier for humans to read, write, and understand, allowing programmers to focus on solving problems rather than managing hardware details.
A compiler plays a critical role here, as it translates high-level language code into assembly or directly into machine language, bridging the gap between human-readable instructions and the binary code that a computer’s processor can execute. This progression from machine language to high-level languages through the use of compilers and assemblers is what allows modern programming to be both powerful and accessible.
A compiler is a specialized software tool that translates source code written in high-level programming languages (like C, C++, or Fortran) into machine code or a lower-level language that a computer’s processor can execute.
The compiler performs several stages of processing, including lexical analysis, syntax analysis, semantic analysis, optimization, and code generation.
Compilers also check for errors in the source code, such as syntax errors or type mismatches, and produce error messages to help developers fix problems. Modern compilers can also optimize the generated machine code to improve the performance and efficiency of the resulting program.
One of the most widely used compilers is GCC (GNU Compiler Collection), which supports several languages, including C, C++, and Fortran. GCC is known for its flexibility, performance, and open-source nature, making it the go-to compiler for many operating systems, including Linux.
Key Features of Compilers:
Translation: The compiler’s primary role is to translate the source code into machine language. This is done in stages:
Lexical Analysis: Breaking the source code into tokens.
Syntax Analysis: Ensuring the code follows the grammar of the language.
Semantic Analysis: Checking for meaningful and logical consistency.
Optimization: Improving the performance and efficiency of the code.
Code Generation: Producing machine-level code or assembly code.
Error Detection: Compilers help detect syntax errors and other issues in the source code before execution. They provide detailed error messages that help developers find and fix problems.
Optimization: Modern compilers like GCC include optimization techniques to improve the performance of the generated machine code. For instance, they may reduce the number of instructions, remove unnecessary calculations, or improve memory access patterns.
We now want to write a numerical code to compute
∫01f(x)dx
with f(x)=x3. Obviously, we already know the solution:
∫01x3dx=41
[!TIP]
Knowing exactly what a numerical code (and, in particular, a small part of a huge code) is expected to do is a key aspect in numerical sicence (sometimes, it can be regarded as a privilege!). In this way, we are sure that (a portion of) the code we implemented works.
We will implement the numerical integration by using the trapezoidal rule.
function f(x)
return x^3
end function
function trapezoidal_rule(f, a, b, N)
p = (b-a) / N
sum = 0.
do i = 0 to N-1
x1 = a + i * p
x2 = a + (i+1) * p
sum = sum + (f(x1)+f(x2)) * p * 0.5
return sum
end function
program
a = 0
b = 1
N = 1000
if N>0: print "The number of trapezoid used is N"
if N<0: print "The number of trapezoid used is negative" stop
if N=0: print "The number of trapezoid used is zero" stop
result = trapezoidal_rule(f, a, b, N)
print "The integral of f(x) = x^3 from a to b is approximately: result"
end program
[!TIP]
Before implementing the code in C, we notice that there is a more clever way of coding the function trapezoidal_rule calling the function only once per iteration, as the f(x2) term at iteration i is the same as the term f(x1) at iteration i+1, and the only cases when this is not true are the end points, which we can handle separately:
FUNCTION trapezoidal_rule(f, a, b, N)
p = (b-a) / N
sum = 0.5 * (func(a) + func(b)) # End points contribution
DO i = 1 to N-1
x = a + i * p
sum = sum + f(x)
RETURN sum * p
END FUNCTION
This way we already improved the numerical efficiency of our code by almost 100%!
[!IMPORTANT]
Often directly translating a mathematical formula into an algorithm does not yield
the best performances, as in the case of the trapezoidal integration.
When coding, always think if there is a clever way of implementing your algorithms,
that minimises the number of calculations and reduces reads and write to memory.
We are now ready to look at the C implementation. We first list the complete code, but don't worry, we will analyse it line by line later:
#include <stdio.h>
#include <math.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration
double trapezoidal_rule(double a, double b, int n) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += f(x);
}
return sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n > 0) {
printf("The number of trapezoids is positive.\n", n);
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
} else {
printf("Error: The number of trapezoids is zero.\n");
}
// Perform numerical integration
double result = trapezoidal_rule(a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
The preprocessor is a program that processes source code files and is commonly used with C, C++, and other programming languages. It handles tasks like including files, expanding macros, conditionally compiling parts of the code, making the source code easier to read and more compact. It is usually a separate program from the compiler, for practical and historical reasons. You can learn more about its basic use in the section on building C programmes and more advanced usage examples in the C preprocessor section.
Our code starts with two preprocessor directives:
#include <stdio.h>
#include <math.h>
The line #include <stdio.h> tells the compiler to include the so called "header" file (hence, the .h extension) with the definitions for several variable types, macros, and various input/output (I/O) functions, such as reading from the keyboard (scanf) or writing to the console (printf).
[!CAUTION]
TODO: add link to relevant section to preprocessing/headers/libraries
The math.h header file provides a variety of mathematical functions, such as sin(), cos(), sqrt(), pow(), and more.
[!CAUTION]
TODO: add link to advanced math libraries, MKL etc. and what to choose to be performant.`
Since we have included stdio.h we have access to the printf function in C, which is used to print formatted output to the terminal (what is called standard output, or stdout in unix-like operating systems).
The basic usage of printf can be shown with this example:
printf("The sum of %d and %d is %d \n", 1, 2, 1+2);
which produces the following output
The sum of 1 and 2 is 3
[!NOTE]
In C, the semicolon ; is used as a statement terminator. Every executable statement in C must end with a semicolon, whether it’s a variable declaration, function call, or control statement. It tells the compiler where one statement ends and the next one begins.
printf() is a special kind of function that accepts an arbitrary number of arguments separated by commas. As you might have noticed, there are special character sequences %d (called format specifiers) within the first argument (the string "The sum of..."), which are replaced by the values (1, 2, 1+2) passed as next arguments. The special "escape sequence" \n tells the code to insert a newline. We will encounter and discuss several other useful escape sequences later on.
[!NOTE]
In this case %d tells printf() to format the output as an integer. Here d stands for decimal - as opposed to the octal (%o) or hexadecimal (%x) bases used to represent integer numbers.
In the specific case of our example, we have:
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
The format specifiers here are used to interpret the following arguments as floating point real numbers with two decimals (%.2f) besides the usual decimal integer (%d).
[!NOTE]
The computer stores in memory everything as a binary sequence (e.g. 0011 0101 0101 1101). It's up to you/the code to decide how to interprete those sequences. It could be an integer (e.g. binary 1101 is 13 decimal, interpreted from right to left as 1 x $2^0$ + 0 x $2^1$ + 1 x $2^2$ + 1 x $2^3$ = 13) or as a real number (in this case the representation changes depening on the precision we want for exponent and mantissa, more on this TODO).
[!CAUTION]
TODO: add link to relevant section to asm/representation
So the general form is printf("some text %format_specifier1 some other text %format_specifier2 ...", var1, var2). Here some examples of format specifiers:
%d: For printing integers (printf("N = %d \n", 5);).
%f: For printing floating-point numbers (printf("x = %f \n", 2.3456);).
%.2f: For printing floating-point numbers with 2 decimal places.
%c: For printing a single character (printf("The chosen letter is %c \n", 'A');`).
%s: For printing strings (printf("My name is %s \n", "Alice");`).
[!WARNING]
As you might have noticed, in C single characters are represented with single quotes ('A') whereas strings use double quotes ("A string"). You might wonder what's the difference between 'A' and "A": the fact is that a character is encoded as an 8-bit value. For example, 'A' is binary 0100 0001 which is decimal 65, while a string is a null-terminated sequence of characters, where null is binary 0000 0000, or zero decimal.
[!WARNING]
Ensure the format specifier corresponds to the type of the variable.
If you pass the wrong type to the printf() function you might get surprising results, because of the way the sequence of bits is interpreted. For example
printf("%d\n", 1.0); // note that 1.0 is stored in memory as a floating point number, not as a decimal number
-2084713672
Note that depending on you architecture and compiler you might get a different result.
Some useful escape characters:
\n: Inserts a newline.
\t: Inserts a tab space.
\\: Inserts a literal backslash.
%%: Prints a literal %.
[!TIP]
You can split a long printf statement across multiple lines by simply breaking the string and continuing it on the next line using the \ (backslash) character or just closing the string and concatenating it with the next part.
Examples:
printf("Once upon a time, in a galaxy far, far away, a programmer wrote a code so long, \
that even their cat, sitting on the keyboard, couldn't stop it from running perfectly.\n");
printf("Once upon a time, in a galaxy far, far away, a programmer wrote a code so long, "
"that even their cat, sitting on the keyboard, couldn't stop it from running perfectly.\n");
In C, variables are declared with specific data types, which determine the kind of data they can hold, the size they occupy in memory, and the operations that can be performed on them. Understanding types is fundamental to writing efficient and correct programs.
Some of the basic data types are:
int (4 bytes, which is 32 bits – on 64 bit machines!): Represents integer numbers, both positive and negative:
int age = -25;
The largest representable signed integer number is 231–1 or about 2 billions (one out of the 32 bits is used for the sign).
char (1 byte): Represents a single character. Use single quotes ' for a character (double quotes " are for strings) :
char letter = 'A';
[!WARNING]
TODO: add a link to an advanced page that discusses type representation
If you ever wonder how many bytes are used to store a variable, you can use the sizeof() operator to determine the size of any type or variable in bytes. For example:
int a;
printf("Size of int: %d bytes\n", sizeof(a)); // Typically prints 4. More on printf later
Types can be automatically or explicitly converted in C:
Explicit type conversion (also called "casting") is when you manually convert a value from one type to another using the casting syntax.
Implicit type conversion (also called "type promotion"): the compiler automatically converts one data type to another when necessary. In expressions that involve different types (e.g., int and double), the smaller type is automatically promoted to the larger type.
The following example shows both kind of conversion:
double pi;
int truncated;
pi = 4 * atan(1.);
/* Explicit conversion of 'pi' (double) to 'int': */
truncated = (int)pi;
printf("The integer part of Pi is: %d\n", truncated_pi); /* Output: 3 */
/* Implicit conversion of 'a' (int) to 'double' */
pi = truncated;
printf("Result: %.2f\n", result); /* Output: 3.00 */
[!WARNING]
Implicit conversion can lead to precision loss: when converting from a larger or more precise type (like double or long) to a smaller or less precise type (like int or float), the conversion can result in a loss of precision or data.
double pi = 3.14159;
int value = pi; // Implicit conversion truncates decimal part
The first line double f(double x) defines a function named f that takes a single argument x of type double (a floating-point number with double precision). The function is also declared to return a double value, which means that the result of the function will be a floating-point number.
The curly braces { and } mark the beginning and end of the function body. Everything inside the curly braces is part of the logic that the function will execute when called.
The line return pow(x, 3); returns the result of pow(x, 3). The pow(a, b) function, from the math.h library, raises the base a to the power of b. The result is returned as the output of the function f.
The line // pow(a,b) computes a^b is is a comment explaining that the pow(a, b) function computes a raised to the power of b. Comments are ignored by the compiler and are meant to make the code easier to understand for humans.
[!NOTE]
When a variable is "passed" as an argument to a function in C, it's the value of this variable that's actually passed to the function.
This means that if you change the value of a variable passed to a function it won't change its value outside of it. This will be discussed in
TODO add link to chapter C_reference_vs_value.md
The following code block shows an example of an if statement:
if (n > 0) {
printf("The number of trapezoids is positive.\n", n);
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
} else {
printf("Error: The number of trapezoids is zero.\n");
}
The if statement in C allows you to control the flow of your program based on conditions. It checks if a certain condition is true and executes a block of code accordingly. If the condition is false, it moves to the next condition or the else block.
if (n > 0):
This condition checks if the value of n is greater than 0.
If true, the program executes the code inside this block: it prints “The number of trapezoids is positive.”.
If false, it moves to the next condition.
else if (n < 0):
This checks if n is less than 0.
If true, the program prints an error message: “Error: The number of trapezoids is negative.”
If false, the program moves to the final else block.
else:
This block is executed if none of the previous conditions were true. Since the only other possibility is that n equals 0, the program prints: “Error: The number of trapezoids is zero.”
You can combine multiple conditions using logical && (AND) and || (OR) to make your checks more compact.
if (n > 0 && n < 100) {
printf("n is positive and less than 100.\n");
}
[!TIP]
For simple if-else conditions, consider using the ternary operator ?:. This can make the code more concise but should be used sparingly to avoid reducing readability.
int result = (n > 0) ? 1 : -1; // If n > 0, result is 1; otherwise, it's -1
[!WARNING]
Be Careful with Floating-Point Comparisons! Comparing floating-point numbers using == can be unreliable due to precision issues. Use a small threshold instead of direct equality checks.
if (fabs(a - b) < 0.00001) { // a and b are "close enough"
// Handle case where a is approximately equal to b
}
In the second function defined in the example code, we can see an example of for loop:
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += func(x);
}
The first line for (int i = 1; i < n; i++) sets up a for loop that will iterate from i = 1 to i < n. The loop does the following:
int i = 1; initializes the loop counter i to 1.
i < n; specifies the condition under which the loop continues to run. The loop will keep running as long as i is less than n.
i++ increments i by 1 after each iteration of the loop.
The curly braces { and } define the block of code that will be repeatedly executed for each iteration of the loop.
[!NOTE]
In C, operators like +=, -=, *=, /=, and %= are called compound assignment operators. They provide a shorthand way to perform an operation and then assign the result back to the variable (e.g., a+=1 is equivalent to a = a+1)
The main function is a special one (and a reserved word) which serves as a main entry point for the execution
of the program. This means that if a main function is present, the code will be made executable by the compiler,
and the execution will begin from there:
[!NOTE]
This is at odds with many other programming languages that use different syntaxes to define their entry point,
like Fortran's or Pascal's program. In C, the function is the basic element used to organise ("structure") the
code, and this is why it is an example of a so-called structured programming language.
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
main(){...} is the entry point of every C program. When you run the program, the code inside the main function is executed. The return type int indicates that the program will return an integer value when it finishes running (in this case, it returns 0 at the end to signal successful completion).
The line double result = trapezoidal_rule(f, a, b, n); calls the trapezoidal_rule function to perform the integration of f(x)=x3 from a = 0.0 to b = 1.0 using n=1000 trapezoids. The result of the integration is stored in the variable result.
You might have noticed that if you want to integrate a different function you need to change the definition of the function f. There are more flexible approaches: in C, it is possible to pass a function (like f) as an argument to another function (like trapezoidal_rule). This means that instead of passing a value, we tell the compiler where to find the function f. We will discuss this approach when talking about pointrs in TODO Add links to relevant sections on pointer.s
We slightly modify the example code written to numerically evaluate an integral with the trapezoid rule:
#include <stdio.h>
#include <math.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double x_values[], double f_values[]) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
// Store the x values and function evaluations in arrays
for (int i = 1; i < n; i++) {
x_values[i] = a + i * p;
f_values[i] = func(x_values[i]); // Store the function evaluation
sum += f_values[i];
}
return sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n > 0) {
printf("The number of trapezoids is positive.\n");
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
return 1;
} else {
printf("Error: The number of trapezoids is zero.\n");
return 1;
}
// Arrays to store x values and function evaluations at those points
double x_values[n]; // Array to store x points
double f_values[n]; // Array to store function evaluations f(x)
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n, x_values, f_values);
// Print the result of the integration
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
// Optionally, print out the x values and their corresponding f(x) values
printf("x values and f(x) evaluations:\n");
for (int i = 1; i < n; i++) {
printf("x[%d] = %.5f, f(x[%d]) = %.5f\n", i, x_values[i], i, f_values[i]);
}
return 0;
}
Two arrays are now introduced in the main() function:
x_values[n]: This array stores the x values where the function f(x) is evaluated during the integration.
f_values[n]: This array stores the computed values of the function f(x) at those x values.
In C, an array is a collection of elements of the same type stored in contiguous memory locations. Arrays are useful when you need to store multiple values of the same type and access them using an index.
type array_name[size];
where type is the data type of the elements (e.g., int, double, char, etc.), array_name is the name of the array and size is the number of elements in the array.
[!WARNING]
Indexing starts at 0: the first element is accessed using arr[0], not arr[1].
Accessing an index outside the array size (e.g., arr[5] when the array has only 5 elements) results in undefined behavior.
The function trapezoidal_rule() is modified to take two additional array parameters (x_values[] and f_values[]).
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double x_values[], double f_values[])
Inside the for loop, each x value is stored in x_values[], and the corresponding function evaluation is stored in f_values[].
for (int i = 1; i < n; i++) {
x_values[i] = a + i * p;
f_values[i] = func(x_values[i]); // Store the function evaluation
sum += f_values[i];
}
After the integration, the code optionally prints the stored x values and their corresponding function evaluations f(x).
printf("x values and f(x) evaluations:\n");
for (int i = 1; i < n; i++) {
printf("x[%d] = %.5f, f(x[%d]) = %.5f\n", i, x_values[i], i, f_values[i]);
}
Initializing an array means assigning initial values to its elements at the time of declaration. Proper initialization is crucial to avoid unpredictable behavior caused by garbage values.
When you declare an array, you can initialize it in several ways:
Full Initialization: you provide explicit values for all elements. int arr[3] = {1, 2, 3}; // Array of 3 integers, initialized to {1, 2, 3}
Partial Initialization: you provide values for some elements. The remaining elements are automatically initialized to 0. int arr[5] = {1, 2}; // Array of 5 integers, initialized to {1, 2, 0, 0, 0}
Zero Initialization: if you initialize the first element to 0, all other elements are also set to 0. int arr[4] = {0}; // Array of 4 integers, initialized to {0, 0, 0, 0}
[!WARNING]
In C, arrays are not automatically initialized when declared. If you declare an array without explicitly initializing it, the array elements will contain garbage values (random values left in memory). This can lead to unpredictable behavior in your program.
Remember to always initialise them!
Here an example of arrays declaration and initialisation:
#include <stdio.h>
int main() {
float a[3]; // Uninitialized float array of size 3
int b[5], c[5]; // Uninitialized int arrays of size 5
float d[3] = {1.2, 2.0, 3.7}; // Initialized float array of size 3
int e[7] = {1}; // Initialized int array of size 7 (first element is 1, rest are 0)
return 0;
}
#include <stdio.h>
#define ROWS 3
#define COLS 4
// Function that takes a 2D array as a parameter
void print_array(int arr[ROWS][COLS]) {
printf("Array elements:\n");
for (int i = 0; i < ROWS; i++) {
for (int j = 0; j < COLS; j++) {
printf("%d ", arr[i][j]);
}
printf("\n");
}
}
// Function to increment each element by 1
void increment_array(int arr[][COLS], int rows_local) {
for (int i = 0; i < rows_local; i++) {
for (int j = 0; j < COLS; j++) {
arr[i][j] += 1;
}
}
}
int main() {
// Initialize a 2D array
int arr[ROWS][COLS] = {
{1, 2, 3, 4},
{5, 6, 7, 8},
{9, 10, 11, 12}
};
// Call the function to print the array
print_array(arr);
// Increment the elements
increment_array(arr, ROWS);
// Call the function again to print the modified array
printf("\nAfter incrementing each element by 1:\n");
print_array(arr);
return 0;
}
In C programming, #define is a preprocessor directive used to create symbolic constants or macros. It allows you to define names or constants that will be replaced by their corresponding values or expressions during the preprocessing stage, before the actual compilation of the code. This can help improve code readability, maintainability, and flexibility.
In our example:
#define ROWS 3
#define COLS 4
we are defining the number of rows and columns for the 2D array. Whenever ROWS (or COLS) appears in the code, it gets replaced with the value 3 (or 4).
void increment_array(int arr[][COLS], int rows_local)
you need to include one set of square brackets [] for each dimension of the array. For example, a 3D array would be passed as:
void function(int arr[][M][L], int N)
The size of the first dimension can be left empty, but the sizes of all subsequent dimensions (from the second one onward) must be explicitly declared. Since the size of the first dimension is not automatically known inside the function, you must pass it as an additional argument.
In our example, the variable rows_local was used for instructional purposes. However, a better approach would be to follow the method used in the print_array function, where macros are utilized for clarity:
void print_array(int arr[ROWS][COLS])
This way, ROWS and COLS are predefined constants (by using #define), which makes the code more readable and maintainable.
Accessing elements beyond the array limits leads to undefined behavior. In some cases, it may appear to work, but it may also crash the program or corrupt data.
int arr[5] = {1, 2, 3, 4, 5};
printf("Valid: %d\n", arr[4]); // Defined behavior
printf("Invalid: %d\n", arr[5]); // Undefined behavior: could crash or print garbage
The sizeof operator returns the size of a pointer when used on array parameters.
void print_size(int arr[]) {
printf("Size: %lu\n", sizeof(arr)); // Typically 8 bytes on 64-bit systems
}
int main() {
int data[10];
print_size(data);
return 0;
}
Output:
Size: 8
To get the actual size, pass it as a separate argument:
void print_size_fixed(int arr[], int size) {
printf("Size: %d\n", size);
}
We propose the third variation of the example code. In particular, this is a variation of the code in which arrays where used. Indeed, we now read the number of the trapezoids from the command line, and we dynamically allocate the memory to store x_values and f_values (if you didn't read the previous examples, don't worry: you will easliy understand what these varaibles are):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double *x_values, double *f_values) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
// Store the x values and function evaluations in arrays
for (int i = 1; i < n; i++) {
x_values[i] = a + i * p;
f_values[i] = func(x_values[i]); // Store the function evaluation
sum += f_values[i];
}
return sum * p;
}
int main(int argc, char *argv[]) {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
// Ensure there is at least one command line argument for the number of trapezoids
if (argc < 2) {
fprintf("Usage: %s <number_of_trapezoids>\n", argv[0]);
return 1;
}
// Read the number of trapezoids from the command line
int n = atoi(argv[1]);
// Check if n is a valid number of trapezoids
if (n <= 0) {
fprintf("Error: The number of trapezoids must be a positive integer.\n");
return 1;
}
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Allocate memory for x_values and f_values dynamically
double *x_values = malloc(n * sizeof(double));
double *f_values = malloc(n * sizeof(double));
if (x_values == NULL || f_values == NULL) {
fprintf("Memory allocation failed\n");
return 1;
}
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n, x_values, f_values);
// Print the result of the integration
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
// Optionally, print out the x values and their corresponding f(x) values
printf("x values and f(x) evaluations:\n");
for (int i = 1; i < n; i++) {
printf("x[%d] = %.5f, f(x[%d]) = %.5f\n", i, x_values[i], i, f_values[i]);
}
// Free allocated memory
free(x_values);
free(f_values);
return 0;
}
In C, argc and argv are used to handle command line arguments passed to a program. They provide a way to pass information from the command line when starting the program.
argc (Argument Count):
Definition: argc is an integer that holds the number of command line arguments passed to the program, including the program’s name.
Example: If a program is executed with the command ./program arg1 arg2, argc will be 3 (the program name and two arguments).
argv (Argument Vector):
Definition: argv is an array of strings (character arrays) where each string is one of the command line arguments.
Example: For the same command ./program arg1 arg2, argv[0] will be "./program", argv[1] will be "arg1", and argv[2] will be "arg2".
In our example:
int main(int argc, char *argv[]) {
...
// Ensure there is at least one command line argument for the number of trapezoids
if (argc < 2) {
fprintf("Usage: %s <number_of_trapezoids>\n", argv[0]);
return 1;
}
Assuming the executable is called test, the code must be run as ./test <n>, where n is the number of trapeziods (for example, ./test 1000 to use n=1000 trapezoids).
The if statement (if (argc < 2)) checks if the correct number of arguments has been provided.
In C, atoi (ASCII to Integer) is a standard library function used to convert a string representing a number into its integer value. It is part of the stdlib.h library:
int atoi(const char *str);
where str is a pointer (read below) to a null-terminated string that contains the representation of an integer.
The input is a string containing numeric characters, optionally with leading white spaces. The output is the integer value corresponding to the numeric characters in the string.
[!WARNING]
Remember to add `#include <stdlib.h>'.
Non-Numeric Strings: If the string does not represent a valid number, atoi will return 0. For instance, atoi("abc") will return 0.
Error Handling:atoi does not provide error handling for invalid input or overflow.
[!TIP]
Always check the input. In the example code, we have:
if (n <= 0) {
fprintf("Error: The number of trapezoids must be a positive integer.\n");
return 1;
}
In the version of the code with arrays, we statically allocated the memory as:
double x_values[n]; // Array to store x points
double f_values[n]; // Array to store function evaluations f(x)
but now we do not know a priori the size of such arrays (i.e., n is given at run time). To overcome this issue, we make use of pointers.
In C, pointers are a fundamental feature that allows you to directly interact with memory. A pointer is a variable that holds the memory address of another variable. Instead of storing a value directly, a pointer stores the location where the value is stored.
To dynamically allocate a pointer, we use the function malloc (memory allocation), which allocates a specified number of bytes of memory and returns a pointer to the beginning of this memory block.:
double *x_values = malloc(n * sizeof(double));
This declares a pointer x_values that can point to a double type. x_values will be used to store the address of dynamically allocated memory that will hold an array of double values.
[!NOTE]
When declaring a pointer, the asterisk * indicates that the
variable *var is a pointer. In our example, it tells the compiler
that x_values is a pointer and it will store the address of a
double variable or an array of double values. The asterisk is a
dereference operator: it accesses the value stored at the memory
address that the pointer is pointing to. In our case, *x_values
refers to the double value stored at the address contained in
x_values.
The argument of malloc is the correct number of bytes: indeed, it is given by the number of double elements n multiplied by the size (in bytes) of each double element (that is, n * sizeof(double)).
The pointer returned by malloc is assigned to x_values. This pointer now points to the start of a block of memory large enough to hold n double values. You can access and manipulate these values using pointer arithmetic or array indexing.
[!TIP]
Check for NULL: It is important to check if malloc returns NULL, which indicates that
the memory allocation has failed. In robust code, you would include a check to handle this case:
if (x_values == NULL) {
fprintf("Memory allocation failed\n");
return 1; // Exit or handle the error appropriately
}
Once the pointers x_values and f_values have been allocated, they are passed to the function trapezoidal_rule exactly as we did for the arrays. The difference now is in how the function trapezoidal_rule gets them as input variables:
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double *x_values, double *f_values) {
[!NOTE]
Function Pointer: double (*func)(double). The asterisk denotes that func is a pointer to a function that takes a double argument and returns a double. It enables the function to be passed as an argument or stored in a variable.
[!NOTE]
Why Use Pointers?
Efficiency: Pointers allow functions to modify variables directly without copying them.
Dynamic Memory Management: Pointers are essential for allocating and deallocating memory dynamically.
Data Structures: Many data structures like linked lists and trees use pointers to manage and navigate data.
When working with pointers, two key operators in C are the asterisk (*) and the ampersand (&). They play crucial roles in accessing and manipulating memory addresses.
The * operator is used to dereference a pointer, which means accessing the value stored at the memory address the pointer holds.
int x = 10;
int *ptr = &x; // ptr holds the address of x
printf("%d\n", *ptr); // Dereferencing ptr gives the value of x (10)
In this example, ptr stores the address of x, *ptr accesses the value at that address (which is 10 in this case).
Dereferencing allows you to manipulate or read the value stored at the location the pointer is pointing to.
The & operator is used to get the memory address of a variable.
[!WARNING]
& gives the address of the variable, not its value!
int x = 10;
int *ptr = &x; // &x gives the address of x, which is stored in ptr
In the last example, &x returns the memory address where the variable x is stored. This address is assigned to ptr, making ptr a pointer to x.
[!NOTE]
Pointers (*) and addresses (&) are closely related:
Use & to get the address of a variable.
Use * to access (or modify) the value at the address stored in a pointer.
In C, not only can you create pointers to variables, but you can also create pointers to functions. This feature allows for more flexibility in programming, especially when you need to pass functions as arguments to other functions, or when you want to select a specific function at runtime.
A function pointer is a variable that stores the address of a function.
Indeed, every variable and instruction in a program has a memory address. Similarly, a function’s entry point has its own memory address, which is linked to the function’s name (this is comparable to arrays, where the name of the array represents the address of its first element).
Just like any other pointer, you can dereference a function pointer to call the function it points to.
In our example, we have this function:
double f(double x)
and the corresponding function pointer is:
double (*func)(double);
Here, func is a pointer to a function that takes one double argument and returns a double. In the above code line, the pointer has not been assigned yet. To assign a function to a function pointer, you simply use the function’s name (without parentheses), which is equivalent to the function’s address:
func = f;
At this point, one can call the function through the pointer. You need to dereference the pointer just like with regular pointers, but with function call syntax:
double result = (*func)(2.3);
Alternatively, you can also call the function without dereferencing explicitly:
double result = func(2.3);
In our example, in the main, we are passing the function f as an argument:
double result = trapezoidal_rule(f, a, b, n, x_values, f_values);
Remember that the name of the function stores the memory address of the function itself. This means that the function trapezoidal_rule must be defined as:
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double *x_values, double *f_values)
Below is an example that shows how function pointers can be used. In this case, we have two functions: add and subtract. We then use a function pointer to select which function to call based on a condition.
#include <stdio.h>
// Functions
int add(int a, int b) {
return a + b;
}
int subtract(int a, int b) {
return a - b;
}
int main() {
int (*operation)(int, int); // Declare a function pointer
int x = 10, y = 5;
char op;
printf("Enter operation (+ or -): ");
scanf(" %c", &op);
// Assign function to pointer based on user input
if (op == '+') {
operation = add;
} else if (op == '-') {
operation = subtract;
} else {
printf("Invalid operation\n");
return 1;
}
// Call the function via pointer
int result = operation(x, y);
printf("Result: %d\n", result);
return 0;
}
In C, the ** symbol is used when you are dealing with pointers to pointers. This means you have a pointer that points to another pointer, which in turn points to some data. It’s often used in more complex scenarios such as dynamic memory allocation for multi-dimensional arrays, handling arrays of strings (array of character pointers), or passing pointers by reference to functions.
int x = 5;
int *ptr = &x; // ptr is a pointer to x (an int)
int **ptr2 = &ptr; // ptr2 is a pointer to ptr (a pointer to int)
printf("%d\n", **ptr2); // Dereferencing twice gives the value of x (5)
In this example:
How it Works:
ptr is a pointer to x and it stores the address of the integer x.
ptr2 is a pointer to ptr, so it stores the address of the pointer ptr.
When you dereferenceptr2once (i.e., *ptr2), you get the value of ptr, which is the address of x.
When you dereference it twice (i.e., **ptr2), you get the value stored in x.
A typical example is given by the following:
void modifyPointer(int **ptr) {
static int y = 100;
*ptr = &y; // Modify the pointer itself to point to y
}
int main() {
int x = 10;
int *p = &x;
modifyPointer(&p); // Pass the address of p (pointer to pointer)
printf("%d\n", *p); // Outputs 100, since p now points to y
}
Note that the function trapezoidal_rule is now defined as follows:
double trapezoidal_rule(double (*func)(double), double a, double b, int n, double *x_values, double *f_values)
Some of the arguments are pointers ( double *x_values and double *f_values). In C, functions create a copy of the arguments (called formal arguments). Any modification to these formal arguments acts on these local copy. For example, if we change the value of a inside the function trapezoidal_rule, the value of a remains unchanged in the main program: we are just modifying the value of the copy of a in the function.
On the other hand, if we pass a pointer to a function, like double *x_values, the function create a local copy of the pointer (which stores the address of the variable, and not its value). This means that if we change x_values (i.e., the address), we are only modifying the address (and not the value) inside the function. But, if we change the value stored at address p (i.e., *p) this change will be seen also outside the function trapezoidal_rule (because we are directly changing the value stored at some address).
Simply put: there are no strings in C (C++ is a different story).
Strings in C are just sequences of char terminated by the special character \0 (that is, the binary zero).
This creates much confusion for beginners, especially if they are used to the treatment of strings in higher-level languages.
Here, we will try to shed some light by a series of examples. To clearly understand some of the differences between pointeres and arrays used to
handle strings, we will have to check where they are they stored in memory.
First, this is the typical layout of the memory in Linux:
The source file is a bit long, you can download it from the previous link.
Here we analyze its various sections:
#include <stdio.h>
#include <malloc.h>
int main(void){
char a[10] = "Test1"; // this is writable, at most 10 char (80 bytes)
char *b = "Test2"; // this is read-only, will be allocated at
// run time reserving the required space
// in the text segment. The pointer can be
// resused in the same way as char *d below.
char c[] = "Test3"; // this is writable, will be allocated at
// run time reserving the required space
char *d ; // this points to an arbitrarily-long sequence
// of chars and its target needs to be alllocated,
// either statically or dynamically.
char (*e)[10] ; // this is a pointer to an array of 10 chars, which itself
// will be a pointer to a memory region containing the string.
char *f[10] ; // these are 10 pointers to char, also more clearly written
// as (char*)e[10]
char ** g ; // this is a pointer to a pointer, similar to what we have with e,
// with the difference that what's pointed by **g (*g) is a pointer itself
// and can be allocated.
These are different ways of using pointers and arrays to represent and handle strings (one or more).
We will now see some example on how to use them, and how and where the memory necessary to hold the content
of the strings is allocated.
The case char a[10] = "Test1";
printf("char a[10] = \"Test1\"\n");
printf("a points to the string: '%s' starting at %p (note the range of address 0x7, on the stack!)\n",a,a);
printf("a[4] = 'X'\n");
a[4]='X';
printf("a points to the string: '%s' starting at %p (note the same address) \n\n",a,a);
the output shows that the memory pointed at by a is writeable (we can manipulate the 5th character in the string
without causing problems (it's allocated on the stack, see the address 0x7):
char a[10] = "Test1"
a points to the string: 'Test1' starting at 0x7ffc385f6abe (note the range of address 0x7, on the stack!)
a[4] = 'X'
a points to the string: 'TestX' starting at 0x7ffc385f6abe (note the same address)
The case char *b = "Test2";
printf("char *b = \"Test2\"\n");
printf("b points the string: '%s' starting at %p \n",b,b);
printf("b = \"TestY\"\n");
b = "TestY";
printf("b points to the string: '%s' starting at %p (note the change of address and range, 0x5, on the heap!)\n\n",b,b);
// b[4]='X'; // this would segfaults because b is stored in the read-only memory section
printf("b[4]='X' would segfault, as b points to memory in the read-only text segment!\n");
printf("Note that b itself is at the address &b -> %p on the stack! It's the string %s pointed by b that's on the heap...\n\n",&b,b);
In this case, the memory is not-writable, allocated in the text segment of the heap (note the 0x5 address)
char *b = "Test2"
b points the string: 'Test2' starting at 0x560bf3c7a008
b = "TestY"
b points to the string: 'TestY' starting at 0x560bf3c7a127 (note the change of address and range, 0x5, on the heap!)
b[4]='X' would segfault, as b points to memory in the read-only text segment!
Note that b itself is at the address &b -> 0x7ffc385f6a40 on the stack! It's the string TestY pointed by b that's on the heap...
The case char c[] = "Test3";
printf("char c[] = \"Test3\"\n");
printf("c points to the string: '%s' starting at %p (note the address range, 0x7, on the stack! which is writable - compare with b)\n\n",c,c);
printf("c[4] = 'Z'\n");
c[4]='Z';
printf("c points to the string: '%s' starting at %p (note the same address, it's writable )\n\n",c,c);
In this case the string is allocated on the stack (in this sense, char *b and char c[] are different: it's not true as often wrongly
claimed that an array (char c[]) behaves exactly like the pointer (char *b):
char c[] = "Test3"
c points to the string: 'Test3' starting at 0x7ffc385f6ab8 (note the address range, 0x7, on the stack! which is writable - compare with b)
c[4] = 'Z'
c points to the string: 'TestZ' starting at 0x7ffc385f6ab8 (note the same address, it's writable )
The pointer to char case: char *d;
printf("d = &a[0]\n");
d = &a[0]; // c now points to a memory region that has been allocated statically (a).
// Equivalently, we could have written c = a ;
printf("d points to the string: '%s' starting at %p, which is the same as that of a (%p) \n",d,d,&a);
printf("d = a\n");
d = a; // d now points to a memory region that has been allocated statically (a).
printf("d points to the string: '%s' starting at %p, which is the same as that of a (%p) \n\n",d,d,&a); // note &a
printf("d = b\n");
d = b; // d now points to a memory region that has been allocated statically (b). Note
// that we do not need to pass the address of the first element of b (although we could).
printf("d points to the string: '%s' starting at %p, which is the same as that of b (%p) \n\n",d,d,b); // note b, not &b
printf("d = (char*) malloc(10* sizeof(char))\n");
d = (char*) malloc(10* sizeof(char)); // d now points to a memory region that has been allocated dynamically.
printf("d[0]='\\0'\n");
d[0]='\0'; // in principle, right after malloc d is uninitialised, printf() will go on
// printing until it finds a '\0' in memory! so we initialize it here to
// the empty string by settin its first character d[0].
//
printf("d points to the string: '%s' starting at %p \n\n",d,d); // empty
//
printf("sprintf(d,\"%%s\",\"Test3\")\n");
sprintf(d,"%s","Test3"); // with sprintf() we modify d element-by-element, address stays the same
printf("d points to the string: '%s' starting at %p (note the same address)\n\n",d,d);
printf("d = \"Test3\"\n");
d = "Test3"; // this way memory is allocated automatically as read-only
// d[4]='X'; // this would segfault because d now is read-only
printf("d points the string: '%s' starting at %p (note the change of address, now on the read-only text-segment)\n\n",d,d);
d = &a[0]
d points to the string: 'TestX' starting at 0x7ffc385f6abe, which is the same as that of a (0x7ffc385f6abe)
d = a
d points to the string: 'TestX' starting at 0x7ffc385f6abe, which is the same as that of a (0x7ffc385f6abe)
d = b
d points to the string: 'TestY' starting at 0x560bf3c7a127, which is the same as that of b (0x560bf3c7a127)
d = (char*) malloc(10* sizeof(char))
d[0]='\0'
d points to the string: '' starting at 0x560bf3c7d6b0
sprintf(d,"%s","Test3")
d points to the string: 'Test3' starting at 0x560bf3c7d6b0 (note the same address)
d = "Test3"
d points the string: 'Test3' starting at 0x560bf3c7a506 (note the change of address, now on the read-only text-segment)
The case char (*e)[10] ;
printf("e = &a\n");
e = &a;
printf("*e points to the string: '%s' starting at %p, which is the same as a (%p) \n",*e,e, a);
printf("Note the dereferencing operator * to obtain the content pointed by e\n\n");
printf("e[0][4] = 'Y'\n");
e[0][4]= 'Y' ;
printf("e is 'writable' in the sense that it's just a pointer to a writable memory area\n\n");
e = &a
*e points to the string: 'TestX' starting at 0x7ffc385f6abe, which is the same as a (0x7ffc385f6abe)
Note the dereferencing operator * to obtain the content pointed by e
e[0][4] = 'Y'
e is 'writable' in the sense that it's just a pointer to a writable memory area
The case char *f[10];
printf("f[0] = \"Test5\"\n");
printf("f[9] = \"Test6\"\n");
f[0] = "Test5";
f[9] = "Test6";
printf("f[0] points to the string: '%s' starting at %p\n",f[0],f[0]);
printf("its elements '%c' '%c' '%c' .... are at offsets of %ld bytes\n",f[0][0],f[0][1],f[0][2],
&f[0][1]-&f[0][0]);
printf("f[9] points to the string: '%s' starting at %p\n\n",f[9],f[9]);
printf("At this point, calling f[1] would most likely segfault, because we have not initialized it!\n\n");
printf("We can reuse each of the 10 pointers f[0]...f[9] to point somewhere,\n");
printf("for example to some dynamically allocated memory:\n");
f[0] = (char*) malloc(10*sizeof(char));
sprintf(f[0],"%s","Test7");
printf("f[0] = (char*) malloc(10*sizeof(char))\n");
printf("sprintf(f[0],\"%%s\",\"Test7\")\n");
printf("f[0] points to the string: '%s' starting at %p\n\n",f[0],f[0]);```
f[0] = "Test5"
f[9] = "Test6"
f[0] points to the string: 'Test5' starting at 0x560bf3c7a6f7
its elements 'T' 'e' 's' .... are at offsets of 1 bytes
f[9] points to the string: 'Test6' starting at 0x560bf3c7a6fd
At this point, calling f[1] would most likely segfault, because we have not initialized it!
We can reuse each of the 10 pointers f[0]...f[9] to point somewhere,
for example to some dynamically allocated memory:
f[0] = (char*) malloc(10*sizeof(char))
sprintf(f[0],"%s","Test7")
f[0] points to the string: 'Test7' starting at 0x560bf3c7d6d0
The case of the pointer to pointer char ** g;
//*g = "Test8"; // this would segfault because **g is not pointing to allocated memory yet.
g = (char**) malloc(10 * sizeof(char*));
*g = "Test8";
printf("*g points to the string: '%s' starting at %p\n", *g, *g);
printf("This is the same as &g[0][0] (%p), or, equivalently g[0] (%p), clear, uh?\n\n",&g[0][0],g[0]);
printf("Since we have allocated space for 10 pointers, we can use also, e.g., the ninth:\n");
g[0] = "Test9";
g[9] = "TestA";
printf("g[0] points to the string: '%s' starting at %p, which is the same as *g (%p)\n",g[0], g[0],*g);
printf("g[9] points to the string: '%s' starting at %p, which is the same as *(g+9) (%p), thanks to pointer algebra!\n",g[9], g[9],*(g+9));
*g points to the string: 'Test8' starting at 0x560bf3c7a909
This is the same as &g[0][0] (0x560bf3c7a909), or, equivalently g[0] (0x560bf3c7a909), clear, uh?
Since we have allocated space for 10 pointers, we can use also, e.g., the ninth:
g[0] points to the string: 'Test9' starting at 0x560bf3c7a9e1, which is the same as *g (0x560bf3c7a9e1)
g[9] points to the string: 'TestA' starting at 0x560bf3c7a9e7, which is the same as *(g+9) (0x560bf3c7a9e7), thanks to pointer algebra!
In this case, a block of memory is reserved by the compiler to store an int value. An address is assigned to this block. To show the address, we must use the & operator:
printf("Address of var = %d\n", &var);
// It prints "Address of var = 1864086260"
The address is a random number that can change at each execution.
To obtain the value stored at a given address (for example, at address &var), we must use the * operator:
printf("Value of var = %d", *(&var));
// It prints "Value of var = 42"
The address of a variable can be stored in another variable known as a pointer variable. The syntax for storing the address of a variable in a pointer is:
type *namePtr = &nameVar;
For example:
int var = 42;
int *ptr = &var;
or
int var = 42;
int *ptr;
ptr = &var;
The data type tells the compiler what type of data the variable will contain whose address we are going to store. In our last example, ptr is a pointer to an int.
[!WARNING]
This doesn't mean that ptr will store an int value. A pointer to an integer (like ptr) can only store the address of variables of typeint.
[!WARNING]
Note that the asterisk * is used both for declaring the pointer and for dereferencing it:
int var = 42;
int *ptr; // The * is used to declare a pointer
ptr = &var;
printf("Var = %d", *ptr); // The * is used to dereference the pointer (i.e., to show the value stored at address p = &var)
One can only add or subtract integers to pointers:
int array[4] = {5,10,15,20};
int *ptr = &array[0];
ptr += 3 // This is valid
ptr *= 3 // This is NOT valid
[!WARNING]
When you add (or subtract) an integer (say n) to a pointer, you’re NOT actually adding (or subtracting) n bytes to the pointer’s value. Instead, you are adding (or subtracting) ntimes the size in bytes of the data type of the variable being pointed to.
int a = 5;
int *ptr = &a; // Assume the address is 1000
int newAddress = ptr + 3; // It is equal to: ptr + 3*sizeof(int) = 1000 + 3*4 = 1012
Let's look at the follwing code:
#include <stdio.h>
int main(){
int arr0D = 42;
int *p0 = &arr0D;
printf("-------------------------------\n");
printf("0D array (variable) and pointer\n");
printf("-------------------------------\n");
printf("Number of element of arr0D = %d \n", sizeof(arr0D)/sizeof(int));
printf("Sizeof(arr0D) = %d Bytes\n\n", sizeof(arr0D));
printf("arr0D = %d ; &arr0D = %d\n", arr0D , &arr0D);
printf("*p0 = %d ; p0 = %d\n\n", *p0 , p0 );
printf("*(p0+1) = %d ; p0+1 = %d\n\n", *(p0+1) , p0+1 );
whose output is:
-------------------------------
0D array (variable) and pointer
-------------------------------
Number of element of arr0D = 1
Sizeof(arr0D) = 4 Bytes
arr0D = 42 ; &arr0D = 1830580912
*p0 = 42 ; p0 = 1830580912
*(p0+1) = 1 ; p0+1 = 1830580916
We immediately see that, since p0 = &arr0D, it means that p0 is storing the address of arr0D. To obtain the corresponding value at address &arr0D, we must dereference p0 (i.e., *p0). If we then add 1 to p0, we obtain the memory address stored at the same address of &arr0D (i.e., 1830580912) PLUS 4 Bytes (i.e., 1830580916). If we dereference *(p0+1), we obtain some trash value.
#include <stdio.h>
int main(){
int arr1D[5] = {1,2,3,4,5};
int *p1 = arr1D;
printf("-------------------------------\n");
printf("1D array and pointer\n");
printf("-------------------------------\n");
printf("Number of element of arr1D = %d \n", sizeof(arr1D)/sizeof(int));
printf("Sizeof(arr1D) = %d Bytes\n\n", sizeof(arr1D));
printf("arr1D = %d\n", arr1D); //Points to element 0
printf("&arr1D[0] = %d\n", &arr1D[0]); //Points to element 0
printf("&arr1D = %d\n", &arr1D); //Points to the whole array
printf("p1 = %d\n\n", p1); //Points to the element 0
printf("&arr1D[1] = %d\n\n", &arr1D[1]); //Points to element 1
printf("Summing +1:\n");
printf("arr1D +1 = %d\n", arr1D+1);
printf("&arr1D[0]+1 = %d\n", &arr1D[0]+1);
printf("&arr1D +1 = %d\n\n", &arr1D+1);
printf("p1 +1 = %d\n\n", p1+1);
whose output is
-------------------------------
1D array and pointer
-------------------------------
Number of element of arr1D = 5
Sizeof(arr1D) = 20 Bytes
arr1D = 1832891120
&arr1D[0] = 1832891120
&arr1D = 1832891120
p1 = 1832891120
&arr1D[1] = 1832891124
Summing +1:
arr1D +1 = 1832891124
&arr1D[0]+1 = 1832891124
&arr1D +1 = 1832891140
p1 +1 = 1832891124
The array arr1D has 5 elements: since each of them is an integer (4 Bytes), the total size of arr1D is 5×4 Bytes = 20 Bytes.
Notice that arr1D and &arr1D[0] both point to the 0th element of the array arr1D. Thus, the name of an array is itself a pointer to the 0th element of the array. Here, both point to the first element, which has a size of 4 Bytes. When you add 1 to them, they now point to the element with index 1 of the array (i.e., &arr1D[1]), this resulting in an address increment of 4 Bytes.
On the other hand, &prime is a pointer to an array of 5 integers. It holds the base address of the array arr1D[5], which is the same as the address of the first element. Therefore, increasing by 1 results in an address increment of 5 x 4 = 20 Bytes.
[!NOTE]
The name of an array is itself a pointer to the 0th element of the array, and if we increase it by one, we move to the address of the next element. On the other hand, &arr1D is a pointer to the whole array, and increasing it by one results in an address increment of sizeof(arr1D).
In short, arr and &arr[0] point to the 0th element, while &arr points to the entire array.
We can access the elements of the array using indexed variables like this:
int arr = {5,10,15,20,25};
for (int i = 0; i < 5; i++){
printf("index = %d, address = %d, value = %d", i, &arr[i], arr[i]);
}
or we can do the same thing using pointers, which are generally faster than using indexing:
int arr = {5,10,15,20,25};
for (int i = 0; i < 5; i++){
printf("index = %d, address = %d, value = %d", i, arr+i, *(arr+i));
}
[!TIP]
Accessing the elements of the array using pointers is faster. If you don't believe... try!
In this section, we consider 2D arrays (but the discussion can be easily extended to multidimensional arrays).
Suppose we want to allocate an array to represent the 3 components of the velocity for each of the N particles.
To allocate it dynamically, we must use pointers:
arr = (float **)malloc(N*sizeof(float *));
for (int i = 0; i < N; i++) {
arr[i] = (float *)malloc(3 * sizeof(float));
}
Here, arr is declared as a pointer to a pointer to a float. This type of pointer is used to represent a 2D array because, in C, a 2D array is essentially an array of arrays. Each pointer in the first dimension (arr[i]) will point to a one-dimensional array of floats (the columns).
To allocate the first dimension (rows), we have used:
arr = (float **)malloc(N * sizeof(float *));
This line allocates memory for N pointers to floats, where N represents the number of rows. The malloc function dynamically allocates memory, and sizeof(float *) ensures that the correct amount of memory is reserved for each pointer. Since arr is a pointer to a pointer, it needs memory for storing N pointers that will later point to the actual rows of floats.
[!NOTE]
The (float **) in front of the malloc() call is a type cast. Indeed, malloc() returns a pointer of type void *. A void * pointer is a generic pointer that can point to any type of data, but it doesn’t carry type information itself. Therefore, when you assign the result of malloc() to a variable of a specific pointer type (like float ** in your case), you need to cast the void * pointer to the appropriate type so that the compiler knows how to interpret it. So, the (float **) cast is used to explicitly convert the void * returned by malloc() into a float **, which is necessary for the variable arr.
To allocate the second dimension (columns), we have used:
for (int i = 0; i < N; i++) {
arr[i] = (float *)malloc(3 * sizeof(float));
}
This loop allocates memory for each row. For each i from 0 to N-1, arr[i] is allocated memory to hold 3 float values. This creates a row with 3 columns for each row in the 2D array. The sizeof(float) ensures that the correct amount of memory is allocated for each float. The number 3 here specifies the number of columns in each row.
To summarise, we created a pointer (arr) that points to an array of pointers. Each of these pointers will, in turn, point to an array of floats. The first call to malloc allocates enough memory to hold N pointers, one for each row. This is why arr is of type float **. For each row, the second malloc allocates memory for 3 floats. These floats represent the columns in each row, creating the second dimension of the array.
[!WARNING]
Memory Management: Since you’re dynamically allocating memory, it’s essential to free the allocated memory later to avoid memory leaks. This would be done using free() for each row and finally for the array of pointers:
for (int i = 0; i < N; i++) {
free(arr[i]); // Free each row
}
free(arr); // Free the array of pointers
The follwing example shows also how to pass this 2D array to a function:
#include <stdio.h>
#include <stdlib.h>
void func(float **arr, int dim1, int dim2){
for(int i = 0; i<dim1; i++){
for(int j = 0; j<dim2; j++){
*(*(arr+i)+j) = i*dim2+j;
}
}
}
int main(){
int const N = 10;
float **arr;
arr = (float **)malloc(N*sizeof(float *));
for (int i = 0; i < N; i++) {
arr[i] = (float *)malloc(3 * sizeof(float));
}
func(arr, N, 3);
for (int i = 0; i < N; i++) {
for (int j = 0; j < 3; j++) {
printf("%d: %f\n",i, *(*(arr+i)+j));
}
}
for (int i = 0; i < N; i++) {
free(arr[i]); // Free each row
}
free(arr); // Free the array of pointers
return 0;
}
Note that in the function func we are using pointers arithmetics: the expression *(*(arr + i) + j) is used to access the elements of a 2D array:
arr is a pointer to a pointer. Essentially, arr is a pointer to the first element of an array of pointers, where each pointer points to the first element of a row in the 2D array.
arr + i advances the pointer arr by i positions. Since arr is a pointer to pointers (float **arr), arr + i moves the pointer i steps ahead, pointing to the i-th row of the array.
*(arr + i) dereferences the pointer to the i-th row, i.e., it accesses the pointer that points to the first element of the i-th row. In simpler terms, *(arr + i) gives you the address of the first element in the i-th row.
*(arr + i) + j: from the address of the i-th row (which is itself a pointer), you move j positions ahead to reach the j-th element of the i-th row. *(arr + i) + j gives you a pointer to the element at position arr[i][j].
((arr + i) + j): finally, the outer * dereferences the pointer obtained from the previous step, giving you the actual value of the element at position arr[i][j].
In the call by value method, the values of the actual parameters are copied into the function's formal parameters. This means two separate copies of the parameters are stored in memory: one for the original values and one for the function's use. Since changes are made only to the function’s local copy, any modifications inside the function do not affect the actual parameters in the calling code.
For example:
#include <stdio.h>
float add_by_value(float x, float y){ //Formal arguments
x = 100; // Changing the values of the formal arguments does not
y = 200; // change the values of the actual arguments
return x+y;
}
int main(){
float a = 1.3;
float b = 2.5;
float res = 0.;
//By value
printf("Pass by value:\n");
printf("Expected: a = %.1f, b = %.1f, a + b = %.1f\n", a,b,a+b);
res = add_by_value(a,b); // Actual arguments
printf("Result : a = %.1f, b = %.1f, a + b = %.1f\n\n", a,b,res);
return 0;
}
The output is
Pass by value:
Expected: a = 1.3, b = 2.5, a + b = 3.8
Result : a = 1.3, b = 2.5, a + b = 300.0
In the above example, we invoke the function add_by_value passing float a = 1.3 and float b = 2.5 (actual arguments). The function takes its formal arguments (x and y), change their local value and return their sum.
If we print the values of a and b after the call to the function, we can see that their value is not changed. Indeed, the function add_by_value changed the values of the (local) formal arguments.
In the call by reference method, the address of the actual parameters is passed to the function as formal parameters. In C, this is done using pointers.
Since both the actual and formal parameters point to the same memory location, any changes made inside the function directly affect the original values in the calling code.
This allows the function to modify the actual parameters.
Let's now look at this example:
#include <stdio.h>
float add_by_reference(float *x, float *y){ //Formal arguments
*x = 100;
*y = 200;
return *x+*y;
}
int main(){
float a = 1.3;
float b = 2.5;
float res = 0.;
//By reference
printf("Pass by reference:\n");
printf("Expected: a = %.1f, b = %.1f, a + b = %.1f\n", a,b,a+b);
res = add_by_reference(&a,&b);
printf("Result : a = %.1f, b = %.1f, a + b = %.1f\n", a,b,res);
return 0;
}
The output is
Pass by reference:
Expected: a = 1.3, b = 2.5, a + b = 3.8
Result : a = 100.0, b = 200.0, a + b = 300.0
On the contrary of the previous example, the values of a and b result changed after invoking the function add_by_reference. Note also that this function takes as an input the address of the varaible, NOT its value. Indeed, the function is defined as float add_by_reference(float *x, float *y), that means x and y are pointers to float. Note also that if we want to change the values of x and y (which are pointers), we must dereference them.
In C, functions can handle parameters in two ways:
Call by Value: A copy of the actual parameter’s value is passed to the function. The function works with this copy, so any changes made inside the function do not affect the original variable.
Call by Reference: Instead of passing the value, the address of the actual parameter is passed, using pointers. This allows the function to modify the original variable.
[!TIP]
Use call by value when you want to protect the original values from being modified.
Use call by reference when you want to modify the caller’s variable or work with large data structures efficiently.
[!WARNING]
Call by value can use more memory for large data structures since the values are copied.
Call by reference: be careful when using pointers; incorrect handling may lead to memory corruption or unexpected behavior. Always ensure pointers are properly initialized and handle memory carefully to avoid issues like segmentation faults.
We now want to write a numerical code to compute
∫01f(x)dx
with f(x)=x3. Obviously, we already know the solution:
∫01x3dx=41
[!TIP]
Knowing exactly what a numerical code (and, in particular, a small part of a huge code) is expected to do is a key aspect in numerical sicence (sometimes, it can be regarded as a privilege!). In this way, we are sure that (a portion of) the code we implemented works.
We will implement the numerical integration by using the trapezoidal rule.
FUNCTION f(x)
RETURN x^3
FUNCTION trapezoidal_rule(f, a, b, N)
p = (b-a) / N
sum = 0.
FOR i = 0, N-1
x1 = a + i * p
x2 = a + (i+1) * p
sum = sum + (f(x1)+f(x2)) * p * 0.5
RETURN sum
MAIN PROGRAM
a = 0
b = 1
N = 1000
if(N>0) print(The number of trapezoid used is N)
if(N<0) print(The number of trapezoid used is negative) stop
if(N=0) print(The number of trapezoid used is zero) stop
result = trapezoidal_rule(f, a, b, N)
print(The integral of f(x) = x^3 from a to b is approximately: result)
END PROGRAM
Before implementing the code in Fortran, we notice that there is a clever way of coding the function trapezoidal_rule:
FUNCTION trapezoidal_rule(f, a, b, N)
p = (b-a) / N
sum = 0.5 * (func(a) + func(b)) # End points contribution
FOR i = 1, N-1
x = a + i * p
sum = sum + f(x)
RETURN sum * p
By doing so, we are computing only x instead of x1 and x2, and we are calling the function f only once per iteration.
[!TIP]
When coding, think if there is a clever way of implementing your algoriths.
In particular: minimise the number of computations and reduce the read/write.
We are now ready to look at the FORTRAN implementation:
program trapezoidal_integration
implicit none
real(8) :: a, b, result
integer :: n
! Initialize variables
a = 0.0d0
b = 1.0d0
n = 1000
print*, "This program performs numerical integration of f(x) = x^3 from a = ", a, " to b = ", b, " using ", n, " trapezoids."
! Check if n is valid
if (n > 0) then
print*, "The number of trapezoids is positive."
elseif (n < 0) then
print*, "Error: The number of trapezoids is negative."
stop
else
print*, "Error: The number of trapezoids is zero."
stop
endif
! Perform numerical integration
result = trapezoidal_rule(a, b, n)
print*, "The integral of f(x) = x^3 from ", a, " to ", b, " is approximately: ", result
contains
! Function to evaluate f(x) = x^3
real(8) function f(x)
real(8), intent(in) :: x
f = x**3
end function f
! Trapezoidal rule function
real(8) function trapezoidal_rule(a, b, n)
real(8), intent(in) :: a, b
integer, intent(in) :: n
real(8) :: p, sum, x
integer :: i
p = (b - a) / n
sum = 0.5d0 * (f(a) + f(b))
do i = 1, n-1
x = a + i * p
sum = sum + f(x)
end do
trapezoidal_rule = sum * p
end function trapezoidal_rule
end program trapezoidal_integration
FORTRAN supports different data types, but real(8) specifies a double-precision floating-point type, similar to C’s double. This ensures that our calculations maintain high precision.
[!NOTE]
Some of the basic data types are:
integer (4 Bytes): Represents whole numbers, both positive and negative (e.g., integer :: age = 25;).
character (1 Byte): Represents a single character (e.g., char letter = 'A';).
FORTRAN provides the sizeof() operator to determine the size of any type or variable in bytes. For example:
integer :: a
print*, "Size of int:", sizeof(a), " bytes" ! Typically prints 4
In FORTRAN, types can also be automatically or explicitly converted:
Implicit type conversion: The compiler automatically converts one data type to another when necessary. In expressions that involve different types (e.g., INTEGER and REAL), the smaller type is automatically promoted to the larger type.
Explicit type conversion (also called "casting") is when you manually convert a value from one type to another using intrinsic functions like INT() or REAL().
The following example shows both kind of conversion:
program type_conversion_example
implicit none
integer :: a
real(8) :: b, pi, result
integer :: truncated_pi
! Variable initialization
a = 5
b = 2.5d0
pi = 3.14159d0
! Implicit conversion of 'a' (integer) to 'real'
result = a + b
print*, "Result (a + b): ", result ! Output: 7.500000000000000
! Explicit conversion of 'pi' (real) to 'integer'
truncated_pi = int(pi)
print*, "Pi (truncated): ", truncated_pi ! Output: 3
end program type_conversion_example
[!WARNING]
Implicit conversion can lead to precision loss: when converting from a larger or more precise type (like REAL or DOUBLE PRECISION) to a smaller or less precise type (like INTEGER), the conversion can result in a loss of precision or data.
real(8) :: pi = 3.14159
integer :: value = pi ! Implicit conversion truncates decimal part
In FORTRAN, the print* statement is used to display output to the console. It is a simple way to print values without needing to specify a format explicitly. The asterisk (*) indicates that the output format is chosen automatically by the compiler, making it convenient for quick printing of variables. In the specific case of our example, we have:
print*, "This program performs numerical integration of f(x) = x^3 from a = ", a, " to b = ", b, " using ", n, " trapezoids."
In this example, print* automatically formats and prints the values of a, b and n. However, if more control over the output format is required, using write with a specific format is a better option.
Here some examples of format specifiers:
A: Used for character strings.
Example: A10 means a character field of width 10.
I: Used for integers.
Example: I5 means an integer field with a width of 5 characters.
F: Used for floating-point numbers in fixed-point notation.
Example: F8.2 means a floating-point number with a total width of 8 characters, including 2 digits after the decimal point.
E: Used for floating-point numbers in scientific notation.
Example: E10.3 means a floating-point number with a total width of 10 characters and 3 digits after the decimal.
X: Inserts spaces between fields.
Example: X5 inserts 5 spaces.
/: Starts a new line.
[!WARNING]
Ensure the format specifier corresponds to the type of the variable. Using the wrong specifier can lead to unexpected output or program crashes.
In FORTRAN, the write statement is used for more flexible and controlled output compared to print. It allows you to specify the format of the output and target different I/O units (like files or the console). The write statement is often paired with a format specifier to control how the output is displayed.
The syntax is:
write(unit, format) var1, var2, ...
where:
unit specifies where the output goes (unit=6 or * typically refers to the console);
format controls the layout of the output: it can be a label pointing to a format statement or an asterisk (*) for automatic formatting.
Here an example:
integer :: a = 10
real :: b = 3.14159
write(*, '(A, I2)') 'Value of a: ', a ! Output with format
write(*, '(A, F6.2)') 'Value of b: ', b ! Output with fixed decimal format
! Or using the format statement:
write(*, 10) 'Value of a: ', a ! Output with format
10 format('(A, I2)')
! We can also specify a number for the unit:
! In the above example, it will create a file called fort.123
write(123, '(A, F6.2)') 'Value of b: ', b
More details on the use of write related to the possibility of opening and closing files are provided in the next lecture.
The following code block shows an example of an if statement:
if (n > 0) then
print*, 'The number of trapezoids is positive.'
else if (n < 0) then
print*, 'Error: The number of trapezoids is negative.'
stop
else
print*, 'Error: The number of trapezoids is zero.'
stop
end if
The if statement in FORTRAN allows you to control the flow of your program based on conditions. It checks if a certain condition is true and executes a block of code accordingly. If the condition is false, it moves to the next condition or the else block.
if (n > 0):
This condition checks if the value of n is greater than 0.
If true, the program executes the code inside this block: it prints “The number of trapezoids is positive.”.
If false, it moves to the next condition.
else if (n < 0):
This checks if n is less than 0.
If true, the program prints an error message: “Error: The number of trapezoids is negative.”
If false, the program moves to the final else block.
else:
This block is executed if none of the previous conditions were true. Since the only other possibility is that n equals 0, the program prints: “Error: The number of trapezoids is zero.”
You can combine multiple conditions using logical .AND. (AND) and .OR. (OR) to make your checks more compact.
if (n > 0 .AND. n < 100) then
print *, 'n is positive and less than 100.'
end if
[!TIP]
In some cases, you can use a simpler if construct for one-liners without else or else if blocks.
if(n > 0) print*, "n is grater than 0."
[!WARNING]
Be cautious with comparisons of floating-point numbers! Due to precision issues, comparing floating-point numbers using == can be unreliable. It is better to use a small tolerance instead of direct equality checks.
if (abs(a - b) < 0.00001) then ! a and b are "close enough"
! Handle case where a is approximately equal to b
end if
real(8) function f(x)
real(8), intent(in) :: x
f = x**3
end function f}
The first line function f(x) defines a function named f that takes a single argument x. In FORTRAN, the data types of both the function and the argument are declared separately. In this case, x and f are both declared as real(8) (i.e., a floating-point number with double precision).
The logic of the function is contained between the lines function f(x) and end function f. - The line f = x**3 computes x raised to the power of 3 using the ** operator (in FORTRAN, ** is used for exponentiation). The result is assigned to f, which is the name of the function and also serves as the variable that holds the return value.
[!NOTE]
The general syntax used for defining functions is:
kind function name_function(arg1,arg2,...)
kind :: arg1
kind :: arg2
...
!do something
end function
Another way to define them is the following:
function name_function(arg1,arg2,...)
kind :: name_function
kind :: arg1
kind :: arg2
...
!do something
end function
In Fortran, the intent attribute is used in procedure arguments to specify how the arguments are intended to be used within the subroutine or function. This feature enhances code clarity, facilitates debugging, and enables better optimization by the compiler. The three main types of intent are:
intent(in): This indicates that the argument is used for input only. The procedure can read the value of the argument, but it should not modify it. This helps ensure that the original value remains unchanged.
intent(out): This specifies that the argument will be used for output only. The procedure is expected to assign a value to this argument before the procedure ends. The initial value of the argument is considered undefined.
intent(inout): This indicates that the argument will be used for both input and output. The procedure can read the value of the argument and also modify it. The original value may be used as input, and a new value is assigned before the procedure finishes.
[!TIP]
By specifying the intended use of variables, you reduce the risk of unintended side effects and make your code safer. Moreover, the compiler can better optimize the code when it knows how variables are being used, potentially improving performance. Finally, it can serves as documentation within the code, making it clearer to users and maintainers what each variable’s role is.
In the second function defined in the example code, we can see an example of for loop:
do i = 1, n-1
x = a + i * p
sum = sum + f(x)
end do
The first line do i = 1, n-1 sets up a do loop that will iterate from i = 1 to i = n-1. The loop performs the following:
i = 1 initializes the loop counter i to 1.
n-1 specifies the final value of i. The loop will continue until i reaches n-1.
Inside the loop, x = a + i * p and sum = sum + func(x) are executed for each iteration.
The end do marks the end of the loop.
In FORTRAN, the do loop can be used both with and without a loop counter. When used without a counter, the do loop acts more like a general loop that can run until a specific condition is met, such as through an exit or cycle statement. The 'do' loop without a counter is useful when the number of iterations is not known in advance and depends on some runtime condition.
Here is an example of a do loop without a counter:
program doWithoutCounter
implicit none
integer :: i = 1
do
print *, 'Iteration: ', i
i = i + 1
if (i > 5) exit ! Exit loop when i exceeds 5
end do
end program doWithoutCounter
The loop runs indefinitely until the exit condition is met, which is used to break out of the loop when a condition is satisfied. Alternatively, cycle can be used to skip to the next iteration without completing the remaining statements in the current iteration.
In FORTRAN, the program block is used to define the main part of a program. Every FORTRAN program must begin with the program keyword, followed by a name (optional but recommended), and ends with the end program statement.
program myProgram
implicit none
! Variable declarations
integer :: a, b, result
! Initialize variables
a = 5
b = 10
! Perform a simple calculation
result = a + b
print *, 'The sum is: ', result
end program myProgram
}
[!TIP]
implicit none is recommended to avoid automatic type assignment, forcing explicit declarations of all variables. You should write implicit none in each program as well as in functions and subroutines.
We slightly modify the example code written to numerically evaluate an integral with the trapezoid rule:
program trapezoidal_integration
implicit none
integer, parameter :: n = 1000 ! Number of trapezoids
real(8) :: a = 0.0d0 ! Lower limit of integration
real(8) :: b = 1.0d0 ! Upper limit of integration
real(8) :: result ! Result of the integration
real(8), dimension(n) :: x_values, f_values
integer :: i
print *, "This program performs numerical integration of f(x) = x^3 from a = ", a, " to b = ", b, " using ", n, " trapezoids."
! Check if n is a valid number of trapezoids
if (n > 0) then
print *, "The number of trapezoids is positive."
else if (n < 0) then
print *, "Error: The number of trapezoids is negative."
stop 1
else
print *, "Error: The number of trapezoids is zero."
stop 1
end if
! Perform numerical integration
result = trapezoidal_rule(a, b, n, x_values, f_values)
! Print the result of the integration
print *, "The integral of f(x) = x^3 from ", a, " to ", b, " is approximately: ", result
! Optionally, print out the x values and their corresponding f(x) values
print *, "x values and f(x) evaluations:"
do i = 1, n-1
print *, "x(", i, ") = ", x_values(i), ", f(x(", i, ")) = ", f_values(i)
end do
contains
! Define the function to integrate: f(x) = x^3
real(8) function f(x)
real(8), intent(in) :: x
f = x**3
end function f
! Trapezoidal rule for numerical integration
real(8) function trapezoidal_rule(a, b, n, x_values, f_values)
real(8), intent(in) :: a, b
integer, intent(in) :: n
real(8), dimension(n), intent(out) :: x_values, f_values
real(8) :: p, sum
integer :: i
p = (b - a) / real(n) ! Width of each trapezoid
sum = 0.5d0 * (f(a) + f(b)) ! End points contribution
! Store the x values and function evaluations in arrays
do i = 1, n-1
x_values(i) = a + i * p
f_values(i) = f(x_values(i)) ! Store the function evaluation
sum = sum + f_values(i)
end do
trapezoidal_rule = sum * p ! Return the computed integral
end function trapezoidal_rule
end program trapezoidal_integration
Two arrays are now introduced in the program:
x_values: This array stores the x values where the function f(x) is evaluated during the integration.
f_values: This array stores the computed values of the function f(x) at those x values.
In FORTRAN, an array is a collection of elements of the same type stored in contiguous memory locations. Arrays are useful when you need to store multiple values of the same type and access them using an index.
type, dimension(size) :: array_name
where type is the data type of the elements (e.g., integer, real(8), character, etc.), array_name is the name of the array, and size is the number of elements in the array.
[!WARNING]
Indexing starts at 1: the first element is accessed using arr(1), not arr(0).
Accessing an index outside the array size (e.g., arr(6) when the array has only 5 elements) will typically result in an error.
The function trapezoidal_rule() is modified to take two additional array parameters (x_values and f_values).
real(8) function trapezoidal_rule(func, a, b, n, x_values, f_values)
real(8), intent(in) :: a, b
integer, intent(in) :: n
real(8), dimension(n), intent(out) :: x_values, f_values
...
Inside the do loop, each x value is stored in x_values(i), and the corresponding function evaluation is stored in f_values(i).
do i = 1, n-1
x_values(i) = a + i * p
f_values(i) = f(x_values(i)) ! Store the function evaluation
sum = sum + f_values(i)
end do
After the integration, to print the stored x values and their corresponding function evaluations f(x):
do i = 1, n-1
print *, "x(", i, ") = ", x_values(i), ", f(x(", i, ")) = ", f_values(i)
end do
Initializing an array means assigning initial values to its elements at the time of declaration. Proper initialization is crucial to avoid unpredictable behavior caused by garbage values.
When you declare an array, you can initialize it in several ways:
Full Initialization: you provide explicit values for all elements. integer :: arr(3) = (/1, 2, 3/) ! Array of 3 integers, initialized to {1, 2, 3}
Zero Initialization: You can explicitly initialize all elements to zero (or to any other value): integer :: arr(4) = 0 ! Array of 4 integers, initialized to {0, 0, 0, 0}
[!WARNING]
In FORTRAN, arrays are not automatically initialized when declared. If you declare an array without explicitly initializing it, the array elements will contain garbage values (random values left in memory). This can lead to unpredictable behavior in your program.
Remember to always initialise them!
Here an example of arrays declaration and initialisation:
program arrays
implicit none
real, dimension(3) :: a ! Declare a real array 'a' of size 3 (uninitialized)
real :: b(3) ! Declare a real array 'b' of size 3 (uninitialized)
integer, dimension(5) :: c, d ! Declare two integer arrays 'c' and 'd', both of size 5 (uninitialized)
integer :: e(5), f(6) ! Declare two integer arrays 'e' of size 5 and 'f' of size 6 (uninitialized)
real, dimension(3) :: g = (/1.2, 2., 3.7/) ! Declare and initialize a real array 'g' with values 1.2, 2.0, and 3.7
integer, dimension(7) :: h = 1 ! Declare an integer array 'h' of size 7. All elements are initialized with 1
end program
In Fortran, multidimensional arrays are easily declared by specifying the dimensions in parentheses. These are useful for representing data in matrices, grids, or higher-dimensional structures.
type :: array_name(size1, size2, ...)
An example of a 2D array:
integer :: matrix(3, 4) ! Array of 3 rows and 4 columns
It can be initialised either with a full initialisation:
program multidimensional_array
implicit none
integer :: i, j
integer :: matrix(2, 3)
! Initialize the 2D array
matrix = reshape([1, 2, 3, 4, 5, 6], shape(matrix))
! Print the 2D array
print *, "Matrix:"
do i = 1, 2 ! Loop through rows
do j = 1, 3 ! Loop through columns
print *, matrix(i, j),
end do
print *, '' ! Newline after each row
end do
end program multidimensional_array
[!TIP]
In Fortran, the size() function is used to determine the number of elements in an array. It can be applied to arrays of any dimension.
The syntax is:
n = SIZE(array [, dim])
where array is the array whose size you want to determine and dim is optional and representes the dimension along which the size is returned (if omitted, size() returns the total number of elements in the array).
! 1D array
real :: arr1(5)
print *, SIZE(arr1) ! Outputs: 5
! 2D array
real :: arr2(3,4)
print *, SIZE(arr2) ! Outputs: 12 (total number of elements)
print *, SIZE(arr2, 1) ! Outputs: 3 (size along the first dimension)
print *, SIZE(arr2, 2) ! Outputs: 4 (size along the second dimension)
[!WARNING]
In Fortran, multidimensional arrays are stored in column-major order, meaning that elements of a column are stored contiguously in memory before moving to the next column. This contrasts with languages like C, which use row-major order, where elements of a row are stored contiguously.
When looping over multidimensional arrays in Fortran, it is more efficient to iterate over the leftmost index (first dimension) in the inner loop and the rightmost index (last dimension) in the outer loop. This is because Fortran stores data in column-major order, so accessing elements in this sequence ensures better memory locality and performance.
program example
implicit none
real, dimension(128,128,128) :: u ! Velocity field
integer :: ix, iy, iz ! Indices used in the do loop
do iz = 1, size(u,3)
do iy = 1, size(u,2)
do ix = 1, size(u,3)
...
end do
end do
end do
end program
This approach allows Fortran to access memory contiguously, improving cache performance. If you switch the order of loops (i.e., iterate over the rows in the outer loop and columns in the inner loop), the program will still work but may be less efficient due to non-contiguous memory access.
We propose the third variation of the example code. In particular, this is a variation of the code in which arrays where used.
Indeed, we now read the number of the trapezoids from the command line, and we dynamically allocate the memory to store x_values and f_values (if you didn't read the previous examples, don't worry: you will easliy understand what these varaibles are):
program trapezoidal_integration
implicit none
real(8) :: a = 0.0d0 ! Lower limit of integration
real(8) :: b = 1.0d0 ! Upper limit of integration
real(8), allocatable :: x_values(:), f_values(:)
integer :: n, i
real(8) :: result
! Prompt the user for the number of trapezoids
print *, "Enter the number of trapezoids (n):"
read(*,*) n
! Check if n is valid
if (n <= 0) then
print *, "Error: The number of trapezoids must be a positive integer."
stop
end if
! Allocate memory for x_values and f_values arrays
allocate(x_values(n), f_values(n))
! Call the trapezoidal rule to perform the integration
result = trapezoidal_rule(a, b, n, x_values, f_values)
! Print the result of the integration
print *, "The integral of f(x) = x^3 from ", a, " to ", b, " is approximately: ", result
! Optionally, print out the x values and their corresponding f(x) values
print *, "x values and f(x) evaluations:"
do i = 1, n - 1
print '(I3, F10.5, F15.5)', i, x_values(i), f_values(i)
end do
! Free dynamically allocated memory
deallocate(x_values, f_values)
contains
! Function to define f(x) = x^3
real(8) function f(x)
real(8), intent(in) :: x
f = x**3
end function f
! Function to perform the trapezoidal rule
real(8) function trapezoidal_rule(a, b, n, x_values, f_values)
real(8), intent(in) :: a, b
integer, intent(in) :: n
real(8), intent(out) :: x_values(n), f_values(n)
real(8) :: p, sum
integer :: i
p = (b - a) / real(n) ! Width of each trapezoid
sum = 0.5d0 * (f(a) + f(b)) ! End points contribution
! Calculate x values and function evaluations
do i = 1, n - 1
x_values(i) = a + real(i) * p
f_values(i) = f(x_values(i)) ! Store function evaluation
sum = sum + f_values(i)
end do
trapezoidal_rule = sum * p ! Return the integral result
end function trapezoidal_rule
end program trapezoidal_integration
In Fortran, the read() statement is used to read input from a source, typically the keyboard (standard input) or a file. It can be used in several forms to capture different types of input.
In our example:
read(*,*) n
the read(*,*) statement takes input for the integer n from the user. The * format specifier means free-form input, so the user can enter the values without needing to worry about specific formatting.
The general syntax is:
read(unit, format) variables
where
unit: Specifies the input source. The most common value is *, which represents standard input (usually the keyboard).
format: Defines how the input is interpreted. * can be used for free-form input, which allows the compiler to automatically interpret the format.
variables: The variables where the input values are stored.
Allocatable arrays in Fortran are a powerful feature that allows for dynamic memory allocation, enabling the creation of arrays whose size can be determined at runtime. This flexibility is particularly useful when the dimensions of the array are not known at compile time or when working with large datasets that may vary in size.
To declare an allocatable array, the allocatable attribute is used in the array’s declaration. In our example:
real(8), allocatable :: x_values(:), f_values(:)
Memory for the array can then be allocated using the allocate statement, specifying the desired dimensions. In our example:
allocate(x_values(n), f_values(n))
Once the array is no longer needed, it can be deallocated using the deallocate statement, freeing up the memory and helping to prevent memory leaks.
deallocate(x_values, f_values)
[!Warning]
Deallocating memory in Fortran is a crucial practice in managing dynamic memory usage effectively. When an allocatable array is created using the allocate statement, memory is reserved on the heap to store its elements. If this memory is not released using the deallocate statement when the array is no longer needed, it can lead to memory leaks. Memory leaks occur when allocated memory is not properly freed, resulting in wasted resources and potentially causing a program to consume more memory than necessary. Over time, this can degrade system performance, lead to unexpected behavior, and, in extreme cases, exhaust available memory, leading to program crashes. Therefore, always ensure that any dynamically allocated memory is properly deallocated to maintain efficient memory management and optimal program performance.
[!TIP]
Automatic Deallocation on Exit: If the allocatable array goes out of scope when the subroutine exits, the memory will be automatically deallocated, but this behavior only applies if the array is declared as allocatable within that subroutine. If you use an allocatable array declared in a module or main program, it will remain allocated until explicitly deallocated.
[!TIP]
If you attempt to allocate an already allocated array in Fortran, it will result in a runtime error. Specifically, the program will throw an error indicating that the array is already allocated. To avoid this issue, you should check whether the array is allocated before allocating it again:
if(.not.allocated(array))allocate(array)
Also multidimensional arrays can be allocated dynamically using the allocate statement. You can specify multiple dimensions when allocating, allowing you to create complex data structures such as matrices or tensors. For example, a two-dimensional array can be allocated with a syntax like: allocate(array(m, n)), where m and n are the sizes of the respective dimensions.
Here some exercises with the solutions in C and FORTRAN. Feel free to start from them and expand them as you wish. For example, for some of them can be usefull to print the output!
#include <stdio.h>
// Function to calculate the sum of squares from 1 to n
int sum_of_squares(int n) {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += i * i; // Add the square of i to sum
}
return sum;
}
int main() {
int n;
// Prompt the user for input
printf("Enter a positive integer n: ");
scanf("%d", &n);
// Call the function and store the result
int result = sum_of_squares(n);
// Output the result
printf("The sum of the squares of the first %d integers is: %d\n", n, result);
return 0;
}
FORTRAN code
program sum_of_squares_program
implicit none
integer :: n, result
! Prompt the user for input
print *, "Enter a positive integer n: "
read(*, *) n
! Call the function and store the result
result = sum_of_squares(n)
! Output the result
print *, "The sum of the squares of the first", n, "integers is:", result
contains
! Function to calculate the sum of squares from 1 to n
integer function sum_of_squares(n)
implicit none
integer :: n, i
integer :: sum
sum = 0
do i = 1, n
sum = sum + i * i ! Add the square of i to sum
end do
sum_of_squares = sum ! Return the computed sum
end function sum_of_squares
end program sum_of_squares_program
#include <stdio.h>
// Function to compute factorial
int factorial(int n) {
int result = 1;
for (int i = 1; i <= n; i++) {
result *= i; // Multiply result by i
}
return result;
}
int main() {
// Print factorials of numbers from 1 to 10
printf("Factorials from 1 to 10:\n");
for (int i = 1; i <= 10; i++) {
printf("%d! = %llu\n", i, factorial(i)); // Call factorial function
}
return 0;
}
FORTRAN code
program factorial_program
implicit none
integer :: i, n
print *, "Factorials from 1 to 10:"
! Loop through numbers from 1 to 10
do i = 1, 10
print *, i, "! =", factorial(i)
end do
contains
! Function to compute factorial
integer function factorial(n)
implicit none
integer :: n, result, j
result = 1
do j = 1, n
result = result * j ! Multiply result by j
end do
factorial = result ! Return the computed factorial
end function factorial
end program factorial_program
Write a program that simulates the motion of a point mass (m=1 kg) under the influence of gravity. The program should calculate and print the height (h), velocity (v), kinetic energy (K), and potential energy (U) of the mass as functions of time. Assume that the object starts from an initial height h0=100 m and with an initial velocity v0=10 m/s. Stop the code when the point touches the ground.
#include <stdio.h>
#define g 9.81 // Gravitational acceleration (m/s^2)
// Function to compute kinetic energy
double kinetic_energy(double mass, double velocity) {
return 0.5 * mass * velocity * velocity;
}
// Function to compute potential energy
double potential_energy(double mass, double height) {
return mass * g * height;
}
// Function to compute velocity at time t
double velocity_at_time(double v0, double t) {
return v0 - g * t; // v(t) = v0 - g * t
}
// Function to compute height at time t
double height_at_time(double h0, double v0, double t) {
return h0 + v0 * t - 0.5 * g * t * t; // h(t) = h0 + v0 * t - 1/2 * g * t^2
}
int main() {
double t = 0.0; // Time starts at 0
double dt = 0.1; // Time step
double total_time = 10.0; // Total simulation time
double m = 1.0; // Mass of the point (kg)
double v0 = 10; // Initial velocity (m/s)
double h0 = 100; // Initial height (m)
// Header for the output
printf("\nTime(s)\t\tKinetic Energy(J)\tPotential Energy(J)\n");
printf("------------------------------------------------------------\n");
// Loop over time and compute KE and PE
for (t=0; t <= total_time; t+=dt) {
double h = height_at_time(h0, v0, t); // Height at time t
double v = velocity_at_time(v0, t); // Velocity at time t
if (h < 0) h = 0; // Stop the object from going below ground level
double KE = kinetic_energy(m, v); // Compute kinetic energy
double PE = potential_energy(m, h); // Compute potential energy
// Print time, kinetic energy, and potential energy
printf("%.2lf\t\t%.2lf\t\t\t%.2lf\n", t, KE, PE);
// Stop the loop if the object hits the ground (h = 0)
if (h == 0 && v <= 0) break;
}
return 0;
}
FORTRAN code
program energy_simulation
implicit none
! Constants
real, parameter :: g = 9.81 ! Gravitational acceleration (m/s^2)
! Variables
real :: t, dt, total_time, m, v0, h0, h, v, KE, PE
! Initialize values
t = 0.0 ! Time starts at 0
dt = 0.1 ! Time step
total_time = 10.0 ! Total simulation time
m = 1.0 ! Mass of the point (kg)
v0 = 10.0 ! Initial velocity (m/s)
h0 = 100.0 ! Initial height (m)
! Output header
print *, 'Time(s)', ' Kinetic Energy(J)', ' Potential Energy(J)'
print *, '---------------------------------------------------------'
! Loop over time and compute KE and PE
do while (t <= total_time)
! Compute height and velocity at time t
h = height_at_time(h0, v0, t)
v = velocity_at_time(v0, t)
! Prevent object from going below ground level
if (h < 0.0) h = 0.0
! Compute kinetic and potential energy
KE = kinetic_energy(m, v)
PE = potential_energy(m, h)
! Print the results
print "(F6.2, 2X, F10.2, 3X, F10.2)", t, KE, PE
! Stop the loop if the object hits the ground and velocity is downward
if (h == 0.0 .and. v <= 0.0) exit
! Increment time
t = t + dt
end do
contains
! Function to compute kinetic energy
real function kinetic_energy(mass, velocity)
real, intent(in) :: mass, velocity
kinetic_energy = 0.5 * mass * velocity * velocity
end function kinetic_energy
! Function to compute potential energy
real function potential_energy(mass, height)
real, intent(in) :: mass, height
potential_energy = mass * g * height
end function potential_energy
! Function to compute velocity at time t
real function velocity_at_time(v0, t)
real, intent(in) :: v0, t
velocity_at_time = v0 - g * t
end function velocity_at_time
! Function to compute height at time t
real function height_at_time(h0, v0, t)
real, intent(in) :: h0, v0, t
height_at_time = h0 + v0 * t - 0.5 * g * t * t
end function height_at_time
end program energy_simulation
A simple harmonic oscillator is described by the equation of motion:
mdt2d2x+kx=0
where:
m is the mass of the oscillator,
k is the spring constant,
x is the displacement from the equilibrium position.
Your task is to numerically solve this differential equation using the velocity Verlet integration method, which is particularly useful for simulating physical systems.
For a second-order differential equation of the type
dt2d2x(t)=a(x(t))
with initial conditions
x(t0)=x0,dtdx(t)t0=v0,
an approximate numerical solution xn≈x(tn) at the time tn=t0+nΔt with time step Δt>0 can be obtained by the following method:
Given the time n, compute the new position at time n+1: xn+1=xn+vnΔt+21anΔt2
Compute the new acceleration: an+1=a(xn+1)
Compute the new velocity: vn+1=vn+21(an+an+1)Δt
Here a to do list that can help:
Define Parameters: define the mass m , spring constant k , time step dt , and total simulation time.
Set Up Initial Conditions: det the initial position x_0 and initial velocity v_0 .
Implement the Verlet Integration Algorithm: use the following equations to update the position and velocity at each time step:
xn+1=xn+vndt+21andt2
vn+1=vn+21(an+an+1)dt
where a=mF=−mkx .
Output the Results: print the position and velocity of the oscillator at each time step.
#include <stdio.h>
void verlet_integration(double m, double k, double x0, double v0, double dt, double total_time) {
double x = x0; // current position
double v = v0; // current velocity
double a = -k/m * x; // initial acceleration
double time = 0.0;
printf("Time\tPosition\tVelocity\n");
while (time <= total_time) {
printf("%.2f\t%.4f\t%.4f\n", time, x, v);
double x_new = x + v * dt + 0.5 * a * dt * dt; // update position
double a_new = -k/m * x_new; // calculate new acceleration
v = v + 0.5 * (a + a_new) * dt; // update velocity
x = x_new; // update position for next iteration
a = a_new; // update acceleration
time += dt; // increment time
}
}
int main() {
double m = 1.0; // mass in kg
double k = 10.0; // spring constant in N/m
double dt = 0.01; // time step in seconds
double total_time = 2.0; // total simulation time in seconds
double x0 = 1.0; // initial position in meters
double v0 = 0.0; // initial velocity in m/s
verlet_integration(m, k, x0, v0, dt, total_time);
return 0;
}
FORTRAN code
program verlet_integration
implicit none
real(8) :: m, k, x, v, a, dt, total_time, time, x_new, a_new, x0, v0
! Define parameters
m = 1.0d0 ! Mass in kg
k = 10.0d0 ! Spring constant in N/m
dt = 0.01d0 ! Time step in seconds
total_time = 2.0d0 ! Total simulation time in seconds
x0 = 1.0d0 ! Initial position in meters
v0 = 0.0d0 ! Initial velocity in m/s
! Initial conditions
x = x0 ! Current position
v = v0 ! Current velocity
a = -k/m * x ! Initial acceleration
time = 0.0d0 ! Start time
! Print header
print *, "Time", "Position", "Velocity"
! Time loop using Verlet integration
do while (time <= total_time)
! Print current state
print '(F6.2, 2X, F8.4, 2X, F8.4)', time, x, v
! Compute new position
x_new = x + v * dt + 0.5d0 * a * dt * dt
! Compute new acceleration based on new position
a_new = -k/m * x_new
! Update velocity
v = v + 0.5d0 * (a + a_new) * dt
! Update position and acceleration for the next step
x = x_new
a = a_new
! Increment time
time = time + dt
end do
end program verlet_integration
Write a program that generates N random integers between 1 and 100, where N is specified by the user. The program should print each number as it is generated, and at the end, it should calculate and display the sum and average of the numbers. Use srand() to ensure different random numbers are generated each time the program runs.
To gerenate random numbers unfiromly distributed in a given range, we can use srand() and rand():
srand(time(0)); // Seed the random number generator
int r = rand() % 100; // Generates a random number between 0 and 99
where:
srand(unsigned int seed): Initializes the random number generator. Using srand(time(0)) seeds it with the current time, ensuring different random numbers on each run.
rand(): Generates a pseudo-random integer between 0 and RAND_MAX (where RAND_MAX is the macro that is defined in the <stdlib.h> library). To limit the range, use modulo: rand() % max + min.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main() {
int N, i, num, sum = 0;
float average;
// Ask the user for the number of random numbers (N)
printf("Enter the number of random numbers to generate: ");
scanf("%d", &N);
// Seed the random number generator with the current time
srand(time(0));
printf("\nRandom numbers between 1 and 100:\n");
// Generate N random numbers between 1 and 100
for (i = 0; i < N; i++) {
num = (rand() % 100) + 1;
printf("%d ", num);
sum += num; // Add the number to the sum
}
// Calculate the average
average = (float)sum / N;
printf("\n\nSum of the numbers: %d\n", sum);
printf("Average of the numbers: %.2f\n", average);
return 0;
}
FORTRAN code
program random_numbers
implicit none
integer :: N, i, num, sum
real :: average
real :: r
! Ask the user for the number of random numbers (N)
print *, "Enter the number of random numbers to generate: "
read *, N
! Seed the random number generator with the current time
call random_seed()
sum = 0
print *, "Random numbers between 1 and 100:"
! Generate N random numbers between 1 and 100
do i = 1, N
call random_number(r) ! Generates a random number between 0 and 1
num = int(r * 100) + 1 ! Scale the number to the range [1, 100]
print *, num
sum = sum + num ! Add the number to the sum
end do
! Calculate the average
average = real(sum) / real(N)
print *, " "
print *, "Sum of the numbers: ", sum
print *, "Average of the numbers: ", average
end program random_numbers
Consider N particles with mass m=1×10−27 kg in a three-dimensional space. The Hamiltonian of the single particle is
H(p)=2mp2
where p is the momentum and p=∣p∣ and it is distributed according to the Maxwell-Boltzmann distribution:
f(v)=[2πkBTm]3/24πv2exp(−2mkBTp2)
where kB=1.380649×10−23 J/K is the Boltzmann constant.
The task is to compute the energy of the system and to compare the result with the equipartition theorem: U=23NkBT.
Tasks:
Generate the momenta: For a system of N particles at a temperature T=300K, generate the velocity components vx, vy, vz from a normal distribution with a standard deviation given by:
σ=mkBT
Calculate the total energy: for each particle, calculate the total energy based on the generated velocities:
E=2mp2
where p is the magnitude of the momentum p=mvx2+vy2+vz2.
Compare to the equipartition theorem: compare the calculated energy to the value predicted by the equipartition theorem, which states that the total energy of the system should be U=23NkBT.
Vary the number of particles: Perform the simulation for different values of N (e.g., 10, 100, 1000, 10000, etc.) and observe how the results converge to the theoretical value as N increases.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
// Constants
const double k_B = 1.38e-23; // Boltzmann constant (J/K)
const double T = 300.0; // Temperature in Kelvin
const double mass = 1.e-27; // Mass of the particle (arbitrary units)
// Function to generate normally distributed random numbers
double random_normal(double mean, double stddev) {
double u1 = (double)rand() / RAND_MAX;
double u2 = (double)rand() / RAND_MAX;
return mean + stddev * sqrt(-2.0 * log(u1)) * cos(2.0 * M_PI * u2);
}
int main() {
int N; // Number of particles
double vx, vy, vz, p, energy, sigma;
int i, j;
// Seed for random number generation
srand(time(NULL));
// Standard deviation for the normal distribution
sigma = sqrt(k_B * T / mass);
N = 1;
for (i = 1; i <= 5; i++) {
N *= 10; // Increase the number of particles each loop
energy = 0.0;
// Generate momentum values according to Maxwell-Boltzmann distribution
for (j = 0; j < N; j++) {
vx = random_normal(0.0, sigma);
vy = random_normal(0.0, sigma);
vz = random_normal(0.0, sigma);
p = mass * sqrt(vx * vx + vy * vy + vz * vz);
energy += (p * p) / (2.0 * mass);
}
// Print the results
printf("----------------------------------------------------\n");
printf("Number of particles: %d\n", N);
printf("Energy of the system: %.5e J\n", energy);
printf("Equipartition theorem (U=3/2NKT): %.5e J\n", (3.0 / 2.0) * N * k_B * T);
printf("Relative difference: %.5f%%\n", (fabs(energy - (3.0 / 2.0) * N * k_B * T) * 2.0 / (energy + (3.0 / 2.0) * N * k_B * T)) * 100.0);
}
return 0;
}
FORTRAN code
program average_energy_mb
implicit none
integer :: N ! Number of particles
real(8), parameter :: mass = 1.e-27 ! Mass of the particle (arbitrary units)
real(8), parameter :: k = 1.38e-23 ! Boltzmann constant (arbitrary units)
real(8), parameter :: T = 300.0 ! Temperature in Kelvin
real(8) :: p, energy, sigma, vx,vy,vz
integer :: i,j
! Seed for random number generation
call random_seed()
! Standard deviation for the normal distribution
sigma = sqrt(k * T / mass)
N = 1
do i = 1, 5 !Change the number of particles
N = N * 10
! Generate momentum values according to Maxwell-Boltzmann distribution
energy = 0.
do j = 1, N
vx = random_normal(0.0d0, sigma)
vy = random_normal(0.0d0, sigma)
vz = random_normal(0.0d0, sigma)
p = mass*sqrt(vx*vx+vy*vy+vz*vz)
energy = energy + (p**2/ (2.0d0 * mass))
end do
! Print the results
print *, "----------------------------------------------------"
print '(A35,I12)', "Number of particles:", N
print '(A35,E12.5,A)', "Energy of the system:", energy, " J"
print '(A35,E12.5,A)', "Equipartition theorem (U=3/2NKT):", 3./2.*N*K*T, " J"
print '(A35,E12.5,A)', "Relative difference:",(abs(energy-(3./2.*N*K*T))*2./(energy+(3./2.*N*K*T)))*100., "%"
enddo
contains
! Subroutine to generate normally distributed random numbers
function random_normal(mean, stddev)
double precision :: random_normal, mean, stddev
double precision :: u1, u2
u1 = rand()
u2 = rand()
random_normal = mean + stddev * sqrt(-2.0d0 * log(u1)) * cos(2.0d0 * 3.14159265358979d0 * u2)
end function
end program
Here some exercises with the solutions in C and FORTRAN. Feel free to start from them and expand them as you wish. For example, for some of them can be usefull to print the output!
#include <stdio.h>
#include <stdlib.h>
void sum_arrays(int *arr1, int *arr2, int *arr3, int N){
for(int i = 0;i<N; i++){
arr3[i] = arr1[i] + arr2[i];
}
return;
}
int main(){
int arr1[10], arr2[10], arr3[10];
int N = sizeof(arr1)/sizeof(arr1[0]);
for(int i=0;i<N;i++){
arr1[i] = i;
arr2[i] = i*10;
}
sum_arrays(arr1, arr2, arr3, N);
for(int i=0;i<N;i++){
printf("%d + %d = %d\n", arr1[i],arr2[i],arr3[i]);
}
return 0;
}
FORTRAN code
program sum_arrays_example
implicit none
integer, parameter :: N = 10
integer :: arr1(N), arr2(N), arr3(N)
integer :: i
! Initialize the arrays
do i = 1, N
arr1(i) = i - 1 ! Fill arr1 with 0, 1, ..., 9
arr2(i) = (i - 1) * 10 ! Fill arr2 with 0, 10, ..., 90
end do
! Call the subroutine to sum the arrays
call sum_arrays(arr1, arr2, arr3, N)
! Print the results
do i = 1, N
print *, arr1(i), ' + ', arr2(i), ' = ', arr3(i)
end do
contains
! Subroutine to sum the arrays
subroutine sum_arrays(arr1, arr2, arr3, N)
integer, intent(in) :: arr1(N), arr2(N)
integer, intent(out) :: arr3(N)
integer :: i
do i = 1, N
arr3(i) = arr1(i) + arr2(i)
end do
end subroutine sum_arrays
end program sum_arrays_example
Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This process is repeated until no more swaps are needed, meaning the list is fully sorted. After each full pass through the list, the largest unsorted element “bubbles” up to its correct position, reducing the portion of the list that needs sorting. The algorithm continues until the entire list is sorted.
Write a code that generates N random numbers in the range 0, 999 (where N is passed from command line) and sorts them in increasing order using the bubble sorting algorithm.
To help the implementation of the bubble sorting, we also provide a pseudo code.
Pseudo code
function bubble_sort(arr, N):
pass = 0
flag_iterate = true
while flag_iterate:
flag_iterate = false
pass = pass + 1
for i = 0 to N - 2:
if arr[i+1] < arr[i]:
swap arr[i] and arr[i+1]
flag_iterate = true
return arr
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>
// Bubble sort implementation
void bubble_sorting(int N, int *arr){
int hold;
bool flag_iterate = true;
int pass = 0;
while (flag_iterate) {
pass += 1;
flag_iterate = false;
// Bubble sorting pass
for (int i = 0; i < N - 1; i++) {
if (arr[i + 1] < arr[i]) {
flag_iterate = true;
hold = arr[i];
arr[i] = arr[i + 1];
arr[i + 1] = hold;
}
}
// Output the array after each pass
printf("OUTPUT: ");
for (int i = 0; i < N; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
printf("Total number of steps: %d\n", pass);
}
int main() {
int N;
printf("Enter the number of elements: ");
scanf("%d", &N);
// Check if N is valid
if (N <= 0) {
printf("Error: Number of elements should be greater than 0.\n");
return 1;
}
// Allocate memory for the array
int *arr = malloc(N * sizeof(int));
if (arr == NULL) {
printf("Error: Memory allocation failed.\n");
return 1;
}
// Seed the random number generator
srand(time(0));
// Generate random numbers and fill the array
printf("INPUT: ");
for (int i = 0; i < N; i++) {
arr[i] = rand() % 1000; // Random numbers between 0 and 999
printf("%d ", arr[i]);
}
printf("\n");
// Perform bubble sort
bubble_sorting(N, arr);
// Free the allocated memory
free(arr);
return 0;
}
FORTRAN code
program bubble_sort
implicit none
integer :: N, i, temp, pass
logical :: flag_iterate
real :: tmp
integer, allocatable :: arr(:)
! Ask for number of elements
print *, "Enter the number of elements: "
read(*, *) N
! Allocate the array
allocate(arr(N))
! Generate random values for the array
call random_seed() ! Set random seed
write(*, '(A8)', advance='no') "INPUT: "
do i = 1, N
call random_number(tmp) ! Random number between 0 and 1
arr(i) = int(tmp * 1000) ! Scale to get values between 0 and 1000
write(*, '(I5)', advance='no') arr(i)
end do
print *, ""
! Bubble Sort algorithm
pass = 0
flag_iterate = .true.
do while (flag_iterate)
pass = pass + 1
flag_iterate = .false.
do i = 1, N - 1
if (arr(i+1) < arr(i)) then
temp = arr(i)
arr(i) = arr(i+1)
arr(i+1) = temp
flag_iterate = .true.
end if
end do
! Print intermediate output after each pass
write(*, '(A8)', advance='no') "OUTPUT: "
do i = 1, N
write(*, '(I5)', advance='no') arr(i)
end do
print *, ""
end do
! Output the total number of passes
print *, "Total number of steps: ", pass
! Deallocate the array
deallocate(arr)
end program bubble_sort
In this variation, the numbers to be ordered are read from input and the result is directed to output. The name of the input and output files are provided by the user: ./exe input_file output_file.
Note that your code must be able to count how many elements are given in the input file.
C code
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
// Bubble sort implementation
void bubble_sorting(int N, int *arr, FILE *output_file) {
int hold;
bool flag_iterate = true;
int pass = 0;
while (flag_iterate) {
pass += 1;
flag_iterate = false;
// Bubble sorting pass
for (int i = 0; i < N - 1; i++) {
if (arr[i + 1] < arr[i]) {
flag_iterate = true;
hold = arr[i];
arr[i] = arr[i + 1];
arr[i + 1] = hold;
}
}
// Output the array after each pass
fprintf(output_file, "OUTPUT after pass %d: ", pass);
for (int i = 0; i < N; i++) {
fprintf(output_file, "%d ", arr[i]);
}
fprintf(output_file, "\n");
}
fprintf(output_file, "Total number of steps: %d\n", pass);
}
int count_numbers_in_file(FILE *input_file) {
int count = 0;
int temp;
// Count the number of integers in the file
while (fscanf(input_file, "%d", &temp) != EOF) {
count++;
}
// Reset the file pointer to the beginning for the next read
rewind(input_file);
return count;
}
int main(int argc, char *argv[]) {
int N;
char input_filename[100], output_filename[100];
// Check if less (or more) then 3 arguments are given in command line
if (argc != 3) {
fprintf(stderr, "Usage: %s <input_file> <output_file>\n", argv[0]);
return 1;
}
// Open input file for reading
FILE *input_file = fopen(argv[1], "r");
if (input_file == NULL) { // Check open
printf("Error opening input file");
return 1;
}
// Open output file for writing
FILE *output_file = fopen(argv[2], "w");
if (output_file == NULL) { // Check open
printf("Error opening output file");
fclose(input_file); // Close input_file before exit
return 1;
}
// Count the number of integers in the input file
N = count_numbers_in_file(input_file);
// Check if N is valid
if (N <= 0) {
fprintf(output_file, "Error: No valid elements found in the file.\n");
fclose(input_file);
fclose(output_file);
return 1;
}
// Allocate memory for the array
int *arr = malloc(N * sizeof(int));
if (arr == NULL) {
fprintf(output_file, "Error: Memory allocation failed.\n");
fclose(input_file);
fclose(output_file);
return 1;
}
// Read the elements from the input file
for (int i = 0; i < N; i++) {
fscanf(input_file, "%d", &arr[i]);
}
// Perform bubble sort and write to output file
bubble_sorting(N, arr, output_file);
// Free the allocated memory
free(arr);
// Close the files
fclose(input_file);
fclose(output_file);
printf("Sorting complete. Results written to %s\n", argv[2]);
return 0;
}
In this exercise, you are required to write a program that dynamically allocates memory for two square matrices, A and B, of size N×N , and computes their sum in a third matrix C.
Here a to do list that can help:
Ask the user to input the size of the matrix N (which must be greater than 0).
Dynamically allocate memory for three matrices A, B, and C, each of size N×N.
Populate the matrices A and B with some values (e.g., fill matrix A with values i×N+j and matrix B with values i×N+j+100. where i and j are the row and column indices, respectively).
Create a function sum_matrices that takes the two input matrices A and B and computes their sum into matrix C.
Print out the resulting matrix C after computing the sum.
Ensure that you free the allocated memory before exiting the program to avoid memory leaks.
C code
#include <stdio.h>
#include <stdlib.h>
// Function to sum two NxN matrices A and B, storing the result in C
void sum_matrices(float **A, float **B, float **C, int N){
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
*(*(C+i)+j) = *(*(A+i)+j) + *(*(B+i)+j);
}
}
}
int main(int argc, char* argv[]){
int N;
float **A, **B, **C;
// Prompt user for input
printf("A and B are NxN matrices. Insert the value of N:\n");
scanf("%d", &N);
if(N <= 0){
printf("N must be greater than 0.\n");
return 1;
}
// Allocate memory for matrices A, B, and C
A = (float **)malloc(N * sizeof(float *));
B = (float **)malloc(N * sizeof(float *));
C = (float **)malloc(N * sizeof(float *));
if (A == NULL || B == NULL || C == NULL) {
printf("Memory allocation failed.\n");
return 1;
}
for(int i=0; i<N; i++){
A[i] = (float *)malloc(N * sizeof(float));
B[i] = (float *)malloc(N * sizeof(float));
C[i] = (float *)malloc(N * sizeof(float));
if (A[i] == NULL || B[i] == NULL || C[i] == NULL) {
printf("Memory allocation failed.\n");
return 1;
}
}
// Populate matrices A and B
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
A[i][j] = i * N + j;
B[i][j] = i * N + j + 100;
}
}
// Compute sum: C = A + B
sum_matrices(A, B, C, N);
// Print matrix C
printf("Matrix C = A + B:\n");
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
printf("C[%d][%d] = %f\n", i, j, C[i][j]);
}
}
// Free allocated memory
for(int i=0; i<N; i++){
free(A[i]);
free(B[i]);
free(C[i]);
}
free(A);
free(B);
free(C);
return 0;
}
C code (variation n. 1)
In this variation, two functions are coded to allocate and deallocate the pointers. Moreover, the function `clock()` is invoked to measure the time the code takes to sum the matrices.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// Function prototypes [to declare the functions before their definitions, so they can be used in the main()]
void sum_matrices(float **, float **, float **, int);
float **my_malloc(int);
void my_free(float **, int);
int main(int argc, char* argv[]){
int N;
// Prompt user for input
printf("A and B are NxN matrices. Insert the value of N:\n");
scanf("%d", &N);
if(N <= 0){
printf("N must be greater than 0.\n");
return 1;
}
// Allocate memory for matrices A, B, and C
float **A = my_malloc(N);
float **B = my_malloc(N);
float **C = my_malloc(N);
// Populate matrices A and B
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
A[i][j] = i * N + j;
B[i][j] = i * N + j + 100;
}
}
// Compute sum: C = A + B
float ts = clock();
sum_matrices(A, B, C, N);
float te = clock();
printf("Time = %f ms\n", (te-ts)/CLOCKS_PER_SEC*1000);
// Print matrix C
printf("Matrix C = A + B:\n");
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
printf("C[%d][%d] = %f\n", i, j, C[i][j]);
}
}
// Free memory
my_free(A,N);
my_free(B,N);
my_free(C,N);
return 0;
}
// Function to sum two NxN matrices A and B, storing the result in C
void sum_matrices(float **A, float **B, float **C, int N){
for(int i=0; i<N; i++){
for(int j=0; j<N; j++){
*(*(C+i)+j) = *(*(A+i)+j) + *(*(B+i)+j);
}
}
}
// Implementation of the allocation function
float **my_malloc(int N){
float **array = (float **)malloc(N * sizeof(float *));
if (array == NULL) {
printf("Memory allocation failed.\n");
}
for(int i=0; i<N; i++){
array[i] = (float *)malloc(N * sizeof(float));
if (array[i] == NULL) {
printf("Memory allocation failed.\n");
}
}
return array;
}
// Implementation of the free function
void my_free(float **array, int N) {
for (int i = 0; i < N; i++) {
free(array[i]); // Free each row
}
free(array); // Free the array of row pointers
}
FORTRAN code
program main
implicit none
integer :: N, i, j
real, dimension(:,:), allocatable :: A, B, C
! Prompt user for input
print*, "A and B are NxN matrices. Insert the value of N:"
read(*,*) N
if(N<=0)then
print*, "N must be grater than 0."
stop
endif
! Allocate memory for matrices A, B, and C
allocate(A(N,N),B(N,N),C(N,N))
! Populate matrices A, B, and C
do j = 1, N
do i = 1, N
A(i,j) = (i-1)*N+(j-1)
B(i,j) = (i-1)*N+(j-1)+100
enddo
enddo
! Compute sum C = A + B
call sum_matrices(A,B,C)
! Print matrix C
do j = 1, N
do i = 1, N
print*,"C[",i,"][",j,"] = ", C(i,j)
enddo
enddo
deallocate(A,B,C)
contains
! Function to sum two NxN matrices A and B, storing the result in C
subroutine sum_matrices(A, B, C)
implicit none
real, dimension(:,:) :: A, B, C
C = A + B
end subroutine
end program
Implement a program to compute the numerical derivative of a function at a given point using function pointers.
This exercise will help you become familiar with passing functions as arguments and comparing numerical derivatives with their theoretical counterparts.
Problem Statement:
You are tasked with writing a program that:
Prompts the user to select a mathematical function from the following options:
sqrt(x)
sin(x)
cos(x)
Computes the numerical derivative of the chosen function at a given point (x0 = 1.0), using the formula:
f′(x)≈ϵf(x+ϵ)−f(x)
where ϵ is a small number that starts from 1 and decreases by a factor of 10 in each iteration.
Compares the computed numerical derivative to the theoretical derivative of the function.
Outputs both the numerical and theoretical derivatives for comparison and displays the error percentage between them.
Stops the calculation when epsilon becomes too small (around 1 \times 10^{-16}).
C code
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double func1(double x) {
return sqrt(x);
}
double deriv_func1(double x) {
return 0.5*pow(x,-0.5);
}
double func2(double x) {
return sin(x);
}
double func3(double x) {
return cos(x);
}
// Generic numerical derivative function
double derivative(double (*f)(double), double x, double epsilon){
return (f(x+epsilon) - f(x)) / epsilon;
}
int main(int argc, char *argv[]){
double epsilon = 1;
double x0 = 1.;
double (*func)(double);
double deriv_th, deriv_num;
if (argc < 2) {
printf("Usage: %s <function>:\n", argv[0]);
printf("<1>: sqrt(x)\n");
printf("<2>: sin(x)\n");
printf("<3>: cos(x)\n");
return 1;
}
// Read the number of trapezoids from the command line
int function = atoi(argv[1]);
// Select the appropriate function and theoretical derivative
switch (function) {
case 1:
func = func1;
deriv_th = deriv_func1(x0); // Exact derivative of sqrt(x)
break;
case 2:
func = func2;
deriv_th = cos(x0); // Exact derivative of sin(x)
break;
case 3:
func = func3;
deriv_th = -sin(x0); // Exact derivative of cos(x)
break;
default:
printf("Invalid function choice! Please choose 1, 2, or 3.\n");
return 1; // Exit if invalid function
}
printf("---------------------------------------\n");
printf("The derivative in %f is %f\n", x0, deriv_th);
printf("---------------------------------------\n");
printf("#epsilon #num #err\n");
while (epsilon > 1.e-18) {
deriv_num = derivative(func, x0, epsilon);
printf("%e %f %f %% \n", epsilon, deriv_num, fabs((deriv_num-deriv_th)/deriv_th)*100.);
epsilon /= 10.;
}
return 0;
}
FORTRAN code
program derivative_calculator
implicit none
real(8) :: epsilon, x0, deriv_th, deriv_num
integer :: function_choice
! Input handling
print *, 'Choose a function to differentiate:'
print *, '1: sqrt(x)'
print *, '2: sin(x)'
print *, '3: cos(x)'
read(*, *) function_choice
! Set the value of x0 and epsilon
x0 = 1.0d0
epsilon = 1.0d0
! Call the appropriate function based on the user's choice
select case (function_choice)
case (1)
deriv_th = deriv_func1(x0)
print *, '---------------------------------------'
print *, 'The derivative of sqrt(x) at ', x0, ' is ', deriv_th
print *, '---------------------------------------'
call compute_derivative(func1, x0, epsilon, deriv_th)
case (2)
deriv_th = cos(x0)
print *, '---------------------------------------'
print *, 'The derivative of sin(x) at ', x0, ' is ', deriv_th
print *, '---------------------------------------'
call compute_derivative(func2, x0, epsilon, deriv_th)
case (3)
deriv_th = -sin(x0)
print *, '---------------------------------------'
print *, 'The derivative of cos(x) at ', x0, ' is ', deriv_th
print *, '---------------------------------------'
call compute_derivative(func3, x0, epsilon, deriv_th)
case default
print *, 'Invalid function choice! Please choose 1, 2, or 3.'
stop
end select
contains
! Function for computing numerical derivative
subroutine compute_derivative(f, x, epsilon, deriv_th)
implicit none
real(8), external :: f
real(8), intent(in) :: x, epsilon, deriv_th
real(8) :: deriv_num, epsilon_tmp
epsilon_tmp = epsilon
print *, '#epsilon #num #err(%)'
do while (epsilon_tmp > 1.0d-18)
deriv_num = (f(x + epsilon_tmp) - f(x)) / epsilon_tmp
print *, epsilon_tmp, deriv_num, abs((deriv_num - deriv_th) / deriv_th) * 100.0d0
epsilon_tmp = epsilon_tmp / 10.0d0
end do
end subroutine
! Define the mathematical functions
real(8) function func1(x)
real(8), intent(in) :: x
func1 = sqrt(x)
end function
real(8) function deriv_func1(x)
real(8), intent(in) :: x
deriv_func1 = 0.5d0 * (x**(-0.5d0))
end function
real(8) function func2(x)
real(8), intent(in) :: x
func2 = sin(x)
end function
real(8) function func3(x)
real(8), intent(in) :: x
func3 = cos(x)
end function
end program
In C, a structure (or struct) is a user-defined data type that allows you to group different types of variables together. This is particularly useful when you want to represent more complex data types that involve multiple variables of different types. Structures help organize data logically and make the code more readable and maintainable.
Structures are ideal for bundling different data types together under one name, especially when representing real-world entities such as points, students, or complex numbers. For instance, instead of using separate variables for each property of a student (e.g., name, age, grade), you can group them into a single structure.
Consider the following task: we want to read from an external file some x and y coordinates. The file is organised as follow:
We have some particles made of discrete points. Each particle has Np points with x,y,z coordinates. Such coordinates are included in a txt file like this:
is a particle made of 4 points with (x,y,z)-coordinates. The name of the particle is Particle_A.
We want to write a program that can be executed like this
./test.exe input_file output_file
that is, we want to specify from command line the name of the input and output files.
The code will read the coordinates of the input file and will compute the centroid of the particle.
Here the pseudo code:
FUNCTION calculate_centroid(points_x, points_y, points_z, n)
centroid_x=0; centroid_y=0; centroid_z=0;
FOR i = 0, n-1
centroid_x += points_x[i];
centroid_y += points_y[i];
centroid_z += points_z[i];
centroid_x /= n; centroid_y /= n; centroid_z /= n;
RETURN centroid_x, centroid_y, centroid_z
MAIN PROGRAM
READ(input_file,points_x,points_y,points_z) //Read the input_file and store the n coordinates in points_x, points_y, points_z
centroid_x,centroid_y,centroid_z = calculate_centroid(points_x, points_y, points_z, n)
WRITE(output_file,"Set name", set_name)
WRITE(output_file,"Number of points", n)
WRITE(output_file,"The centroid of the points is at", centroid_x,centroid_y,centroid_z)
WRITE(output_file,"Coordinates of the points:")
FOR i = 0, n-1
WRITE("Point:", points_id[i], points_x[i], points_y[i], points_z[i])
END PROGRAM
Here is the C code:
#include <stdio.h>
#include <stdlib.h>
// Define a structure to store 3D points with additional fields (id and name)
struct Point3D {
int id; // Point identifier
double x; // x-coordinate
double y; // y-coordinate
double z; // z-coordinate
};
// Function to calculate the centroid of an array of 3D points
struct Point3D calculate_centroid(struct Point3D *points, int n) {
struct Point3D centroid;
centroid.x = 0.0;
centroid.y = 0.0;
centroid.z = 0.0;
for (int i = 0; i < n; i++) {
centroid.x += points[i].x;
centroid.y += points[i].y;
centroid.z += points[i].z;
}
centroid.x /= n;
centroid.y /= n;
centroid.z /= n;
return centroid;
}
int main(int argc, char *argv[]) {
// Check if less (or more) then 3 arguments are given in command line
if (argc != 3) {
fprintf(stderr, "Usage: %s <input_file> <output_file>\n", argv[0]);
return 1;
}
// Open input file for reading
FILE *input_file = fopen(argv[1], "r");
if (input_file == NULL) { // Check open
printf("Error opening input file");
return 1;
}
// Open output file for writing
FILE *output_file = fopen(argv[2], "w");
if (output_file == NULL) { // Check open
printf("Error opening output file");
fclose(input_file); // Close input_file before exit
return 1;
}
// Read the number of points and name of the set from the input file
int n;
char set_name[100]; // Name of the dataset
fscanf(input_file, "%d %s", &n, set_name);
// Allocate memory for an array of 3D points
struct Point3D *points = (struct Point3D *)malloc(n * sizeof(struct Point3D));
// Read points data (ID, x, y, z) from the input file
for (int i = 0; i < n; i++) {
fscanf(input_file, "%d %lf %lf %lf", &points[i].id, &points[i].x, &points[i].y, &points[i].z);
}
// Calculate the centroid of the points
struct Point3D centroid = calculate_centroid(points, n);
// Write the results to the output file
fprintf(output_file, "Set Name: %s\n", set_name);
fprintf(output_file, "Number of points: %d\n", n);
fprintf(output_file, "The centroid of the points is at (%.2f, %.2f, %.2f)\n", centroid.x, centroid.y, centroid.z);
fprintf(output_file, "Data of the points:\n");
for (int i = 0; i < n; i++) {
fprintf(output_file, "Point %d: (%.2f, %.2f, %.2f)\n", points[i].id, points[i].x, points[i].y, points[i].z);
}
// Clean up
free(points);
fclose(input_file);
fclose(output_file);
printf("Centroid calculation complete. Results written to %s\n", argv[2]);
return 0;
}
involves opening a file for reading. FILE is a special type in C (a so-called "structure", more on this later) that is used to handle files. The * indicates that input_file is a pointer to a FILE structure. This pointer will be used to interact with the file (e.g., reading or writing).
The fopen() function opens a file. It takes two arguments:
The first argument (argv[1]) is the name of the file to open. argv[1] is coming from the command-line arguments, meaning the program expects the user to provide the filename as the second argument when they run the program.
The second argument ("r") tells fopen to open the file in read mode ("r" stands for read). This means the program can read the content of the file, but cannot write or modify it.
[!NOTE]
The second argument of fopen can be:
r: Open for reading. The file must exist.
w: Open for writing. Creates an empty file or truncates an existing file.
a: Open for appending. Writes data at the end of the file. Creates the file if it does not exist.
r+: Open for reading and writing. The file must exist.
w+: Open for reading and writing. Creates an empty file or truncates an existing file.
a+: Open for reading and appending. The file is created if it does not exist.
If the file exists and opens successfully, fopen() returns a pointer to the FILE, which is assigned to input_file. If the file can’t be opened (for example, if it doesn’t exist), fopen() returns NULL, and you’d typically check for that to handle errors:
if (input_file == NULL) { // Check open
printf("Error opening input file");
return 1;
}
[!WARNING]
fopen() and open() are two different functions, and you should know when to use which:
Binary mode: With fopen() you can choose whether you want to read in text mode (by default) or in binary mode using the "b" flag. In contrast, open() treats all files as binary by default.
High-level vs. Low-level: fopen() is higher-level and provides more functionality than open() like formatted I/O (fprintf, fscanf). open(), instead, is a system call (i.e., it's part
of the kernel) and provides low-level, direct control over file access, but you will have to write your own routines to read/write specific formats.
Buffering: fopen() buffers I/O operations, meaning that it stores data in memory before sending it to or reading it from the actual file. This can improve performance for reading and writing files, particularly for larger files.
Example:
Similarly, this line
FILE *output_file = fopen(argv[2], "w");
is used to open a file for writing. The first argument of fopen() is again the name of the file to open (argv[1] is the second argument passed by the command-line). The second argument is now the "w", flag, which stands for write.
is used to close a file that was previously opened with fopen().
fclose() is a standard library function in C that closes a file stream. It takes one argument, which is a pointer to the FILE object that represents the open file (in our example, input_file is the pointer to the file that was opened for reading).
You should call fclose() when you are done working with the file. This includes after you have finished reading from or writing to the file and before the program exits or before opening another file.
[!TIP]
Why Close a File?:
Resource Management: Each open file uses system resources. Closing a file frees up these resources and allows the operating system to manage them efficiently.
Data Integrity: For files that are being written to, closing the file ensures that all data is properly written and saved. Even though input_file is opened in read mode in this case, it is still good practice to close files when done.
Preventing Memory Leaks: Failing to close files can lead to memory leaks and resource exhaustion, which can affect the stability of the program or the system.
[!WARNING]
Although it might sound counterintuitive, closing a file with fclose() does not necessarily flush the content to file. The operating system decides when it's time to do that.
What is guaranteed is that by opening the file for writing/appending after calling fclose(), the order of write operations will be preserved. This is not guaranteed if you are writing
on a file which has been opened and is being written to by two functions. The behaviour in this case is unpredictable.
fscanf() is a standard library function used to read data from a file. It works similarly to scanf(), but instead of reading from the standard input (keyboard), it reads from a file stream.
It takes these arguments:
File pointer (input_file, in our example): This is the pointer to the FILE object that represents the open file from which data will be read. It was previously obtained using fopen().
Format string ("%d %s", in our example): This is the format string that specifies how the input data should be interpreted. "%d" tells fscanf to read an integer from the file, while "%s" tells fscanf to read a string (a sequence of characters) from the file. It will read until it encounters a whitespace character.
Addresses of variables where the read data will be stored (&n, set_name): &n is the address of the integer variable n. fscanf will read an integer value from the file and store it at this address. set_name is a character array (or string) where the string read from the file will be stored. The fscanf function will copy the characters into this array until it encounters a whitespace.
[!WARNING]
In C, when using functions like fscanf() that require addresses of variables to store input data, the use of the address-of operator (&) is necessary for some types and not for others. &var is used to get the memory address of var.
In our example, n is an integer variable: when using fscanf() (or similar functions), you need to provide the address of n so that fscanf() can store the read value directly into n.
In our example, set_name is a character array (or string). In C, the name of an array (like set_name) automatically represents the address of the first element of the array.
When passing set_name to fscanf(), you are providing a pointer to the beginning of the array. fscanf() will use this pointer to write the string data directly into the array.
In the previous lecture, we introduced some types in C (e.g., int, float, double, char, etc.). However, there is the possibility to extend such types, by using the so-called structures.
In C, a structure (or struct) is a user-defined data type that allows you to group different types of variables together. This is particularly useful when you want to represent more complex data types that involve multiple variables of different types. Structures help organize data logically and make the code more readable and maintainable.
Structures are ideal for bundling different data types together under one name, especially when representing real-world entities such as points, students, or complex numbers. For instance, instead of using separate variables for each property of a student (e.g., name, age, grade), you can group them into a single structure.
Look how the following code can be simplified by using structures:
We immediately see that the information firstName, lastName, age, etc., are common to both student and professor. We could therefore define a struct called, let's say, info
struct info{
char firstName[20];
char lastName[20];
int age;
char gender;
double height;
}; // note the semicolon here!
and use it like this:
#include <stdio.h>
int main(void){
struct info student = {"Ludwik", "Boltzmann", 26, 'm', 1.72}; // yes, it's a typo ;)
struct info professor = {"Josef", "Stefan", 35, 'm', 1.68};
// Structure elements can be assigned using the . operator
student.age = 28;
// Since `firstName` and `lastName` are arrays, they cannot be reassigned, but you need to use `sprintf()`
sprintf(student.firstName, "%s", "Ludwig"); // let's fix the typo
return 0;
}
[!WARNING]
arrays of char and pointers to char are not the same thing. In fact, we could have done the followin to assign element-by-element both strings and numbers:
struct info{
char *firstName, *lastName;
int age;
char gender;
double height;
};
int main(void) {
struct student;
student.firstName = "Ludwik"; // our usual typo, now stored in read-only memory
student.lastName = "Boltzmann";
student.firstName = "Ludwig"; // this will have another address
}
This is valid for the assignment only, and it's not possible to change the content of firstName and secondName without
occupying another bit of read-only memory. See the section on strings
In our example code, we have the following strucutre:
This line performs dynamic memory allocation to create an array of Point3D structures.
Let’s break it down:
struct Point3D *points:
struct Point3D is a structure type that represents a 3D point, with fields id, x, y, and z.
*points is a pointer to a Point3D structure. This means points will hold the address of a dynamically allocated block of memory where Point3D structures will be stored.
malloc(n * sizeof(struct Point3D)):
malloc() is a standard library function used to allocate a block of memory. It returns a pointer to the beginning of the allocated memory. n * sizeof(struct Point3D) calculates the total amount of memory to allocate (n is the number of Point3D structures, sizeof(struct Point3D) gives the size (in bytes) of a single Point3D structure).
(struct Point3D *):
Since the result of malloc() is a void * (which is a generic pointer type), you need to cast it to the appropriate type (struct Point3D *). (struct Point3D *) is a type cast that converts the void * returned by malloc() into a struct Point3D *.
In C, both the dot . and arrow -> operators are used to access members of a structure. However, they are used in different contexts depending on whether you’re working with a structure variable or a pointer to a structure.
The dot operator is used when you are working with structure variables directly. If you have a structure variable (i.e., the actual instance of the structure), you can access its members using the dot operator: struct_name.member_name.
In this example, point is a structure variable, so you use point.x, point.y, and point.z to access the members of the Point3D structure.
The arrow operator is used when you are working with a pointer to a structure. Since a pointer points to the memory address of the structure, you first need to dereference it to access the structure’s members. The arrow operator combines both dereferencing the pointer and accessing the member in one step: pointer_to_struct->member_name.
struct Point3D {
double x, y, z;
};
struct Point3D point;
struct Point3D *point_ptr = &point; // point_ptr is a pointer to the structure
point_ptr->x = 1.0;
point_ptr->y = 2.0;
point_ptr->z = 3.0;
printf("x: %f, y: %f, z: %f\n", point_ptr->x, point_ptr->y, point_ptr->z);
Here, point_ptr is a pointer to the structure point. You use point_ptr->x, point_ptr->y, and point_ptr->z to access the members of the structure via the pointer.
To summarise:
Use the dot. operator for direct access to structure elements in an array.
Use the arrow-> operator when accessing members via a pointer.
struct Point3D point;
struct Point3D *point_ptr = &point;
point.x = 1.0; // Access structure members directly
point_ptr->x = 1.0; // Access structure members through a pointer
[!WARNING]
In the function Point3D calculate_centroid(struct Point3D *points, int n), points is a pointer to an array of Point3D structures, and points[i] gives the actual structure at index i (it is not a pointer). Since points[i] is a structure, you access its members using the dot operator ., like points[i].x.
So, when you have an array of structures (like points), you access elements of the structure like points[i].x. When you have a pointer to a structure, then you use ptr->x to access the x member of the structure.
Consider the following task: we want to read from an external file some x and y coordinates. The file is organised as follow:
We have some particles made of discrete points. Each particle has Np points with x,y,z coordinates. Such coordinates are included in a txt file like this:
is a particle made of 4 points with (x,y,z)-coordinates. The name of the particle is Particle_A.
We want to write a program that can be executed like this
./test.exe input_file output_file
that is, we want to specify from command line the name of the input and output files.
The code will read the coordinates of the input file and will compute the centroid of the particle.
Here the pseudo code:
FUNCTION calculate_centroid(points_x, points_y, points_z, n)
centroid_x=0; centroid_y=0; centroid_z=0;
FOR i = 0, n-1
centroid_x += points_x[i];
centroid_y += points_y[i];
centroid_z += points_z[i];
centroid_x /= n; centroid_y /= n; centroid_z /= n;
RETURN centroid_x, centroid_y, centroid_z
MAIN PROGRAM
READ(input_file,points_x,points_y,points_z) //Read the input_file and store the n coordinates in points_x, points_y, points_z
centroid_x,centroid_y,centroid_z = calculate_centroid(points_x, points_y, points_z, n)
WRITE(output_file,"Set name", set_name)
WRITE(output_file,"Number of points", n)
WRITE(output_file,"The centroid of the points is at", centroid_x,centroid_y,centroid_z)
WRITE(output_file,"Coordinates of the points:")
FOR i = 0, n-1
WRITE("Point:", points_id[i], points_x[i], points_y[i], points_z[i])
END PROGRAM
Here is the Fortran code:
program centroid_calculation
implicit none
integer :: n, i
real(8) :: centroid_x, centroid_y, centroid_z
character(len=100) :: set_name
type Point3D
integer :: id
real(8) :: x, y, z
end type
type(Point3D), allocatable :: points(:)
character(len=255) :: input_filename, output_filename
integer :: input_unit, output_unit, ios
! Check if the correct number of command-line arguments are given
call get_command_argument(1, input_filename)
call get_command_argument(2, output_filename)
if (len_trim(input_filename) == 0 .or. len_trim(output_filename) == 0) then
print *, "Usage: <program> <input_file> <output_file>"
stop 1
end if
! Open input and output files
open(unit=input_unit, file=trim(input_filename), status='old', action='read', iostat=ios)
if (ios /= 0) then
print *, "Error opening input file"
stop 1
end if
open(unit=output_unit, file=trim(output_filename), status='replace', action='write', iostat=ios)
if (ios /= 0) then
print *, "Error opening output file"
close(input_unit)
stop 1
end if
! Read number of points and set name from input file
read(input_unit, *) n, set_name
! Allocate memory for the array of points
allocate(points(n))
! Read points data (ID, x, y, z) from the input file
do i = 1, n
read(input_unit, *) points(i)%id, points(i)%x, points(i)%y, points(i)%z
end do
! Calculate the centroid of the points
call calculate_centroid(points, n, centroid_x, centroid_y, centroid_z)
! Write the results to the output file
write(output_unit, '(A)') "Set Name: " // trim(set_name)
write(output_unit, '(A, I0)') "Number of points: ", n
write(output_unit, '(A, F7.2, A, F7.2, A, F7.2, A)') "The centroid of the points is at (", centroid_x, ", ", centroid_y, ", ", centroid_z, ")"
write(output_unit, '(A)') "Data of the points:"
do i = 1, n
write(output_unit, '(A, I0, A, F7.2, A, F7.2, A, F7.2, A)') "Point ", points(i)%id, ": (", points(i)%x, ", ", points(i)%y, ", ", points(i)%z, ")"
end do
! Clean up
deallocate(points)
close(input_unit)
close(output_unit)
print *, "Centroid calculation complete. Results written to ", trim(output_filename)
contains
! Function to calculate the centroid of an array of 3D points
subroutine calculate_centroid(points, n, centroid_x, centroid_y, centroid_z)
type(Point3D), intent(in) :: points(:)
integer, intent(in) :: n
real(8), intent(out) :: centroid_x, centroid_y, centroid_z
integer :: i
centroid_x = 0.0
centroid_y = 0.0
centroid_z = 0.0
do i = 1, n
centroid_x = centroid_x + points(i)%x
centroid_y = centroid_y + points(i)%y
centroid_z = centroid_z + points(i)%z
end do
centroid_x = centroid_x / real(n)
centroid_y = centroid_y / real(n)
centroid_z = centroid_z / real(n)
end subroutine
end program
unit: This specifies the logical unit number (an integer) to be associated with the file. It is a reference for file operations like read and write.
file: This specifies the name of the file as a string. The file must exist (or be created, depending on the status attribute) at the specified path.
status (optional, default = unknown): This controls how the file should be handled:
old ensures the file must exist.
new creates a new file and raises an error if the file already exists.
replace creates a new file or overwrites an existing one.
unknown allows either reading from or creating the file, depending on its existence.
action (optional, default = readwrite): This specifies whether the file is to be opened for read, write, or readwrite operations.
form (optional, default = formatted): Specifies whether the file is opened for formatted or unformatted I/O. formatted is used for human-readable text files, while unformatted is for binary files. [default is usually 'formatted'
iostat (optional): If provided, this integer variable will store the status of the operation. A value of 0 indicates success, while a nonzero value indicates an error (e.g., file not found).
The close statement is used to close an open file that has been previously accessed using the open statement. Properly closing files ensures that any pending I/O operations are completed, resources are released, and that no further operations can be performed on the file until it is reopened.
The general syntax for the CLOSE statement is:
close ([unit=]unit_number [, iostat=ios] [, status=sta] [, err=err_label])
is used for reading n and set_name from the file whose unit is input_unit. The second argument of read is used to specify the format. If the * is used, Fortran will automatically match the input data to the variables n and set_name without requiring you to specify a format string.
Fortran provides a powerful and flexible formatting system for reading from and writing to files or standard I/O. The FORMAT statement (or its shorthand within READ, WRITE, and PRINT statements) allows you to control the appearance of the data input/output. This section explains various format specifiers available in Fortran and how to use them effectively.
The general syntax of a FORMAT statement is:
FORMAT(format-specifier-list)
Alternatively, in the READ, WRITE, or PRINT statement, the format specifier can be placed directly in parentheses:
This writes an integer in a field of width 5 and a floating-point number in a field of width 10 with 2 decimal places.
Fortran format specifiers define how different types of variables are formatted during input/output. Here are the most common format specifiers:
Integer Format: Iw
I stands for an integer.
w specifies the width of the output field.
For example: WRITE(*, '(I5)') 123 ! Output: " 123" where I5 prints an integer in a field of 5 characters. If the integer has fewer than 5 digits, it is right-aligned, and spaces are added on the left.
Floating-Point Format: Fw.d
F is for real (floating-point) numbers.
w specifies the total width of the field.
d specifies the number of digits after the decimal point.
For example: WRITE(*, '(F10.2)') 3.14159 ! Output: " 3.14" where F10.2 writes a floating-point number in a field of width 10, with 2 digits after the decimal point.
Exponential Format: Ew.d or ESw.d
E is used for real numbers in scientific notation.
w specifies the total width of the field.
d specifies the number of digits after the decimal point.
For example: WRITE(*, '(E12.4)') 12345.6789 ! Output: " 0.1235E+05" where E12.4 prints a number in scientific notation, with a total width of 12 and 4 digits after the decimal point.
Character Format: Aw
A is used for character strings.
w specifies the width of the output field.
For example: WRITE(*, '(A10)') 'Fortran' ! Output: "Fortran " where A10 prints a string in a field of width 10.
A is a shorthand for printing a string of any length without specifying the width. For example WRITE(*, '(A)') 'Fortran is great!' ! Output: "Fortran is great!"
A complete list of format specifiers in Fortran can be found at this link.
In the previous lecture, we introduced some types in Fortran (e.g., integer(kind), real(kind), character(kind), etc.). These are called intrinsic types. In addition, Fortran allows the creation of user-defined types (also known as derived types). This is useful for grouping related data together in a structured way.
In our example, we have:
type Point3D
integer :: id
real(8) :: x, y, z
end type
where type Point3D begins the definition of a new type called Point3D.
Inside this type, there are three components:
integer :: id specifies that id is an integer field, which can be used to identify each Point3D instance.
real(8) :: x, y, z specify that x, y, and z are double precision floating-point fields (using real(8)) representing the coordinates of a point in 3D space.
In our example, to use the Point3D type, we declared variables of this type as follows:
type(Point3D), allocatable :: points(:)
This creates two instances, point1 and point2, of the Point3D type. You can then access and manipulate the components of these instances:
Derived types are ideal for bundling different data types together under one name, especially when representing real-world entities such as points, students, or complex numbers. For instance, instead of using separate variables for each property of a student (e.g., name, age, grade), you can group them into a single structure.
Look how the following code can be simplified by using structures:
program main
implicit none
! Declare variables for student
character(len=20) :: student_firstName
character(len=20) :: student_lastName
integer :: student_age
character :: student_gender
real(8) :: student_height
! Declare variables for professor
character(len=20) :: professor_firstName
character(len=20) :: professor_lastName
integer :: professor_age
character :: professor_gender
real(8) :: professor_height
! Assign values to student variables
student_firstName = "Ludwig"
student_lastName = "Boltzmann"
student_age = 26
student_gender = 'M'
student_height = 1.75 ! Example height, you can assign as needed
! Assign values to professor variables
professor_firstName = "Josef"
professor_lastName = "Stefan"
professor_age = 50 ! Example age, you can assign as needed
professor_gender = 'M'
professor_height = 1.80 ! Example height, you can assign as needed
! You can add further code here
end program main
We immediately see that the information firstName, lastName, age, etc., are common to both student and professor. We could therefore define a type called, let's say, info
type :: info
character(len=20) :: firstName
character(len=20) :: lastName
integer :: age
character :: gender
real(8) :: height
end type info
and use it like this:
program main
use person_info
implicit none
type :: info
character(len=20) :: firstName
character(len=20) :: lastName
integer :: age
character :: gender
real(8) :: height
end type info
type(info) :: student
! Assign values to the student's fields
student%firstName = "Ludwig"
student%lastName = "Boltzmann"
student%age = 26
student%gender = 'M'
student%height = 1.75
! Output the student's information
print *, "Student Info:"
print *, "Name: ", student%firstName, student%lastName
print *, "Age: ", student%age
print *, "Gender: ", student%gender
print *, "Height: ", student%height
end program main
Gaussian quadrature is a powerful numerical integration technique designed to achieve high accuracy using a carefully chosen set of sample points and weights. Unlike basic methods of numerical integration, such as the trapezoidal or Simpson’s rule, which use equally spaced points, Gaussian quadrature selects both the locations of the points and their associated weights in an optimized way. This flexibility allows Gaussian quadrature to approximate integrals with fewer evaluations, often with remarkably high precision.
The key idea behind Gaussian quadrature is to approximate the integral of a function f(x) over an interval by summing the function’s values at specific points xj, each multiplied by a weight wj. For an integral of the form: ∫abf(x)dx, Gaussian quadrature approximates it as:
∫abf(x)dx≈j=1∑Nwjf(xj),
where N is the number of points used in the approximation.
Unlike basic methods, Gaussian quadrature does not restrict these points to be evenly spaced. This freedom in choosing both weights and points effectively doubles the degrees of freedom, allowing Gaussian quadrature to attain a higher-order accuracy than traditional methods like Newton-Cotes for the same number of evaluations.
Another key advantage of Gaussian quadrature is that it can be customized for specific classes of functions. By selecting appropriate weights and abscissas, we can design quadrature formulas that are tailored to integrands of kind “polynomials times some known function W (x)” rather than for the usual class of integrands “polynomials”.
Given such a weight function W(x) and a fixed number of points N, we can determine weights wj and points xj that make the approximation
∫abW(x)f(x)dx≈j=1∑Nwjf(xj)
exact when f(x) is a polynomial.
Here some examples of weight functions:
Gauss-Legendre:
W(x)=1−1<x<1
Gauss-Chebyshev:
W(x)=(1−x2)−1/2−1<x<1
Gauss-Laguerre:
W(x)=xαe−x0<x<∞
Gauss-Hermite:
W(x)=e−x2−∞<x<∞
The computation of abscissas and weights can be easy or difficult depending on how much you already know about your weight function and its associated polynomials. In the case of classical, well-studied, orthogonal polynomials, practically everything is known, including good approximations for their zeros. These can be used as starting guesses, enabling Newton’s method to converge very rapidly.
We are going to write a code to perform integration using Chebyshev polynomials for Gaussian quadrature.
The idea is to approximate the integral of a function f(x) over the interval [−1,1] using Chebyshev nodes and weights.
The Chebyshev quadrature rule gives the nodes xj and weights wj for integrating with a weight function
W(x)=1−x21.
For an N-point Chebyshev quadrature, the nodes xj and weights wj are given by:
xj=cos(Nπ(j−0.5))
wj=Nπ
Here a to do list that can help:
Prompt the user to choose a function to integrate from the following list:
Allow the user to specify the maximum number of Chebyshev nodes, Nmax, for the quadrature.
Generate Chebyshev nodes and weights for the integration.
Calculate and print the integral for different values of N∈[2,Nmax].
C code
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double f(double x, int lab) {
switch (lab) {
case 1: return exp(-cos(x) * cos(x));
case 2: return x * x;
case 3: return 3 * pow(x, 6) + 2 * pow(x, 3) - 3 * x * x + 2;
case 4: return 3 * pow(x, 10) + 2 * pow(x, 3) - 3 * x * x + 2;
case 5: return 3 * pow(x, 100) - 3 * pow(x, 50) + 2 * pow(x, 3) - 3 * x * x + 2;
case 6: return exp(-x * x);
default: return 0.0;
}
}
void generate_weights_abscissae(double *w, double *x, int N) {
const double pi = 3.141592653589793;
*w = pi / N; // Chebyshev weight
for (int j = 0; j < N; j++) {
x[j] = cos(pi * (j + 0.5) / N); // Chebyshev node
}
}
double gauss_chebyshev(double w, double *x, int N, int lab) {
double integral = 0.0;
for (int j = 0; j < N; j++) {
integral += w * f(x[j], lab)*sqrt(1.-x[j]*x[j]);
}
return integral;
}
int main() {
int Nmax, lab;
printf("Choose the function to be integrated from -1 to 1 (just type the number):\n");
printf("(1) exp(-cos(x)*cos(x))\n");
printf("(2) x*x\n");
printf("(3) 3*x^6 + 2*x^3 - 3*x^2 + 2\n");
printf("(4) 3*x^10 + 2*x^3 - 3*x^2 + 2\n");
printf("(5) 3*x^100 - 3*x^50 + 2*x^3 - 3*x^2 + 2\n");
printf("(6) exp(-x*x)\n");
scanf("%d", &lab);
if (lab < 1 || lab > 6) {
fprintf(stderr, "ERROR: you must insert an integer number between 1 and 6.\n");
return 1;
}
printf("Enter the maximum number of Chebyshev points (Nmax): ");
scanf("%d", &Nmax);
if (Nmax < 2) {
fprintf(stderr, "ERROR: Nmax must be at least 2.\n");
return 1;
}
// Open the output file
FILE *file = fopen("results.txt", "w");
if (file == NULL) {
fprintf(stderr, "ERROR: Could not open file for writing.\n");
return 1;
}
// Write header to the output file
fprintf(file, "#N Integral\n");
for (int N = 2; N <= Nmax; N++) {
// Allocate memory for the Chebyshev nodes
double *x = (double *)malloc(N * sizeof(double));
if (x == NULL) {
fprintf(stderr, "ERROR: Memory allocation failed.\n");
fclose(file);
return 1;
}
double w;
generate_weights_abscissae(&w, x, N);
double integral = gauss_chebyshev(w, x, N, lab);
// Write the result to the file
fprintf(file, "%-3d %.6f\n", N, integral);
// Free allocated memory
free(x);
}
// Close the output file
fclose(file);
printf("Results written to results.txt\n");
return 0;
}
FORTRAN code
program chebyshev_integration
implicit none
integer :: N,Nmax ! Number of Chebyshev points
real(8) :: a, b, integral, W
integer :: j, lab
real(8), allocatable, dimension(:) :: x ! Arrays to store nodes and weights
! Define the limits of integration
a = -1.0d0
b = 1.0d0
! Ask for input
print *, "Choose the function to be integrated from -1 to 1 (just type the number):"
print *, "(1) exp(-cos(x)*cos(x))"
print *, "(2) x*x"
print *, "(3) 3*x**6+2*x**3-3*x**2+2"
print *, "(4) 3*x**10+2*x**3-3*x**2+2"
print *, "(5) 3*x**100-3*x**50+2*x**3-3*x**2+2"
print *, "(6) exp(-x*x)"
!Read input
read(*, *) lab
!Check input
if(lab.lt.0.or.lab.gt.6)then
print*, "ERROR: you must insert an integer number between 1 and 6."
stop
endif
print *, "Choose the function to be integrated from -1 to 1 (just type the number):"
!Read input
read(*, *) Nmax
!Check input
if(Nmax.lt.0)then
print*, "ERROR: you must insert a positive integer number."
stop
endif
!Header for the output
write(666,'(A3, 7x, A3)') "#N", "Integral"
do N = 2, Nmax
!Allocate the array for the abscissae
if(allocated(x))deallocate(x)
allocate(x(N))
! Generate Chebyshev nodes and weights
call generate_weights_abscissae(w,x)
! Perform the integration
integral = gauss_chebyshev(w,x,lab)
! Print the result
write(666,'(I3,5x,F10.6)') N, integral
enddo
contains
! Define the function to integrate here
real(8) function f(x,lab)
real(8), intent(in) :: x
integer,intent(in) :: lab
select case(lab)
case(1)
f = exp(-cos(x)*cos(x)) ! Example: Integrate exp(-x^2) over [-1, 1]
case(2)
f = x*x
case(3)
f = 3*x**6+2*x**3-3*x**2+2
case(4)
f = 3*x**10+2*x**3-3*x**2+2
case(5)
f = 3*x**100-3*x**50+2*x**3-3*x**2+2
case(6)
f = exp(-x*x)
end select
end function
subroutine generate_weights_abscissae(w,x)
implicit none
real(8), intent(inout) :: w
real(8), dimension(:), intent(inout) :: x
real(8), parameter :: pi = 3.141592653589793d0
integer :: N
N = size(x)
w = pi / real(N) ! Chebyshev weight
do j = 1, N
x(j) = cos(pi * (real(j) - 0.5d0) / real(N)) ! Chebyshev node
end do
end subroutine
real(8) function gauss_chebyshev(w,x,lab)
real(8), intent(in) :: w
real(8), dimension(:), intent(in) :: x
integer, intent(in) :: lab
integer :: j, N
N = size(x)
gauss_chebyshev = 0.0d0
do j = 1, N
gauss_chebyshev = gauss_chebyshev + W * f(x(j),lab)*sqrt(1-x(j)*x(j))
end do
end function
end program chebyshev_integration
If you're using a modern personal computer in the 2020s, it's likely that you're using a
CPU with either x86-64 or ARM architecture. By architecture (more precisely an instruction set architecture, or ISA),
we mean a standard set of rules that define the way a CPU is supposed to be composed and behave. The ISA defines, for example the set
of instructions (addition, subtraction, ...) , data types (integers, floating points and their representation) and memory
(registers and addresses, memory management system, ...).
Of course, every CPU manufacturer has usually several implementations of a specific ISA, and they are called
microarchitecture.
In the following we restrict our discussion and historical overview to Intel-compatible architectures only.
Examples of architectures, microarchitectures and CPUs you might have heard of include:
x86-16 architecture
Intel 8086
Intel 80286
IA-32
Intel 80386
Intel 80486
Intel Pentium
AMD K6
IA-64
Intel Itanium
x86-64
Intel Core i5 (Haswell)
Intel Core i7 (Broadwell)
Intel Skylake
AMD Ryzen
AMD Epyc (Zen microarchitecture family)
We can say that a specific microarchitecture is the description of the specific components of a CPU,
and entails concepts like those of arithmetic logic unit (ALU), Floating Point Unit (FPU), instruction sets, pipelines, cache,
branch predictors, multithreading, and more.
We will focus on Intel architectures, but our discussion is rather general in terms of modern CPU architectures.
If you are interested in historical development of different architectural philosophies, you might want to explore,
for example
Reduced Instruction Set Computers (RISC) vs Complex Instruction Set Computers (CISC)
Scalar processors vs Vector processors
Most modern (2020s) CPUs are CISC/RISC hybrids, with the CPU receiving CISC instruction but executing them using
proprietary RISC microinstructions. They are classified as scalar processors because they process finite-size chuncks
of data, even though this is usually done in a vectorised way with SIMD (single instruction multiple data) units.
Now, a bit of history of microprocessors:
The first commercially produced microprocessor was the Intel 4004, a 4-bit CPU based on
Metal-Oxide-Semiconductor (MOS) silicon gate technology, designed by Federico Faggin and released in 1971, and it opened
the doors to the future of mass-produced, general purpose CPUs. Only few months later, Intel released the first 8-bit
microprocessor, the 8008 (which Intel started developing even before the 4004 hit the market, showing the
company's appetite to what became the modern microprocessor proliferation). Few years later, in 1978, Intel released the
8086 and its close cousin, the 8088, the foundation of what we now know as the x86 architecture.
The 8088 CPUs were a huge success as they were used in the IBM PC, debuting in 1981, and destined to become one of the more
influential lines of home and office computers, the x86-based architecture being one of the most used nowadays (excluding mobile
computing).
The 80386, released in 1985, was a game-changer. It was the first 32-bit processor in the
x86 family, which meant it could handle more data and address a lot more memory—up to 4 GB.
The 80386 could be supplemented by the 80387 floating point unit (FPU) coprocessor. In later CPUs, the
FPUs were typically integrated on-chip.
By 1989, the 80486 (then rebranded i486) brought some serious upgrades.
It integrated the Floating Point Unit (FPU) directly into the CPU, making scientific calculations much faster.
It also introduced an on-chip cache, which stored frequently used data close to the CPU, speeding up processing by
reducing the need to fetch data from slower main memory.
The early 1990s saw the arrival of the Pentium series, which introduced a superscalar architecture.
This meant the CPU could execute multiple instructions at the same time, thanks to having multiple ALUs and FPUs.
It also brought in branch prediction (guessing which way a program will go at a fork) and out-of-order execution
(rearranging instruction execution to avoid delays), both of which boosted performance.
The x86-64 architecture was introduced by AMD; Also known as AMD64, it was developed by AMD as an extension of
Intel's x86 architecture to support 64-bit computing while maintaining backward compatibility with 32-bit and
16-bit x86 code, which was lost in the competing 64 bit architecture of Intel (IA-64, Itanium processors).
Evenually, Intel adopted the AMD64 architecture starting with the Pentium Pro in the mid-1990s.
These CPUs feature deep pipelines, sophisticated speculative execution (executing instructions before they're needed,
based on predictions), and multiple levels of cache (L1, L2, and L3) to keep the processor fed with data.
These CPUS also introduced MMX (the MultiMedia eXtensions) a type of SIMD (Single Instruction, Multiple Data)
instructions able to operate on multiple data points at once
Building on the success of MMX, Intel introduced SSE (Streaming SIMD Extensions) with the Pentium III.
SSE expanded SIMD capabilities beyond the integer operations of MMX to include floating-point operations.
Each new iteration (SSE2, SSE4) added more instructions and improved the performance and versatility to
the SIMD units, making it increasingly valuable for a wider range of applications. By the time of SSE4,
SIMD units were not just a tool for multimedia but also a significant asset in tasks like encryption,
data compression, and many aspects of scientific computing.
With AVX (Advanced Vector Extensions), first appearing in Intel's Sandy Bridge processors in 2011, AVX brought
wider vector registers (256 bits compared to 128 bits in SSE), allowing the CPU to process even more data in
parallel. AVX is particularly impactful in areas requiring heavy numerical computation.
In summary:
Intel's MMX (MultiMedia eXtensions): provides arithmetic and logic operation on 64-bits integer numbers in
blocks of 2 32-bit, 4 16-bit or 8 8-bit operations in one instruction. Ther registers are called MM0, MM1, ...
AMD's 3DNow: added single-precision (32-bit) floating point support to the MMX instruction set
Intel's SSE (Streaming SIMD Extensions): providing single precision floating point operations and 128-bit registers (XMM0, XMM1,...)
Intel's SSE2/SSE4: providing double precision (64-bit) floating point operations on 128-bit registers (
Assembly language is a low-level programming language that is closely related to machine code. Each instruction in an assembly program corresponds to a specific operation that the CPU can perform. Writing programs in assembly allows for fine-grained control over the computer's hardware, but it requires a deep understanding of the CPU's instruction set and the system's architecture.
[!IMPORTANT]
You will rarely, if ever, need to write assembly code, but it might well happen, especially if you are goinig to
optimise code, that you will have to look at the assembly code produced by your compiler. Also, studying a bit of assembly
and simple cases as those presented here will help you understand how the CPU works "under the hood" (well, at least a bit).
Having some knowledge of assembly will be extremely helpful to sharpen your coding skills!
Below are three simple examples in x86-64 assembly: summing two numbers, a simple for loop, and making a function call. Each example includes the code, and instructions on how to assemble and link it using the GNU assembler (as) and GCC on a Linux system. Last, we will analyse in depth a "Hello, world!" code written in assembly, spending some time on the details on the mechanism behind invoking system calls, that is, requesting the linux kernel to do something for us.
[!TIP]
If you want to put your assembly skills to work after this chapter, you can have a look at the section on the effect of optimisation flags.
This example demonstrates how to add two numbers and store the result in a register.
.section .data
# No data needed
.section .text
.globl _start
_start:
movq $5, %rax # Load 5 into rax
movq $10, %rbx # Load 10 into rbx
addq %rbx, %rax # Add rbx to rax, result in rax
# Exit the program
movq $0, %rdi # Exit code 0
movq $60, %rax # sys_exit system call
syscall
Let's see what every line means:
.section .data: the read-and-write data where strings, numbers etc are stored. In this case, we don't need any, so this is empty!
.section .text: in assembly and compiled programs, the .text section is a dedicated region of memory where the executable
code (machine instructions) is stored. This is the part of the program that contains the actual instructions
the CPU executes. The
_start: the _start label is the default entry point for programs that are written in assembly, and is the first
code that gets executed when the operating system starts running the program. This is a "simpler" way of starting a program,
unlike higher-level ones written in C, which start executing from the main function after some initialisation done by the C runtime
(crt0 or crt1.o), such as setting up the stack and heap. (TODO: refer to an explanation of memory layout).
Using this "simpler" way also means that it's not possible to use wrappers to the system calls as those provided
in the libc (more on this in later, in the "Hello, world!" example).
Within the _start section we have, first, three instructions:
movq $5, %rax # Load 5 into rax
movq $10, %rbx # Load 10 into rbx
addq %rbx, %rax # Add rbx to rax, result in rax
These lines simply call two instructions:
movq to move quadword (that is, 64 bits). In this specific case the source is an immediate value
(the numerical value 5, denoted $5) and the destination is the RAX 64-bit register
addq to add the binary values stored in the two registers and store the result in RAX.
[!NOTE]
In assembly, an immediate value is a constant number embedded into the instruction itself by the assembler,
as opposed to one loaded from, e.g., a register
The final part:
# Exit the program
movq $0, %rdi # Exit code 0
movq $60, %rax # sys_exit system call
syscall
is invoking the kernel, asking to execute the sys_exit system call. We will dissect this part later,
in the "Hello, world!" example. For now, just consider that block as the way to exit the code.
How do you produce an executable from this assembly code? We first need to produce the object code with
the assembler, then link it to make it executable. Let's say we save the assembly code in a file called sum.s,
then this will assemble it, link it, and execute it.
> as -o sum.o sum.s
> ld -o sum sum.o
> ./sum
Alternatively, one can use gcc to perform all these steps. As this is not a conventional code using the standard C runtime initialisation, we must provide
> gcc -nostartfiles -o sum sum.s
Of course, nothing happens at the moment, because we're not doing producing any output! More on this later in our "Hello, world!" example!
[!NOTE]
the "dialect" of assembly used above is the so-called AT&T syntax. This is the one natively recognised
by the GNU assembler, although it can also read the other major dialect, the Intel syntax, which would look
like this:
mov rax, 5 ; Load 5 into rax
mov rbx, 10 ; Load 10 into rbx
add rax, rbx ; Add rbx to rax, result in rax
The following example demonstrates how to write a simple for loop that counts from 0 to 9.
.section .data
# No data needed
.section .text
.globl _start
_start:
xorq %rcx, %rcx # Set counter (rcx) to 0
loop_start:
cmpq $10, %rcx # Compare counter with 10
jge loop_end # If counter >= 10, jump to loop_end
# Body of the loop (No operation, just incrementing counter)
incq %rcx # Increment counter
jmp loop_start # Jump back to start of loop
loop_end:
# Exit the program
movq $0, %rdi # Exit code 0
movq $60, %rax # sys_exit system call
syscall
This case is a bit more complex than the one shown before. Let's go through it line-by-line:
xorq %rcx, %rcx: a common trick to zero-out a register with the exclusive or logical operation
cmpq $10, %rcx: compare the value stored in RCX with the immediate value 10. After the comparison, a conditional should follow (jge in this case)
jge loop_end: the jge instruction is a conditional jump, and evaluates the result of a preceeding test (cmpq in our case).
If the result of the comparison is "greater or equal" (to ten), then jump to the label loop_end
[!IMPORTANT]
cmpq Stands for compare a quadword. It performs the internal subtraction %rcx - 10 and sets the flags (see box below) based on the result:
If %rcx is less than 10, set the Carry Flag (CF)
If %rcx is equal to 10, set the Zero Flag (ZF)
If %rcx is greater than 10, set the Sign Flag (SF)
[!NOTE]
Flags in a CPU refer to specific bits in the status register (also called the flags register or EFLAGS on x86 processors).
Flags are set or cleared by various instructions to indicate the results of operations, and are critical for decision-making
in assembly language because they affect how conditional branches and other logic are executed.
incq %rcx: if the jump is not executed, the next instruction just increments by one our counter, stored in register RCX.
jmp loop_start: after incrementing RCX, jump to the beginning of the loop.
This example demonstrates how to define and call a simple function that adds two numbers and returns the result.
The only new instructions introduced here are call and ret:
.section .data
# No data needed
.section .text
.globl _start
_start:
movq $5, %rdi # First argument (5) -> rdi
movq $10, %rsi # Second argument (10) -> rsi
call sum # Call sum function
# Result is in rax
# Exit the program
movq $0, %rdi # Exit code 0 -> rdi
movq $60, %rax # sys_exit system call -> rax
syscall
sum:
movq %rdi, %rax # Move first argument (rdi) to rax
addq %rsi, %rax # Add second argument (rsi) to rax
ret # return (result in rax)
[!NOTE]
Here we need to introduce the convention on calling functions. The System V ABI (Application Binary Interface)
specifies the following registers for function calls:
Argument Passing Registers: %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are used to pass arguments from the 1st to the 6th.
Return Value Register: %rax is used to store the return value of a function (for both integer and pointer return values). If the return value is larger than 64 bits, additional registers may be used, but typically %rax handles most cases.
The stack pointer %rsp: points to the top of the stack and is used to push and pop values (such as the return address, local variables, and extra arguments) during function calls.
The base pointer %rbp: is often used as a reference to access local variables but in modern compilers, it may be omitted.
When call is executed, the return address (the address of the instruction after the call) is automatically pushed onto the stack.
The ret instruction pops this address off the stack to return control to the calling function.
Let's make an example of a very simple assembly code to print Hello, world! to screen.
First, we will compile it and run it, and then we will describe each line in detail. The more we proceed
in the description of the code, the more we will add details, especially close to the end, where we will
dissect it down to the machine code level. However, the initial part should give you a light introduction
to the structure of an executable file.
.section .data
.msg: # msg is just a label, you can use anything else
.string "Hello, world!\n"
.section .text
.globl _start # this is the default entry point
_start:
movl $14, %edx # Length of the string ("a\n")
leaq .msg(%rip), %rsi # Go to the label 'msg' and load the string address
movl $1, %edi # File descriptor 1 (stdout)
movl $1, %eax # 1 is the syscall number for sys_write
# defined in, e.g.,
# /usr/include/x86_64-linux-gnu/asm/unistd_64.h
syscall # Make the system call (x86-64 style)
# On older x86 (32-bit) systems, system calls
# were typically made with the 'int 0x80' interrupt.
mov $60, %rax # sys_exit (syscall number 60)
xor %rdi, %rdi # set the sxit status to 0 (success)
syscall # Make the system call
Paste this code into the file hw.s. You can compile it in different ways:
In 64-bit Linux systems, ELF executables are loaded starting at the address 0x400000.
The ELF headers, which describe the structure of the executable, are located at the beginning of
the address space (0x400000), and the actual program code starts after the headers, with an offset of
0x1000 (which is 4096 bytes or one memory page), which is why 0x401000 is a common starting address.
.section .data: switches to the read-and-write data where data such as strings, are stored.
.msg:: this label is a reference to the location of the string "Hello, world!\n" that comes after.
.string "Hello, world!\n": defines the string (including a newline) and stores it at the location of .msg. The string is null-terminated automatically.
.section .text: the .text section is where the executable code is stored, as described before.
This is typically read-only and the code cannot modify it at runtime
(this being enforced by the operating system). This protection ensures that executable code is not
accidentally or maliciously altered. The .text section is typically marked as executable upon linking,
meaning the CPU is allowed to execute instructions from this section.
In a linked executable, the .text section typically starts at a well-defined address, which is determined by the
linker during the linking phase. For example, in many Linux systems using ELF format, the .text section is placed at
an address like 0x401000, as explained earlier.
.globl: this directive is used to declare a symbol (label) as global, meaning that it can be
referenced from other files or outside the current assembly file.
_start: as described above, the _start label is the default entry point for programs that bypass
the standard C runtime initialisation.
movl $14, %edx: stores the value 14 in the register EDX. This is the lower 32-bit part of the full, general purpose register RDX.
The conventional naming of x86-64 general purpose registers are:
RDX: The full 64-bit register.
EDX: The lower 32 bits of RDX.
DX: The lower 16 bits of RDX.
DL: The lower 8 bits of RDX.
DH: The upper 8 bits of the lower 16 bits of RDX.
Here, we store a 32-bit integer (14). This is the length of the string "Hello, world!\n", including
the newline. In this particular case, however, EDX plays a particular role (see below at the syscall section)
as this represents the third argument passed to system calls.
leaq .msg(%rip), %rsi: the leaq instruction is the Load Effective Address instruction for 64-bit registers.
It is commonly used to compute memory addresses or perform arithmetic without actually accessing memory.
The instruction leaq computes the address or value of the memory operand and stores that computed address into the destination register.
The instruction leaq .msg(%rip), %rsi is used to compute the effective address of the label .msg
relative to the Instruction Pointer register (RIP), and then store that computed address into
the RSI register (which is a full 64-bit general purpose register: addresses are be 64 bit long on
a 64 bit machine...). The RSI register is also a special one, because it is used to pass the second argument
to system calls (see the syscall section).
The %rip register contains the address of the next instruction to be executed.
RIP-relative addressing calculates the memory address as an offset from the current instruction pointer,
making it particularly useful in position-independent code (common in shared libraries and executables).
In this particular case, RSI plays a particular role
as this represents the first argument passed to system calls (see the syscall section).
movl $1, %edi: stores the value 1 in the register EDI. This will be used to indicate the
stdout file descriptor ( stdin being typically 0, stdout 1 and stderr 2). EDI (and RDI) is a special
register, because it is used to pass the first argument to system calls (see the syscall section).
movl $1, %eax: stores the value in the register EAX. This will be used to indicate the numerical value of the system call
sys_write that performs binary I/O. The syscall numbers are defined in some system header files, for example
/usr/include/x86_64-linux-gnu/asm/unistd_64.h, where the first 7 syscalls are defined as:
Here, since we know already that sys_write is associated to the numerical constant 1, we store it directly.
Rhe EAX register (or the full RAX one) is a special one that is used by the syscall
instruction to determine which system call to invoke.
syscall: execute the system call. The syscall instruction in x86-64 is a special CPU instruction used to
transition from user mode to kernel mode, allowing user-space programs to request services from
the operating system kernel. This instruction is crucial for executing system calls, which are controlled
entry points into the kernel, enabling user programs to interact with hardware or perform privileged
operations like file I/O, memory management, or process control. Notice that syscall is the modern replacement for
the older int 0x80 interrupt instruction used in 32-bit systems.
In x86-64 based Linux systems, system calls use specific registers to pass the system call number
and its arguments. The registers are loaded with values before invoking the syscall instruction.
The RAX register contains the system call number, which tells the kernel which service is being requested.
Other registers contain the arguments to the system call, as one can see from the linux kernel source
linux/arch/x86_64/entry.S
/*
* Registers on entry:
* rax system call number
* rcx return address
* r11 saved rflags (note: r11 is callee-clobbered register in C ABI)
* rdi arg0
* rsi arg1
* rdx arg2
* r10 arg3 (needs to be moved to rcx to conform to C ABI)
* r8 arg4
*/
By calling syscall, the kernel takes over and executes the sys_write system call, passing to it
the values 1 (stdout) as first argument (stored in EDI) and the address of the string Hello, world!
as second argument (stored in RSI).
As we mentioned, this way we are not making use of the libc wrappers like write(),
but are invoking directly the syscall from the kernel. This is roughly equivalent to the following C code:
#include <unistd.h> // for syscall function
#include <sys/syscall.h> // for syscall numbers (SYS_write)
int main() {
const char *message = "Hello, world!\n";
long bytes_written;
// Calling the sys_write syscall using the syscall function
bytes_written = syscall(SYS_write, 1, message, 14); // 1 is the file descriptor for stdout
return 0; // Exit the program
}
The system call sys_write writes in binary format to the standard output file descriptor the content
of the string, starting from it's initial address and ending 14 bytes afterwards.
Summarising, the registers used here (EDX, RSI, EDI, EAX) are used to:
EDX: select the system call (1 being sys_write)
EDI: pass the first argument to write (the stdout identifier 1)
RSI: pass the address where the first character (byte) of the string is stored
EDX: pass the number of bytes that need to be written to stdout.
mov $60, %rax: here we're getting ready to calling another syscall, namely sys_exit (whose identifying number is 60),
by storing the value 60 in RAX.
xor %rdi, %rdi: again, the RDI register is used to pass the first argument to the system call. The sys_exit syscall accepts
only one argument, and calling xor %rdi, %rdi is a quick way to set it identically to zero.
[!Note]
Using xor to set a register to zero is a typical pattern because it ensures minimal usage of resources.
Using mov could potentially be slower and/or requiring more resources. In particular:
xor %rdi, %rdi
Operation: Clears the RDI register by XOR-ing it with itself, effectively setting RDI to 0.
Latency: On modern CPUs, this is an extremely fast instruction. Since no memory access is required and it only operates
on the register itself, it is typically 1 clock cycle.
Micro-optimization: The xor %rdi, %rdi is recognized by modern processors as a common way to zero out a
register. It takes advantage of internal optimizations that recognize this pattern, so it typically has a
latency of 1 cycle and may even be executed without requiring any write-back (micro-ops fusion).
mov $0, %rdi
Operation: Moves the immediate value 1 into the RDI register.
Latency: The mov instruction with an immediate value usually takes 1 clock cycle as well on modern
CPUs. However, it involves loading an immediate value (1) into the RDI register, so it requires
slightly more resources compared to xor.
Instruction Size: The size of mov $1, %rdi is typically 5 bytes (since it has to encode the
immediate value). Although this doesn't affect the latency directly, it can impact instruction
fetching and decoding.
Also note that:
xor %rdi, %rdi is 2 bytes long in machine code: (0x31, 0xFF).
0x31 is the opcode for a xor operation between two 64-bit registers
0xFF (11111111 in binary) is the so-called ModR/M Byte and encodes the mode of operation:
11 for register-to-register, then 111 twice to identify the RDI register twice.
mov $0, %rdi is 5 bytes long in machine code: (0x48, 0xC7, 0xC7, 0x00, 0x00, 0x00, 0x00).
0x48 is the REX prefix that tells the machine to operate on full 64-bit registers (RDI instead of EDI)
0xC7 is the opcode for mov
0xC7 (11000111 in binary) is the ModR/M Byte: 11 for register-to-register, 000 to specify
an immediate value (0), and 111 to specify the RDI register
0x00 Lower byte of the immediate value (0), padded with zeros (the other three 0x00 bytes)
Both instructions typically take 1 cycle on modern processors, but xor %rdi, %rdi is generally more
efficient due to microarchitectural optimizations and smaller instruction size. This might not have a
particular performance effect in this context, but it is a good excuse to explain how assembly code is
translated to machine language!
syscall: finally, we call sys_exit passing zero as an argument, signaling the outer world that everything was OK.
We are now at the end our our tour on the intruduction to the assembly language.
We have also introduced the structure of an ELF executable, and some details on how assembly code is
translated to machine code, ready to be executed by the CPU.
GDB, the GNU Debugger, is a powerful tool, which can be used for debugging C, C++, Fortran, or other compiled codes.
When writing complex applications, bugs such as segmentation faults, memory corruption, or logical errors are often
difficult to trace by simply reviewing the code, and debuggers like GDB provide you with a robust interface to control
program execution by running your program, for example, line by line, inspecting variables, following the flow of function calls,
and observing how your program interacts with memory and data.
Let's make a simple example of how one can use GDB to inspect the flow of a program (no bugs at the moment, so let's call it nobug.c):
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int arr[5];
for (int i = 0; i < 5; i++) {
arr[i] = i;
printf("%d\n",arr[i]);
}
return 0;
}
To compile a code with GCC and add the information necessary to run the GDB debugger, pass the -g flag for basic functionality, or the -ggdb flag for GDB-specific features.
> gcc -o nobug -ggdb nobug.c
Now we can run our executable within GDB,
> gdb ./nobug
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nobug...
Now, we issue the command l, or list:
(gdb) l
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int arr[5];
6 for (int i = 0; i < 5; i++) {
7 arr[i] = i;
8 printf("%d\n",arr[i]);
9 }
10 return 0;
By passing the flag -ggdb at compile time we included in the executable file information on the source code. Without this information it would still be possible to run the debugger,
but it will be much more difficult to understand its output.
[!TIP]
Including debugging info with options -g or -ggdb does not slow down the execution of the programe (despite someone affirming the contrary).
Therefore, it is in general recommended to include the debugging flags everytime you compile your code! If you don't want your source to be
at hand for everyone, you can remove the debugging flag, or call strip <file> on your executable, to remove all strings included in the binary file.
Let's make the first step by introducing a breakpoint, that is, a point in the code where the debugger will stop the execution and give you
the opportunity to inspect the variables. For now, let's break right after starting the main function:
(gdb) break nobug.c:main
Breakpoint 1 at 0x1175: file nobug.c, line 4.
[!TIP]
With the breakpoint set (we could have also passed directly the line with break 3, or the short version b 3
Modern versions of GDB provide autocompletion! So try typing b nob<tab> and see what happens!
We can now run the program, which will break exection immediately:
(gdb) run
Starting program: /home/m.sega/test_daje/nobug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at nobug.c:4
4 int main(void) {
At this point no variable has been defined. Let's make a step forward in the execution, with step, and print the value of i:
(gdb) step
6 for (int i = 0; i < 5; i++) {
(gdb) print i
$1 = 0
At this point, no element of arr has been assigned. We can see this by printing the whole array (we can also access its address):
Let's now go to the next instruction with next, and print the value of arr[i] (yes, you can also use variables in your expressions!)
[!NOTE]
step goes one step by entering also the execution of the called functions, whereas next goes to the next
instruction staying always at the same level, that is, without entering functions.
The first element has been assigned, but the remaining four still have random values.
[!NOTE]
some compilers can be instructed to zero-out numerical variables when they are
declared. Check the manual page of your compiler!
At the next iteration, the second element is assigned:
(gdb) next
0
6 for (int i = 0; i < 5; i++) {
(gdb) next
7 arr[i] = i;
(gdb) next
8 printf("%d\n",arr[i]);
(gdb) next
1
6 for (int i = 0; i < 5; i++) {
(gdb) print arr
$7 = {0, 1, 100, 0, 4096}
That's it for this simple example, where we have seen how to take control over the flow of the code and to inspect the value and size of variables!
[!IMPORTANT]
As you can imagine, this is a powerful tool that allows you to avoid the typical debugging pattern of adding
manually debugging information in the form of printf statement distributed along your code!
#include <stdlib.h>
#include <stdio.h>
int function(int* arr) {
int i,sum=0;
for (i = 0; i < 2048 ; i++) {
arr[i] = i;
sum += i ;
}
return sum;
}
int main(void){
int a[2],s;
s = function(a);
printf("result=%d\n",s);
return s;
}
Compiling and running this code on older system would directly raise a Segmentation Fault (i.e., a violation of
memory boundaries). On modern systems, the stack is protected, so that an error message like this is probably what you
would observe this output:
(gdb) run
Starting program: /home/m.sega/test_daje/bug1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x000055555555517c in function (arr=0x7fffffffe274) at bug1.c:7
7 arr[i] = i;
The execution stops where the segmentation fault happened. This is already an interesting bit of information.
Let's show the code around that line with the list (l) command:
(gdb) l
2 #include <stdio.h>
3
4 int function(int* arr) {
5 int i,sum=0;
6 for (i = 0; i < 1<<2048; i++) {
7 arr[i] = i;
8 sum += i ;
9 }
10 return sum;
11 }
We know that the problem is most likely the array arr. However, arr is passed as a pointer, so that
we need to understand how big is the memory region that arr points to, and in which part of the code
this has been allocated. In this simple example the answer is obvious, but in a more complex code this could
be difficult to spot, for example because function could be called in many different places, passing different
pointers.
The backtrace command (bt in short) is our friend here, because it shows the whole stack of function calls that brought to this SEGFAULT:
(gdb) bt
#0 0x000055555555517c in function (arr=0x7fffffffe274) at bug1.c:7
#1 0x00005555555551ae in main () at bug1.c:15
which shows that the call stack is just made of two frames, #0 and #1. We know already that we need to look
in a frame above the one where the SEGFAULT happened, so we switch to frame #1 with the frame command (f in short):
(gdb) f 1
#1 0x00005555555551ae in main () at bug1.c:15
15 s = function(a);
This way, the debugger shows us the line in frame #1 where the function was called. In this frame we can inspect the type
of a, and realise that it's just two elements long, so that the loop that goes up to 2048 is clearly going out of bounds:
Have a look at this seemingly innocuous piece of code:
#include <stdio.h>
int main() {
int arr1[2] = {2,2};
int arr2[2] = {1,1};
for (int i=0 ; i<arr1[0] ; i++){
arr2[arr1[1]]++ ;
printf("%d\n",arr2[arr1[1]]);
}
return 0;
}
At first sight, if you're not looking attentively enough, this should perform a loop from 0 to arr1[0] (which is equal to 2), incrementing one of the elements of arr2 and printing its value. In practice, if you run it (with an old compiler), it will never stop...
Why is that so? First of all, the arrays are stored in inverse order on the stack. Let's see this with GDB:
gdb ./bug2
Reading symbols from ./bug2...
(gdb) b main
Breakpoint 1 at 0x1155: file bug2.c, line 4.
(gdb) r
Starting program: /home/m.sega/test_daje/bug2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at bug2.c:4
4 int arr1[2] = {2,2};
(gdb) next
5 int arr2[2] = {1,1};
(gdb) p &arr1
$1 = (int (*)[2]) 0x7fffffffe274
(gdb) p &arr2
$2 = (int (*)[2]) 0x7fffffffe26c
The two addresses are 0xfe274-0xfe26c = 8 bytes apart, as one would expect (an int is 32 bits on a 64-bit machine). What is imporant to notice is that arr1 is located in memory afterarr2, so that the physical representation would be something like this:
0xfe26c
0xfe270
0xfe274
0xfe278
arr2[0]
arr2[1]
arr1[0]
arr1[1]
Now, when trying to access arr2[2], the pointer algebra tells us we are actually reaching the address arr2+2,
or, in other words, &arr2[0] + 2. Since &arr2[0]=0xfe26c, it means that &arr2[0] + 2 = 0xfe274, which is
the address of arr1[0]. By writing arr2[arr1[1]]++ we're actually incrementing by one the content of arr2[2],
that is to say, the content of arr1[0]. In the loop for (int i=0 ; i<arr1[0] ; i++), the content of arr1[0]
is not constant, but is increase at every iteration by one, with the result that the loop never ends.
Let's see how to spot this using the debugger.
Here, we define a breakpoint at the beginning of the main function, and start the execution
Reading symbols from ./bug2...
(gdb) b main
Breakpoint 1 at 0x1155: file bug2.c, line 4.
(gdb) r
Starting program: /home/m.sega/test_daje/bug2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at bug2.c:4
4 int arr1[2] = {2,2};
At this point, arr1 is defined, so we can instruct a watchpoint, passing
the flag -l to specify that any changes taking place at that address will
be notified. Since our loop is never ending, the problem has to be in arr1[0].
Let's see what happens:
Now we keep iterating by issuing the next (n) command.
[!TIP]
Pressing <enter> is equivalent to repeating the last GDB command.
This comes handy when in need to quickly go through a long loop
(gdb) n
Hardware watchpoint 2: -location arr1[0]
Old value = 0
New value = 2
0x000055555555515c in main () at bug2.c:4
4 int arr1[2] = {2,2};
(gdb)
5 int arr2[2] = {1,1};
(gdb)
7 for (int i=0 ; i<arr1[0] ; i++){
(gdb)
8 arr2[arr1[1]]++ ;
(gdb)
Hardware watchpoint 2: -location arr1[0]
Old value = 2
New value = 3
main () at bug2.c:9
9 printf("%d\n",arr2[arr1[1]]);
From the information above it's clear that the watchpoint
is triggered by line 8:
8 arr2[arr1[1]]++ ;
And we can also check explicitly that the address are the same:
(gdb) p &arr2[arr1[1]]
$1 = (int *) 0x7fffffffe274
(gdb) p &arr1[0]
$2 = (int *) 0x7fffffffe274
When solving systems of linear equations, one powerful method is to decompose the matrix A into a product of two matrices: a lower triangular matrix L and an upper triangular matrix U:
L⋅U=A(1)
where:
L is the lower triangular matrix, meaning it has non-zero elements only on the diagonal and below.
U is the upper triangular matrix, with non-zero elements only on the diagonal and above.
This decomposition allows us to break the problem into simpler parts.
For example, for a 4x4 matrix A, the decomposition in Eq. (1) would look like this:
The advantage of this approach is that triangular systems (like L and U) are easier to solve.
In a triangular system:
L (lower triangular) only has non-zero elements on the diagonal and below. This makes it possible to solve for the unknowns sequentially using a method called forward substitution, starting from the first row and moving downwards.
U (upper triangular) has non-zero elements on the diagonal and above. For U, you use a method called backsubstitution, which solves the system from the last row upwards.
These two methods —- forward substitution and back substitution -— are computationally efficient. Instead of having to deal with the full matrix at once, you solve the system in steps, reducing the overall complexity. This stepwise approach results in fewer calculations compared to methods like Gaussian elimination, making it faster and less prone to numerical errors for large systems.
By decomposing the matrix A into L and U, we can solve the system of equations step-by-step, first with forward substitution and then with backsubstitution, making the process more efficient.
In the previous section, we saw that once we have the lower and upper triangular matrices (L and U, respectively) such that A=L⋅U, we can easily solve a linear system like A⋅x=b. Now, we see how to compute L and U, i.e., how to perform the LU decomposition.
As you can see from this example, we have N2 equations and N2+N unknowns. Since the number of unknowns is greater than the number of equations, we are invited to specify N of the unknowns arbitrarily and then try to solve for the others. We therefore choose:
αii=1i=1,…,N
We now apply the following procedure:
Crout’s algorithm provides a very efficient way to solve the system of equations generated by the matrix decomposition L⋅U=A. The method tackles the set of N2+N equations, which include both the elements of L and U, by cleverly rearranging them so that fewer unknowns are solved at each step. This reordering allows us to sequentially solve for each αij and βij, as outlined below:
First, we set the diagonal elements of L to unity:
αii=1fori=1,2,…,N
Then, for each j=1,2,…N, we follow these two steps:
Use the following equation to solve for each βij where i=1,2,…,j (in the below equation, the summation term is zero when i=1):
βij=aij−k=1∑i−1αikβkj
Use the following equation to solve for αij where i=j+1,j+2,…N:
αij=βjj1(aij−k=1∑j−1αikβkj)
As you proceed through these iterations, you will notice that each αij and βij is determined by previously computed values, allowing for an “in-place” decomposition. This means that no extra storage is required for these values, as they overwrite the corresponding locations in the matrix.
The algorithm essentially fills in the matrix by columns from left to right and within each column, from top to bottom.
Thus, after completing the process, the resulting matrix looks like this:
Pivoting is a technique used to improve the numerical stability of the factorization process: without it, LU decomposition can suffer from numerical instability, especially when small or zero elements are encountered on the diagonal.
Pivoting refers to reordering the rows (or sometimes columns) of a matrix during the decomposition to avoid dividing by small or zero pivot elements, which can lead to large numerical errors.
In the context of LU decomposition, partial pivoting (the most common form) means swapping rows to ensure that the largest absolute value in each column becomes the pivot element (diagonal element of the matrix). This helps reduce round-off errors and increases the numerical stability of the factorization.
Without pivoting, if a small pivot element is encountered during the factorization process, division by that small value can amplify rounding errors, making the solution inaccurate. Pivoting minimizes this risk by ensuring that the pivot element is large enough to avoid such issues.
In Crout’s method with partial pivoting, the algorithm includes an additional step where the rows of the matrix are swapped to ensure that the pivot element (the diagonal element of U) is the largest element in its column. Here’s how the process works:
Initialize the LU Decomposition: Start with matrix A, and begin the process of decomposing it into L⋅U.
At each step (for each column):
Before computing the current column of L and U, search for the largest absolute value in the current column (from the diagonal element down) and swap rows so that the largest value becomes the pivot element.
3 Row Swapping:
When a row swap is performed, you must update both L and U to reflect the new row ordering. Typically, these row swaps are recorded in a permutation matrix P, and the decomposition becomes:
P⋅A=L⋅U

where P is the permutation matrix that tracks row swaps.
4. Compute the LU Factors:
- After pivoting, compute the entries of the L and U matrices as usual. Ensure that the pivot element (diagonal element of U) is now large enough to avoid division by small numbers.
LU decomposition is useful for solving a series of Ax=b problems with the same A matrix and different b matrices. This is advantageous for computing the inverse of A, as only one decomposition is required.
The second column of the inverse can be computed by changing b to [0,1,0]T, and the third column with [0,0,1]T, and so on.
This method is efficient because only back- and forward-substitution is required after the initial LU decomposition.
[!WARNING]
To compute the inverse of the matrix, you need to solve (n) Ax=b-problems, one for each column of the inverse. If you're trying to compute the inverse just to solve an Ax=b problem, you introduce more numerical error and slow down your computation time. In short, avoid computing the matrix inverse unless absolutely necessary, and instead solve the system of equations directly.
To provide an example of forward substitution for solving a lower triangular system like L⋅y=b, let’s apply the forward substitution method to a specific 4x4 system.
We start with the lower triangular matrix L, the vector b, and the unknown vector y.
In this case, L⋅y=b becomes:
To perform LU decomposition of a 3x3 matrix step by step, we decompose matrix A into a lower triangular matrix L and an upper triangular matrix U such that:
A=L⋅U
Where:
L is a lower triangular matrix, meaning it has non-zero values only on and below the diagonal, and all elements above the diagonal are zero.
U is an upper triangular matrix, meaning it has non-zero values only on and above the diagonal, and all elements below the diagonal are zero.
LAPACK (Linear Algebra PACKage) is a highly optimized library designed for solving linear algebra problems such as solving systems of linear equations, computing matrix factorizations, and performing matrix inversions. LAPACK is written in Fortran, but there are wrappers and bindings available for C and other languages.
This will also install any necessary dependencies, including BLAS.
To work with LAPACK in C, you will want to install the LAPACKe package, which provides the necessary bindings for C:
brew install lapacke
To make sure LAPACK is installed correctly, you can check if lapacke.h and the shared libraries (liblapacke.dylib, liblapack.dylib, libblas.dylib) are available in the default Homebrew installation path (/usr/local/include for headers and /usr/local/lib for libraries).
You can also list installed libraries by running:
ls /usr/local/lib | grep lapack
ls /usr/local/lib | grep blas
-lblas: Links the BLAS library, which LAPACK depends on.
-lm: Links the math library (for operations like square roots or other math functions).
[!WARNING]
If the libraries are installed in a non-standard path (e.g., /opt/homebrew/ on newer Macs with M1/M2 chips), you may need to specify the location of the libraries explicitly when compiling, like this:
The name of each LAPACK routine is a coded specification of its function.
All driver and computational routines have names of the form XYYZZZ, where for some driver routines the 6th character is blank.
The first letter, X, indicates the data type as follows:
S REAL
D DOUBLE PRECISION
C COMPLEX
Z COMPLEX*16 or DOUBLE COMPLEX
When we wish to refer to an LAPACK routine generically, regardless of data type, we replace the first letter by x. Thus xGESV refers to any or all of the routines SGESV, CGESV, DGESV and ZGESV.
The next two letters, YY, indicate the type of matrix (or of the most significant matrix). Most of these two-letter codes apply to both real and complex matrices; a few apply specifically to one or the other, as indicated in the table below.
Open Table
BD bidiagonal
DI diagonal
GB general band
GE general (i.e., unsymmetric, in some cases rectangular)
GG general matrices, generalized problem (i.e., a pair of general matrices)
GT general tridiagonal
HB (complex) Hermitian band
HE (complex) Hermitian
HG upper Hessenberg matrix, generalized problem (i.e a Hessenberg and a
triangular matrix)
HP (complex) Hermitian, packed storage
HS upper Hessenberg
OP (real) orthogonal, packed storage
OR (real) orthogonal
PB symmetric or Hermitian positive definite band
PO symmetric or Hermitian positive definite
PP symmetric or Hermitian positive definite, packed storage
PT symmetric or Hermitian positive definite tridiagonal
SB (real) symmetric band
SP symmetric, packed storage
ST (real) symmetric tridiagonal
SY symmetric
TB triangular band
TG triangular matrices, generalized problem (i.e., a pair of triangular matrices)
TP triangular, packed storage
TR triangular (or in some cases quasi-triangular)
TZ trapezoidal
UN (complex) unitary
UP (complex) unitary, packed storage
The last three letters ZZZ indicate the computation performed. For example, SGEBRD is a single precision (S) routine that performs, on a general matrix (GE), a bidiagonal reduction (BRD).
Here’s a guide on how to use LAPACK in C to compute the inverse of a matrix.
To compute the inverse of a matrix A, you first perform an LU decomposition of the matrix and then use the result to compute the inverse. LAPACK provides the dgetrf function for LU decomposition and dgetri for matrix inversion.
#include <stdio.h>
#include <lapacke.h>
int main() {
// Define the matrix A (for example, a 3x3 matrix)
int n = 3;
double A[9] = {
4.0, 2.0, 1.0,
8.0, 7.0, 3.0,
2.0, 1.0, 5.0
};
// Define pivot array and workspace
int ipiv[3]; // Pivot indices array
int info; // Status info
// Perform LU decomposition
info = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, n, n, A, n, ipiv);
if (info != 0) {
printf("LU decomposition failed with error code %d\n", info);
return -1;
}
// Perform matrix inversion using LU decomposition
info = LAPACKE_dgetri(LAPACK_ROW_MAJOR, n, A, n, ipiv);
if (info != 0) {
printf("Matrix inversion failed with error code %d\n", info);
return -1;
}
// Output the inverse matrix
printf("Inverse of A:\n");
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
printf("%6.2f ", A[i * n + j]);
}
printf("\n");
}
return 0;
}
LAPACK_ROW_MAJOR is a constant or macro used when calling LAPACK functions to specify the memory layout of the matrix data. It tells LAPACK that the matrix is stored in row-major order, meaning that the matrix elements are stored in a contiguous block of memory row by row.
Consider this matrix:
A=(a11a21a12a22)
Row-major order: All elements of a row are stored consecutively in memory. In C, matrices are typically stored in row-major order by default. This is what LAPACK_ROW_MAJOR refers to.
In row-major order, the elements would be stored in memory like this:
[a11,a12,a21,a22]
Column-major order: All elements of a column are stored consecutively in memory. Fortran, and hence LAPACK’s native format, uses column-major order by default. So in this order, the above matrix would be stored as:
[a11,a21,a12,a22]
[!NOTE]
Why LAPACK_ROW_MAJOR?
Since LAPACK is traditionally written in Fortran (which uses column-major order), C and C++ users need to specify how their matrices are stored. LAPACK functions can handle both row-major and column-major layouts, but you need to tell LAPACK how your data is arranged in memory.
To inform LAPACK that you’re passing a matrix stored in row-major order, you’d use the LAPACK_ROW_MAJOR constant when calling LAPACK functions. Conversely, LAPACK_COL_MAJOR is used if the matrix is stored in column-major order (the Fortran default).
program matrix_inversion
implicit none
real(8), dimension(2,2) :: A
real(8), dimension(2) :: ipiv
integer :: info
! Matrix A to be inverted
A = reshape((/ 4.0d0, 7.0d0, 2.0d0, 6.0d0 /), (/ 2, 2 /))
! Call LAPACK routine to perform LU decomposition and matrix inversion
call dgetrf(2, 2, A, 2, ipiv, info) ! LU decomposition
if (info /= 0) then
print *, 'Error in LU decomposition, info = ', info
stop
endif
call dgetri(2, A, 2, ipiv, info) ! Matrix inversion
if (info /= 0) then
print *, 'Error in matrix inversion, info = ', info
stop
endif
print *, 'Inverse of matrix A:'
print *, A
end program matrix_inversion
Modern compilers are doing quite a large part of the heavy lifting that's necessary to
produce performing code. The number of strategies a modern compiler is using in order to optimise the code for speed
can be quite impressive, and choosing the correct optimisation flags at compile time can really make a difference. However,
this is true only if the data layout and basic structure of the code allow for these strategies to be implemented. Still,
by performing some optimisaations by hand and looking at the resulting performances and disassembled code we can learn which
optimisation strategies works best and why. We will show that, with a bit of patience, we can even outperform the compiler and
reach performances that are close to the theoretical maximum that, as we will see, is set by the cache bandwdith for this
specific problem.
Let's start with a simple code that adds up floating point numbers:
/* sum_std.c */
double sum(int N, double *a) {
double s = 0.0;
for (int i = 0; i < N; i++) {
s += a[i];
}
return s;
}
We can test this by calling the function repeatedly and collecting some statistics, using this code:
/* runner.c */
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <math.h>
double sum(int N, double *a);
#define NUM_RUNS 200
void time_function(int N, double *a, double (*func)(int, double*), double *mean, double *std) {
clock_t start, end;
double time, m=0.0, s=0.0;
for (int i = 0; i < NUM_RUNS; i++) {
start = clock();
func(N, a);
end = clock();
time = (double)(end - start) / CLOCKS_PER_SEC;
m += time;
s += time*time;
}
m/=NUM_RUNS;
*mean = m;
*std = sqrt(s/NUM_RUNS - m*m);
}
int main(void) {
int N = 1<<20;
double mean, std;
/* Let's enforce 16 byte memory alignment, to be sure */
double *a = (double*) aligned_alloc( (size_t)16, N * sizeof(double));
for (int i = 0; i < N; i++) a[i] = (double)(i + 1);
time_function(N, a, sum, &mean, &std);
return printf("Average execution time for sum: %g ± %g ms \n", 1e3*mean, 1e3*std);
}
We can compile this with different optimisation flags, for example:
> # no optimisation
> gcc -O0 -lm runner.c sum_std.c -o sum && ./sum
Average execution time for sum: 4.11662 ± 0.0645093 ms
We're doing 2**20 = 1,048,576 additions in 4 ms, this means that we're crunching numbers at
the rate of 0.26 Gflops (1 flops = one floating point operation per second). This is far
from the 16 Gflops theoretical peak of the EPYC 7713 @ 2GHz where this code has been ran.
Switching on -O3 improves the performances by a factor 4,
> # -O3
> gcc -O3 -lm runner.c sum_std.c -o sum && ./sum
Average execution time for sum: 1.03888 ± 0.0608442 ms
bringing us to 1 Gflops, still more than 10 times slower than the theoretical peak.
First of all, where does this factor 4 come from?
.L3:
movl -12(%rbp), %eax # Load the loop index (stored at -12(%rbp)) into %eax
cltq # Sign-extend %eax (32-bit) into %rax (64-bit)
leaq 0(,%rax,8), %rdx # Compute the offset by multiplying the loop index by 8 and store it in %rdx
movq -32(%rbp), %rax # Load the base address of the array (or pointer) from -32(%rbp) into %rax
addq %rdx, %rax # Add the computed offset to the base address to get the current element address
movsd (%rax), %xmm0 # Load the double-precision float at the current element address into %xmm0
movsd -8(%rbp), %xmm1 # Load the running sum (or previous value) from -8(%rbp) into %xmm1
addsd %xmm1, %xmm0 # Add the current element value (in %xmm0) to the running sum (in %xmm1)
movsd %xmm0, -8(%rbp) # Store the updated running sum back into -8(%rbp)
addl $1, -12(%rbp) # Increment the loop index (stored at -12(%rbp)) by 1
The version compiled with -O3 is different, in that in .L4 (other lables like .L3 here are used in special array length cases) makes use vectorised load operations:
vmovupd (%rax), %ymm3 # loads 4 double at a time, note the ymm 256-bit register
It seems that the code above is not making full use of the vectorised operations.
Let's try to force its hand by coding the function directly using AVX intrinsics:
/* sum_avx.c */
// this include loads all the necessary definitions
// and constants to use the AVX extensions:
#include <immintrin.h>
double sum(int N, double *a) {
// vsum will use a 256-bit register and is set to zero
__m256d vsum = _mm256_set1_pd(0.0);
// we move 4 doubles at a time (4 x 64 = 256)
for (int i = 0; i < N; i += 4) {
// load 4 doubles at a time and stores them in an AVX register
__mm256d value = _mm256_load_pd(&a[i])
// add the 4 double in column to vsum
vsum = _mm256_add_pd(vsum, value);
}
// now add up horizontally the 4 double in vsum,
// and store the result in vsum
vsum = _mm256_hadd_pd(vsum, vsum);
// copy the lower double-precision element;
// we disregard the rest
return _mm256_cvtsd_f64(vsum);
}
This turns out to be way faster (a factor 16 faster than the -O0 version, and a factor 4 faster than the -O3 version) :
> gcc -O3 -lm -mavx2 runner.c sum_avx.c -o sum && ./sum
Average execution time for sum: 0.264505 ± 0.0396432 ms
That's much better, almost 4 Gflops, only 1/4th of the theoretical peak of 14 Gflops.
How did we manage to get to this performance? Let's have a look at the corresponding assembly code generated with
gcc -O3 -march=native -mtune=native -S sum_avx.c
or, equivalently on this machine (an AMD EPYC 7713), with
where it is clear that the bulk of the work is done in .L3 by these lines:
.L3:
vappd (%rsi), %ymm0, %ymm0 # Add 4 packed double-precision floats from memory (pointed to by %rsi)
# to the values currently in the %ymm0 register. The result is stored back in %ymm0.
# This is a vectorized addition, processing 4 doubles in parallel.
addq $32, %rsi # Increment the memory pointer %rsi by 32 bytes (4 doubles x 8 bytes per double).
# This moves to the next set of 4 double-precision floating-point numbers.
cmpq %rsi, %rax # Compare the updated pointer %rsi with %rax (which holds the address
# marking the end of the array). This checks if we’ve processed all the elements.
jne .L3 # If %rsi is still less than %rax (i.e., there are more elements left to process),
# jump back to label .L3 and repeat the loop.
Now, this way we are fully exploiting AVX instructions by adding packed double-precision floats while
traversing the memory in jumps of 4 doubles.
Why is the -O3 flag not using the vappd instruction? This is because using packed operations would
change the order operations. Remember that in floating point arithmentics
(a+b)+c is not the same as a+(b+c). So, in the code that uses vappd we are adding up the values in the four columns first, and
then we are adding up the four final values. This is not the same as what our initial C code was supposed to do (sum one after another
all the doubles). Let's quick-fix this by releasing the constraint that prevent us from reordering, by passing -ffast-math to the compiler:
> gcc -O3 -mtune=native -march=native -ffast-math -lm runner.c sum_std.c -o sum && ./sum
Average execution time for sum: 0.265075 ± 0.0677651 ms
In fact, we have now recovered the same performances, and looking at the assembly code it's clear that the code now makes full use of the
AVX instructions, for example, looking at the output of
Using -ffast-math might be not what we want in some cases.
How can we convince the compiler to reach this level of optimisation without this "trick"?
The answer is to provide a code that is already arranged in four "columns", so that using
the packed sum vappd won't change the order of the operations; we "unroll" the loop manually:
/* sum_unrolled.c */
double sum(int N, double *a) {
double s[4];
for (int j=0;j<4;j++) s[j]=0.0;
for (int i = 0; i < N-4; i+=4) {
for (int j=0;j<4;j++) s[j] += a[i+j];
}
return s[0]+s[1]+s[2]+s[3];
}
This yields the wanted performance without using -ffast-math:
> gcc -O3 -mtune=native -march=native -lm runner.c sum_unroll.c -o sum && ./sum
Average execution time for sum: 0.26451 ± 0.00615288 ms
Considering that we're doing 2**20 = 1,048,576 additions in 0.27 ms, we're crunching numbers at
the rate of 3.9 Gflops. The EPYC 7713 is working at a clock speed of 2GHz. It's a Zen 3 architecture, which, for the
packed add (VADDPS/D) operations has a latency of three clock cycles and a reciprocal throughput of 0.5
(cycles per instruction), meaning that every cycle it can perform two operations on 4 double precision floats.
Under ideal conditions (an effortless, continuous stream of data from L1 cache to the SIMD units), this would
total 8 flops x 2 GHz = 16 Gflops. This is the way we came to the theoretical peak performances mentioned before.
The fact that the CPI is 0.5 suggests that we could pack more than one vaddpd instruction within the for loop, because at the moment,
as it is also apparent from the assembly code
only one vaddpd is performed before moving the pointer forward. Let's try to force pre-fetching of two 4-double-precision blocks of data in each iteration, simply like this:
> gcc -lm -O3 -mavx2 runner.c sum_avx.c -o sum && ./sum
Average execution time for sum: 0.132805 ± 0.0319479 ms
This is a noticeable performance boos, to 7.9 Gflops!
However, this is still just a bit more than half of peak performance. Where did the the remaining time go "wasted"?
Let's consider the data size: 2**20 double = 8 MB will only fit in L3 cache on the EPYC 7713
(L3: 32 MB/core; L2: 512 kB/core; L1: 32 kB/core). This means that there is a bottleneck to
move data from L3 down to the registers. The cache peak bandwidth is about 32 bytes/clock cycle,
meaning 32 bytes / (8 bytes/flop) x 2 GHz = 8 Gflops, showing that with the code above crunching numbers at 7.9 Gflops we managed to saturate the bandwidth, reaching the actual theoretical limit for this kind of operation on this CPU.
[!NOTE]
One cache line on the 7713 is precisely 64 bytes, or, the 8 double-precision floats
we are loading in the code above. In this sense, the code makes the most efficient use
of the cache bandwidth.
[!IMPORTANT]
It's always important to understand whether your calculation is compute-limited, with
many operations to be performed on data residing in the registers or close nearby, or
memory-limited, where the work is less computationally intensive, but it has to be
performed on a large set of data that needs to be streamed continuously from RAM/cache
the registers. In the last case, the bottleneck is the memory bandwidth, and this usually
prevents to reach the theoretical peak flops.
The Fourier transform is a powerful mathematical tool used to analyze the frequency content of signals and functions. In essence, it allows us to represent a function f(x), which might be a time-dependent signal or a spatial distribution, in terms of its constituent sinusoidal components.
For a continuous function f(x), the Fourier transform provides a way to decompose it into a spectrum of frequencies, which can reveal information that is not immediately apparent in the original function.
The continuous Fourier transform (CFT) of a function f(x) is defined by the integral:
F(k)=∫−∞∞f(x)e2πikxdx,
where:
f(x) is the original function defined in the spatial or time domain
F(k) is the Fourier transform of f(x), representing the amplitude and phase of the sinusoidal components at frequency k
e2πikx is a complex exponential that oscillates at the frequency k
Clearly, the Fourier transformation is a linear opeartion:
The transform of the sum of two functions is equal to the sum of the transforms.
The transform of a constant times a function is that same constant times the transform of the function.
The inverse Fourier transform allows us to reconstruct the original function f(x) from its Fourier transform F(k):
f(x)=∫−∞∞F(k)e−2πikxdk.
This transform pair establishes a relationship between the time (or spatial) domain and the frequency (or wavelength) domain, providing a dual view of the information content in f(x).
The continuous Fourier transform is particularly useful in fields such as physics, engineering, and signal processing, where it helps analyze waveforms, signals, and spatial distributions.
However, working with continuous transforms is not always practical, as it assumes an infinite and continuous range of data points. In practical applications, we often need a discrete representation, which leads us to the discrete Fourier transform (DFT). The DFT operates on sampled data and lays the foundation for computational methods, like the fast Fourier transform (FFT), which efficiently computes the Fourier transform for large datasets.
The Discrete Fourier Transform (DFT) is a powerful tool in signal processing and numerical analysis, allowing us to analyze the frequency components of discrete signals or data sequences.
While the CFT applies to continuous functions, the DFT is defined for discrete sequences, making it particularly useful for applications in digital computing and signal processing.
Given a discrete sequence of N complex numbers x0,x1,…,xN−1, the DFT transforms this sequence into another sequence X0,X1,…,XN−1, which represents the frequency spectrum of the original data.
The transformation is defined as:
Xk=n=0∑N−1xne2πikn/N,k=0,1,…,N−1
Here:
xn are the elements of the original sequence (often representing sampled data in time),
Xk are the elements of the transformed sequence (frequency domain representation),
i is the imaginary unit,
N is the number of points in the sequence.
The DFT maps N complex numbers (the xn) into N complex numbers (the Xk).
The original sequence xn can be recovered from its DFT Xk using the Inverse Discrete Fourier Transform, defined as:
xn=N1k=0∑N−1Xke−2πikn/N,n=0,1,…,N−1
This equation reconstructs the original sequence from its frequency components by reversing the transformation.
For any sampling interval ∆, there is also a special frequency fc, called the Nyquist critical frequency, that is half of the sampling rate, 1/Δ, of a discrete signal:
fc=2Δ1
Let's see why it is important:
Sampling Theorem: According to the Nyquist-Shannon sampling theorem, to accurately capture all the information in a continuous signal without aliasing (falsely translated), the signal must be sampled at least at twice the highest frequency component present in the signal. This means that if the signal contains a frequency component higher than fc, sampling at a rate lower than 1/Δ would lead to aliasing, where higher frequencies are misrepresented as lower ones. In other words: If a system uniformly samples an analog signal at a rate that exceeds the signal’s highest frequency by at least a factor of two, the original analog signal can be perfectly recovered from the discrete values produced by sampling.
Frequency Representation in Discrete Signals: When we sample a continuous signal at a rate of 1/Δ, the highest frequency that can be captured in the resulting discrete signal is fc, or half of the sampling rate. If any frequency component of the signal exceeds this Nyquist frequency, it cannot be distinguished from lower frequencies after sampling.
When performing a DFT on a discrete signal, the Nyquist frequency corresponds to the highest unique frequency that can be resolved without ambiguity. In a DFT of length N, this corresponds to the index n=N/2 (for even N). This frequency is the boundary between positive and negative frequencies in the DFT, as both positive and negative frequencies can have values from 0 to ±fc.
If a continuous function is sampled properly, we can assume that its Fourier transform should be zero outside [−fc,fc]. This is based on the idea that, after proper sampling, no significant frequency components should exist outside this range. So, when we estimate the Fourier transform of a signal from discrete samples, we typically expect the signal's frequency content to be confined to the range [−fc,fc]. This assumption helps us check whether the sampling was done correctly.
To verify that the continuous function has been competently sampled (with minimal aliasing), we examine how the Fourier transform behaves near the boundaries of the frequency range. Specifically:
As the frequency approaches fc from below (positive frequencies), or −fc from above (negative frequencies), the Fourier transform should approach zero.
If the Fourier transform doesn't approach zero and instead tends to a nonzero value as it reaches fc or −fc, this suggests aliasing.
To conclude, aliasing occurs when higher frequency components (above the Nyquist frequency) "fold back" into the sampled signal's frequency range. This happens when the sampling rate is too low to capture the signal's full frequency content, causing higher frequencies to appear incorrectly as lower frequencies in the sampled data.
If the Fourier transform doesn't approach zero as expected, it indicates that such folding has occurred, and the sampling was not sufficient to capture the original signal accurately.
By checking the behavior of the Fourier transform at the edges of this range, we can detect whether aliasing has occurred.
The DFT has several useful properties that are similar to those of the continuous Fourier transform:
Linearity: The DFT of a linear combination of two sequences is the same combination of their individual DFTs: if yn=axn+bzn then Yk=aXk+bZk.
Symmetry: If the original sequence xn is real, then the DFT satisfies certain symmetry properties. For instance, the DFT of a real sequence has Hermitian symmetry: $X_{N-k} = X_k^{*}$, where $X_k^{*}$ is the complex conjugate of Xk
Periodicity: Both the input and output sequences are periodic with period N, meaning xn+N=xn and Xk+N=Xk.
Parseval’s Theorem: This theorem relates the total energy of the sequence in the time domain to the energy in the frequency domain:
Index Range in DFT: Usually, in the DFT, the index n runs from −N/2 to N/2 for N data points. However, because of periodicity in the DFT, the result repeats every N indices. This allows us to “wrap around” the values so that we only need indices from 0 to N−1, effectively covering a full cycle.
Symmetry in Frequency: The property H−n=HN−n indicates that values are symmetric around n=0, reflecting the periodic structure. So, by considering only indices from 0 to N−1, we avoid redundancies and simplify the indexing.
Frequency Mapping and Ranges: When this convention is applied, the frequencies associated with each index n are interpreted as follows:
Zero Frequency: Corresponds to n=0.
Positive Frequencies: For frequencies in the range 0<f<fc, the corresponding n values are from 1 to N/2−1. These represent the positive part of the frequency spectrum.
Negative Frequencies: For frequencies −fc<f<0, we use the indices from N/2+1 to N−1. This range effectively represents negative frequencies in the DFT.
Nyquist Frequency:
The index n=N/2 is special because it corresponds to the Nyquist frequencyf=fc, which is the highest frequency that can be resolved without aliasing. In this convention, f=fc is equivalent to f=−fc, representing the boundary between positive and negative frequencies.
By reinterpreting n to range from 0 to N−1, we make the mapping between the indices and frequencies clearer and allow for an easier, symmetric representation of the DFT spectrum.
Computing the DFT directly using its definition requires O(N2) operations, because each of the N outputs Xk requires N computations,. For large N, this becomes computationally expensive. This is where the Fast Fourier Transform (FFT) comes into play, providing a highly efficient algorithm to compute the DFT with a complexity of O(NlogN). The FFT revolutionized digital signal processing by making the DFT feasible for large data sets.
In the next section, we will explore the FFT in detail, discussing how it optimizes the DFT calculation and why it is a cornerstone of modern signal processing.
In other words, calculating the DFT can be represented as multiplying a vector of input values xn by a matrix, where each entry in the matrix has the form Wk×n, with W being a complex constant. This matrix multiplication yields a new vector, Xk, containing the transformed values. However, performing this operation directly requires N2 complex multiplications, making the DFT an O(N2) process.
At first glance, this might seem like the only way to compute the DFT, but there's a far more efficient approach. The Fast Fourier Transform (FFT) algorithm allows us to compute the DFT in only O(NlogN) operations, a significant improvement. For a large input size, say N=106, the difference in performance is drastic: rather than taking around two weeks (for perforiming 1012 operations), an FFT computation might only take seconds on the same computer (by performing ≈6×106 operations).
The FFT algorithm was popularized in the mid-1960s through the work of J.W. Cooley and J.W. Tukey, but this discovery was not entirely new. Efficient methods for computing the DFT date back much further; mathematicians, including Carl Friedrich Gauss, independently discovered similar techniques as early as 1805.
A key rediscovery of the FFT, made by Danielson and Lanczos in 1942, provides an elegant and straightforward derivation of the algorithm. They showed that a DFT of length N can be broken down into two smaller DFTs, each of length N/2. Specifically, this involves separating the original sequence into two parts: one DFT that processes only the even-numbered points and another that processes the odd-numbered points.
By expressing the full DFT in terms of these two smaller DFTs, they revealed a structure that significantly reduces the computational load:
The term Fke represents the k-th component of the Fourier transform of length N/2, calculated from the even-indexed elements of the original sequence fj. Similarly, Fko is the Fourier transform of length N/2 obtained from the odd-indexed elements. Note that in the last equality, k ranges from 0 to N, rather than stopping at N/2.
The beauty of the Danielson-Lanczos Lemma is that it can be applied repeatedly. By breaking down Fk into the smaller transforms Fke and Fko, we can further split each of these into transforms of length N/4, composed of even and odd indices within each half. This approach continues recursively, dividing the data into smaller and smaller sections, labeled as Fkee (even-even terms), Fkeo (even-odd terms), and so on, with each subdivision based on even and odd indices.
The most straightforward application of this approach is when N is a power of two. It is highly recommended to use the FFT only when N is a power of two, as this ensures the recursive breakdown works optimally. If your dataset length is not a power of two, you should pad it with zeros until it reaches the next power of two. This recursive splitting continues until each sub-transform is of length 1. At this point, the Fourier transform of a single element is trivial—it simply returns that element, effectively copying the input to the output directly.
In other words, for every pattern of log2N e’s and o’s, there is a one-point transform that is just one of the input numbers fn
Fkeooeooeeeeoo…oee=fn
The next step is to determine which value of n corresponds to each sequence of e's and o's. Here’s the trick: reverse the sequence of e's and o's, then replace each "e" with 0 and each "o" with 1. The result is the binary representation of n.
Why does this method work? It’s because each successive division into even and odd parts effectively tests the lower bits of n. Each step down in the recursive process separates the data based on whether an index is even or odd at that bit level, starting from the least significant bit. By following this process, the bit pattern builds up from the least significant bit to the most significant bit as we proceed through the recursion.
f0 f1 f2 f3 f4 f5 f6 f7
/ \
e o
f0 f2 f4 f6 f1 f3 f5 f7
/ \ / \
e o e o
f0 f4 f2 f6 f1 f5 f3 f7
/ \ / \ / \ / \
e o e o e o e o
f0 f4 f2 f6 f1 f5 f3 f7
eee eeo eoe eoo oee oeo ooe ooo
Reverse: eee oee eoe ooe eeo oeo eoo ooo
(e=0, o=1): 000 100 010 110 001 101 011 111
This bit-reversal technique is an essential part of making FFT algorithms efficient. The idea is to rearrange the original data array in a special way: instead of ordering data elements sequentially by their index j, we order them according to the "bit-reversed" version of j. This means that if we represent the index j in binary, we reverse its bits to get a new index, and reorder the data based on these reversed indices.
Using this bit-reversed order simplifies the recursive application of the Danielson-Lanczos Lemma. In this configuration, we start with single-point "transforms" (just the original data points), then combine adjacent pairs to get two-point transforms, then combine adjacent pairs of those to get four-point transforms, and continue doubling in this manner. This process progresses until we combine the first and second halves of the entire dataset into the final transform.
The FFT algorithm achieves an O(Nlog2N) runtime because each level of combination involves N operations and there are log2N levels (or "stages") of combination. Reordering the data into bit-reversed order itself also takes O(Nlog2N) time, so it doesn’t increase the algorithm's overall complexity.
The FFT algorithm, then, consists of two main parts:
Bit-reversal reordering: This rearranges the data in-place, without requiring additional storage, by swapping pairs of elements according to bit-reversed indices.
Transform computation: This part has an outer loop that runs log2N times, each time calculating sub-transforms of length 2,4,8,…,N. Each stage applies the Danielson-Lanczos Lemma through nested inner loops, combining previously computed sub-transforms. To optimize efficiency, sines and cosines of angles are calculated only log2N times in the outer loop and used with recurrence relations in the inner loops to avoid repeated calls for trigonometric functions.
This structure allows FFT to efficiently compute discrete Fourier transforms, dramatically reducing computation time compared to direct DFT methods.
we continue by substituting k=2k for the even terms and k=2k+1 for the odd terms, thereby halving the range of the summation.
This results in the following expressions:
Each summation involves splitting and applying the Fourier components in a recursive pattern, halving the size of the summations at each level until we reach single-point transforms. This breakdown simplifies computation by leveraging symmetry and periodicity in the terms, a core idea in the Fast Fourier Transform (FFT) algorithm.
In Fourier analysis, windowing is a technique used to improve the accuracy of the DFT when applied to real-world signals, especially those that are not periodic within the sampled interval. When we analyze signals in practice, we often encounter finite-length data segments that may not reflect the full behavior of the signal over time. This truncation can introduce errors, commonly known as spectral leakage, which makes interpreting the frequency components more challenging. Windowing mitigates these effects and enhances the quality of the frequency analysis.
The DFT assumes that the data being transformed is periodic, meaning that the signal repeats itself seamlessly. However, if the data ends abruptly or does not naturally repeat, the DFT sees a discontinuity at the edges of the interval. This discontinuity introduces artificial high-frequency components that distort the frequency spectrum of the signal. This phenomenon, called spectral leakage, causes energy from one frequency to "leak" into neighboring frequencies, resulting in a smeared or spread-out spectrum.
For example, if we analyze a single tone signal (e.g., a sine wave) with a finite, non-periodic segment, the frequency spectrum ideally should have a single sharp peak at the signal's frequency. However, without windowing, spectral leakage will cause additional frequency components to appear around the main peak, complicating the analysis and obscuring other important frequencies in the data.
Windowing involves multiplying the signal by a window function before performing the DFT. A window function is typically a smooth, bell-shaped curve that tapers to zero at the edges. By applying a window, we force the signal to smoothly approach zero at the edges, reducing the abrupt transitions that cause spectral leakage.
Mathematically, for a signal x[n] of length N, windowing modifies the signal by multiplying each sample by a window function w[n]:
xwindowed[n]=x[n]⋅w[n]for n=0,1,…,N−1.
The windowed signal xwindowed[n] is then used as the input to the DFT, resulting in a more accurate frequency spectrum.
There are many window functions, each designed to balance between minimizing spectral leakage and preserving signal information. Here are some commonly used window functions:
Hann (or Hanning) Window: A cosine-based window that smoothly tapers to zero at the edges. It significantly reduces spectral leakage and is widely used for general-purpose frequency analysis.
Hamming Window: Similar to the Hann window, but with a slightly different shape. It reduces side lobes (frequency leakage) even further, making it useful when high spectral resolution is needed.
Blackman Window: A more aggressively tapered window, reducing side lobes even more than the Hann or Hamming windows. However, it also has a wider main lobe, meaning some frequency resolution is sacrificed.
The choice of window depends on the specific analysis goals—whether to prioritize frequency resolution (sharp peaks in the frequency domain) or suppression of leakage.
A cluster is a network of interconnected computers, called nodes, that work together to perform parallel computations efficiently. Each node can have multiple processors or cores, allowing tasks to be distributed both across nodes and within them. This layout is ideal for utilizing MPI (Message Passing Interface) for inter-node communication and OpenMP (Open Multi-Processing) for multi-threading within nodes.
Nodes: Each node in a cluster functions as an individual computing unit. Nodes typically contain several CPU cores, memory, and storage. Clusters can contain anywhere from a few to thousands of nodes, depending on their purpose and scale.
Interconnect: The nodes are linked through a high-speed network interconnect, which allows for rapid data transfer across nodes. This interconnect is critical to a cluster’s performance, as it determines how quickly information can be shared between nodes. Common interconnects in high-performance clusters include Infiniband, Ethernet, and more specialized networking technologies.
In scientific computing and high-performance applications, leveraging multiple CPUs can significantly reduce computation time. Two widely used frameworks for parallelization on multi-CPU systems are MPI (Message Passing Interface) and OpenMP (Open Multi-Processing).
Although both facilitate parallel computation, they rely on different paradigms and are suited for distinct types of applications. Understanding these paradigms allows developers to choose the most efficient approach for their specific computational tasks.
OpenMP (Open Multiprocessing) is designed for shared-memory parallelization, where multiple threads work on different parts of a task within a single, shared memory space. This approach allows threads to directly access and modify the same variables in memory, making OpenMP well-suited for systems with a single, large memory pool, like multi-core CPUs.
OpenMP uses a directive-based approach in languages like C and Fortran, allowing you to mark loops or regions of code to be executed in parallel. For instance, a simple loop that calculates values for an array can be parallelized with OpenMP by adding a single pragma directive:
#include <omp.h>
#include <stdio.h>
int main() {
int i;
int array[1000];
#pragma omp parallel for
for (i = 0; i < 1000; i++) {
array[i] = i * i; // each thread computes part of the array
}
return 0;
}
In this example, OpenMP automatically splits the loop among available threads, each of which independently performs its calculations on a portion of the array.
MPI is suited for distributed-memory systems, where each CPU has its own local memory. Rather than sharing memory, processes communicate by sending messages to each other. This model is particularly useful in large-scale computing clusters, where multiple nodes with independent memory need to work together on complex computations.
MPI is implemented through library functions that handle the sending and receiving of data between processes. Here’s a simple example in C illustrating how MPI can be used to distribute computations across multiple processes:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv); // Initialize MPI
MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get process rank
MPI_Comm_size(MPI_COMM_WORLD, &size); // Get total number of processes
printf("Hello from process %d of %d\n", rank, size);
MPI_Finalize(); // Finalize MPI
return 0;
}
In this example, each process prints its rank (unique ID) among the total number of processes. This output shows how each process works independently, and coordination between processes requires explicit communication through MPI functions.
Memory Model: OpenMP relies on shared memory, making it more straightforward but limited to systems with shared memory architectures (typically single-node systems). On the other hand, MPI uses distributed memory, which is ideal for multi-node clusters.
Scalability: MPI is more scalable across nodes, as each process has its own memory. OpenMP’s shared memory model generally scales well only within a single node.
Communication Overhead: OpenMP has lower overhead for data access since memory is shared. MPI, however, has higher overhead due to message passing, which involves data transfer between separate memory spaces.
Ease of Use: OpenMP is typically easier to implement in code, as it requires fewer modifications to existing code (mostly through compiler directives). MPI requires explicit communication, which can make it more complex but offers greater control over data distribution and process management.
The choice between MPI and OpenMP depends on the problem size and the hardware architecture. For multi-core CPUs or systems with shared memory, OpenMP is often more convenient and efficient. However, for computations that need to span multiple nodes or require significant data movement, MPI is usually the better choice. For hybrid systems with both shared and distributed memory (e.g., a cluster of multi-core nodes), a combination of MPI and OpenMP can maximize performance, using MPI for inter-node communication and OpenMP for intra-node parallelism.
In such a configuration, you can either use MPI on 4×64 cores, or you can use the hybrid MPI/OpenMP: in the latter case, you run MPI on 4×2 CPUs with each MPI process using 32 OpenMP threads.
Imagine you need to perform a large-scale numerical integration that would take too long on a single core:
Using MPI, you can divide the integration domain into smaller subdomains, assigning one to each node. Each node then focuses on a portion of the problem, making inter-node communication essential to manage the shared workload.
Within each node, OpenMP can parallelize calculations across the 64 cores, speeding up the integration of the assigned subdomain.
Finally, MPI gathers the partial results from each node, combining them to produce the final solution.
This combination of MPI and OpenMP might be powerful. It allows clusters to solve problems at a scale and speed that would be impossible on a single machine.
OpenMP is not a separate library you install but a specification supported by many compilers, such as GCC, Clang, and Intel compilers. To use OpenMP, you need a compatible compiler and enable OpenMP support during compilation.
tar -xzf openmpi-x.x.x.tar.gz # Replace x.x.x with the version number
cd openmpi-x.x.x
./configure --prefix=/path/to/install
make -j$(nproc)
sudo make install
We start from the serial code we used in the previous chapters, calculates the integral of a function f(x)=x3 over a range [a,b].
To parallelize the trapezoidal integration with MPI, we can divide the integration range among multiple processors. Each processor computes the integral over a smaller sub-range, and then the results from all processors are combined to obtain the final integral. This approach leverages the distributed processing power of the cluster to speed up the computation.
#include <stdio.h>
#include <math.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration
double trapezoidal_rule(double (*func)(double), double a, double b, int n) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += func(x);
}
return sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n > 0) {
printf("The number of trapezoids is positive.\n");
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
} else {
printf("Error: The number of trapezoids is zero.\n");
}
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
The MPI-parallelised version of the code is:
#include <stdio.h>
#include <math.h>
#include <mpi.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration over a sub-interval
double trapezoidal_rule(double (*func)(double), double a, double b, int n) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += func(x);
}
return sum * p;
}
int main(int argc, char *argv[]) {
int rank, size;
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Total number of trapezoids (higher n for better accuracy)
double local_a, local_b; // Local limits for each process
int local_n; // Number of trapezoids for each process
double local_result, total_result; // Local and total integral results
// Initialize MPI
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Check if n is a positive number
if (rank == 0) {
if (n <= 0) {
printf("Error: The number of trapezoids must be positive.\n");
MPI_Abort(MPI_COMM_WORLD, 1);
return 1;
}
}
// Divide the interval [a, b] among processes
double h = (b - a) / size; // Width of each sub-interval
local_a = a + rank * h;
local_b = local_a + h;
local_n = n / size;
// Each process computes the integral over its sub-interval
local_result = trapezoidal_rule(f, local_a, local_b, local_n);
// Use MPI_Reduce to sum up the results from all processes
MPI_Reduce(&local_result, &total_result, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
// The root process (rank 0) prints the final result
if (rank == 0) {
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, total_result);
}
// Finalize MPI
MPI_Finalize();
return 0;
}
To compile and run an MPI program, you need an MPI compiler (like mpicc for C and mpif90 for Fortran) and an MPI runtime to execute the code across multiple processes. Here’s a step-by-step guide for compiling and running the code.
MPI_Init and MPI_Finalize are essential functions that manage the lifecycle of an MPI application. They are used to initialize and finalize the MPI environment, allowing multiple processes to communicate with each other through message-passing.
The function MPI_Init initializes the MPI environment and must be called before any other MPI function. It prepares the program to use the MPI library by setting up the required resources and creating a "communicator", which establishes a context for all participating processes to communicate.
The MPI_Finalize function shuts down the MPI environment. It releases resources, terminates MPI communication, and ensures a clean exit for the parallel application. All processes in the communicator must reach MPI_Finalize for the program to end correctly.
The signatures in C are:
int MPI_Init(int *argc, char ***argv);
int MPI_Finalize(void);
MPI_Init: The function takes pointers to argc and argv from main, allowing MPI to process command-line arguments.
MPI_Finalize: Takes no parameters and simply finalizes the MPI environment.
The signature in Fortran are:
CALL MPI_INIT(ierr)
CALL MPI_FINALIZE(ierr)
MPI_INIT: Takes a single argument, ierr, which is an integer to hold the error status (0 for success).
MPI_FINALIZE: Similarly takes an ierr argument for error handling.
The MPI_Comm_rank function is a core MPI command that allows each process in a distributed system to determine its unique identifier, or “rank,” within a given communicator. This rank is an integer that typically starts at zero for the first process and increments by one for each additional process in the communicator. Knowing the rank of each process is essential for controlling how tasks are distributed and managed in parallel computation.
The signatures in C are:
int MPI_Comm_rank(MPI_Comm comm, int *rank);
The signature in Fortran are:
CALL MPI_COMM_RANK(comm, rank, ierr)
comm: The communicator for which the rank is being queried, usually MPI_COMM_WORLD, which represents all processes initiated by MPI_Init.
rank: An integer variable where the rank of the calling process will be stored.
ierr(Fortran only): An integer error code. Zero indicates success.
[!NOTE]
The rank values range from 0 to size-1, where size is the total number of processes in the communicator.
MPI_COMM_WORLD is the default communicator encompassing all processes. Custom communicators can be defined to group subsets of processes.
[!TIP]
To ensure a section of code is executed by only one process, use a conditional check on the rank, commonly rank == 0. For example, if you want a specific action (such as printing output or managing I/O) to be done by only one process, add a check for rank == 0. In this way, only the process with rank 0 —- often called the “master” or “root” process —- will execute that block, avoiding redundancy and reducing communication overhead.
In our example:
// Check if n is a positive number
if (rank == 0) {
if (n <= 0) {
printf("Error: The number of trapezoids must be positive.\n");
MPI_Abort(MPI_COMM_WORLD, 1);
return 1;
}
}
The MPI_Comm_size function allows you to determine the total number of processes participating in a given communicator, typically MPI_COMM_WORLD. Knowing the number of processes (size) is essential for distributing tasks evenly and dynamically adapting workloads in parallel applications. This is especially useful in dividing up iterations or data among processes.
The signatures in C are:
MPI_Comm_size(MPI_Comm comm, int *size);
The signature in Fortran are:
CALL MPI_Comm_size(comm, size, ierr)
comm: The communicator for which the rank is being queried, usually MPI_COMM_WORLD, which represents all processes initiated by MPI_Init.
size: [(C only) A pointer to] an integer where the number of processes will be stored.
ierr(Fortran only): An integer error code. Zero indicates success.
In our example, we used size to divide the integration interval among the processes:
// Divide the interval [a, b] among processes
double h = (b - a) / size; // Width of each sub-interval
MPI_Abort provides a way to terminate all processes in a communicator if a critical error occurs, making it a helpful tool for immediate shutdown in MPI programs. When called, MPI_Abort stops all processes within the specified communicator (usually MPI_COMM_WORLD) and returns an error code to indicate the reason for termination. This is particularly useful in situations where continuing execution would lead to incorrect or unpredictable behavior.
The signatures in C are:
MPI_Abort(MPI_Comm comm, int errorcode);
The signature in Fortran are:
CALL MPI_Abort(comm, errorcode, ierr)
comm: The communicator for which the rank is being queried, usually MPI_COMM_WORLD, which represents all processes initiated by MPI_Init.
errorcode: The integer error code to return, indicating the reason for aborting.
ierr(Fortran only): An integer error code. Zero indicates success.
[!NOTE]
The difference between MPI_Abort and MPI_Finalize lies in how they terminate an MPI program and the context in which each should be used. MPI_Finalize is used to gracefully shut down an MPI program after all processes have completed their tasks and the program is ready to terminate normally. MPI_Abort is used to immediately terminate all processes within a communicator if a critical error or unexpected condition is detected that prevents the program from continuing safely. In particular, MPI_Abort provides an error code for debugging purposes, while MPI_Finalize does not.
MPI_Reduce is a collective communication function that performs a reduction operation across all processes within a communicator. It combines data from each process and reduces it to a single result, which is stored in one specified process, typically the root process.
The signatures in C are:
int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm);
sendbuf: The starting address of the data buffer to be reduced from each process. Each process contributes its data stored here.
recvbuf: The starting address of the buffer where the root process stores the result. (Ignored by non-root processes).
count: The number of elements in the data buffer from each process.
datatype: The data type of elements in sendbuf and recvbuf (e.g., MPI_INT, MPI_FLOAT, MPI_DOUBLE).
op: The reduction operation to be applied, like MPI_SUM, MPI_MAX, MPI_MIN, or custom operations.
root: The rank of the process that will store the result.
comm: The communicator within which the operation is performed (usually MPI_COMM_WORLD).
ierr(Fortran only): An integer error code. Zero indicates success.
[!Warning]
MPI_Reduce collects data from all processes, applies the reduction operation, and stores the result in the buffer of the designated root process. Other processes will not receive the result directly; only the root process gets the reduced value.
The MPI_Allreduce function in MPI is similar to MPI_Reduce but with one key difference: it performs a reduction operation across all processes and then shares the result with all processes, not just a designated root process. This makes MPI_Allreduce particularly useful when every process needs access to the result of the reduction.
The signatures in C are:
int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);
sendbuf: The starting address of the data buffer to be reduced from each process. Each process contributes its data stored here.
recvbuf: The starting address of the buffer where the root process stores the result. (Ignored by non-root processes).
count: The number of elements in the data buffer from each process.
datatype: The data type of elements in sendbuf and recvbuf (e.g., MPI_INT, MPI_FLOAT, MPI_DOUBLE).
op: The reduction operation to be applied, like MPI_SUM, MPI_MAX, MPI_MIN, or custom operations.
comm: The communicator within which the operation is performed (usually MPI_COMM_WORLD).
ierr(Fortran only): An integer error code. Zero indicates success.
MPI_Allreduce collects data from all processes, performs the specified reduction operation, and then distributes the result back to all participating processes. Each process will have the final reduced value in its recvbuf after the call.
For numerical simulations, the heat equation is often discretized using finite differences.
For example, using the explicit finite difference method, the time derivative can be approximated as:
∂t∂T≈ΔtT(x,t+Δt)−T(x,t)
and the second spatial derivative as:
∂x2∂2T≈(Δx)2T(x+Δx,t)−2T(x,t)+T(x−Δx,t)
Substituting these approximations into the heat equation gives a scheme to calculate T(x,t+Δt) from T(x,t):
In parallelizing the heat diffusion equation, the computational domain is divided into subdomains, each handled by a separate process. This division allows each process to handle a smaller subset of the domain, reducing the computational load per process and enabling larger simulations to run more efficiently.
Let’s assume a one-dimensional domain with N points divided into np subdomains, where np is the number of processes. Each process is responsible for updating the temperature values for a certain range of x-values within the domain. For example:
Process 0 will handle points from x0 to xN/np−1.
Process 1 will handle points from xN/np to x2N/np−1.
And so on, until the last process handles x(np−1)N/np to xN.
For example, if we have N=64 points and np=4 processes, each process will handle n=N/np=16 points. We will divide the domain as follow:
Process 0 will handle points in range [0,n−1]=[0,N/np−1]=[0,15].
Process 1 will handle points in range [n,2n−1]=[n,2N/np−1]=[16,31].
Process 2 will handle points in range [2n,3n−1]=[2n,3N/np−1]=[32,47].
Process 3 will handle points in range [3n,4n−1]=[3n,4N/np−1]=[48,63].
In each time step, updating the temperature at a given point requires information from neighboring points. Specifically, points near the boundaries of each process’s subdomain need the latest temperature values from the adjacent processes. This is where boundary communication becomes necessary:
Exchange of Boundary Values: Each process needs to send its boundary data (the temperature at its boundary points) to its neighboring processes. For instance:
Process i must send the temperature at its rightmost point to Process i+1 and receive the leftmost boundary value from Process i−1.
Similarly, it sends its leftmost boundary to Process i-1 and receives the rightmost boundary value from Process i+1.
Using MPI_Send and MPI_Recv: Communication can be implemented with MPI_Send and MPI_Recv:
Non-blocking communication (e.g., using MPI_Isend and MPI_Irecv) can allow computation to overlap with communication, potentially improving performance.
Boundary update ensures that each process has the necessary information before updating the temperature values at boundary points.
The MPI_Send function is a core part of MPI that sends a message from one process to another within an MPI communicator. It performs a standard-mode blocking send (below there is a section explaining blocking vs. non-blocking operations), meaning it only returns once the send buffer is safe to modify, which may occur when the message is delivered. Here’s a breakdown of its syntax in both C and Fortran.
The signature in C is:
int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);
buf: Pointer to the data being sent.
count: Number of elements in the send buffer.
datatype: Data type of each element (e.g., MPI_INT, MPI_DOUBLE).
dest: Rank of the destination process within the communicator.
tag: Message tag to identify the message type, used in pairing sends and receives.
The MPI_Recv function is the counterpart to MPI_Send, and it allows a process to receive a message from another process in a blocking fashion. This means that the receiving process will wait (block) until the requested message is received, and it can then proceed with execution. The message is stored in a buffer that is provided by the receiving process.
The matching between the send and receive operations is done using the message's tag and destination rank. The tag allows the program to distinguish between different types of messages.
The signature in C is:
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status);
buf: Pointer to the buffer where the received data will be stored.
count: The maximum number of elements to receive (this should match the number of elements sent by the sender).
datatype: Data type of each element (e.g., MPI_INT, MPI_DOUBLE).
source: The rank of the sending process, or MPI_ANY_SOURCE if the message can come from any process.
tag: The message tag used to identify the message, or MPI_ANY_TAG to accept messages with any tag.
comm: Communicator (usually MPI_COMM_WORLD).
status: A pointer to an MPI_Status structure, which is used to store information about the received message, such as the actual source and tag of the message.
In MPI, a blocking call refers to a function that causes the calling process to wait until the operation completes before it continues execution. This means the process will block, or be paused, until it receives a response or the data it is waiting for has been transmitted or received.
For example, when a process calls MPI_Send to send a message, it will block until the message is sent (or at least successfully enqueued for transmission, depending on the specific call).
Similarly, MPI_Recv is blocking: the receiver will block until the message is received, meaning it waits for the message to arrive.
Blocking calls can simplify program flow because the program doesn't need to handle asynchronous communication. However, blocking calls can also lead to inefficiencies, especially in parallel applications where multiple processes might be waiting on each other.
In contrast, non-blocking calls like MPI_Isend and MPI_Irecv allow processes to continue execution while the communication is being handled in the background, and completion must be explicitly checked (using MPI_Wait, for example).
Blocking calls are generally easier to program but can lead to performance bottlenecks, especially when processes have to wait for each other to send or receive messages.
Here is an example in which the code gets blocked due to incorrect use of MPI_Send and MPI_Recv. This happens when the sender and receiver are not properly matched, or if there is a mismatch between the expected number of elements or tags.
Suppose we have two processes. One process is sending data, and the other process is supposed to receive it. The issue occurs when the receiver is not correctly set up to receive the message, such as by using the wrong tag or not providing enough space in the receive buffer.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int rank, size;
int send_data = 123; // Data to send
int recv_data; // Buffer to receive data
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size < 2) {
printf("This example requires at least 2 processes.\n");
MPI_Finalize();
return 1;
}
if (rank == 0) {
// Process 0 sends data with tag 100
printf("Process %d sending data: %d\n", rank, send_data);
MPI_Send(&send_data, 1, MPI_INT, 1, 100, MPI_COMM_WORLD);
}
else if (rank == 1) {
// Process 1 expects data with tag 200 (incorrect tag)
printf("Process %d waiting for data...\n", rank);
MPI_Recv(&recv_data, 1, MPI_INT, 0, 200, MPI_COMM_WORLD, MPI_STATUS_IGNORE); // Incorrect tag here
printf("Process %d received data: %d\n", rank, recv_data);
}
MPI_Finalize();
return 0;
}
To fix it, just change the tag to match the send and receive message.
You might use the following Python script to create an animation showing the time evolution of the temperature field:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
# Parameters
NX = 100 # Number of spatial points
NT = 99900 # Total number of time steps
every = 1000 # Time step interval for displaying the frame
# Initialize x positions and dummy y positions for a "bar" look
x = np.linspace(0, NX - 1, NX)
y = np.ones(NX) # Fixed y-position for visualization
# Initialize the figure
fig, ax = plt.subplots(figsize=(10, 2))
colorbar = plt.colorbar(plt.cm.ScalarMappable(cmap="coolwarm"), ax=ax, orientation='horizontal', label='Temperature')
plt.xlim([0, NX])
plt.ylim([0.995, 1.005])
ax.set_title("1D Heat Diffusion")
# Define update function to load data from file and update plot
def update(frame):
# Construct the filename for the current time step
file = f'heat_output_step_{frame}.dat'
# Load temperature data
try:
T = np.loadtxt(file, skiprows=1) # Load temperature data from the file
# Clear previous scatter and reinitialize it with the new color data
ax.collections.clear() # Clear old scatter plot objects
scat = ax.scatter(x, y, c=T, cmap="coolwarm", marker='s', s=100) # Reinitialize color data
ax.set_title(f"1D Heat Diffusion - Step {frame}")
except Exception as e:
print(f"Error loading or processing file {file}: {e}")
return scat,
# Create the animation
frames = range(0, NT, every) # Use only every nth frame
ani = FuncAnimation(fig, update, frames=frames, blit=True)
plt.close(fig)
# Display the animation in the notebook
HTML(ani.to_jshtml())
In addition to the fundamental MPI functions you already know, such as MPI_Init, MPI_Finalize, MPI_Comm_rank, MPI_Comm_size, MPI_Send, MPI_Recv, and MPI_Reduce, there are several other important MPI functions that provide greater flexibility and efficiency when designing parallel programs. These functions are especially useful for handling complex communication patterns, collective operations, and custom data handling.
This section is meant to just provide an overview of other existing MPI functions.
Non-blocking communication functions allow processes to initiate a communication operation and then proceed with other work while the operation completes in the background. This can improve performance by overlapping communication and computation.
MPI_Isend: Initiates a non-blocking send operation.
int MPI_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request);
MPI_Irecv: Initiates a non-blocking receive operation.
int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request);
MPI_Wait: Waits for a specific non-blocking operation to complete.
int MPI_Wait(MPI_Request *request, MPI_Status *status);
MPI_Waitall: Waits for all specified non-blocking operations to complete.
int MPI_Waitall(int count, MPI_Request array_of_requests[], MPI_Status array_of_statuses[]);
These collective operations enable efficient communication of data among processes.
MPI_Bcast: Broadcasts data from one process to all others.
int MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm);
Example: A root process sends an array to all processes in the communicator.
MPI_Scatter: Distributes parts of an array from the root process to all processes.
int MPI_Scatter(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm);
MPI_Gather: Gathers parts of an array from all processes to the root process.
int MPI_Gather(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm);
MPI_Allgather: Gathers data from all processes and distributes the results to all processes.
int MPI_Allgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm);
These functions allow you to handle more complex parallelism scenarios, optimize performance, and manage data effectively across processes. Mastering them will give you a broader toolbox for tackling real-world computational problems.
MPI provides a way to arrange processes in a Cartesian grid, which is especially useful for applications with structured grids, such as finite difference or finite volume solvers.
We start from the serial code we used in the previous chapters, calculates the integral of a function f(x)=x3 over a range [a,b].
To parallelize the trapezoidal integration with OpenMP, we make use of compiler directives (pragmas) to divide the integration range among multiple threads. Each thread computes the integral over a smaller sub-range, and then the results from all threads are combined to obtain the final integral.
#include <stdio.h>
#include <math.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration
double trapezoidal_rule(double (*func)(double), double a, double b, int n) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += func(x);
}
return sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n > 0) {
printf("The number of trapezoids is positive.\n");
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
} else {
printf("Error: The number of trapezoids is zero.\n");
}
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
The Open-MP parallelised version of the code is:
#include <stdio.h>
#include <math.h>
#include <omp.h>
// Define the function to integrate: f(x) = x^3
double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// Trapezoidal rule for numerical integration with OpenMP parallelization
double trapezoidal_rule(double (*func)(double), double a, double b, int n) {
double p = (b - a) / n; // Width of each trapezoid
double sum = 0.5 * (func(a) + func(b)); // End points contribution
// Parallelize the summation over trapezoids
#pragma omp parallel for reduction(+:sum)
for (int i = 1; i < n; i++) {
double x = a + i * p;
sum += func(x);
}
return sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n > 0) {
printf("The number of trapezoids is positive.\n", n);
} else if (n < 0) {
printf("Error: The number of trapezoids is negative.\n");
} else {
printf("Error: The number of trapezoids is zero.\n");
}
// Perform numerical integration
double result = trapezoidal_rule(f, a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
By default, OpenMP uses as many threads as there are available CPU cores. You can control the number of threads by setting the OMP_NUM_THREADS environment variable before running the program.
For example, to use 4 threads:
export OMP_NUM_THREADS=4
./my_program
Alternatively, you can set the number of threads within the code using the omp_set_num_threads() function:
#include <omp.h>
omp_set_num_threads(4); // Sets the number of threads to 4
The directive #pragma omp parallel for reduction(+:sum) is an OpenMP construct that combines parallel processing with a reduction operation. Here’s how it works:
Parallelization: The #pragma omp parallel for part of the directive tells the compiler to parallelize the following for loop. This means that each iteration of the loop can be executed by a different thread. OpenMP automatically handles the distribution of loop iterations among available CPU threads, allowing them to work independently on their assigned iterations.
Reduction Operation: The reduction(+:sum) clause specifies a reduction operation for the variable sum. In this case, the + symbol indicates that each thread should add its individual contributions to sum. OpenMP will create a private copy of sum for each thread, allowing threads to accumulate their partial results without interference or race conditions. At the end of the loop, OpenMP combines each thread’s partial sum into a single total sum, storing the final result in the original sum variable.
Thread Safety: By managing individual partial sums for each thread and combining them at the end, OpenMP ensures thread safety. This approach is efficient and avoids conflicts that could arise if multiple threads attempted to modify sum simultaneously.
This directive is particularly useful in scenarios where each iteration independently contributes to an accumulation (e.g., summing values in numerical integration). The reduction clause simplifies parallelized summation by automatically handling both private copies and final aggregation.
A race condition is a situation in parallel programming where multiple threads or processes access and manipulate shared data concurrently, and the program's outcome depends on the timing or sequence of their execution. This often leads to unpredictable and incorrect results, as the order of operations is not controlled, allowing threads to "race" to complete their tasks.
Race conditions happen because of incomplete synchronization. When threads access shared data without a controlled sequence (e.g., without using locks or other synchronization mechanisms), they can interfere with each other’s operations, leading to inconsistent or unexpected results.
[!TIP]
To prevent race conditions, programmers use synchronization techniques to control access to shared data. In OpenMP, constructs like #pragma omp critical, #pragma omp atomic, or reduction (for certain operations) can help manage access to shared variables, ensuring correct results by controlling when threads can read or modify shared data.
Here an example of using pragma omp parallel for to parallelise the operation a[i] = b[i]*c + d[i]:
#include <stdio.h>
#include <omp.h>
#define N 1000 // Size of the arrays
int main() {
double a[N], b[N], d[N];
double c = 2.5; // Scalar multiplier
// Initialize arrays b and d with some values
for (int i = 0; i < N; i++) {
b[i] = i * 1.0; // Example values for b
d[i] = i + 1.0; // Example values for d
}
// Parallelized computation
#pragma omp parallel for
for (int i = 0; i < N; i++) {
a[i] = b[i] * c + d[i];
}
return 0;
}
OpenMP supports a variety of built-in operators for the reduction clause. Each operator combines the values from multiple threads into a single result.
The fork-join concept is a parallel programming model used in frameworks like OpenMP to manage the flow of parallel tasks. This model allows a program to create multiple threads to work on different tasks simultaneously (the "fork" part) and then synchronize or join them back together once they complete their tasks.
Fork: When a program encounters a parallel region (like an OpenMP #pragma omp parallel directive), it "forks" into multiple threads. This means that the main (or "master") thread spawns a team of threads, each working independently on a portion of the task. Each thread executes the same code in parallel, and OpenMP handles the assignment of iterations or tasks to each thread.
Parallel Work: During the parallel region, each thread performs its assigned work independently. This is where the program achieves parallelism, as multiple threads execute concurrently on different CPU cores.
Join: After the parallel section is complete, all threads are synchronized, and they "join" back into a single thread. The main thread waits for all spawned threads to finish their work before continuing with the sequential parts of the program.
This ensures that all parallel computations are completed before the program moves on to the next steps, maintaining program correctness.
#include <omp.h>
#include <stdio.h>
int main() {
printf("Sequential code before parallel region.\n");
// Fork: Start parallel region
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
printf("Parallel work by thread %d\n", thread_id);
} // Join: End of parallel region
printf("Sequential code after parallel region.\n");
return 0;
}
In the previous example, we used the function omp_get_thread_num(), which is one of several utility functions provided by OpenMP to manage and query information about threads. These functions are helpful in understanding and controlling parallel execution within an OpenMP program.
Here’s a breakdown of omp_get_thread_num() and other common OpenMP functions:
omp_get_thread_num(): This function returns the ID of the current thread within a parallel region. The thread ID is an integer that ranges from 0 up to omp_get_num_threads() - 1. Often used within a #pragma omp parallel block to identify which thread is executing a particular portion of the code.
omp_get_num_threads(): It returns the total number of threads currently in a parallel region. It is useful for dividing work dynamically based on the total number of threads or for setting up data structures.
omp_get_max_threads(): It provides the maximum number of threads that could be used in parallel regions, based on the current settings. Often used to predict or set the maximum threads without necessarily entering a parallel region.
omp_set_num_threads(int num_threads): Sets the number of threads that will be used for subsequent parallel regions. Used to control the number of threads dynamically, usually at the beginning of the program or before a parallel section.
omp_get_num_procs(): Returns the number of processors (or CPU cores) available on the system. Useful for configuring the number of threads based on system capabilities, often combined with omp_set_num_threads().
omp_in_parallel(): Returns a non-zero value if currently inside a parallel region; otherwise, returns zero. This function helps determine if a particular section of code is running in a parallel context, which can be useful for managing nested parallelism or debugging.
omp_get_wtime(): Returns the elapsed wall-clock time (in seconds) since an arbitrary point in the past, used for timing and performance analysis. Commonly used to measure the execution time of code blocks in parallel programs.
Here an example in which these functions are used:
#include <stdio.h>
#include <omp.h>
int main() {
// Set number of threads to 4 for demonstration
omp_set_num_threads(4);
printf("Number of threads set to: %d\n", 4);
// Get the maximum number of threads available
int max_threads = omp_get_max_threads();
printf("Maximum threads available: %d\n", max_threads);
// Get the number of processors available
int num_procs = omp_get_num_procs();
printf("Number of processors available: %d\n", num_procs);
// Start timing the parallel section
double start_time = omp_get_wtime();
// Parallel region begins
#pragma omp parallel
{
// Check if we are inside a parallel region
if (omp_in_parallel()) {
printf("Thread %d: Currently in a parallel region.\n", omp_get_thread_num());
} else {
printf("Not in a parallel region.\n");
}
// Get the total number of threads in this parallel region
int total_threads = omp_get_num_threads();
printf("Thread %d: Total threads in parallel region: %d\n", omp_get_thread_num(), total_threads);
// Each thread prints its own ID
int thread_id = omp_get_thread_num();
printf("Hello from thread %d\n", thread_id);
}
// Parallel region ends
double end_time = omp_get_wtime();
// Print total time taken for the parallel section
printf("Time taken for parallel region: %f seconds\n", end_time - start_time);
return 0;
}
and this is the output
Number of threads set to: 4
Maximum threads available: 4
Number of processors available: 32
Thread 1: Currently in a parallel region.
Thread 1: Total threads in parallel region: 4
Hello from thread 1
Thread 2: Currently in a parallel region.
Thread 2: Total threads in parallel region: 4
Hello from thread 2
Thread 3: Currently in a parallel region.
Thread 3: Total threads in parallel region: 4
Hello from thread 3
Thread 0: Currently in a parallel region.
Thread 0: Total threads in parallel region: 4
Hello from thread 0
Time taken for parallel region: 0.001045 seconds
In OpenMP, critical and single are two different constructs used to control how certain code blocks are executed in a parallel region. While both are designed to prevent issues arising from concurrent access to code, they serve distinct purposes and are used in different situations. Here’s a breakdown of the differences:
#pragma omp critical
Purpose: Ensures that only one thread at a time executes the code within the critical section.
Behavior: All threads can encounter a critical section, but only one thread is allowed to execute it at any given time. If multiple threads reach the critical section simultaneously, they will wait their turn to execute, effectively serializing access to the code block.
Usage: Used to protect shared resources (e.g., updating a shared variable or data structure) where only one thread should modify or access the resource at a time.
#pragma omp single
Purpose: Ensures that only one thread executes the enclosed code block, but only once in total.
Behavior: The first thread to reach the single section executes the code, while all other threads skip it entirely (rather than waiting to execute). This means that only one thread performs the work inside the single section, and it is done only once across all threads.
Usage: Often used for initializing shared resources, performing I/O operations, or any task that only needs to be done once, rather than by every thread.
Here an example:
...
int shared_var = 0;
#pragma omp parallel
{
#pragma omp single
{
printf("Single: This only happens once.\n");
}
#pragma omp critical
{
shared_var += 1; // Critical: Ensures safe modification of shared_var
}
}
...
In OpenMP, data sharing attributes are used to control how variables are shared or private among threads in a parallel region. These attributes determine how variables are treated during parallel execution, ensuring that data is correctly handled to avoid race conditions and other issues. The key data sharing attributes are:
Purpose: A variable is shared among all threads. All threads in the parallel region see the same value of the variable.
Behavior: The variable is shared across all threads, meaning any updates made by one thread are visible to all other threads.
Usage: Typically used for global variables, accumulators, or any data that needs to be shared among threads.
Example:
int sum = 0;
#pragma omp parallel for shared(sum)
for (int i = 0; i < 10; i++) {
sum += i;
}
printf("Total sum: %d\n", sum);
In this case, sum is shared across all threads. Each thread adds to the same sum, which will be updated by all threads during the execution of the loop.
Purpose: Similar to private, but the value of the variable from the last thread that executes the loop or region is copied back to the original variable after the parallel region ends.
Behavior: Each thread has its own private copy of the variable, but the value of the variable from the last thread is written back to the original variable after the parallel region completes.
Usage: Typically used when you want the final value of a variable after the parallel region to reflect the last thread's result.
Example:
int last_value = 0;
#pragma omp parallel for lastprivate(last_value)
for (int i = 0; i < 10; i++) {
last_value = i; // Will be set to 9 in the end
}
printf("Last value: %d\n", last_value); // Prints 9
In this example, last_value is set to the value of i from the last thread to execute in the parallel region (thread handling the last iteration of the loop).
Purpose: Specifies the default sharing behavior for variables in a parallel region. OpenMP supports three default options: shared, none, and private.
Options:
default(shared): All variables are shared by default unless explicitly stated otherwise.
default(private): All variables are private by default unless explicitly stated otherwise.
default(none): No variables are implicitly shared or private. This forces the programmer to explicitly declare the data-sharing attribute for each variable used in the parallel region.
Example:
#pragma omp parallel default(none) shared(sum) private(i)
{
// Each thread must declare its own data sharing explicitly.
for (i = 0; i < 10; i++) {
sum += i;
}
}
Purpose: Used to specify that a variable is shared but the compiler should optimize it based on the understanding that all threads will access the same value.
Usage: This is particularly useful when passing large data objects to the parallel region and the compiler can optimize them for thread safety.
Example:
#pragma omp parallel for uniform(arr)
for (int i = 0; i < 100; i++) {
arr[i] = i;
}
Each thread gets its own private copy of the variable.
Loop counters, thread-specific data.
shared
All threads share the same variable, any update is visible to all threads.
Accumulators, global data.
firstprivate
Variable is private to each thread, initialized with the value from outside.
Initialization of thread-local copies with external values.
lastprivate
Like private, but the value from the last thread is copied back to the original.
For updating a variable based on the last thread’s result in a loop.
default
Specifies the default data-sharing behavior (e.g., shared, private).
Used to set default sharing behavior for all variables in a parallel region.
uniform
Optimizes data sharing for large objects, ensuring uniform access.
Optimizing memory access patterns for large shared data.
By carefully choosing the appropriate data-sharing attributes, you can control how variables are handled in parallel regions, ensuring the correct behavior and avoiding data races.
When working with parallel computing, it is important to evaluate the performance of algorithms as the number of processors or nodes is increased. Two common metrics used to assess performance in parallel computing are strong scaling and weak scaling. These metrics help understand how well a parallel system can handle increasing computational demands as resources are added.
Strong scaling refers to the ability of a parallel system to solve a fixed-size problem faster as more computational resources (such as CPUs or nodes) are added. The key idea here is that the problem size remains constant while the number of processors increases. Strong scaling measures how efficiently the parallel system can reduce the execution time by dividing the work across more processors.
Goal: Minimize the time to complete a fixed-size problem as the number of processors increases.
Ideal Scaling: In an ideal scenario, adding more processors would reduce the execution time proportionally, meaning that doubling the number of processors would halve the execution time.
Example of Strong Scaling:
Suppose we have a computational problem that takes 1000 seconds to solve on 1 processor. If we add more processors, strong scaling aims to reduce the solution time. For example, if the problem takes 500 seconds with 2 processors, 250 seconds with 4 processors, and 125 seconds with 8 processors, this demonstrates ideal strong scaling.
However, strong scaling is limited by factors such as:
Overhead of Parallelization: Increased communication between processors can introduce overhead that reduces the expected performance gain.
Load Balancing: Not all problems can be evenly divided, leading to idle processors or unbalanced workloads, which can hinder scaling efficiency.
Amdahl's Law: This law shows that the maximum performance improvement from parallelization is limited by the serial portion of the code. Even with an infinite number of processors, the speedup is constrained by the time spent in the non-parallelizable part of the code.
Weak scaling, on the other hand, evaluates how the performance of a parallel system changes as both the problem size and the number of processors are increased proportionally. In weak scaling, the goal is to maintain constant computational workload per processor as you increase the number of processors. The problem size grows with the number of processors, so each processor is assigned an equal portion of the problem.
Goal: Maintain the same workload per processor as more processors are added, while keeping the total computation proportional to the number of processors.
Ideal Scaling: In an ideal weak scaling scenario, as you increase the number of processors, the time to solve the problem should remain constant because each processor handles an equal share of the increasing workload.
Example of Weak Scaling:
Consider a scenario where you are solving a problem that takes 1000 seconds on 1 processor. If you double the number of processors to 2, the problem size also doubles, and the workload per processor remains the same. In an ideal weak scaling scenario, if the problem size increases proportionally, the execution time should also remain close to 1000 seconds, as long as the problem is balanced and the communication overhead does not grow too large.
Weak scaling is often more relevant in situations where large problems need to be solved and where the number of processors must be increased to handle the growing size of the problem.
Weak scaling helps determine how well a parallel system can handle increasing problem sizes.
Strong Scaling: You would expect a decreasing curve in the time-to-solution as the number of processors increases. The curve flattens out once the system reaches a point where further increases in the number of processors do not significantly reduce the execution time due to overheads.
Weak Scaling: The execution time should remain roughly constant as the number of processors increases, provided that the problem size increases in proportion. Any significant deviation from this ideal curve indicates poor scaling.
Strong scaling is most useful for problems where the total amount of work is fixed, such as simulations or numerical calculations with a fixed dataset or model.
Weak scaling is more applicable to problems where the size of the problem grows with the number of resources, such as large-scale simulations of physical systems (e.g., climate modeling or molecular dynamics simulations).
Both strong and weak scaling provide valuable insights into how well a parallel system can handle increased computational demands and can guide optimization efforts in algorithm design and system configuration.
In parallel computing, speedup is a measure of how much faster a parallel algorithm performs compared to a serial (non-parallel) algorithm. It is one of the key metrics for evaluating the effectiveness of parallelization and helps determine whether the effort to parallelize a program is worthwhile.
The speedupS is defined as the ratio of the time taken to execute the serial version of a program to the time taken to execute the parallel version of the program. Mathematically, it is expressed as:
S=TparallelTserial
Where:
Tserial is the execution time of the program when run on a single processor (the serial time).
Tparallel is the execution time of the program when run on multiple processors (the parallel time).
The ideal speedup is obtained when the parallel algorithm achieves perfect scalability, meaning that the execution time decreases in direct proportion to the number of processors used. For example, if a program takes 100 seconds to run on 1 processor, the ideal speedup would be:
With 2 processors, the execution time should be Tparallel=50 seconds, yielding a speedup of S=50100=2.
With 4 processors, the execution time should be Tparallel=25 seconds, yielding a speedup of S=25100=4.
In this case, the speedup is proportional to the number of processors, which is the ideal scenario.
Amdahl’s Law provides an upper bound on the speedup that can be achieved by parallelizing a program. It accounts for the fact that not all parts of a program can be parallelized, and some portion must always remain serial.
Amdahl’s Law states that the maximum speedup Smax that can be achieved with P processors is given by:
Smax=(1−f)+Pf1
Where:
f is the fraction of the program that can be parallelized (i.e., the parallelizable portion).
P is the number of processors.
If f=1 (i.e., the entire program can be parallelized), then the speedup increases linearly with the number of processors.
If f is less than 1 (which is the usual case), the speedup starts to plateau as the number of processors increases.
For example, if 90% of a program can be parallelized (f=0.9), even with an infinite number of processors, the maximum speedup will be:
Smax=(1−0.9)1=10
This demonstrates that the serial portion of the program limits the overall speedup, and beyond a certain point, adding more processors will result in diminishing returns.
In practice, the speedup achieved in parallel computing often deviates from the ideal, due to factors such as:
Communication Overhead: In parallel systems, processors need to exchange data. The time spent on communication (especially in distributed-memory systems like clusters) can reduce the effectiveness of parallelization.
Load Imbalance: If the workload is not evenly distributed among the processors, some processors may be idle while others are overburdened, reducing overall efficiency.
Synchronization Costs: In parallel systems, synchronizing the execution of different processors (e.g., waiting for all processors to reach a certain point) can add overhead that reduces speedup.
Amdahl’s Law: The serial portion of the program limits how much speedup can be achieved. Even if most of the program is parallelized, the small serial portion can significantly reduce the total speedup.
In some cases, superlinear speedup can occur, where the parallel version of a program performs faster than the serial version, even though more processors are used. This can happen due to:
Cache effects: In some cases, parallelization can lead to better cache utilization, reducing memory access times and improving performance.
Problem decomposition: If the problem is divided in a way that allows for more efficient data access patterns or algorithmic improvements, the parallel version can outperform the serial version.
Hardware acceleration: If parallelization leverages specific hardware features (such as GPUs or specialized processors), the speedup may exceed linear scaling.
However, superlinear speedup is usually a result of specific optimizations and should not be expected in all parallel programs.
Y-axis: Speedup S.
In the ideal case, the graph shows a straight line where speedup increases linearly with the number of processors. However, in most practical scenarios, the curve starts to level off due to diminishing returns caused by overheads (communication, synchronization, etc.).
Consider the following code parallelised with MPI.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
#define N 1024 // Matrix size (can adjust for strong/weak scaling)
void matrix_multiply(double *A, double *B, double *C, int size, int rank, int num_procs) {
int i, j, k;
int rows_per_proc = size / num_procs;
for (i = 0; i < rows_per_proc; i++) {
for (j = 0; j < size; j++) {
double sum = 0.0;
for (k = 0; k < size; k++) {
sum += A[i * size + k] * B[k * size + j];
}
C[i * size + j] = sum;
}
}
}
int main(int argc, char *argv[]) {
int rank, num_procs;
int i;
double *A, *B, *C, *local_A, *local_C;
clock_t start, end;
double cpu_time_used;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
int rows_per_proc = N / num_procs;
int matrix_size = N * N;
// Memory allocation
A = (double*) malloc(matrix_size * sizeof(double));
B = (double*) malloc(matrix_size * sizeof(double));
C = (double*) malloc(matrix_size * sizeof(double));
// Initialize matrices
for (i = 0; i < matrix_size; i++) {
A[i] = 1.0;
B[i] = 1.0;
}
local_A = (double*) malloc(rows_per_proc * N * sizeof(double));
local_C = (double*) malloc(rows_per_proc * N * sizeof(double));
start = clock();
// Perform matrix multiplication on each process
matrix_multiply(local_A, B, local_C, N, rank, num_procs);
// Gather the results from all processes
MPI_Gather(local_C, rows_per_proc * N, MPI_DOUBLE, C, rows_per_proc * N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
if(rank==0) printf("%d %f\n", num_procs, cpu_time_used*1000);
// Finalize and free memory
free(A);
free(B);
free(C);
free(local_A);
free(local_C);
MPI_Finalize();
return 0;
}
In this code, the size is fixed (N=1024) and we are going to vary the number of processors.
Note that the times are taken right before the call to matrix_multiply and after MPI_Gather. In this way, we are only measuring the time it takes to perform the calculations (in parallel) and to communicate. Note also that before calling end = clock();, we invoke MPI_Barrier: this is usefull when the MPI calls are not asynchronus.
Here a bash script that can be used to run multiple time the above code varying the number of processors:
echo "#np #t[ms]"
for np in 1 2 4 8 16 32;
do
mpirun -np $np ./exe
done
The following plot shows the strong scaling with 16 processors.
As computational problems grow in complexity and scale, the demand for high-performance computing solutions becomes increasingly critical. While traditional CPU-based parallelization techniques like MPI and OpenMP offer substantial gains by distributing tasks across multiple cores and processors, certain applications benefit significantly from a different architectural approach: Graphics Processing Units (GPUs).
Originally designed to handle the massive parallelism required for rendering images and videos, GPUs have evolved into powerful tools for general-purpose computing. Unlike CPUs, which excel at executing a few threads at high speed, GPUs are built to handle thousands of threads simultaneously. This architectural distinction makes GPUs particularly well-suited for tasks with a high degree of parallelism, such as matrix operations, particle simulations, and machine learning.
In this chapter, we will delve into the fundamentals of GPU computing, focusing on the CUDA (Compute Unified Device Architecture) programming model. Developed by NVIDIA, CUDA enables developers to harness the computational power of GPUs for general-purpose applications. Before exploring the specifics of CUDA, we will examine the key differences between CPUs and GPUs, the types of problems GPUs excel at solving, and the typical workflow of a GPU-accelerated application.
GPU computing does not replace CPU-based parallelization strategies; rather, it complements them. Hybrid solutions that integrate CPUs, GPUs, and multi-node communication (e.g., using MPI) are increasingly common in modern scientific computing, providing the tools to tackle even the most demanding computational challenges.
Understanding the differences between Central Processing Units (CPUs) and Graphics Processing Units (GPUs) is crucial to appreciate how these components complement each other in high-performance computing. While both are essential for modern computing systems, they are architected to serve different purposes.
CPU (Central Processing Unit):
CPUs are designed as general-purpose processors capable of executing a wide variety of tasks. They typically consist of a few powerful cores optimized for sequential performance. Each core is equipped with substantial cache memory and complex control logic, allowing CPUs to efficiently handle tasks requiring rapid decision-making and minimal parallelism.
GPU (Graphics Processing Unit):
GPUs, on the other hand, are specialized for parallel tasks. A GPU consists of thousands of smaller, simpler cores that can execute the same or similar operations simultaneously on multiple data elements. This makes GPUs ideal for workloads with high data parallelism, such as image processing, matrix calculations, and simulations.
CPU (Latency-Oriented):
CPUs are designed to minimize latency for individual tasks, that is, to minimize the time it takes for a single operation or task to complete. CPUs are optimized for low-latency operations like branch-heavy control flows, system management, and single-threaded computations.
GPU (Throughput-Oriented):
GPUs focus on maximizing throughput by processing many operations concurrently. They are less concerned with the speed of individual operations and more with the overall volume of operations executed over time.
CPU Memory System:
CPUs rely on a deep cache hierarchy (L1, L2, and often L3) to minimize memory access latency. The memory bandwidth is lower than that of GPUs but optimized for frequent, small, and random memory accesses typical in general-purpose tasks.
GPU Memory System:
GPUs have access to high-bandwidth memory designed for streaming large amounts of data in parallel. While the latency for accessing GPU memory (global memory) is higher than CPU cache, the architecture compensates with a large number of threads, ensuring many operations can proceed while others wait for memory access.
CPU:
CPUs dedicate a significant portion of their silicon area to control logic and sophisticated branch prediction. This enables efficient handling of diverse and dynamic workloads but limits the number of cores available.
GPU:
GPUs prioritize compute density, allocating most of their silicon to arithmetic and logic units (ALUs). This allows them to maximize the number of cores available for parallel computations, at the expense of complex control mechanisms.
CPU Programming:
Programming for CPUs often involves traditional languages like C, C++, or Fortran, combined with parallelization libraries such as OpenMP and MPI. These frameworks are designed for task parallelism or domain decomposition.
GPU Programming:
GPU programming requires specific frameworks like CUDA or OpenCL, which expose the parallel nature of the GPU to developers. These frameworks require explicitly managing data transfer between CPU and GPU and structuring algorithms to leverage the massive parallelism of the GPU.
CPU:
CPUs are better suited for tasks involving decision-making, low-latency requirements, and sequential processes, such as running operating systems, handling interrupts, and processing small-scale simulations.
GPU:
GPUs excel at tasks that can be broken into many parallel subtasks, such as scientific simulations, image rendering, deep learning, and large-scale numerical computations.
These differences make CPUs and GPUs complementary rather than competitive. While CPUs remain central to the orchestration and execution of diverse tasks, GPUs offer unparalleled acceleration for data-parallel computations, enabling hybrid systems that harness the strengths of both.
Figure 1: Architecture comparison of CPU and GPU cores. Source: https://cvw.cac.cornell.edu/gpu-architecture/gpu-characteristics/design
At the core of the GPU are Streaming Multiprocessors (SMs), which are the building blocks of computational power. Each SM contains multiple CUDA cores (NVIDIA’s terminology for GPU cores) and additional specialized units. These elements collectively perform the computations required for parallel processing.
CUDA Cores: Lightweight processing units within each SM. They execute arithmetic and logical operations in parallel across thousands of threads.
Specialized Units:
Tensor Cores: Accelerate matrix operations, crucial for deep learning and scientific computations.
Registers: High-speed memory local to the SM for thread-level data storage.
Texture and Load/Store Units: Optimize memory access for specific tasks.
Efficient memory access is critical for GPU performance. The memory hierarchy in GPUs is designed to support high bandwidth and minimize latency:
Global Memory: The largest and slowest memory accessible by all threads. It is used to store data shared across the GPU. Due to its high latency, programmers strive to minimize direct access to global memory.
Shared Memory: A small, fast memory shared among threads within the same block on an SM. It allows threads to collaborate efficiently and reduces the need to access global memory.
Registers: Each thread has private registers for storing frequently used variables. These are the fastest memory available on the GPU.
High-Bandwidth Memory (HBM): Many modern GPUs use HBM for ultra-fast data transfers required by parallel workloads.
Texture and Constant Memory: Specialized memory spaces for specific read-only access patterns, such as spatially coherent data.
GPUs organize computations into a hierarchical model, allowing developers to write scalable parallel programs:
Threads: The smallest unit of execution on a GPU. Each thread executes a single instance of the kernel function (a GPU program) and operates independently.
Thread Block: Threads are grouped into blocks. A block contains up to a maximum number of threads (typically 512 or 1024, depending on the GPU). Threads within a block can:
Share data via shared memory.
Synchronize using barriers (__syncthreads() in CUDA).
Each block runs on a single SM, and its threads are scheduled for execution in warps (groups of 32 threads).
Grid: Blocks are organized into a grid. The grid represents the entire set of computations that the GPU will perform. A grid can be 1D, 2D, or 3D, allowing for flexible mapping of data to computation.
Threads within a block are executed in groups of 32, called warps. All threads in a warp execute the same instruction at the same time (SIMD: Single Instruction, Multiple Data). If threads in a warp take different execution paths (e.g., due to if conditions), the warp must serialize those paths, leading to warp divergence and reduced performance. Minimizing warp divergence is crucial when optimizing GPU programs.
Modern GPUs use high-speed interconnects to communicate with the CPU and other GPUs. Examples include:
PCIe (Peripheral Component Interconnect Express): Transfers data between the CPU and GPU. It has significantly lower bandwidth compared to GPU memory, making efficient data transfer strategies essential.
NVLink: A high-speed GPU-to-GPU interconnect for faster communication in multi-GPU systems.
Consider a real-world example: processing a 2D image.
Grid: Represents the entire image. Each block corresponds to a subregion of the image.
Block: Handles a smaller section of the image (e.g., a 32x32 pixel tile).
Thread: Processes a single pixel in the tile. Thousands of threads work simultaneously across the grid.
This hierarchical structure ensures scalability: small problems can fit into a single block, while larger problems scale by increasing the number of blocks in the grid.
Installing CUDA involves downloading and configuring the necessary software to develop GPU-accelerated applications on NVIDIA GPUs. Below is a general guide to install CUDA on Linux and macOS (no Windows-specific instructions per your preference).
NVIDIA provides a wide range of GPUs tailored for different use cases, including gaming, scientific computing, AI, and deep learning. CUDA developers should understand the types of GPUs available and the concept of compute capability, which defines the features and hardware capabilities of a GPU for CUDA programming.
Compute Capability (CC) is a version number assigned to each NVIDIA GPU that specifies its hardware features and CUDA capabilities. It informs developers about:
The architectural generation of the GPU.
Supported CUDA features and functions.
Hardware specifications, such as the number of registers, shared memory, and support for advanced instructions.
Key Components of Compute Capability:
Major version: Indicates the architectural generation (e.g., 7 for Volta, 8 for Ampere).
Minor version: Indicates incremental improvements within the same architecture.
GPUs with higher compute capability support more advanced features, such as tensor cores for deep learning, warp matrix operations, and larger shared memory.
Kernel Optimization:
The number of threads, registers, and shared memory available per block can vary depending on the compute capability.
Backward Compatibility:
CUDA applications compiled for a lower compute capability GPU can run on higher compute capability GPUs, but the reverse is not true.
Checking Your GPU’s Compute Capability:
Use the CUDA deviceQuery sample program:
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery
This will display the compute capability of your GPU.
#include <stdio.h>
#include <math.h>
#include <cuda.h>
// Define the function to integrate: f(x) = x^3
__device__ double f(double x) {
return pow(x, 3); // pow(a,b) computes a^b
}
// CUDA kernel to compute the sum of function values at given points
__global__ void trapezoidal_kernel(double a, double b, double p, int n, double *sum) {
// Compute the global thread ID
int idx = blockIdx.x * blockDim.x + threadIdx.x;
// Accumulate contributions from this thread
if (idx < n) {
double x = a + idx * p;
atomicAdd(sum, f(x)); // Atomic add to safely update shared sum
}
// Add the contributions from the end points
if (idx==0) atomicAdd(sum, 0.5 * (f(a) + f(b)));
}
// Host function to compute the integral using the trapezoidal rule with CUDA
double trapezoidal_rule_cuda(double a, double b, int n) {
double *d_sum, h_sum = 0.0;
double p = (b - a) / n; // Width of each trapezoid
// Allocate memory on the device for the sum
cudaMalloc((void **)&d_sum, sizeof(double));
cudaMemcpy(d_sum, &h_sum, sizeof(double), cudaMemcpyHostToDevice);
// Define the number of threads and blocks
int blockSize = 32; // Number of threads per block
int numBlocks = (n + blockSize - 1) / blockSize; // Round up to cover all trapezoids
// Launch the kernel
trapezoidal_kernel<<<numBlocks, blockSize>>>(a, b, p, n, d_sum);
// Copy the result back to the host and finalize the sum
cudaMemcpy(&h_sum, d_sum, sizeof(double), cudaMemcpyDeviceToHost);
// Clean up
cudaFree(d_sum);
return h_sum * p;
}
int main() {
double a = 0.0; // Lower limit of integration
double b = 1.0; // Upper limit of integration
int n = 1000; // Number of trapezoids (higher n for better accuracy)
printf("This program performs numerical integration of f(x) = x^3 from a = %.2f to b = %.2f using %d trapezoids.\n", a, b, n);
// Check if n is a valid number of trapezoids
if (n <= 0) {
printf("Error: The number of trapezoids must be positive.\n");
return -1;
}
// Perform numerical integration
double result = trapezoidal_rule_cuda(a, b, n);
printf("The integral of f(x) = x^3 from %.2f to %.2f is approximately: %.5f\n", a, b, result);
return 0;
}
module trapezoidal_module
use cudafor
contains
! Device function to define f(x) = x^3
attributes(device) function f(x) result(res)
real(8), value :: x
real(8) :: res
res = x ** 3
end function
! CUDA kernel for the trapezoidal rule
attributes(global) subroutine trapezoidal_kernel(a, b, p, n, sum)
real(8), value :: a, b, p
integer, value :: n
real(8) :: sum
integer :: idx, iostat
real(8) :: x
! Compute global thread ID
idx = threadIdx%x + (blockIdx%x - 1) * blockDim%x
! Add contributions from each thread
if (idx < n) then
x = a + idx * p
iostat = atomicAdd(sum, f(x))
end if
! Add the contributions from the end points (only by thread 0)
if (idx == 0) iostat = atomicAdd(sum, 0.5 * (f(a) + f(b)))
end subroutine
! Host function for the trapezoidal rule
function trapezoidal_rule_cuda(a, b, n) result(integral)
real(8), value :: a, b
integer, value :: n
real(8) :: integral
real(8), device :: d_sum
real(8) :: h_sum
real(8) :: p
integer :: blockSize, numBlocks, iostat
! Initialize parameters
h_sum = 0.0
p = (b - a) / n
! Allocate device memory
d_sum = 0.0
! Define grid and block dimensions
blockSize = 32
numBlocks = (n + blockSize - 1) / blockSize
! Launch the kernel
call trapezoidal_kernel<<<numBlocks, blockSize>>>(a, b, p, n, d_sum)
iostat = cudaDeviceSynchronize()
! Copy the result back to the host
h_sum = d_sum
! Finalize the sum
integral = h_sum * p
end function
end module
program trapezoidal_integration
use trapezoidal_module
implicit none
real(8) :: a, b, result
integer :: n
! Define integration limits and number of trapezoids
a = 0.0
b = 1.0
n = 1000
print *, "This program performs numerical integration of f(x) = x^3"
print *, "from a = ", a, " to b = ", b, " using ", n, " trapezoids."
! Check for valid input
if (n <= 0) then
print *, "Error: The number of trapezoids must be positive."
stop
end if
! Perform numerical integration
result = trapezoidal_rule_cuda(a, b, n)
print *, "The integral of f(x) = x^3 from ", a, " to ", b, " is approximately: ", result
end program
It is a CUDA runtime API function used to allocate memory on the GPU device. It works similarly to malloc in standard C, but the allocated memory resides in the GPU’s global memory rather than the host’s memory. It is essential for transferring data from the host (CPU) to the device (GPU) and enabling computations on the GPU.
In CUDA Fortran, memory allocation on the device is typically managed through Fortran’s native syntax rather than using cudaMalloc explicitly. However, it’s essential to understand how memory is allocated in CUDA Fortran and how it compares to the use of cudaMalloc in CUDA C.
In CUDA Fortran, arrays or variables intended to reside on the GPU are declared with the device attribute. The memory allocation and deallocation are handled via Fortran’s standard allocate and deallocate statements.
real(8), device :: d_array(:)
integer :: n
n = 1000
! Allocate memory on the device
allocate(d_array(n))
! Deallocate memory when done
deallocate(d_array)
Here, the device attribute ensures that d_array resides in the GPU memory, and the allocate statement allocates memory on the device. This approach is more straightforward than using cudaMalloc.
In CUDA Fortran, attributes are used to define the memory location or scope of variables, specifying whether they reside on the host, device, or in special GPU memory spaces. These attributes allow developers to manage memory effectively and optimize performance when working with GPUs.
Here are the most commonly used attributes in CUDA Fortran:
device
Marks variables that reside in the global memory of the GPU.
These variables are accessible from device (GPU) code but not directly from the host (CPU) unless transferred explicitly.
Example:
real(8), device :: d_array(:)
allocate(d_array(1000)) ! Allocates memory in GPU global memory
constant
Marks variables stored in the constant memory of the GPU.
Constant memory is optimized for read-only data that is frequently accessed by many threads.
It is used to transfer data between the host (CPU) and the device (GPU). It can also transfer data between two devices if needed. Proper memory management using cudaMemcpy ensures that computations on the GPU use the correct data.
However, in CUDA Fortran, it is possible to transfer data between the host and device without explicitly using cudaMemcpy. This is achieved by using the Fortran array assignment or automatic data movement features provided by CUDA Fortran. These features simplify memory management and data transfers, making the code more concise and easier to write.
When an array is declared with the device attribute in CUDA Fortran, you can directly assign values between host and device arrays. The CUDA Fortran runtime handles the memory transfers behind the scenes.
program implicit_host_to_device
use cudafor
implicit none
real(8), device :: d_array(100) ! Device array
real(8) :: h_array(100) ! Host array
integer :: i
! Initialize the host array
h_array = [(real(i, 8), i = 1, 100)]
! Assign host array to device array (host to device transfer)
d_array = h_array
print *, "Data transferred to device."
end program implicit_host_to_device
Device to Host Transfer
program implicit_device_to_host
use cudafor
implicit none
real(8), device :: d_array(100) ! Device array
real(8) :: h_array(100) ! Host array
integer :: i
! Initialize the device array with values (on the host)
d_array = [(real(i, 8), i = 1, 100)]
! Assign device array to host array (device to host transfer)
h_array = d_array
print *, "Data transferred back to host:", h_array(1:10)
end program implicit_device_to_host
[!NOTE]
The runtime automatically determines when to perform host-to-device or device-to-host transfers based on array assignments. This eliminates the need for explicit cudaMemcpy calls.
The Fortran-style assignment syntax is more intuitive and concise than explicitly managing memory transfers.
While convenient, implicit data transfers may introduce hidden performance costs, especially if used frequently in a loop. Explicit cudaMemcpy might be better for performance-critical code as it provides finer control.
int n = 1000; // Total number of elements
int blockSize = 256; // Threads per block
int numBlocks = (n + blockSize - 1) / blockSize; // Number of blocks
[!TIP]
A good choice of blockSize depends on the GPU hardware and the problem size. Common choices are 128, 256, or 512.
Kernels appear like normal functions but are defined with the __global__ qualifier. They are invoked from the host code. A kernel launch specifies the grid and block dimensions.
Syntax:
kernelName<<<numBlocks, blockSize>>>(arguments);
numBlocks: Number of blocks in the grid.
blockSize: Number of threads per block.
Example Kernel:
__global__ void addArrays(double *a, double *b, double *c, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x; // Compute global thread index
if (idx < n) {
c[idx] = a[idx] + b[idx]; // Perform addition for this thread
}
}
Kernel Invocation:
int n = 1000; // Number of elements
int blockSize = 256; // Threads per block
int numBlocks = (n + blockSize - 1) / blockSize;
// Allocate memory and copy data as needed
addArrays<<<numBlocks, blockSize>>>(d_a, d_b, d_c, n);
[!TIP]
Ensure the number of threads covers the entire workload (n in the above example).
Synchronize if necessary to ensure all threads finish before proceeding: cudaDeviceSynchronize();
Kernels are written with the attributes(global) specifier to indicate that they are callable from the host but execute on the device.
Example: A Simple Kernel in CUDA Fortran
Here’s an example of a kernel that adds two arrays element-wise:
module kernel_module
use cudafor
contains
attributes(global) subroutine add_arrays_kernel(a, b, c, n)
real, device :: a(:), b(:), c(:)
integer, value :: n
integer :: idx
! Compute the unique thread index
idx = threadIdx%x + (blockIdx%x - 1) * blockDim%x
! Ensure we stay within array bounds
if (idx <= n) then
c(idx) = a(idx) + b(idx)
end if
end subroutine add_arrays_kernel
end module kernel_module
In CUDA programming, __device__ in C or attributes(device) in Fortran is used to declare functions that are executed on the GPU and can only be called by other GPU functions. These are referred to as device functions. Unlike kernels (which are marked with __global__ or attributes(global)), device functions cannot be launched directly from the host but are called within kernels or other device functions.
Key features:
GPU-Only Execution:
Device functions are compiled for GPU execution only.
They are not accessible from the host.
Calling Rules:
Can be called from other device functions or kernels.
Cannot be called directly from host functions.
Inlined by Default:
Device functions are often inlined by the compiler to reduce the overhead of function calls.
Purpose:
They help modularize GPU code by allowing reusable computations within kernels.
In CUDA C, __device__ is the keyword to define a device function.
Example:
#include <stdio.h>
#include <math.h>
// Device function to compute the cube of a number
__device__ double cube(double x) {
return x * x * x;
}
// Kernel to use the device function
__global__ void compute_cubes(const double *input, double *output, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
output[idx] = cube(input[idx]); // Calling the device function
}
}
int main() {
// Example host code would allocate arrays, transfer them to the device,
// launch compute_cubes, and copy results back to the host.
}
In CUDA Fortran, attributes(device) is used to define device functions.
Example:
module device_functions
use cudafor
contains
! Device function to compute the cube of a number
attributes(device) function cube(x) result(res)
real, value :: x
real :: res
res = x ** 3
end function cube
! Kernel to use the device function
attributes(global) subroutine compute_cubes(input, output, n)
real, device :: input(:), output(:)
integer, value :: n
integer :: idx
! Compute unique thread index
idx = threadIdx%x + (blockIdx%x - 1) * blockDim%x
! Ensure within bounds
if (idx <= n) then
output(idx) = cube(input(idx)) ! Calling the device function
end if
end subroutine compute_cubes
end module device_functions
Atomic functions are special operations that allow safe updates to shared memory by ensuring that only one thread modifies a memory location at a time. They are essential in parallel programming to prevent race conditions when multiple threads access and modify shared data simultaneously.
General Considerations
Atomicity: An operation is atomic if it is executed as a single, indivisible step. This guarantees that no other thread can interfere during the operation.
Common Use Cases: Incrementing counters. Accumulating sums. Updating shared data structures like histograms.
Performance Impact: Atomic operations serialize access to shared data, which can reduce performance due to contention. However, they are often faster and easier to implement than more complex synchronization methods.
Supported Operations: CUDA supports atomic addition, subtraction, exchange, min/max, and logical operations like AND/OR. These operations are hardware-dependent and optimized for GPU execution.
In CUDA C, atomic functions are prefixed with atomic. For example, atomicAdd performs an atomic addition. These functions operate on integers and floats in global or shared memory.
Example: Atomic Addition (C)
#include <stdio.h>
__global__ void atomic_example(int *counter) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < 10) {
atomicAdd(counter, 1); // Safely increment the counter
}
}
int main() {
int *d_counter, h_counter = 0;
// Allocate memory and initialize
cudaMalloc((void**)&d_counter, sizeof(int));
cudaMemcpy(d_counter, &h_counter, sizeof(int), cudaMemcpyHostToDevice);
// Launch kernel
atomic_example<<<1, 32>>>(d_counter);
// Copy back and print result
cudaMemcpy(&h_counter, d_counter, sizeof(int), cudaMemcpyDeviceToHost);
printf("Counter = %d\n", h_counter); // Should print 10
cudaFree(d_counter);
return 0;
}
In CUDA Fortran, atomic operations are similarly supported with the atomic subroutines provided by the language. These subroutines operate on variables in device memory.
Example: Atomic Addition (Fortran)
module atomic_example
use cudafor
contains
attributes(global) subroutine increment_counter(counter)
integer, device :: counter
integer :: idx
idx = threadIdx%x + (blockIdx%x - 1) * blockDim%x
if (idx <= 10) call atomicadd(counter, 1) ! Safely increment the counter
end subroutine increment_counter
end module atomic_example
program main
use atomic_example
use cudafor
integer, device :: d_counter
integer :: h_counter
! Allocate and initialize counter
allocate(d_counter)
h_counter = 0
d_counter = h_counter
! Launch kernel
call increment_counter<<<1, 32>>>(d_counter)
! Copy back and print result
h_counter = d_counter
print *, "Counter =", h_counter ! Should print 10
deallocate(d_counter)
end program main
To compile CUDA programs, you use either nvcc for CUDA C/C++ programs or nvfortran for CUDA Fortran programs. Below, we explain how to use these compilers, including basic commands and options.
nvfortran is part of the NVIDIA HPC SDK and is used to compile Fortran programs, including those with CUDA Fortran extensions.
Basic Command:
nvfortran -o program_name source_file.f90
source_file.f90: The CUDA Fortran source file.
program_name: The name of the output binary.
Common Options:
-cuda: Enable CUDA Fortran features (this is implicit if the code includes CUDA Fortran constructs).
-Mcuda=ccX: Specify the compute capability of the target GPU (e.g., -Mcuda=cc80 for Ampere GPUs).
-O2, -O3: Enable optimization levels.
-g: Include debugging information.
[!TIP]
To query device properties, like the compute capability, memory, maximum number of threads per block, etc., we can use the function cudaGetDeviceProperties (see example code here)
Here is a CUDA C program to query the properties of the available GPUs on your system using the cudaGetDeviceCount and cudaGetDeviceProperties functions:
#include <stdio.h>
#include <cuda_runtime.h>
int main() {
int deviceCount = 0;
// Get the number of available GPUs
cudaError_t err = cudaGetDeviceCount(&deviceCount);
if (err != cudaSuccess) {
printf("Error querying the number of GPUs: %s\n", cudaGetErrorString(err));
return -1;
}
printf("Number of CUDA-capable GPUs: %d\n\n", deviceCount);
for (int i = 0; i < deviceCount; i++) {
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, i);
printf("GPU #%d: %s\n", i, deviceProp.name);
printf(" Compute Capability: %d.%d\n", deviceProp.major, deviceProp.minor);
printf(" Total Global Memory: %.2f GB\n", deviceProp.totalGlobalMem / (1024.0 * 1024.0 * 1024.0));
printf(" Shared Memory Per Block: %.2f KB\n", deviceProp.sharedMemPerBlock / 1024.0);
printf(" Registers Per Block: %d\n", deviceProp.regsPerBlock);
printf(" Warp Size: %d\n", deviceProp.warpSize);
printf(" Maximum Threads Per Block: %d\n", deviceProp.maxThreadsPerBlock);
printf(" Maximum Block Dimensions: (%d, %d, %d)\n",
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);
printf(" Maximum Grid Dimensions: (%d, %d, %d)\n",
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);
printf(" Clock Rate: %.2f MHz\n", deviceProp.clockRate / 1000.0);
printf(" Memory Bus Width: %d bits\n", deviceProp.memoryBusWidth);
printf(" Memory Bandwidth: %.2f GB/s\n\n",
2.0 * deviceProp.memoryClockRate * (deviceProp.memoryBusWidth / 8) / 1.0e6);
}
return 0;
}
The output looks like:
GPU #0: Tesla P100-PCIE-16GB
Compute Capability: 6.0
Total Global Memory: 15.89 GB
Shared Memory Per Block: 48.00 KB
Registers Per Block: 65536
Warp Size: 32
Maximum Threads Per Block: 1024
Maximum Block Dimensions: (1024, 1024, 64)
Maximum Grid Dimensions: (2147483647, 65535, 65535)
Clock Rate: 1328.50 MHz
Memory Bus Width: 4096 bits
Memory Bandwidth: 732.16 GB/s
Consider the following code parallelised with CUDA C.
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include <time.h>
//#define N 1024 // Matrix size
// Kernel for matrix multiplication
__global__ void matrix_multiply_kernel(double *A, double *B, double *C, int size) {
int row = blockIdx.y * blockDim.y + threadIdx.y; // Row index
int col = blockIdx.x * blockDim.x + threadIdx.x; // Column index
if (row < size && col < size) {
double sum = 0.0;
for (int k = 0; k < size; k++) {
sum += A[row * size + k] * B[k * size + col];
}
C[row * size + col] = sum;
}
}
int main(int argc, char* argv[]) {
double *A, *B, *C; // Host pointers
double *d_A, *d_B, *d_C; // Device pointers
clock_t start, end;
double cpu_time_used;
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);
if (argc < 4) {
printf("Usage: %s <N> <blockDim.x> <blockDim.y>:\n", argv[0]);
printf("<N> in the number of rows and columns (NxN matrix)\n");
printf("<blockDim.x> is the number of threads in the block (x direction)\n");
printf("<blockDim.y> is the number of threads in the block (y direction)\n");
return 1;
}
int N = atoi(argv[1]);
int block_dim_x = atoi(argv[2]);
int block_dim_y = atoi(argv[3]);
if(block_dim_x*block_dim_y>prop.maxThreadsPerBlock){
printf("Max threads per block: %d\n", prop.maxThreadsPerBlock);
printf("You chose %d x %d = %d threads\n", block_dim_x, block_dim_y, block_dim_x*block_dim_y);
return 1;
}
size_t matrix_size = N * N * sizeof(double);
// Allocate memory on the host
A = (double*) malloc(matrix_size);
B = (double*) malloc(matrix_size);
C = (double*) malloc(matrix_size);
// Initialize matrices (example initialization)
for (int i = 0; i < N * N; i++) {
A[i] = 0.01*i;
B[i] = 0.025*i;
}
// Allocate memory on the device
cudaMalloc((void**)&d_A, matrix_size);
cudaMalloc((void**)&d_B, matrix_size);
cudaMalloc((void**)&d_C, matrix_size);
// Copy matrices A and B from host to device
cudaMemcpy(d_A, A, matrix_size, cudaMemcpyHostToDevice);
cudaMemcpy(d_B, B, matrix_size, cudaMemcpyHostToDevice);
// Define grid and block dimensions
dim3 blockDim(block_dim_x, block_dim_y);
dim3 gridDim((N + blockDim.x - 1) / blockDim.x, (N + blockDim.y - 1) / blockDim.y);
// Start timing
start = clock();
// Launch kernel
matrix_multiply_kernel<<<gridDim, blockDim>>>(d_A, d_B, d_C, N);
// Wait for the GPU to finish
cudaError_t err = cudaDeviceSynchronize();
if (err != cudaSuccess) {
printf("Kernel execution failed: %s\n", cudaGetErrorString(err));
}
// Stop timing
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
// Copy result matrix C from device to host
cudaMemcpy(C, d_C, matrix_size, cudaMemcpyDeviceToHost);
//printf("Matrix multiplication completed in %.2f ms\n", cpu_time_used * 1000);
printf("%d %d %d %.6f\n", N, block_dim_x, block_dim_y, cpu_time_used);
// Free memory on device
cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);
// Free memory on host
free(A);
free(B);
free(C);
return 0;
}
In this code, we vary the size N and the number of threads per block, both along x and y (block_dim_x and block_dim_y). Note that the times are taken right before the call to the kernel matrix_multiply_kernel and after the call to cudaDeviceSynchronize().
This last call is crucial, because kernel launches are asynchronous, meaning that the host (CPU) does not wait for the kernel execution on the device (GPU) to finish before continuing to the next line of code. If you measure time immediately after launching a kernel without waiting for its completion, the measured time will not accurately reflect the kernel’s execution duration.
When you invoke a CUDA kernel (like matrix_multiply_kernel<<<gridDim, blockDim>>>()), the kernel is queued for execution but the host does not block or wait for it to complete. Without cudaDeviceSynchronize, the clock() call to stop timing may occur before the kernel has finished executing, leading to incorrect timing. cudaDeviceSynchronize() forces the host to wait until all preceding GPU work (including the kernel) is completed. cudaDeviceSynchronize() also provides an opportunity to check for errors during kernel execution: if there is an issue (e.g., out-of-bounds memory access), the error can be detected and reported immediately after synchronization.
Here a bash script that can be used to run multiple times the above code varying the input parameters:
#!/bin/bash
# Define the values of N to iterate over
Ns=(1024 2048 4096)
# Define the thread counts for nx and ny
thread_values=(1 2 4 8 16 32 64 128 256 512 1024)
# Loop over each N
for N in "${Ns[@]}"; do
# Vary nx with ny fixed to 1
for nx in "${thread_values[@]}"; do
ny=1
if (( nx * ny < 1024 )); then
./scaling.exe $N $nx $ny
fi
done
# Vary ny with nx fixed to 1
for ny in "${thread_values[@]}"; do
nx=1
if (( nx * ny < 1024 )); then
./scaling.exe $N $nx $ny
fi
done
# Vary nx and ny equally
for nx in "${thread_values[@]}"; do
ny=$nx
if (( nx * ny < 1024 )); then
./scaling.exe $N $nx $ny
fi
done
done
This is the scaling plot:
The plateau observed in execution time as we increase the number of threads along the x-direction in the matrix-matrix multiplication kernel is due to the saturation of computational resources and the characteristics of GPU hardware. Here’s why this happens:
GPUs have a limited number of streaming multiprocessors (SMs), each capable of running a specific number of threads simultaneously.
Each SM has a finite pool of resources, such as registers, shared memory, and execution units.
When the number of threads reaches a point where the GPU is fully utilizing its SMs, adding more threads does not improve performance. The GPU has reached its maximum occupancy.
For instance, if 256 threads per block fully occupy all the available SMs, increasing to 512 or 1024 threads per block will not yield any additional performance gains because the GPU cannot process more threads simultaneously.
Matrix-matrix multiplication is memory-intensive, involving frequent memory reads and writes.
As you increase the number of threads, the memory system becomes a bottleneck because the global memory bandwidth is finite.
Once memory bandwidth is saturated, increasing the number of threads does not reduce execution time because threads must wait for memory transactions to complete.
Your results suggest that the optimal number of threads per block for this kernel is between 64 and 256 threads along the x-direction. This configuration:
Maximizes the use of computational resources.
Balances memory bandwidth and arithmetic throughput.
Beyond this range, the additional threads either remain idle or compete for the same resources without reducing execution time.
Scientific computing is an essential tool for modern researchers, but it often suffers from inefficiencies and errors due to a lack of formal training in software development. By embracing a set of best practices, scientists can significantly improve the reliability, maintainability, and productivity of their code. This chapter outlines key principles to guide researchers in writing better scientific software:
The names you choose for variables, functions, and other elements in your code play a critical role in its readability and maintainability. Descriptive and consistent naming conventions help others (and your future self!!!) understand the code’s purpose at a glance. Poor naming, on the other hand, leads to confusion and errors.
Names should clearly indicate the role or purpose of a variable or function. Avoid generic names like temp or data unless they truly convey the meaning.
Example:
Stick to a single naming style throughout your project, such as snake_case or camelCase. Inconsistent styles can be distracting and make your code harder to follow.
Example:
✅ compute_average (snake_case)
✅ computeAverage (camelCase)
❌ computeAverage and compute_Variance (mixing styles)
Whenever you find yourself copying and pasting code, stop and create a function instead. This reduces duplication and makes the logic easier to debug and reuse.
Example:
By consolidating repeated tasks into functions, you not only reduce redundancy but also make your code more modular and easier to debug or enhance. This approach aligns with the “Don’t Repeat Yourself” (DRY) principle, a cornerstone of efficient programming practices.
Performance is important, but premature optimization can lead to complexity and wasted effort. Start by making the code correct:
Profile before optimizing: Use profiling tools to identify bottlenecks rather than guessing.
Prototype in high-level languages: Develop early versions in user-friendly languages, then translate critical sections to low-level languages like C or Fortran only if needed.
This approach balances productivity with performance, ensuring that effort is focused where it matters most.
Clear documentation bridges the gap between the code and its users, ensuring that others (and your future self!!!) can understand its purpose and usage:
Focus on intent: Describe what the code does and why, rather than how it works.
Embed documentation: Include comments directly in the code or use tools to generate user-friendly references.
[!WARNING]
Documentation is not an afterthought! It is a key part of making software usable and reproducible.
Roundoff errors are an inherent limitation in numerical computations due to the finite precision of computer arithmetic. These small discrepancies arise because most real numbers cannot be represented exactly in a binary format. While often negligible, roundoff errors can accumulate in iterative processes or sensitive algorithms, leading to significant inaccuracies if not properly managed.
Finite Precision Representation
Computers represent numbers using a fixed number of bits, typically following the IEEE 754 standard for floating-point arithmetic. For example:
A 64-bit double has about 15–17 decimal digits of precision.
A 32-bit float has about 6–9 decimal digits of precision.
Numbers that cannot be expressed exactly as sums of powers of 2 (e.g., 0.1 in decimal) are approximated, leading to small representation errors.
Arithmetic Operations
Mathematical operations often introduce additional roundoff errors. For instance:
Adding or subtracting numbers of vastly different magnitudes can lead to loss of significance.
Multiplication and division propagate small errors in inputs into the results.
Algorithmic Sensitivity
Some algorithms are more susceptible to roundoff errors than others. Problems involving matrix inversion, polynomial evaluation, or iterative methods can amplify these errors.
Consider summing a series of small numbers using single-precision floating-point arithmetic. Due to the limited precision, roundoff errors can accumulate, significantly affecting the result.
#include <stdio.h>
int main() {
float sum = 0.0; // Single-precision float
double exact_sum = 0.0; // Double-precision for reference
// Summing 1 million small values
float small_value = 1e-7;
for (int i = 0; i < 1000000; i++) {
sum += small_value;
exact_sum += small_value;
}
// Print results
printf("Single-precision sum: %.7f\n", sum);
printf("Double-precision sum (reference): %.7f\n", exact_sum);
printf("Error: %.7f\n", exact_sum - sum);
return 0;
}
Explanation
Input: Summing 1×10−7 times.
Expected Result: The correct sum is 1×10−7×106=0.1.
Observed Behavior:
In single precision (float), the accumulated roundoff error results in a sum slightly less than 0.1.
In double precision (double), the error is negligible due to higher precision.
Sample Output
Single-precision sum: 0.0999999
Double-precision sum (reference): 0.1000000
Error: 0.0000001
[!NOTE]
Key Takeaways
Precision Matters: The error in the single-precision calculation illustrates how limited precision accumulates over many iterations.
Double Precision as Reference: Using double mitigates the error, making it a better choice for high-accuracy computations.
Real-World Impact: Similar issues can arise in large-scale simulations or when summing very small differences, potentially leading to significant inaccuracies in scientific results.
While roundoff errors cannot be completely eliminated, their effects can be minimized through careful design and implementation of numerical algorithms.
Choose the Right Precision
Use double precision for most scientific computations.
Switch to higher-precision libraries (e.g., mpfr in C or mpmath in Python) for cases requiring extreme accuracy.
Subtracting similar numbers magnifies relative errors. Instead, reformulate equations to avoid such operations.
Bad Practice:
double result = (a + b) - (a - b); // Large cancellation error
Improved Code:
double result = 2 * b; // Reformulated for better accuracy
Rescale Problems
Normalize input data to similar magnitudes before computation to prevent loss of significance. For instance, scale large matrices to avoid large/small value interactions.
Use Numerically Stable Algorithms
Prefer algorithms specifically designed to minimize roundoff error. For example:
Use Kahan Summation for summing large arrays to reduce error accumulation.
Prefer LU decomposition over naive matrix inversion.
Example of Kahan Summation in C:
double kahan_sum(double* array, int n) {
double sum = 0.0, c = 0.0;
for (int i = 0; i < n; i++) {
double y = array[i] - c;
double t = sum + y;
c = (t - sum) - y;
sum = t;
}
return sum;
}
Test for Tolerances, Not Exact Equality
Floating-point comparisons should account for small inaccuracies.
Machine precision refers to the smallest difference between two distinct floating-point numbers that a computer can represent. It defines the limit of accuracy for numerical computations in floating-point arithmetic. This concept is crucial in scientific computing because it governs the extent of roundoff errors and the reliability of numerical results.
In most systems, floating-point numbers are stored in the IEEE 754 standard, which uses a finite number of bits to represent numbers. These numbers are stored in the form:

x=(−1)s⋅m⋅2e
where:
s: Sign bit (0 for positive, 1 for negative)
m: Mantissa (or significand), representing the precision of the number
e: Exponent, representing the scale
Since only a finite number of bits are allocated to m, many real numbers cannot be represented exactly, leading to rounding to the nearest representable value.
Machine precision, often denoted as ϵ (epsilon), is the maximum relative error due to rounding in floating-point arithmetic. For a system using p-bit precision in the mantissa, ϵ is given by:
ϵ=2−(p−1)
This value represents the smallest fraction δ such that 1+δ>1 in the floating-point system.
Roundoff Errors: Operations like addition, subtraction, or multiplication may result in numbers that cannot be exactly represented. The difference between the exact value and the stored value is bounded by ϵ.
Loss of Significance: Subtracting two nearly equal numbers magnifies relative errors due to limited precision.
Algorithm Sensitivity: Certain numerical algorithms, such as those for solving linear systems or evaluating polynomials, are more prone to errors because of their sensitivity to ϵ.
Summary
Machine precision defines the smallest difference a floating-point system can discern. It determines the accuracy of numerical computations and plays a critical role in the design and evaluation of algorithms in scientific computing. Understanding and accounting for  helps mitigate errors and ensures reliable results.
You don’t know where your actual latest code is. Your teammate Ubaldo (or even worse, YOU!) made changes in their own folder called my_code_Ubaldo_final_v2/ and now your simulations mysteriously produce different results. Welcome to folder-based version control! It’s like doing Carbonara with bacon and cream.
What is “version control”, and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. The type of files is not restricted: source code, text files, images, etc. It allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Git is the most famous and currently used version control system. We'll see some basic git functionalities in this lecture.
Track changes: Know what changed, when, and by whom. Even if it was past-you (who can’t be trusted).
Undo mistakes: Broke your code? Git is a time machine. Go back to when things worked.
Collaborate without chaos: No more emailing .zip files named project_final_fixed_really.zip.
Parallel development: Work on new features in isolation using branches, and then merge without fear.
One source of truth: Everyone sees the same history. No more "which folder did you run that from?"
This lecture expects you to know how to open Terminal in macOS or Linux or Command Prompt or PowerShell in Windows. There are a lot of different ways to use Git. There are the original command-line tools, and there are many GUIs. The command line is the only place you can run all Git commands -- most of the GUIs implement only a partial subset of Git functionality for simplicity. If you know how to run the command-line version, you can probably also figure out how to run the GUI version, while the opposite is not necessarily true.
[!Installation and configuration]
The installation of Git will not be covered here. However, Git is so easy to install (e.g. with apt install git) and famous that it is probably already installed in you machine. Just open a terminal and type git --version to check if it is available.
Additionally, before using Git, it is good practice to configure a couple of things: your name, email, default text editor and default branch name. This steps will need to be done only once.
> git config --global user.name "John Doe" # username
> git config --global user.email johndoe@example.com # useremail
> git config --global core.editor vim # default text editor
> git config --global init.defaultBranch main # default branch name
If you have a project directory that is currently not under version control and you want to start controlling it with Git, you first need to go to that project’s directory. For example, let's create one
> mkdir project_tmp
Then, you can initialize git by going to that directory and using git init:
> cd project_tmp
> git init
This creates a new subdirectory named .git that contains all of your necessary repository files (you can see it if you do ls -a). If you want to start version-controlling any files, you should begin tracking those files and do an initial commit:
At this point, you should have a working Git repository on your local machine, and a checkout or working copy of all of its files in front of you. Typically, you’ll want to start making changes and committing snapshots of those changes into your repository each time the project reaches a state you want to record. To see all files tracked by git you can use git ls-files .. Let's start by making some changes to some file:
If the file that you changed was tracked by git, then you should be able to see that the file is modified using:
> git status
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: prova.py
no changes added to commit (use "git add" and/or "git commit -a")
If this edit is important, you can mark it in order to be part of the next commit using:
> git add prova.py
> git status
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: prova.py
By using git status again, you should see that the modified file is staged for the next commit.
Once you are satisfied with your edits, you can make these changes part of the repository by committing them:
> git commit -m "added a second greeting for politeness"
> git status
On branch main
nothing to commit, working tree clean
By running git status again, you will now see that there are no changes to the repository. The edited files are now "part" of the current snapshot and they are not considered changed anymore. It is important to understand that, in order to be useful and interoperable, commits should be focused on a specific topic and contain as few edits as possible. This will make it easy to find, inspect, move, delete or edit any commit from the repository history.
Let's try to understand what just happened here. Git has three main states that your files can reside in: modified, staged, and committed:
Modified means that you have changed the file but have not committed it to your database yet.
Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
Committed means that the data is safely stored in your local database.
This leads us to the three main sections of a Git project:
the working tree: a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
the staging area: a file, generally contained in your Git directory, that stores information about what will go into your next commit.
the Git directory: where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
Now, what if you want to inspect some of the changes done in one of your commits? After you have created several commits, or if you have cloned a repository with an existing commit history, you’ll probably want to look back to see what has happened. The most basic and powerful tool to do this is the git log command.
> git log
commit 08a9aa3df553b1c3e05174831115e96ac4892c5e (HEAD -> main)
Author: scarpma <scarpma@gmail.com>
Date: Thu Jun 5 15:29:54 2025 +0200
added exit message
commit 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
Author: scarpma <scarpma@gmail.com>
Date: Thu Jun 5 15:29:07 2025 +0200
added some multiplications and prints
commit 3506b463daff54b2e0bcd76ba1023bf567e1129e
Author: scarpma <scarpma@gmail.com>
Date: Thu Jun 5 15:22:22 2025 +0200
added a second greeting for politeness
commit 1d8cbc066272c9e91014357e345b0a93c56a6615
Author: scarpma <scarpma@gmail.com>
Date: Thu Jun 5 15:16:48 2025 +0200
initial project version
It lists all commits in reverse chronological order (first is last) on the current branch and prints some information:
commit hash,
commit author,
commit date,
commit message.
[!NOTE]
Additionally, you can see that in the first listed commit has a reference decorator containing "(HEAD -> main)". It means that HEAD is pointing at that commit, which is also the main branch. More specifically,
HEAD refers to the current commit checked out in your working directory
HEAD->main means that your current commit coincides with the main branch of your repository
these two references (HEAD and main) point to the same commit (snapshot of the repository)
At any stage, you may want to undo something. Here, we’ll review a few basic tools for undoing changes that you’ve made. We'll cover more powerful tools in the next course with Git branching. Be careful, because you can’t always undo some of these undos. This is one of the few areas in Git where you may lose some work if you do it wrong.
The first thing you can do with commits is inspect them. git show is the right tool for you. The argument it requires is, for example, a commit id, i.e. a reference for the commit that you want to inspect. In Git, commits are referenced mainly by the commit hash, i.e. the long sequence of 40-character hexadecimal number displayed by the git log command. Each part of the hash is directly derived from the content it represents, making every hash unique to its commit.
> git show 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
commit 1a8b02c05789ea602bf63129ae56cfd4649b0ac0
Author: scarpma <scarpma@gmail.com>
Date: Thu Jun 5 15:29:07 2025 +0200
added some multiplications and prints
diff --git a/prova.py b/prova.py
index d22ee64..a75df96 100644
--- a/prova.py
+++ b/prova.py
@@ -1,2 +1,7 @@
print("Hello World")
print("Hello again")
+
+a = 2.3
+b = 7.01
+print(f"{a=}, {b=}")
+print(f"a^b = {a**b:.4f}")
[!CURIOSITY]
Why are Git hashes important?
At its core, the Git version control system is a filesystem. It uses the SHA-1 hash function to name content. For example, files, directories, and revisions are referred to by hash values unlike in other traditional version control systems where files or versions are referred to via sequential numbers. The use of a hash function to address its content delivers a few advantages:
Integrity checking is easy. If even a single byte in your repository changes, the resulting hash will change
Uniqueness: every commit in Git can be accessed via its unique hash. If two snapshots are identical, their hash will be the same
Lookup of objects is fast
Using a cryptographically secure hash function brings additional advantages:
Object names can be signed and third parties can trust the hash to address the signed object
One of the common undos takes place when you commit too early and possibly forget to add some files, or you mess up your commit message. If you want to redo that commit, make the additional changes you forgot, stage them, and commit again using the --amend option git commit --amend.
This command takes your staging area and adds it to the last commit. If you’ve made no changes since your last commit (for instance, you run this command immediately after your previous commit), then your snapshot will look exactly the same, and all you’ll change is your commit message. The same commit-message editor fires up, but it already contains the message of your previous commit. You can edit the message the same as always, but it overwrites your previous commit.
As an example, if you commit and then realize you forgot to stage the changes in a file you wanted to add to this commit, you can do something like this:
It’s important to understand that when you’re amending your last commit, you’re not so much fixing it as replacing it entirely with a new, improved commit. Effectively, it’s as if the previous commit never happened, and it won’t show up in your repository history.
The obvious value to amending commits is to make minor improvements to your last commit, without cluttering your repository history with commit messages of the form, “Oops, forgot to add a file” or “Darn, fixing a typo in last commit”.
Only amend commits that are still local and have not been pushed somewhere. Amending previously pushed commits and force pushing the branch will cause problems for your collaborators.
If you used git add <file> unintentionally, you may want to remove a file change from the stage area. Fortunately, the command you use to determine the state of the working and stage areas (git status) also reminds you how to undo changes to them:
> touch tmp
> git add tmp
> git status
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: tmp
Right below the “Changes to be committed” text, it says use git restore --staged <file>... to unstage. By using git restore --staged tmp, the tmp file remains changed, but returns unstaged.
[!NOTE]
If you are using older versions of Git, probably Git is reminding you a different command to unstage the file:
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: tmp
This is because Git version 2.23.0 introduced a new command: git restore. It’s basically an alternative to git reset which will be covered in the next lecture. From Git version 2.23.0 onwards, Git will recommend restore instead of reset for many undo operations.
We finish this lecture with a taste of remote repositories. You can download the repository we just made using git clone command:
> git clone https://github.com/scarpma/git_course.git
> cd git_course
That creates a directory named git_course, initializes a .git directory inside it, pulls down all the data for that repository(and its history), and checks out a working copy of the latest version. We'll se in the following lectures how to work with remote repositories (git remote add, git pull and git push).
[!NOTE]
Git has a number of different transfer protocols you can use. The previous example uses the https:// protocol, but you may also see git:// or user@server:path/to/repo.git, which uses the SSH transfer protocol. For private repositories, the ssh protocol is recommended.
Branching means you diverge from the main line of development and continue to do work without messing with that main line. With folder-based version control this is a somewhat expensive process as it probably requires you to create a whole new copy of your source code directory.
Instead, with Git it becomes incredibly lightweight, making branching operations nearly instantaneous, for example switching back and forth between branches. Git encourages workflows that branch and merge often, even multiple times in a day. Mastering this feature might change entirely the way that you develop.
To start this lecture, let's create a repository with a very basic code for solving the Laplace equation in two dimensions. We have a square domain Ω=[0,1]×[0,1], with Dirichlet boundary conditions on the square's borders. Additionally, we put two plates (lines in 2d) inside the square where we enforce dirichlet boundary conditions u=±1:
where L is the lenght of the two plates and D is the distance between the two.
We can discretize the problem using second order central differences for the laplacian. Let uij=u(xi,yj), with xi=i/dx and yi=j/dy, dx=dy. This will end up with
ui,j=4ui+1,j+ui−1,j+ui,j+1+ui,j−1
which we can solve with, for example the iterative Jacobi method:
ui,jk+1=4ui+1,jk+ui−1,jk+ui,j+1k+ui,j−1k.
The idea is that performing several iterations, the method will converge to the discretized solution of the original laplace equation. This approach is generally faster and requires a lot less memory for sparse algebraic systems than direct methods.
[!NOTE]
To have some physical intuition, you are basically solving the electrostatic potential problem in a 2D domain by numerically integrating the Laplace equation. Two parallel plates with fixed potentials simulate a capacitor, generating an electric field in the surrounding region.
Figure 1. Domain sketch.
[!SOURCES]
poisson.f90
program poisson
use precision
implicit none
real(dp), dimension(:,:), allocatable :: U, Uold
real(dp), dimension(:,:), allocatable :: rho
real(dp) :: tolerance, a, err, maxerr, w
integer :: i,j,k, N, error
character(1) :: M, BC
character(20) :: arg
real(dp), parameter :: Pi=3.141593_dp
real(dp), parameter :: e2=14.4_dp ! eV*Ang
real(dp) :: L, dx, D
integer :: istart, iend, j1, j2
! command line arguments help
if (iargc()<5) then
write(*,*) 'poisson Method N tolerance w BC'
write(*,*) ' - Method: J|G (Jacobi or Gauss-Siedel)'
write(*,*) ' - N: <int> (number of points in x and y)'
write(*,*) ' - tolerance: <real> (for convergence)'
write(*,*) ' - w: <real> (relaxation only for G)'
write(*,*) ' - BC: D|N (Dirichlet or Neumann)'
stop
endif
! parsing of command line arguments
call getarg(1,arg)
read(arg,*) M
call getarg(2,arg)
read(arg,*) N
call getarg(3,arg)
read(arg,*) tolerance
call getarg(4,arg)
read(arg,*) w
call getarg(5,arg)
read(arg,*) BC
allocate(U(N,N), stat=error)
allocate(Uold(N,N), stat=error)
if (error /= 0) then
write(*,*) 'allocation error'
stop
endif
! grid spacing
dx = 1.0_dp / N
! initial condition for first iteration
U=0.0_dp
! dirichlet boundary conditions (box)
U(1:N,1) = 0.0_dp ! bottom edge
U(1:N,N) = 0.0_dp ! top edge
U(1,1:N) = 0.0_dp ! left edge
U(N,1:N) = 0.0_dp ! right edge
! dirichlet boundary condition on plates
L = 0.3_dp ! plate length
D = 0.5_dp ! plate separation
istart = int(N/2 - L/(2*dx)) ! Plate x-start
iend = int(N/2 + L/(2*dx)) ! Plate x-end
j1 = int(N/2 - D/(2*dx)) ! Bottom plate y-index
j2 = int(N/2 + D/(2*dx)) ! Top plate y-index
U(istart:iend, j1) = -1.0_dp
U(istart:iend, j2) = +1.0_dp
select case (M)
case("J")
write(*,*) "Jacobi iteration"
case("G")
write(*,*) "Gauss-Siedel iteration"
end select
! -----------------------------
! Iterative solver main loop
! -----------------------------
err = 2.0_dp * tolerance
k = 1
do while (err > tolerance)
Uold = U ! Store previous solution
maxerr = 0.0_dp ! Reset max error
! ---------------
! loop in space
! ---------------
do j = 2, N-1
do i = 2, N-1
! Skip capacitor plates
if (i >= istart .and. i<=iend .and. (j==j1 .or. j==j2)) cycle
! solution domain: perform iteration update
select case (M)
case("J")
U(i,j) = (Uold(i-1,j) + Uold(i+1,j) + Uold(i,j-1) + Uold(i,j+1))/4.0_dp
case("G")
write(*,*)'Gauss-Siedel not implemented. Stopping'
call exit(-1)
end select
! check covergence
if (abs(Uold(i,j)-U(i,j)) > maxerr ) then
maxerr = abs(Uold(i,j) - U(i,j))
end if
! relaxation factor w
U(i,j) = (1-w)*Uold(i,j) + w*U(i,j)
! optional Neumann o(a^2) along x==1 and x==n
if (BC.eq."N") then
write(*,*)'Neumann B.C. not implemented. Stopping'
call exit(-1)
endif
end do
! optional Neumann o(a^2) along y==1 and y==n
if (BC.eq."N") then
write(*,*)'Neumann B.C. not implemented. Stopping'
call exit(-1)
endif
end do
! Output iteration progress
write(*,*) 'iter: ',k, maxerr
err = maxerr
k = k + 1
end do
! write to file in fortran order
open(101, file='sol.dat')
do j = 1, N
do i = 1, N
write(101, *) U(i,j)
end do
write(101,*)
end do
close(101)
end program poisson
precision.f90
module precision
integer, parameter, public :: dp = 8
end module precision
plot.py
import numpy as np
import matplotlib.pyplot as plt
import sys
N = int(sys.argv[1])
nx, ny = N, N
D = 0.5
L = 0.3
boxC = 0.5
# load flattened data with fortran order (y1x1 y1x2 y1x3 ... y2x1 y2x2 y2x3 ...)
# (x is contiguous in memory)
data = np.loadtxt('sol.dat')
# reshape and transpose to convert to "standard" c order
# (y is contiguous in memory)
data = data.reshape((ny,nx)).T
x = np.linspace(0, 1, nx)
y = np.linspace(0, 1, ny)
X, Y = np.meshgrid(x, y, indexing='ij')
Ey, Ex = np.gradient(-data, y, x)
# create figure and axis
fig, ax = plt.subplots()
ax.set_xlabel('y')
ax.set_ylabel('x')
# plot 2d solution with contour lines
im = ax.imshow(data, extent=[x[0],x[-1],y[0],y[-1]], origin='lower', interpolation='nearest')
fig.colorbar(im, ax=ax)
# plot streamlines of the gradient field (electric field)
Em = np.sqrt(Ey**2. + Ex**2.)
lw = 8. * Em / Em.max() # linewidth depending on magnitude
ax.streamplot(x, y, Ex, Ey, color='white', linewidth=lw, arrowsize=0.7, density=1.2)
cntr = ax.contour(data, [-0.7, -0.25, -0.05, 0.05, 0.25, 0.7], colors='red', extent=[0,1,0,1])
ax.clabel(cntr, cntr.levels, fontsize=10, colors='red') # plot contour values
# plot bars inside domain
ax.vlines(x=boxC-D/2., ymin=boxC-L/2., ymax=boxC+L/2., linewidth=2, color='b')
ax.vlines(x=boxC+D/2., ymin=boxC-L/2., ymax=boxC+L/2., linewidth=2, color='b')
plt.show()
Makefile
# Simple Makefile for a Fortran program
# Program: poisson.f90
# Module: precision.f90 (used by poisson.f90)
# Compiler and flags
# Fortran compiler
FC = gfortran
# Optimization level 3
FFLAGS = -O3
# Define source files and the final executable name
MODULE = precision.f90
MAIN = poisson.f90
EXEC = poisson
# Define object files: these are compiled versions of the source files
OBJS = precision.o poisson.o
# Default target: builds the executable
# This rule says: to build `poisson`, first make sure all object files are compiled
$(EXEC): $(OBJS)
$(FC) $(FFLAGS) -o $(EXEC) $(OBJS)
# Rule to compile the module
# (modules must be compiled before the files that use them)
precision.o: precision.f90
$(FC) $(FFLAGS) -c precision.f90
# Rule to compile the main program
# Depends on both poisson.f90 and precision.o
poisson.o: poisson.f90 precision.o
$(FC) $(FFLAGS) -c poisson.f90
# Utility target: clean up compilation artifacts
# Run `make clean` to remove object files, module files, and the executable
clean:
rm -f *.o *.mod $(EXEC)
![WARNING]
HTML cannot render hard-tabs, which are required in makefile language. To fix this, you have to replace soft tabs in front of $(FC) ... with hard tabs. You can use:
# linux
sed -i 's/^\(.*\$(FC)\)/\t\$(FC)/' Makefile
#mac os x
sed -i .bak 's/^\(.*\$(FC)\)/\t\$(FC)/' Makefile
Enough with equations. Create the directory, create the files (copying the content from the sources given in the previous note) and use git init, git add and git commit commands to setup the repository. You can run the program by doing:
> make
> # run the solver with 100x100 grid, 1e-5 tolerance
> ./poisson J 100 1e-5 1.0 D
> # plot results (100x100 grid)
> python3 plot.py 100
Now, how do we proceed if we want to add some features to the code? Let's implement Gauss-Siedel method instead of Jacobi. To do that, you should change the update rule with the following:
If you want, you can delete the whole poisson.f90 file and create it from scratch (git will have a backup in any case) with the following updated source:
[!SOURCES]
poisson.f90
program poisson
use precision
implicit none
real(dp), dimension(:,:), allocatable :: U, Uold
real(dp), dimension(:,:), allocatable :: rho
real(dp) :: tolerance, a, err, maxerr, w
integer :: i,j,k, N, error
character(1) :: M, BC
character(20) :: arg
real(dp), parameter :: Pi=3.141593_dp
real(dp), parameter :: e2=14.4_dp ! eV*Ang
real(dp) :: L, dx, D
integer :: istart, iend, j1, j2
! command line arguments help
if (iargc()<5) then
write(*,*) 'poisson Method N tolerance w BC'
write(*,*) ' - Method: J|G (Jacobi or Gauss-Siedel)'
write(*,*) ' - N: <int> (number of points in x and y)'
write(*,*) ' - tolerance: <real> (for convergence)'
write(*,*) ' - w: <real> (relaxation only for G)'
write(*,*) ' - BC: D|N (Dirichlet or Neumann)'
stop
endif
! parsing of command line arguments
call getarg(1,arg)
read(arg,*) M
call getarg(2,arg)
read(arg,*) N
call getarg(3,arg)
read(arg,*) tolerance
call getarg(4,arg)
read(arg,*) w
call getarg(5,arg)
read(arg,*) BC
allocate(U(N,N), stat=error)
allocate(Uold(N,N), stat=error)
if (error /= 0) then
write(*,*) 'allocation error'
stop
endif
! grid spacing
dx = 1.0_dp / N
! initial condition for first iteration
U=0.0_dp
! dirichlet boundary conditions (box)
U(1:N,1) = 0.0_dp ! bottom edge
U(1:N,N) = 0.0_dp ! top edge
U(1,1:N) = 0.0_dp ! left edge
U(N,1:N) = 0.0_dp ! right edge
! dirichlet boundary condition on plates
L = 0.3_dp ! plate length
D = 0.5_dp ! plate separation
istart = int(N/2 - L/(2*dx)) ! Plate x-start
iend = int(N/2 + L/(2*dx)) ! Plate x-end
j1 = int(N/2 - D/(2*dx)) ! Bottom plate y-index
j2 = int(N/2 + D/(2*dx)) ! Top plate y-index
U(istart:iend, j1) = -1.0_dp
U(istart:iend, j2) = +1.0_dp
select case (M)
case("J")
write(*,*) "Jacobi iteration"
case("G")
write(*,*) "Gauss-Siedel iteration"
end select
! -----------------------------
! Iterative solver main loop
! -----------------------------
err = 2.0_dp * tolerance
k = 1
do while (err > tolerance)
Uold = U ! Store previous solution
maxerr = 0.0_dp ! Reset max error
! ---------------
! loop in space
! ---------------
do j = 2, N-1
do i = 2, N-1
! Skip capacitor plates
if (i >= istart .and. i<=iend .and. (j==j1 .or. j==j2)) cycle
! solution domain: perform iteration update
select case (M)
case("J")
U(i,j) = (Uold(i-1,j) + Uold(i+1,j) + Uold(i,j-1) + Uold(i,j+1))/4.0_dp
case("G")
U(i,j) = (U(i-1,j) + Uold(i+1,j) + U(i,j-1) + Uold(i,j+1))/4.0_dp
end select
! check covergence
if (abs(Uold(i,j)-U(i,j)) > maxerr ) then
maxerr = abs(Uold(i,j) - U(i,j))
end if
! relaxation factor w
U(i,j) = (1-w)*Uold(i,j) + w*U(i,j)
! optional Neumann o(a^2) along x==1 and x==n
if (BC.eq."N") then
write(*,*)'Neumann B.C. not implemented. Stopping'
call exit(-1)
endif
end do
! optional Neumann o(a^2) along y==1 and y==n
if (BC.eq."N") then
write(*,*)'Neumann B.C. not implemented. Stopping'
call exit(-1)
endif
end do
! Output iteration progress
write(*,*) 'iter: ',k, maxerr
err = maxerr
k = k + 1
end do
! write to file in fortran order
open(101, file='sol.dat')
do j = 1, N
do i = 1, N
write(101, *) U(i,j)
end do
write(101,*)
end do
close(101)
end program poisson
The nice thing is that we can let Git check the differences to be sure that nothing else changed:
By committing this "new file", Git will automatically update the previous snapshot by adding only the new feature. This is very important, as it allows us to understand clearly what changed from the previous directory snapshot (commit).
You can run the code again with the Gauss-Siedel method using the G option instead of J. As it often happens, it converges faster:
> make
> ./poisson J 100 1e-5 1.0 D # converges in 2005 iterations
> ./poisson G 100 1e-5 1.0 D # converges in 1131 iterations
![EXERCISE]
Did you notice there is a small problem in the code? The convergence check is before the relaxation step. Fix it and do a specifi commit.
Now, what if you messed up something, a problem came up, and you want to revert to the last "working" commit? First, let's look at the history:
> git log
commit c528bea83fdaaf6115dbd41c72a90ff191f744a0 (main)
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 21:55:06 2025 +0200
inverted convergence check and relaxation step
commit 75f7a90f0628375cdb3bc1ddcefa23e747f8fe76
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 18:01:35 2025 +0200
implemented Gauss-Siedel method
commit f5d62843f8bff218434a93c72319b27d05000128
Author: Alessandro Pecchia <alessandro.pecchia@ismn.cnr.it>
Date: Tue Dec 1 11:29:52 2020 +0100
First commit
If we want to go back to a previous commit we can use the git reset. In Git language, "resetting" means to change the snapshot the current branch and HEAD point to. You can do this by also "changing" the working tree (hard resetting) or by leaving it as it is now (soft resetting). Let's go back to the last commit (75f7a):
> git reset 75f7a
> git log
commit 75f7a90f0628375cdb3bc1ddcefa23e747f8fe76 (HEAD -> main)
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 18:01:35 2025 +0200
implemented Gauss-Siedel method
commit f5d62843f8bff218434a93c72319b27d05000128
Author: Alessandro Pecchia <alessandro.pecchia@ismn.cnr.it>
Date: Tue Dec 1 11:29:52 2020 +0100
First commit
>
>On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: poisson.f90
no changes added to commit (use "git add" and/or "git commit -a") git status
As you can see from git log, our "last commit" now is 75f7a, which is the one we just resetted onto. One commit disappeared. However, its changes are still on the working tree (as can be seen with git status).
If we want to "hard reset" we can do:
> git reset --hard 75f7a
> git status
On branch main
Your branch is behind 'origin/main' by 1 commit, and can be fast-forwarded.
(use "git pull" to update your local branch)
nothing to commit, working tree clean
Now it's really as if the deleted commit never happened. At least from the history we see. What if we want to recover that commit? We went back in the past by selecting a specific commit using git log, but it cannot see the future. So is the deleted commit lost? In Git almost nothing is ever lost, especially if it was committed somewhere. We can, for example, scroll back in the terminal and see the hash of the deleted commit (c528b). Once we have it, we can do a hard reset again:
> git reset --hard c528b
> git log
commit c528bea83fdaaf6115dbd41c72a90ff191f744a0 (HEAD -> main)
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 21:55:06 2025 +0200
inverted convergence check and relaxation step
commit 75f7a90f0628375cdb3bc1ddcefa23e747f8fe76
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 18:01:35 2025 +0200
implemented Gauss-Siedel method
commit f5d62843f8bff218434a93c72319b27d05000128
Author: Alessandro Pecchia <alessandro.pecchia@ismn.cnr.it>
Date: Tue Dec 1 11:29:52 2020 +0100
First commit
and here we are, as if nothing ever happened. This is not the only way it could be recovered (check git reflog for example), however there's a better way to do this kind of commit tagging: branching.
Creating a branch is equivalent to putting a reference to a certain commit. As we saw in the first lecture, the output of git log shows, for each commit listed, if some branch points to it. In the git log output just above (HEAD -> main) is along the commit c528b, meaning that the main branch points to it. Additionally, it means the the current working directory (HEAD) points to main as well.
You can check this with git log. However, we have not checked out it yet (i.e. "switched to it" in Git terminology). Indeed, HEAD still directly points to main. To switch to an existing branch, you run the git checkout command, so
From Git, this is shown by the git log command. However, doing git log now will show only the history of the debug branch, so c528b will be hidden. Instead, we can provide the branch main as argument
> git log main
commit c528bea83fdaaf6115dbd41c72a90ff191f744a0 (origin/main, main)
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 21:55:06 2025 +0200
inverted convergence check and relaxation step
commit 75f7a90f0628375cdb3bc1ddcefa23e747f8fe76 (HEAD -> debug)
Author: scarpma <scarpma@gmail.com>
Date: Sun Jun 8 18:01:35 2025 +0200
implemented Gauss-Siedel method
commit f5d62843f8bff218434a93c72319b27d05000128
Author: Alessandro Pecchia <alessandro.pecchia@ismn.cnr.it>
Date: Tue Dec 1 11:29:52 2020 +0100
First commit
and we can see that the reference decorators describe the situation perfectly. Now, if you would like to go back to the "most update" version of the repository, you could simply do git checkout main, without having to remember the hash of the precise commit.
Imagine that now we are asked to implement Neumann boundary conditions, but we are not sure if the last commit added to main is correct or not. It might be better to continue working on the debug branch and figure out later how to fix the problem
> git checkout debug
So let's implement Neumann boundary conditions $$\partial u / \parital n = 0$$ on the debug branch.
You can insert
! optional Neumann o(a^2) along x [y==1 and y==N] dUdy=0
if (BC.eq."N") then
U(i,1) = 4.0_dp/3.0_dp * U(i,2) - 1.0_dp/3.0_dp * U(i,3)
U(i,N) = 4.0_dp/3.0_dp * U(i,N-1) - 1.0_dp/3.0_dp * U(i,N-2)
endif
Visually, it's clear why they are called branch: we branched from the main history of our project and created a different path. This is called a divergent history (the debug branch has diverged from the "main" history)
Now, if, after some testing, we understand that commit c528b did not introduce any problem, it may have been better to implement the new feature directly in the main branch. However, this is not a problem, as Git allows us to move and rebase commits at will. There are two ways to solve this situation now. Remember that commits can be seen as local edits to the repository. We might want to copy commit c528b on top of the debug branch (git cherry-pick), or we might move, i.e. rebase, the whole debug branch on top of the update main (git rebase).
Let's do a rebase. We want to achieve something like this
Fortunately, this is very easy: git rebase command takes the commit you want to "base" your current branch onto, so
> git rebase main
will do the work. Since c528b is very similar to the previous "base" 75f7a, everything should go smoothly and complete automatically. Note that the commit hash has changed to d0710 after rebasing. This is normal because, as we already covered, the hash reflects the content of that snapshot. If the base of a commit changes, its hash changes as well.
We can check with git log that now our debug branch is based on main. Tipically, this is the best situation because, if everything is ok and we are convinced to keep this "version" of the repository, we can directly merge into main without any problems
> git checkout main
> git merge debug
and now the situation is clean again, with all updated and working features on the main branch:
Whatever kind of code you are developing, if it is data analysis, numerical, or just complex, it will eventually fail. The question is not if, but when — and how long it will take you to find out. Ideally, each commit you add to the main branch should introduce features, without breaking what is already there. How can you be sure that that's the case? For programs that solve numerical problems, for example, even small errors are important to capture, as they might highlight an instability, or something that might be accumulated over iterations.
Checking manually if every part of the code works at each commit you push to main is literally impossible. That's where automated testing comes in. Tests are not optional add-ons, they are core infrastructure. Without them, you're completely blind as you cannot monitor how your code behaves.
Deterministic: It gives the same result every time.
Fast: Tests should run in seconds, not minutes.
Avoid vague or overly general tests. Don’t just check that a function "runs" Assert specific numerical results. For numerical methods, this might mean checking convergence or expected errors.
Let's see this simple example with the bisection algorithm written in C. Let's imagine we wrote the following two source files that will be used by some other program to find the roots of an equation.
bisection.c
#include <math.h>
double bisection(
double (*f)(double),
double a,
double b,
double tol,
int max_iter)
{
for (int i = 0; i < max_iter; i++) {
double c = (a + b) / 2;
if (fabs(f(c)) < tol) {
return c;
}
if (f(a) * f(c) < 0) {
b = c;
} else {
a = c;
}
}
// Return the best guess if
// max iterations are reached
return (a + b) / 2;
}
bisection.h
double bisection(
double (*f)(double),
double a,
double b,
double tol,
int max_iter);
Before going ahead to use this function, let's create a unit test for it. A unit test is a small, isolated test that verifies the functionality of a specific part of the code, usually a single function. It checks if the code behaves as expected under defined conditions. Let's create three unit tests:
test_bisection.c
#include <stdio.h>
#include <assert.h>
#include <math.h>
#include "bisection.h"
// Example function: x^2 - 2
double f(double x) {
return x * x - 2;
}
// Test function for bisection method
void test_bisection_root() {
double root = bisection(f, 0, 2, 1e-10, 1000);
printf("Test 1: Checking if root is close to sqrt(2)...\n");
assert(fabs(root - sqrt(2)) < 1e-10); // Root should be close to sqrt(2)
printf("Test 1 passed 👌.\n");
}
// Test case where the function has no root in the interval
void test_bisection_no_solution() {
double root = bisection(f, 1, 1.5, 1e-5, 100);
printf("Test 2: Checking if root is within the interval [1, 1.5]...\n");
assert(root >= 1 && root <= 1.5);
printf("Test 2 passed 👌.\n");
}
// Test case with root at the interval's boundary
void test_bisection_edge_case() {
double root = bisection(f, sqrt(2.)-1.e-10, 2, 1e-8, 100);
printf("Test 3: Checking if root is close to sqrt(2) in small interval...\n");
assert(fabs(root - sqrt(2)) < 1e-8);
printf("Test 3 passed 👌.\n");
}
int main() {
test_bisection_root();
test_bisection_no_solution();
test_bisection_edge_case();
printf("All tests passed 👌🏞️🏖️🚀.\n");
return 0;
}
These tests check that the function behaves as expected in three different situations. Let's compile everything and check if it works
> cc -c bisection.c
> cc -o test_bisection test_bisection.c bisection.o -lm
> ./test_bisection
Test 1: Checking if root is close to sqrt(2)...
Test 1 passed 👌.
Test 2: Checking if root is within the interval [1, 1.5]...
Test 2 passed 👌.
Test 3: Checking if root is close to sqrt(2) in small interval...
Test 3 passed 👌.
All tests passed 👌🏞️🏖️🚀.
Every test passed, so now you can commit to main. In the future, when adding other commits, before pushing to main you have to make sure that all tests pass, even the one of past features. Otherwise you are undoing the work of your past self, or someone else.
We will now apply to a simple but structured repository what we have learned in this course.
To contribute to a public repository, the standard approach is to create a fork, i.e. a personal copy of the repository, clone it and interact with it. Once you want to merge your work into the original public repository, you should create "pull request" to the maintainers, wait for a review, and eventually be merged. This is the typical workflow in very large open-source projects, where the development process must be supervised by someone. Otherwise it's chaos.
In a small community, however, if people interact everyday regularly, these additional layers of complications might be unnecessary. Everyone might be able to directly push to the main branch, at his own risk and responsability. This, of course, is possible only in small, local communities. Let's dive into this situation by working together on a very simple molecular dynamics solver.
To clone the repository with write permissions (being able to create branches, push to main, etc.) you have to use be added to the repository by the maintainer and use the ssh protocol.
![NOTE]
To use the ssh (Secure Shell) protocol you need an ssh key. SSH key pairs use public key infrastructure technology, the gold standard for digital authentication and encryption. An SSH key relies on the use of two related but asymmetric keys, a public key and a private key, that together create a key pair that is used as the secure access credential. The private key is secret, known only to the user, and should be stored safely. The public key can be shared freely with any SSH server to which the user wishes to connect.
If you don't already have it, to create a ssh key pair you can use the ssh-keygen command, which will generate a public key <keyname>.pub and its private counterpart <keyname> in the ~/.ssh/ directory.
Once your github account has been granted write access to the repository, you will need to copy the public key in your github account in order to associate your local terminal to your github account.
> cd
> git clone git@github.com:scarpma/md6.git
> cd md6
You can check the repository's README.md file to understand what the program does.
In this repository, unfortunately, there are no unit tests. However, another useful kind of tests are what can be called "End-to-End Tests". These tests verify the entire program’s behavior by running the full application in a real environment and testing whether the results match expectations.
In the numerical scientific world, these tests can ensure that the output matches expected results either
From a previous version
From an externally-provided reference:
existent analytical solutions
existent results in the literature
Check the run_tests.sh to understand what kind of tests the repository does.
We can compile and check if the checked out version of the repository works and passes all tests:
> make
> ./run_tests.sh
make: Nothing to be done for `all'.
===========================
Doing test test_fluid/
===========================
r_max=3.54835 BOXL=5.25 red. dens=0.746356
Initialize FCC lattice and random velocities.
Integration started:
Done!
Testing ...
col 1 max diff: 0
col 2 max diff: 0
col 3 max diff: 0
col 4 max diff: 0
col 5 max diff: 0
col 6 max diff: 0
===========================
ALL TEST PASSED
YOU'RE GOOD TO GO
===========================
===========================
Doing test test_solid/
===========================
r_max=3.54835 BOXL=5.25 red. dens=0.746356
Initialize FCC lattice and random velocities.
Integration started:
Done!
Testing ...
col 1 max diff: 0
col 2 max diff: 0
col 3 max diff: 0
col 4 max diff: 0
col 5 max diff: 0
col 6 max diff: 0
===========================
ALL TEST PASSED
YOU'RE GOOD TO GO
===========================