GDB

GDB, the GNU Debugger, is a powerful tool, which can be used for debugging C, C++, Fortran, or other compiled codes.
When writing complex applications, bugs such as segmentation faults, memory corruption, or logical errors are often difficult to trace by simply reviewing the code, and debuggers like GDB provide you with a robust interface to control program execution by running your program, for example, line by line, inspecting variables, following the flow of function calls, and observing how your program interacts with memory and data.

First steps with GDB

Let's make a simple example of how one can use GDB to inspect the flow of a program (no bugs at the moment, so let's call it nobug.c):

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int arr[5];
    for (int i = 0; i < 5;  i++) {
        arr[i] = i;
        printf("%d\n",arr[i]);
    }
    return 0;
}

To compile a code with GCC and add the information necessary to run the GDB debugger, pass the -g flag for basic functionality, or the -ggdb flag for GDB-specific features.

> gcc -o nobug -ggdb nobug.c

Now we can run our executable within GDB,

> gdb ./nobug
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nobug...

Now, we issue the command l, or list:

(gdb) l
1       #include <stdio.h>
2       #include <stdlib.h>
3
4       int main(void) {
5           int arr[5];
6           for (int i = 0; i < 5;  i++) {
7               arr[i] = i;
8               printf("%d\n",arr[i]);
9           }
10          return 0;

By passing the flag -ggdb at compile time we included in the executable file information on the source code. Without this information it would still be possible to run the debugger, but it will be much more difficult to understand its output.

[!TIP] Including debugging info with options -g or -ggdb does not slow down the execution of the programe (despite someone affirming the contrary). Therefore, it is in general recommended to include the debugging flags everytime you compile your code! If you don't want your source to be at hand for everyone, you can remove the debugging flag, or call strip <file> on your executable, to remove all strings included in the binary file.

Let's make the first step by introducing a breakpoint, that is, a point in the code where the debugger will stop the execution and give you the opportunity to inspect the variables. For now, let's break right after starting the main function:

(gdb) break nobug.c:main
Breakpoint 1 at 0x1175: file nobug.c, line 4.

[!TIP]

  1. With the breakpoint set (we could have also passed directly the line with break 3, or the short version b 3
  2. Modern versions of GDB provide autocompletion! So try typing b nob<tab> and see what happens!

We can now run the program, which will break exection immediately:

(gdb) run
Starting program: /home/m.sega/test_daje/nobug 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at nobug.c:4
4       int main(void) {

At this point no variable has been defined. Let's make a step forward in the execution, with step, and print the value of i:

(gdb) step
6           for (int i = 0; i < 5;  i++) {
(gdb) print i
$1 = 0

At this point, no element of arr has been assigned. We can see this by printing the whole array (we can also access its address):

(gdb) print arr
$2 = {-6471, 32767, 100, 0, 4096}
(gdb) print &arr
$3 = (int (*)[5]) 0x7fffffffe250

Let's now go to the next instruction with next, and print the value of arr[i] (yes, you can also use variables in your expressions!)

[!NOTE] step goes one step by entering also the execution of the called functions, whereas next goes to the next instruction staying always at the same level, that is, without entering functions.

(gdb) next
7               arr[i] = i;
(gdb) print arr[i]
$4 = -6471

From the lines above you understand that the flow is interrupted before executing the line shown.
In fact, by issuing `next another time we get:

(gdb) next
8               printf("%d\n",arr[i]);
(gdb) print arr[i]
$5 = 0
(gdb) print arr
$6 = {0, 32767, 100, 0, 4096}

The first element has been assigned, but the remaining four still have random values.

[!NOTE] some compilers can be instructed to zero-out numerical variables when they are declared. Check the manual page of your compiler!

At the next iteration, the second element is assigned:

(gdb) next
0
6           for (int i = 0; i < 5;  i++) {
(gdb) next
7               arr[i] = i;
(gdb) next
8               printf("%d\n",arr[i]);
(gdb) next
1
6           for (int i = 0; i < 5;  i++) {
(gdb) print arr
$7 = {0, 1, 100, 0, 4096}

That's it for this simple example, where we have seen how to take control over the flow of the code and to inspect the value and size of variables!

[!IMPORTANT] As you can imagine, this is a powerful tool that allows you to avoid the typical debugging pattern of adding manually debugging information in the form of printf statement distributed along your code!

Debugging with backtrace

Let's introduce now a program with a simple bug:

#include <stdlib.h>
#include <stdio.h>

int function(int* arr) {
    int i,sum=0;
    for (i = 0; i < 2048 ;  i++) {
        arr[i] = i;
        sum += i ;
    }
    return sum;
}

int main(void){
    int a[2],s;
    s = function(a);
    printf("result=%d\n",s);
    return s;
}

Compiling and running this code on older system would directly raise a Segmentation Fault (i.e., a violation of memory boundaries). On modern systems, the stack is protected, so that an error message like this is probably what you would observe this output:

*** stack smashing detected ***: terminated
Aborted

[!NOTE] To reproduce the behaviour of older compiler, you could compile and run in this way

> gcc -o bug1   -fno-stack-protector  -ggdb bug1.c
> ./bug1
Segmentation fault
>

obtaining the Segmentation fault message

Let's run this code within the debugger with

> gdb ./bug1
(gdb) run
Starting program: /home/m.sega/test_daje/bug1 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000055555555517c in function (arr=0x7fffffffe274) at bug1.c:7
7               arr[i] = i;

The execution stops where the segmentation fault happened. This is already an interesting bit of information. Let's show the code around that line with the list (l) command:

(gdb) l
2       #include <stdio.h>
3
4       int function(int* arr) {
5           int i,sum=0;
6           for (i = 0; i < 1<<2048;  i++) {
7               arr[i] = i;
8               sum += i ;
9           }
10          return sum;
11      }

We know that the problem is most likely the array arr. However, arr is passed as a pointer, so that we need to understand how big is the memory region that arr points to, and in which part of the code this has been allocated. In this simple example the answer is obvious, but in a more complex code this could be difficult to spot, for example because function could be called in many different places, passing different pointers.

The backtrace command (bt in short) is our friend here, because it shows the whole stack of function calls that brought to this SEGFAULT:

(gdb) bt
#0  0x000055555555517c in function (arr=0x7fffffffe274) at bug1.c:7
#1  0x00005555555551ae in main () at bug1.c:15

which shows that the call stack is just made of two frames, #0 and #1. We know already that we need to look in a frame above the one where the SEGFAULT happened, so we switch to frame #1 with the frame command (f in short):

(gdb) f 1
#1  0x00005555555551ae in main () at bug1.c:15
15          s = function(a);

This way, the debugger shows us the line in frame #1 where the function was called. In this frame we can inspect the type of a, and realise that it's just two elements long, so that the loop that goes up to 2048 is clearly going out of bounds:

(gdb) ptype(a)
type = int [2]

Debugging with watchpoints

Have a look at this seemingly innocuous piece of code:

#include <stdio.h>

int main() {
    int arr1[2] = {2,2};
    int arr2[2] = {1,1};

    for (int i=0 ; i<arr1[0] ; i++){
            arr2[arr1[1]]++ ;
            printf("%d\n",arr2[arr1[1]]);
    }

    return 0;
}

At first sight, if you're not looking attentively enough, this should perform a loop from 0 to arr1[0] (which is equal to 2), incrementing one of the elements of arr2 and printing its value. In practice, if you run it (with an old compiler), it will never stop...

> gcc -o bug2   -fno-stack-protector  -ggdb bug2.c  && ./bug2
3
4
5
6
7
8
9
...

Why is that so? First of all, the arrays are stored in inverse order on the stack. Let's see this with GDB:

gdb ./bug2
Reading symbols from ./bug2...
(gdb) b main
Breakpoint 1 at 0x1155: file bug2.c, line 4.
(gdb) r
Starting program: /home/m.sega/test_daje/bug2 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at bug2.c:4
4           int arr1[2] = {2,2};
(gdb) next
5           int arr2[2] = {1,1};
(gdb) p &arr1
$1 = (int (*)[2]) 0x7fffffffe274
(gdb) p &arr2
$2 = (int (*)[2]) 0x7fffffffe26c

The two addresses are 0xfe274-0xfe26c = 8 bytes apart, as one would expect (an int is 32 bits on a 64-bit machine). What is imporant to notice is that arr1 is located in memory after arr2, so that the physical representation would be something like this:

0xfe26c0xfe2700xfe2740xfe278
arr2[0]arr2[1]arr1[0]arr1[1]

Now, when trying to access arr2[2], the pointer algebra tells us we are actually reaching the address arr2+2, or, in other words, &arr2[0] + 2. Since &arr2[0]=0xfe26c, it means that &arr2[0] + 2 = 0xfe274, which is the address of arr1[0]. By writing arr2[arr1[1]]++ we're actually incrementing by one the content of arr2[2], that is to say, the content of arr1[0]. In the loop for (int i=0 ; i<arr1[0] ; i++), the content of arr1[0] is not constant, but is increase at every iteration by one, with the result that the loop never ends.

Let's see how to spot this using the debugger.

Here, we define a breakpoint at the beginning of the main function, and start the execution

Reading symbols from ./bug2...
(gdb) b main
Breakpoint 1 at 0x1155: file bug2.c, line 4.
(gdb) r
Starting program: /home/m.sega/test_daje/bug2 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at bug2.c:4
4           int arr1[2] = {2,2};

At this point, arr1 is defined, so we can instruct a watchpoint, passing the flag -l to specify that any changes taking place at that address will be notified. Since our loop is never ending, the problem has to be in arr1[0]. Let's see what happens:

(gdb) watch -l arr1[0]
Hardware watchpoint 2: -location arr1[0]

Now we keep iterating by issuing the next (n) command.

[!TIP] Pressing <enter> is equivalent to repeating the last GDB command. This comes handy when in need to quickly go through a long loop

(gdb) n

Hardware watchpoint 2: -location arr1[0]

Old value = 0
New value = 2
0x000055555555515c in main () at bug2.c:4
4           int arr1[2] = {2,2};
(gdb) 
5           int arr2[2] = {1,1};
(gdb) 
7           for (int i=0 ; i<arr1[0] ; i++){
(gdb) 
8                   arr2[arr1[1]]++ ;
(gdb) 

Hardware watchpoint 2: -location arr1[0]

Old value = 2
New value = 3
main () at bug2.c:9
9                   printf("%d\n",arr2[arr1[1]]);

From the information above it's clear that the watchpoint is triggered by line 8:

8                   arr2[arr1[1]]++ ;

And we can also check explicitly that the address are the same:

(gdb) p &arr2[arr1[1]]
$1 = (int *) 0x7fffffffe274
(gdb) p &arr1[0]
$2 = (int *) 0x7fffffffe274

as well as the incriminating value:

(gdb) p arr1[1]
$3 = 2