The language of the CPU

Machine language

At the heart of every computer system lies machine language, the most basic form of code that the CPU can understand. Machine language consists entirely of binary digits (0s and 1s) and directly controls the hardware. However, writing programs in machine language is extremely difficult for humans due to its complexity and lack of readability.

An example of machine language might look like this:

10110000 01100001

This sequence of binary numbers could represent an instruction for the CPU:

  • The first part, 10110000, could represent an instruction, such as MOV (move data).
  • The second part, 01100001, could represent a memory address or a register, such as moving the value a (ASCII value 97 or 01100001 in binary) into a specific register.

In this case, the machine code might be interpreted by the CPU as: “Move the value 97 into a specific register.

Since machine language directly interacts with the CPU, each set of binary instructions is architecture-specific, meaning it depends on the type of processor being used. For example, a machine code instruction for an Intel processor will differ from one for an ARM processor, even if they are performing similar operations.

While this example is very basic, a complete program in machine language would consist of hundreds or thousands of such binary instructions.

Assembly

To make programming more manageable, assembly language was developed. Assembly language uses mnemonic codes and symbols to represent machine language instructions, making it a more human-readable form of the underlying binary code. Each assembly instruction corresponds directly to a machine language instruction, but it still requires intimate knowledge of the computer’s architecture. An assembler is used to convert assembly language into machine code that the CPU can execute.

Here’s a simple example of assembly language that moves the value 5 into a register and then adds 10 to it:

MOV AX, 5    ; Move the value 5 into the AX register
ADD AX, 10   ; Add the value 10 to the AX register
  • MOV AX, 5: This instruction moves the value 5 into the AX register (a general-purpose register in x86 architecture).
  • ADD AX, 10: This instruction adds the value 10 to whatever is already in the AX register. After this, AX will hold the value 15.

Each line corresponds directly to a CPU operation and is much easier to read than raw machine code, but it still requires understanding of the processor’s architecture and available registers. An assembler would convert this assembly code into machine language for the CPU to execute.

High-level languages

As computing evolved, the need for even more user-friendly and efficient programming methods led to the development of high-level languages (like C, Fortran, etc.). These languages are designed to be easier for humans to read, write, and understand, allowing programmers to focus on solving problems rather than managing hardware details.

A compiler plays a critical role here, as it translates high-level language code into assembly or directly into machine language, bridging the gap between human-readable instructions and the binary code that a computer’s processor can execute. This progression from machine language to high-level languages through the use of compilers and assemblers is what allows modern programming to be both powerful and accessible.

Compilers

A compiler is a specialized software tool that translates source code written in high-level programming languages (like C, C++, or Fortran) into machine code or a lower-level language that a computer’s processor can execute.

The compiler performs several stages of processing, including lexical analysis, syntax analysis, semantic analysis, optimization, and code generation.

Compilers also check for errors in the source code, such as syntax errors or type mismatches, and produce error messages to help developers fix problems. Modern compilers can also optimize the generated machine code to improve the performance and efficiency of the resulting program.

One of the most widely used compilers is GCC (GNU Compiler Collection), which supports several languages, including C, C++, and Fortran. GCC is known for its flexibility, performance, and open-source nature, making it the go-to compiler for many operating systems, including Linux.

Key Features of Compilers:

  1. Translation: The compiler’s primary role is to translate the source code into machine language. This is done in stages:
    • Lexical Analysis: Breaking the source code into tokens.
    • Syntax Analysis: Ensuring the code follows the grammar of the language.
    • Semantic Analysis: Checking for meaningful and logical consistency.
    • Optimization: Improving the performance and efficiency of the code.
    • Code Generation: Producing machine-level code or assembly code.
  2. Error Detection: Compilers help detect syntax errors and other issues in the source code before execution. They provide detailed error messages that help developers find and fix problems.
  3. Optimization: Modern compilers like GCC include optimization techniques to improve the performance of the generated machine code. For instance, they may reduce the number of instructions, remove unnecessary calculations, or improve memory access patterns.

How to compile a C code

Consider the following C code: