Engineering a compiler through examples: building a mathematical expression engine - Part 2

In this post, we will explore what a compiler is precisely and examine the various phases that make up the compilation process.

Information

We previously touched on compilation in a series dedicated to writing a lexical analyzer (with a focus on SQL queries). We will revisit all the concepts here, but feel free to refer back to that series for additional details.

Writing a simple lexical analyzer for SQL

What is compilation ?

An example is more illustrative than a lengthy explanation. Therefore, let's consider the following three sentences.

  1. Tha ader _guipoi je ?dftre !!klm.
  2. Question be not address book this.
  3. The cow is writing balloons.
  • The first sentence (Tha ader _guipoi je ?dftre !!klm) is incomprehensible, at least in English, but also in French, in Russian and likely in all languages worldwide. Additionally, certain words commence with illegal characters, such as exclamation marks, for instance.

  • The second sentence (Question be not address book this) uses valid English words, but the arrangement in which they are employed renders it completely nonsensical.

  • The third sentence (The cow is writing balloons) employs valid English words, and their arrangement is grammatically correct. However, the combination of words results in gibberish, making it nonsensical for anyone.

In a more formal lexicon, the first sentence is deemed lexically incorrect, the second one is lexically correct but syntactically incorrect, and the last one is lexically correct, syntactically correct but semantically incorrect.

The same principles that apply to a human language can be extended to programming languages, and the process is identical: a program needs to utilize permitted words (lexical analysis), organize them in a proper sequence according to a predefined grammar (syntax analysis), and avoid illogical assignments (semantic analysis).

Important

Compilation is essentially the process of translating human-readable source code into machine-executable code, allowing a program to run on a computer. To achieve this, the source code must be lexically, syntactically, and semantically correct.

Information

The three passes outlined above constitute the core components of a compiler and comprise a significant portion of a compilation course. To be perfectly rigorous, however, we must acknowledge that the compilation process also includes the crucial phase of code generation, which involves producing the actual code that will be executed by the machine. This is why a compiler is often divided into two sections: the front end and the back end.

  • The front end includes the three phases described above (lexical analysis, syntax analysis, and semantic analysis) with the goal of understanding the meaning of the program.
  • The back end, on the other hand, is responsible for generating the executable code in binary form.

In this series, we will focus exclusively on the front end of the compiler and will not delve into the back end. However, this will be more than sufficient to build our mathematical engine.

Pfff, that sounds rather abstract and complex. Is there a more concrete explanation ?

Compilation is one of the most challenging tasks in computer science, as it involves sophisticated mathematical concepts. This is why it can often seem obscure and difficult to grasp. To make it more accessible, we will illustrate these concepts by building a concrete compiler step by step. Our approach will be based on a language we have been familiar with since childhood: mathematics. Specifically, mathematical expressions.

What is a mathematical expression ?

Roughly speaking, a mathematical expression is a combination of numbers, variables, operators, and sometimes functions, arranged in a specific way to represent a mathematical concept or computation. It is a formal way of expressing a relationship or performing an operation, such as addition, subtraction, multiplication, or more complex operations like exponentiation or differentiation.

More formally, a mathematical expression is composed of the following concepts:

  • Numbers are the constants in an expression and represent either integers or real numbers, such as 5, 2.3, or -7.

  • Variables represent unknown or changeable values, often denoted by letters like $x$, $y$, or $z$.

  • Operators define the operations performed on the numbers or variables. Operators can be arithmetic (+, -, *, /, ^) or relational (=, <, >).

  • Functions represent operations like square root, trigonometric functions (e.g., sin, cos), logarithms, and more.

  • Parentheses are used to group parts of an expression and control the order of operations. For example, in $(3 + 5) * 2$, the parentheses dictate that $3 + 5$ should be calculated first.

Give me an example, please !

Our goal, therefore, is to be able to define functions such as $f(x,y)=2cos(x)+3ln(y)$, allowing for multiple variables. We will also enable operations on these functions, such as computing partial derivatives. We could for instance differentiate with respect to $x$ and get $\dfrac{\partial f}{\partial x}=-2sin(x)$.

Our plan is therefore quite straightforward: we will need to implement an internal representation of functions and enable operations such as differentiation on them. However, that alone won't suffice. It must also be intuitive and user-friendly, allowing any user to define functions effortlessly without requiring knowledge of the underlying complexities. This is where compilation and its associated processes come into play, providing us with a remarkably elegant solution.

And what can we do with this ?

At first glance, this example may seem like a simple toy exercise, leading one to question its practical applications. However, in later series, we will see its significant relevance in optimization methods, where derivatives (particularly second-order derivatives) play a crucial role. Since these functions can become highly complex, automatic differentiation provides a far more efficient and reliable approach.

Engineering a compiler through examples: building a mathematical expression engine - Part 3