Engineering a compiler through examples: building a mathematical expression engine - Part 3
In this post, we will begin developing our mathematical engine by exploring how to model mathematical expressions and functions in C#. This process will also allow us to introduce the concept of grammars.
In the previous post, we provided an informal definition of a mathematical expression. Now, it's time to dive deeper and formalize this concept.
How can mathematical expressions be modeled in a programming language ?
Imagine that we need to represent the mathematical expression $2x+cos(y)$. A straightforward and classical approach is to use trees, as illustrated in the image below.
Define expressions
Define a base class for expressions
We previously observed that expressions can take various forms, such as numbers, variables, and functions. This insight naturally leads us to define a base class from which all concrete implementations will inherit.
1public abstract class Expression
2{
3}
There isn't much to say about this class; it serves as a simple foundation for our concrete implementations.
Define a class for numbers
Expressions should be able to represent numbers, whether integers or real numbers. Therefore, we will create a dedicated class for this purpose.
1public class ConstantExpression : Expression
2{
3 public double Value { get; }
4
5 public ConstantExpression(double value)
6 {
7 Value = value;
8 }
9
10 public override string ToString()
11 {
12 return string.Format(CultureInfo.InvariantCulture, "{0}", Value);
13 }
14}
Our first concrete class leverages the built-in double type to represent these numbers.
Define a class for variables
We will also need a class to represent variables, which will simply be denoted using a string type.
1public class VariableExpression : Expression
2{
3 public string Variable { get; }
4
5 public VariableExpression(string variable)
6 {
7 Variable = variable;
8 }
9
10 public override string ToString()
11 {
12 return Variable;
13 }
14}
Define a class for binary operators (+ and *)
At this point, we will enhance our model by implementing operators. Operators can take any number of expressions as parameters, allowing us to recursively define more complex expressions.
1public class SumExpression : Expression
2{
3 public Expression[] List { get; }
4
5 public SumExpression(params Expression[] list)
6 {
7 var l = list.Length;
8 if (l <= 1) throw new ArgumentException("The array must contain at least two items.");
9
10 List = list;
11 }
12
13 public override string ToString()
14 {
15 var sb = new StringBuilder("("); var i = 0; var l = List.Length;
16 foreach (var item in List)
17 {
18 i++;
19 sb.Append(item.ToString());
20 if (i < l) sb.Append(" + ");
21 }
22 sb.Append(")");
23
24 return sb.ToString();
25 }
26}
1public class ProductExpression : Expression
2{
3 public Expression[] List { get; }
4
5 public ProductExpression(params Expression[] list)
6 {
7 var l = list.Length;
8 if (l <= 1) throw new ArgumentException("The array must contain at least two items.");
9
10 List = list;
11 }
12
13 public override string ToString()
14 {
15 var sb = new StringBuilder(); var i = 0; var l = List.Length;
16 foreach (var item in List)
17 {
18 i++;
19 sb.Append(item.ToString());
20 if (i < l) sb.Append(" * ");
21 }
22
23 return sb.ToString();
24 }
25}
Define a class for built-in functions (like cos for example)
We can also model any built-in mathematical functions, which take an expression as a parameter.
1public class CosinusExpression : Expression
2{
3 public Expression Operand { get; }
4
5 public CosinusExpression(Expression operand)
6 {
7 Operand = operand;
8 }
9
10 public override string ToString()
11 {
12 return $"cos({Operand})";
13 }
14}
And so forth for square root, sine, or any other function...
Where do we stand ?
This is the first step, and with this framework, we can now define any expression. For instance, the function introduced at the beginning of this article ($f(x,y)=2x+cos(y)$) can be written as follows.
1var f = new SumExpression(new ProductExpression(new ConstantExpression(2), new VariableExpression("x")), new CosinusExpression(new VariableExpression("y")));
We can see that complex functions with many variables can quickly become cumbersome to write (just as it becomes tedious even for a very simple one). The goal of the compiler will be to simplify this entire process.
What can we do with this framework ? That is precisely what we will explore in the next section.
Define operations on expressions
We will need to apply several operations on our expressions. For instance, we could differentiate them, evaluate them at a specific point, or simplify complex expressions. How can we assign meaning to various operations, how can we define semantics ?
One approach would be to add all the necessary methods to the base abstract class Expression and then override them in each derived class. The following code demonstrates this approach.
1public abstract class Expression
2{
3 public abstract Expression Differentiate(string variable);
4
5 public abstract Expression Simplify();
6
7 // others methods here...
8}
1public class ConstantExpression : Expression
2{
3 public double Value { get; }
4
5 public ConstantExpression(double value)
6 {
7 Value = value;
8 }
9
10 public override Expression Differentiate(string variable)
11 {
12 return new ConstantExpression(0);
13 }
14
15 public override Expression Simplify()
16 {
17 return this;
18 }
19
20 // other methods here...
21
22 public override string ToString()
23 {
24 return string.Format(CultureInfo.InvariantCulture, "{0}", Value);
25 }
26}
And so forth for the other concrete classes.
There is nothing wrong with this approach. If the number of operations is small, it could be the best solution. However, if the number of operations increases, we will end up with a large Expression class filled with code that we may only need occasionally, leading to unnecessary complexity. That is why we will adopt a more elegant solution by using the Visitor design pattern.
The Visitor design pattern represents an operation to be performed on the elements of an object structure. Visitor lets you define a new operation without changing the classes of the elements on which it operates.
Design Patterns: Elements of Reusable Object-Oriented Software
Our goal here is not necessarily to write code using the best design patterns. However, we will take advantage of this series to apply our software development knowledge and create elegant code. Feel free to refer to the classic Gang of Four (GoF) book cited in the references for more in-depth information.
Define a base class for expressions
In fact, this design pattern involves delegating each operation to its own dedicated class, helping to keep the Expression class clean and focused. As before, we will begin by defining an abstract base class.
1public abstract class ExpressionVisitor
2{
3 public abstract Expression Visit(VariableExpression expression);
4
5 public abstract Expression Visit(ConstantExpression expression);
6
7 public abstract Expression Visit(SumExpression expression);
8
9 public abstract Expression Visit(ProductExpression expression);
10
11 public abstract Expression Visit(CosinusExpression expression);
12
13 // ...
14}
The Visitor design pattern is particularly well-suited to our model, as it is based on a tree structure. We simply need to define specific behavior for each concrete expression type.
We will now explore how to put all this machinery into practice through a concrete example: implementing differentiation.
Define the DifferentiatorExpressionVisitor class
1public class DifferentiationExpressionVisitor : ExpressionVisitor
2{
3 public string Variable { get; }
4
5 public DifferentiationExpressionVisitor(string variable)
6 {
7 Variable = variable;
8 }
9
10 public override Expression Visit(VariableExpression expression)
11 {
12 return expression.Variable == Variable ? new ConstantExpression(1.0) : new ConstantExpression(0.0);
13 }
14
15 public override Expression Visit(ConstantExpression expression)
16 {
17 return new ConstantExpression(0.0);
18 }
19
20 public override Expression Visit(SumExpression expression)
21 {
22 var derived = expression.List.Select(t => t.Accept(this)).ToArray();
23 return new SumExpression(derived);
24 }
25
26 public override Expression Visit(ProductExpression expression)
27 {
28 var first = expression.List[0]; var remaining = expression.List.Skip(1).ToArray();
29 var r = (remaining.Length == 1) ? remaining[0] : new ProductExpression(remaining);
30
31 var r1 = new ProductExpression(first.Accept(this), r);
32 var r2 = new ProductExpression(first, r.Accept(this));
33
34 return new SumExpression(r1, r2);
35 }
36
37 public override Expression Visit(CosinusExpression expression)
38 {
39 var f = expression.Operand.Accept(this);
40 return new ProductExpression(new MinusUnaryExpression(f), new SinusExpression(expression.Operand));
41 }
42
43 // ...
44}
This class needs to know the variable with respect to which we are differentiating, and this variable is provided through the constructor. After that, we simply apply the basic rules of differentiation we learned in school, which lend themselves naturally to a recursive approach.
How to use this class ?
That's a good question. It turns out the implementation is not yet complete, as we still need to add a specific method to the Expression class.
1public abstract class Expression
2{
3 public abstract Expression Accept(ExpressionVisitor visitor);
4}
This method can be given any name, but by convention, it is typically called Accept. We'll follow that convention here. It must be overridden in all derived classes.
1public class ConstantExpression : Expression
2{
3 // ...same code as before
4
5 public override Expression Accept(ExpressionVisitor visitor)
6 {
7 return visitor.Visit(this);
8 }
9
10 // ...same code as before
11}
1public class VariableExpression : Expression
2{
3 // ...same code as before
4
5 public override Expression Accept(ExpressionVisitor visitor)
6 {
7 return visitor.Visit(this);
8 }
9
10 // ...same code as before
11}
And now ?
From this point on, we can define a mathematical function (through an expression) and differentiate it.
1var f = new SumExpression(new ProductExpression(new ConstantExpression(2), new VariableExpression("x")), new CosinusExpression(new VariableExpression("y")));
2var derivedfwrtx = f.Accept(new DifferentiatorExpressionVisitor("x"));
We can observe that, while the result is indeed correct, it can be significantly simplified. This would be the role of another operation dedicated to expression simplification.
Define the SimplificationExpressionVisitor class
Simplifying an expression is another operation that can be performed on expressions. To handle this, we will define a SimplificationExpressionVisitor class.
1public class SimplificationExpressionVisitor : ExpressionVisitor
2{
3 public override Expression Visit(VariableExpression expression)
4 {
5 return expression;
6 }
7
8 public override Expression Visit(ConstantExpression expression)
9 {
10 return expression;
11 }
12
13 public override Expression Visit(SumExpression expression)
14 {
15 var simples = expression.List.Select(x => x.Accept(this)).ToList();
16
17 var results = new List<Expression>(); var cs = new List<ConstantExpression>();
18 foreach (var simple in simples)
19 {
20 if (simple is ConstantExpression simpleConstant)
21 {
22 if (simpleConstant.Value == 0.0) continue;
23 else cs.Add(simpleConstant);
24 }
25 else results.Add(simple);
26 }
27
28 var aggregcs = new ConstantExpression(cs.Sum(x => x.Value));
29 if (aggregcs.Value != 0.0) { results.Add(aggregcs); }
30
31 if (!results.Any()) return new ConstantExpression(0.0);
32
33 return (results.Count == 1) ? results[0] : new SumExpression(results.ToArray());
34 }
35
36 public override Expression Visit(ProductExpression expression)
37 {
38 var simples = expression.List.Select(x => x.Accept(this)).ToList();
39
40 var results = new List<Expression>(); var cs = new List<ConstantExpression>();
41 foreach (var simple in simples)
42 {
43 if (simple is ConstantExpression simpleConstant)
44 {
45 if (simpleConstant.Value == 0.0)
46 {
47 return new ConstantExpression(0.0);
48 }
49
50 cs.Add(simpleConstant);
51 }
52 else { results.Add(simple); }
53 }
54
55 if (cs.Any())
56 {
57 var aggregcs = new ConstantExpression(cs.Aggregate(1.0, (x, y) => x * y.Value));
58 if (aggregcs.Value != 1.0) { results.Add(aggregcs); }
59 }
60
61 if (!results.Any()) return new ConstantExpression(0.0);
62
63 return (results.Count == 1) ? results[0] : new ProductExpression(results.ToArray());
64
65
66 return new ProductExpression(simples.ToArray());
67 }
68
69 public override Expression Visit(CosinusExpression expression)
70 {
71 var f = expression.Operand.Accept(this);
72 return new CosinusExpression(f);
73 }
74
75 // ...
76}
This class can likely be improved to handle a wider range of use cases, but it already provides a solid starting point. We can now use it to simplify our expressions.
1var f = new SumExpression(new ProductExpression(new ConstantExpression(2), new VariableExpression("x")), new CosinusExpression(new VariableExpression("y")));
2var derivedfwrtx = f.Accept(new DifferentiatorExpressionVisitor("x")).Accept(new SimplificationExpressionVisitor());
We still obtain the correct result, but this time it is simplified in a natural way.
What comes next ?
We are now able to define any mathematical expression, no matter how complex, and apply various operations to it. However, defining these expressions can be cumbersome, time-consuming, and error-prone. It would be much more efficient if we could define them in a natural way (for instance, simply as $2x+3y$ instead of the difficult C# code) and have a processor that could translate this string into our internal representation. In the next post, we will explore how to achieve this and see a compiler in action.
Engineering a compiler through examples: building a mathematical expression engine - Part 4