Implementing a simple garbage collector in C# - Part 3

In this article, we will establish and implement in C# the concepts we previously mentioned, with a specific emphasis on how the garbage collector will be coded.

One should envision a scenario where a program is initially written in source code and subsequently compiled into an intermediate representation, manifested as a sequential list of instructions. We will assume that the IL compiler has generated this list of instructions. An example of such a list is provided below.

1thread1;CREATE_THREAD;
2thread1;PUSH_ON_STACK;Jubilant
3thread2;CREATE_THREAD;
4thread1;PUSH_ON_STACK;Radiant
5thread1;POP_FROM_STACK

Each instruction consists of three components. The first indicates which thread is affected by the instruction, the second describes the operation to be performed, and the third (optional), provides a value to complement the operation. In the provided example, the compiler is initially instructed to create a thread named "thread1" and then push a variable named "Jubilant" onto the stack of this thread. The third instruction calls for the creation of another thread named "thread2" and so forth. The fifth instruction attempts to pop the stack of thread1.

This list of instructions can be envisioned as an intermediate representation (IR), as illustrated in Compilers: Principles, Techniques and Tools.

Our current task is to implement an execution engine (runtime) capable of handling these instructions, with a particular focus on memory allocation and deallocation.

Implementing the runtime

Without further delay, here is how the runtime is implemented in our code.

 1public class Runtime
 2{
 3    private List<Instruction> _instructions;
 4    private Collectors.Interfaces.GarbageCollector _collector;
 5
 6    public Runtime(List<Instruction> instructions, string collectorStrategy)
 7    {
 8        _instructions = instructions;
 9        _collector = GetCollector(collectorStrategy);
10    }
11
12    #region Public Methods
13
14    public void Run()
15    {
16        foreach (var instruction in _instructions)
17        {
18            _collector.ExecuteInstruction(instruction);
19            _collector.PrintMemory();
20        }
21    }
22
23    #endregion
24}

The execution engine possesses the list of instructions to execute and a garbage collector (described below) to be utilized during memory deallocation. It also includes a Run method called by the main thread.

 1public class Program
 2{
 3    static void Main(string[] args)
 4    {
 5        var instructions = new List<Instruction>();            
 6        var contents = File.ReadAllLines(AppContext.BaseDirectory + "/instructions.txt");
 7        foreach (var line in contents)
 8        {
 9            var c = line.Split(';');
10            var intruction = new Instruction(c[0], c[1], c[2]);
11            instructions.Add(intruction);
12        }
13        
14        var runtime = new Runtime(instructions, "MARK_AND_COMPACT");
15
16        runtime.Run();
17    }
18}

The Runtime class also requires a memory collection strategy to apply. As we will explore, there are several possibilities for memory collection, and designing an effective garbage collector is indeed a subtle art.

Defining the GarbageCollector interface

The Runtime class requires a GarbageCollection instance to operate. This instance implements the GarbageCollection abtract class, whose methods are detailed below.

 1public abstract class GarbageCollector
 2{ 
 3    public abstract void ExecuteInstruction(Instruction instruction);
 4
 5    public abstract int Allocate(string field);
 6
 7    public abstract void Collect();
 8
 9    public abstract void PrintMemory();        
10}

In essence, the GarbageCollector class presented here serves as an orchestrator to execute the list of instructions and manage memory allocation and collection. In real-world implementations, these abstractions would typically be kept separate, but for simplicity's sake, the goal here is to streamline the representation.

The PrintMemory method is merely a helper function designed to visualize the content of the heap during debugging.

Defining a thread

A thread refers to the smallest unit of execution within a process. A process is an independent program that runs in its own memory space, and within a process, multiple threads can exist. Each thread shares the same resources, such as memory, with other threads in the same process.

  • Threads enable concurrent execution of tasks, allowing multiple operations to be performed simultaneously. They operate independently, with their own program counter, registers, and stack, but they share the same memory space. Threads within the same process can communicate and synchronize with each other.

  • Threads are widely used in modern software development to improve performance, responsiveness, and resource utilization in applications ranging from desktop software to server applications and distributed systems.

 1 public class RuntimeThread
 2 {
 3     private Collectors.Interfaces.GarbageCollector _collector;
 4
 5     private Stack<RuntimeStackItem> _stack;
 6     private Dictionary<int, RuntimeStackItem> _roots;
 7     private int _stackSize;
 8     private int _index;
 9
10     public string Name { get; set; }
11
12     public RuntimeThread(Collectors.Interfaces.GarbageCollector collector, int stackSize, string name)
13     {
14         _collector = collector;
15
16         _stack = new Stack<RuntimeStackItem>(stackSize);
17         _roots = new Dictionary<int, RuntimeStackItem>();
18         _stackSize = stackSize;
19         _index = 0;
20
21         Name = name;
22     }
23
24     public Dictionary<int, RuntimeStackItem> Roots => _roots;        
25
26     public void PushInstruction(string value)
27     {
28         _index++;
29         if (_index > _stackSize) throw new StackOverflowException();
30
31         var address = _collector.Allocate(value); 
32         var stackItem = new RuntimeStackItem(address);
33
34         _stack.Push(stackItem);
35         _roots.Add(address, stackItem);
36     }
37
38     public void PopInstruction()
39     {
40         _index--;
41
42         var item = _stack.Pop();
43         _roots.Remove(item.StartIndexInTheHeap);
44     }
45 }

A thread is characterized by a name and executes its own list of instructions. The crucial point is that threads share the same memory. This translates into the fact that in a program, there can be multiple threads, and consequently, multiple stacks, but there is generally one heap shared among all threads.

In the provided example, when a thread executes an instruction, it is placed on its stack, and any potential reference to the heap is retained in an internal dictionary. This mechanism ensures that each thread maintains its own stack while references to the shared heap are tracked to manage memory allocation and deallocation effectively.

Defining the stacks and the heap

As mentioned earlier, stacks and heap are implementation details, so their code is deferred to the concrete classes of GarbageCollector. It could still be informative, nonetheless, to provide a brief description of them because, as we observed, these concepts are universally adopted.

Implementing the stack

In C#, there already exists a data structure for stacks. For simplicity, we will use it, especially since it is well-suited to our requirements.

We will push instances of a RuntimeStackItem class onto this stack. RuntimeStackItem simply contains a reference to an address in the heap.

1public class RuntimeStackItem
2{
3    public int StartIndexInTheHeap { get; set; }
4
5    public RuntimeStackItem(int startIndexInTheHeap)
6    {
7        StartIndexInTheHeap = startIndexInTheHeap;
8    }
9}

Implementing the heap

A heap must represent a memory area, and we will simply model each memory cell with a char class. With this abstraction, a heap is nothing more than a list of cells that can be allocated. It also includes additional plumbing code, such as pointers, to manage which cells are occupied or can be reclaimed.

 1public class RuntimeHeap
 2{
 3    public RuntimeHeap(int size)
 4    {
 5        Size = size;
 6        Cells = new RuntimeHeapCell[size];
 7        Pointers = new List<RuntimeHeapPointer>();
 8
 9        for (var i = 0; i < size; i++)
10        {
11            Cells[i] = new RuntimeHeapCell();
12        }
13    }
14
15    #region Properties
16
17    public int Size { get; set; }
18
19    public RuntimeHeapCell[] Cells { get; set; }
20
21    public List<RuntimeHeapPointer> Pointers { get; set; }
22
23    #endregion
24}

Our task now is to define concrete classes of GarbageCollection to observe a garbage collector in action. We will start this in the next article with the simplest possible implementation, where there is no garbage collection policy.

Let's continue here.