Implementing a simple garbage collector in C# - Part 7

In this article, we delve into more advanced concepts in garbage collection by introducing heap partitions.

What is heap partition ?

A heap partition refers to a segmentation or division of the heap into distinct sections, each subject to separate memory management strategies or garbage collection algorithms. This approach is commonly employed to optimize memory management by tailoring the collection strategy to the characteristics and usage patterns of objects within each partition. Generational garbage collection, a notable example, often involves dividing the heap into two main generations: the young generation and the old generation. The young generation is typically collected more frequently due to the higher likelihood of short-lived objects, while the old generation undergoes collection less frequently. This partitioning allows for more targeted and efficient garbage collection, improving overall system performance.

This technique is based on the consideration that it's faster to compact the memory for a portion of the managed heap than for the entire managed heap.

To illustrate our point, we will implement a generational garbage collection where objects are partitioned by age. Objects that have just been allocated will be placed in the region of the heap named "generation0", while long-lived objects will be promoted to a region named "generation1". Other partitions are possible; for example, we could partition the heap by size: objects larger than a certain threshold are allocated into a separate Large Object Heap (LOH), whereas smaller objects are placed in a Small Object Heap.

Focus on the generational garbage collector

As stated in The Garbage Collection Handbook, the hypothesis that most objects die young appears to be widely valid, regardless of programming paradigm or implmentation language. This provides the opportunity to partition the heap by age, as illustrated in the figure below.

If a collection becomes necessary, the entire heap will not be scanned indiscriminately. Instead, the focus will be on the portion of the heap most likely to contain a low rate of survivors (generation0 or objects in red in the picture above). This strategic approach enables us to avoid collecting the entire heap, thereby enhancing performance and minimizing pause times attributable to garbage collection.

Implementing the ExecuteInstruction method

This method remains identical to what was seen in the last article.

 1public override void ExecuteInstruction(Instruction instruction)
 2{
 3    if (instruction.Operation == "CREATE_THREAD")
 4    {
 5        AddThread(instruction.Name);
 6        return;
 7    }
 8
 9    var thread = _threads.FirstOrDefault(x => x.Name == instruction.Name);
10    if (thread == null) return;
11
12    if (instruction.Operation == "PUSH_ON_STACK")
13    {
14        thread.PushInstruction(instruction.Value);
15    }
16    if (instruction.Operation == "POP_FROM_STACK")
17    {
18        thread.PopInstruction();
19    }
20}

Implementing the Allocate method

If there is insufficient memory, the garbage collector attempts to reclaim memory only for the generation0 and then retries the allocation process. If there is still not enough space, a collection for generation1 is achieved. If there is still not enough space, an exception is raised.

 1public override int Allocate(string field)
 2{
 3    var r = AllocateHeap(field);
 4    if (r == -1)
 5    {
 6        Collect("gen0");
 7        r = AllocateHeap(field);
 8
 9        if (r == -1)
10        {
11            Collect("gen1");
12            r = AllocateHeap(field);
13        }
14    }
15
16    if (r == -1) throw new InsufficientMemoryException();
17
18    return r;
19}
 1private int AllocateHeap(string field)
 2{
 3    var size = _heap.Size;
 4    var cells = field.ToArray();
 5    var requiredSize = cells.Length;
 6
 7    var begin = _currentPointerInTheHeap;
 8
 9    // Do not continue if there is not enough space.
10    if ((size - begin) < requiredSize) return -1;
11
12    for (var i = begin; i < begin + requiredSize; i++)
13    {
14        _heap.Cells[i].Cell = cells[i - begin];
15    }
16
17    var pointer = new RuntimeHeapPointer(begin, requiredSize);
18    _heap.Pointers.Add(pointer);
19
20    _currentPointerInTheHeap = _currentPointerInTheHeap + requiredSize;
21
22    return begin;
23}

Implementing the Collect method

The Collect method serves as the core of our algorithm, gathering memory from a specific portion of the heap based on the provided argument.

1public override void Collect(string partition)
2{
3    if (partition == "gen0") CollectGen0();
4    if (partition == "gen1") CollectGen1();
5}
 1private void CollectGen0()
 2{
 3    MarkFromRoots("gen0");
 4    CompactGen0();
 5}
 6
 7private void CollectGen1()
 8{
 9    MarkFromRoots("gen1");
10    CompactGen1();
11}

Here, we employ the mark-and-compact algorithm for both generations, but it is entirely feasible to implement a distinct one for each (for instance, using mark-and-sweep for generation1 and mark-and-compact for generation0).

The MarkFromRoots method closely resembles the one previously presented, making it unnecessary to reproduce it here. Our focus will be on highlighting the CompactGen1 method.

Merely marking and compacting objects in generation1 does not suffice for compacting the entire heap. It is imperative to additionally adjust pointers within generation0. This particularity of the method necessitates careful consideration.

 1private void CompactGen1()
 2{
 3    var cells = _heap.Cells;
 4    var pointers = _heap.Pointers.Where(x => x.Metadata == "gen1").ToList();
 5    foreach (var pointer in pointers.Where(x => !x.IsMarked))
 6    {
 7        var address = pointer.StartCellIndex;
 8        for (var i = address; i < address + pointer.AllocationSize; i++)
 9        {
10            cells[i].Cell = '\0';
11        }
12    }
13
14    var list = new List<RuntimeHeapPointer>();
15    foreach (var pointer in pointers.Where(x => x.IsMarked))
16    {
17        pointer.SetMarked(false);
18        list.Add(pointer);
19    }
20
21    var offset = 0;
22    foreach (var pointer in list)
23    {
24        var begin = pointer.StartCellIndex;
25        if (begin == 0)
26        {
27            offset = offset + pointer.AllocationSize;
28            continue;
29        }
30
31        // Update roots in the stack
32        // This code is perfectible.
33        var thread = _threads.FirstOrDefault(x => x.Roots.Any(t => t.Key == begin));
34        var root = thread.Roots.FirstOrDefault(x => x.Key == begin);
35        root.Value.StartIndexInTheHeap = offset;
36        thread.Roots[offset] = root.Value; thread.Roots.Remove(begin);
37
38        // Update pointers in the heap
39        pointer.StartCellIndex = offset;
40        for (var i = begin; i < begin + pointer.AllocationSize; i++)
41        {
42            cells[offset + i - begin].Cell = cells[i].Cell;
43            cells[i].Cell = '\0';
44
45        }
46
47        offset = offset + pointer.AllocationSize;
48    }
49
50    var listGen0 = _heap.Pointers.Where(x => x.Metadata == "gen0").ToList();
51    var offsetGen0 = offset;
52    foreach (var pointer in listGen0)
53    {
54        var begin = pointer.StartCellIndex;
55        if (begin == 0)
56        {
57            offsetGen0 = offsetGen0 + pointer.AllocationSize;
58            continue;
59        }
60
61        // Update roots in the stack
62        // This code is perfectible.
63        var thread = _threads.FirstOrDefault(x => x.Roots.Any(t => t.Key == begin));
64        var root = thread.Roots.FirstOrDefault(x => x.Key == begin);
65        root.Value.StartIndexInTheHeap = offsetGen0;
66        thread.Roots[offsetGen0] = root.Value; thread.Roots.Remove(begin);
67
68        // Update pointers in the heap
69        pointer.StartCellIndex = offsetGen0;
70        for (var i = begin; i < begin + pointer.AllocationSize; i++)
71        {
72            cells[offsetGen0 + i - begin].Cell = cells[i].Cell;
73            cells[i].Cell = '\0';
74
75        }
76
77        offsetGen0 = offsetGen0 + pointer.AllocationSize;
78    }
79
80    _heap.Pointers = list.Concat(listGen0).ToList();
81    _startAddressForGen0 = offsetGen0;
82    _currentPointerInTheHeap = offsetGen0;
83}

We can observe that the implementation of the Collect method is significantly more complex than its counterparts in other algorithms. This is the distinctive feature of heap partitioning: it offers improved performance but comes at the cost of increased coding complexity.

Running the programming

We employ a specific set of instructions to observe the garbage collector in action. Specifically, we initiate the collection of generation1.

 1thread1;CREATE_THREAD;
 2thread1;PUSH_ON_STACK;Jubilant
 3thread2;CREATE_THREAD;
 4thread1;PUSH_ON_STACK;Radiant
 5thread1;POP_FROM_STACK;
 6thread2;PUSH_ON_STACK;Harmony
 7thread1;PUSH_ON_STACK;Frenzy
 8thread1;PUSH_ON_STACK;Luminous
 9thread1;PUSH_ON_STACK;So
10thread1;POP_FROM_STACK;
11thread2;PUSH_ON_STACK;Serendipity
12thread1;PUSH_ON_STACK;Enigmatic
13thread2;POP_FROM_STACK;
14thread1;PUSH_ON_STACK;Cascade
15thread2;POP_FROM_STACK;
16thread2;PUSH_ON_STACK;GarbageCollector
17thread1;POP_FROM_STACK;
18thread1;POP_FROM_STACK;
19thread2;PUSH_ON_STACK;Three
20thread1;POP_FROM_STACK;
21thread1;POP_FROM_STACK;
22thread2;PUSH_ON_STACK;GenerationalGarbageCollector

In fact, almost all the objects in thread1 are deallocated. As there were long-lived objects among them, they will be promoted to generation1 and will subsequently need to be reclaimed.

In this particular case, Harmony existed in the heap almost from the beginning of the program. It is reclaimed at the end of the process when we attempt to allocate a very long word (GenerationalGarbageCollection). Once again, the key point here is that during the collections initiated by the program, only a portion of the heap was scanned.

What about the .NET framework ?

  • The .NET garbage collector uses a generational approach, dividing objects into three generations (0, 1, and 2) based on their lifetime. Younger objects (in generation 0) are collected more frequently, while older objects (in generations 1 and 2) are collected less frequently. It employs a mark-and-compact algorithm to identify and collect unreachable objects.

  • The heap is further divided into two main areas: the Small Object Heap (SOH) and the Large Object Heap (LOH). These divisions are made to optimize memory management for different types of objects.

Small Object Heap (SOH)

  • The Small Object Heap is designed to manage small and short-lived objects efficiently. Objects with a size less than or equal to 85,000 bytes are allocated in the SOH. The SOH is further divided into three generations (0, 1, and 2) based on the age of the objects.
  • The garbage collector collects Generation 0 more frequently than older generations, as many short-lived objects do not survive for long.

Large Object Heap (LOH)

  • The Large Object Heap is specifically designated for larger objects, typically those with a size greater than 85,000 bytes. Examples include large arrays or data structures.
  • Unlike the Small Object Heap, the Large Object Heap does not have a generational division. All objects in the LOH are treated as a single generation. So, because large objects tend to have longer lifetimes, garbage collections in the LOH are less frequent. When a collection occurs, it may involve pausing the application for a longer duration compared to collections in the SOH.

Objects in the SOH benefit from the generational approach, which optimizes the collection process for short-lived objects. The LOH is suitable for objects that are expected to have a longer lifespan or are inherently large. The division between the SOH and LOH helps to balance the efficiency of garbage collection for different types of objects and their usage patterns.

Final thoughts

In this article, our aim was to provide a comprehensible overview of what a garbage collector is and how it can be implemented progressively. In reality, designing a garbage collector is a nuanced art that requires a profound understanding of various domains within computer science. This post has presented only the fundamental guidelines employed in modern collectors, overlooking many subtleties. The intent was to be didactic and offer a broad understanding of the topic.

If you wish to delve deeper into this topic, acquire the following book, which encompasses all the concepts emphasized in this series and delves into more advanced ones. Topics such as concurrency are covered in several chapters, providing a comprehensive guide to garbage collection in the present day.

The Garbage Collection Handbook

Do not hesitate to contact me shoud you require further information.