Implementing neural networks in C# - Part 6
In this post, we will translate theory into practice by implementing a basic neural network in C#, observing its functionality on the dataset introduced at the outset of this series.
To maintain coherence with the series and ensure conciseness, our coding efforts will focus exclusively on the aforementioned scenarios, without attempting to account for all possible neural network variations.
Creating utility classes
Let's first provide a brief definition for the DataRecord and DataSet classes.
A DataRecord represents a single entry with its corresponding features and a target value.
1public class DataRecord()
2{
3 public Dictionary<string, double> Data { get; set; }
4
5 public double Target { get; set; }
6}
A DataSet is essentially a collection of DataRecord objects.
1public class DataSet
2{
3 public List<string> Features { get; set; }
4
5 public List<DataRecord> Records { get; set; }
6
7 // ...
8}
Modeling a neural network
The defining characteristics of a neural network primarily include its count of hidden units, the number of output units, and the choice of activation functions.
1public class ANN
2{
3 public double[,] HiddenWeights { get; set; }
4
5 public double[] OutputWeights { get; set; }
6
7 public int NumberOfFeatures { get; set; }
8
9 public int NumberOfHiddenUnits { get; set; }
10
11 public IActivationFunction HiddenActivationFunction { get; set; }
12
13 public IActivationFunction OutputActivationFunction { get; set; }
14
15 public ANN(int numberOfFeatures, int numberOfHiddenUnits, IActivationFunction hiddenActivationFunction, IActivationFunction outputActivationFunction)
16 {
17 NumberOfFeatures = numberOfFeatures;
18 NumberOfHiddenUnits = numberOfHiddenUnits;
19 HiddenActivationFunction = hiddenActivationFunction;
20 OutputActivationFunction = outputActivationFunction;
21
22 HiddenWeights = new double[NumberOfHiddenUnits, NumberOfFeatures];
23 OutputWeights = new double[NumberOfHiddenUnits + 1]; // For bias
24 }
25}
Presently, the sole activation function in use is the sigmoid.
1public class SigmoidActivationFunction : IActivationFunction
2{
3 public double Evaluate(double input)
4 {
5 return 1/(1+Math.Exp(-input));
6 }
7
8 public double EvaluateDerivative(double input)
9 {
10 var temp = Evaluate(input);
11 return temp*(1-temp);
12 }
13}
Training the model
Having established this framework, we can proceed to implement a straightforward gradient descent method by employing the previously defined backpropagation algorithm.
We first define an interface.
1public interface IANNTrainer
2{
3 void Train(DataSet set);
4
5 double Predict(DataToPredict record);
6}
This interface adheres to the conventional structure of a machine learning algorithm, featuring a method for training and another for making predictions on the trained data.
1public class GradientDescentANNTrainer : IANNTrainer
2{
3 private ANN _ann;
4
5 public void Train(DataSet set)
6 {
7 var numberOfFeatures = set.Features.Count;
8 var numberOfHiddenUnits = 20;
9 var activationFunction = new SigmoidActivationFunction();
10
11 _ann = new ANN(numberOfFeatures, numberOfHiddenUnits, activationFunction, activationFunction);
12
13 Fit(set);
14 }
15
16 public double Predict(DataToPredict record)
17 {
18 // ...
19 }
20
21 #region Private Methods
22
23 private void Fit(DataSet set)
24 {
25 var numberOfHiddenUnitsWithBiases = _ann.NumberOfHiddenUnits + 1;
26
27 var a = new double[numberOfHiddenUnitsWithBiases];
28 var z = new double[numberOfHiddenUnitsWithBiases];
29 var delta = new double[numberOfHiddenUnitsWithBiases];
30
31 var nu = 0.005;
32
33 // Initialize
34 var rnd = new Random();
35 for (var i = 0; i < _ann.NumberOfFeatures; i++)
36 {
37 for (var j = 0; j < _ann.NumberOfHiddenUnits; j++)
38 {
39 _ann.HiddenWeights[j, i] = rnd.NextDouble();
40 }
41 }
42
43 for (var j = 0; j < numberOfHiddenUnitsWithBiases; j++)
44 _ann.OutputWeights[j] = rnd.NextDouble();
45
46 for (var n = 0; n < 1000; n++)
47 {
48 foreach (var record in set.Records)
49 {
50 // Forward propagate
51 z[0] = 1.0;
52 for (var j = 1; j <= _ann.NumberOfHiddenUnits; j++)
53 {
54 a[j] = 0.0;
55 for (var i = 0; i < _ann.NumberOfFeatures; i++)
56 {
57 var feature = set.Features[i];
58 a[j] = a[j] + _ann.HiddenWeights[j-1, i]*record.Data[feature];
59 }
60 z[j] = _ann.HiddenActivationFunction.Evaluate(a[j]);
61 }
62
63 var b = 0.0;
64 for (var j = 0; j < numberOfHiddenUnitsWithBiases; j++)
65 b = b + _ann.OutputWeights[j] * z[j];
66
67 var y = _ann.OutputActivationFunction.Evaluate(b);
68
69 // Evaluate the error for the output
70 var d = y - record.Target;
71
72 // Backpropagate this error
73 for (var j = 0; j < numberOfHiddenUnitsWithBiases; j++)
74 delta[j] = d * _ann.OutputWeights[j] * _ann.HiddenActivationFunction.EvaluateDerivative(a[j]);
75
76 // Evaluate and utilize the required derivatives
77 for (var j = 0; j < numberOfHiddenUnitsWithBiases; j++)
78 {
79 _ann.OutputWeights[j] = _ann.OutputWeights[j] - nu * d * z[j];
80 }
81
82 for (var j = 1; j <= _ann.NumberOfHiddenUnits; j++)
83 {
84 for (var i = 0; i < _ann.NumberOfFeatures; i++)
85 {
86 var feature = set.Features[i];
87 _ann.HiddenWeights[j-1, i] = _ann.HiddenWeights[j-1, i] - nu * delta[j]*record.Data[feature];
88 }
89 }
90 }
91 }
92 }
93
94 #endregion
95}
This algorithm employs backpropagation to collect derivatives, meticulously adhering to the guidelines outlined in the previous post.
The implemented gradient descent method here is notably simplistic: we descend along the gradient with a constant step size in each iteration. Conversely, the stopping criterion is equally elementary, with the algorithm terminating after 1000 iterations. In a real-world setting, it is imperative to emphasize a more sophisticated optimization algorithm, such as L-BFGS, and adopt a precise stopping criterion.
Running the program
We will evaluate the efficacy of our algorithm by conducting tests with 10 hidden units on the dataset outlined at the inception of this series (a recap is furnished below).
In contrast to logistic regression, where manual adjustment of basis functions was necessary for accurate predictions, we will now explore the behavior of a neural network to observe if such manual adaptation is still required.
X | Y | prediction |
---|---|---|
-0.25 | 0.24 | 1 |
0.45 | -0.72 | 1 |
0.92 | 0.86 | 0 |
-0.5 | -0.55 | 0 |
It's evident that our neural network adeptly handles non-linearly separable data without the necessity for pre-embedding in another space. This highlights a significant advantage of such methods – their capacity to adapt to intricate configurations without requiring manual intervention.
Final thoughts
The implemented code is subject to improvement. On one hand, exploring different activation functions to assess their impact on accuracy, as well as adjusting the count of hidden units, are areas for potential refinement. However, these aspects constitute separate topics and merit their own series dedicated to tuning neural networks.
If you wish to delve deeper into this topic, acquire the following books, which encompass all the concepts emphasized in this series and delve into more advanced ones.
Deep Learning (Goodfellow, Bengio, Courville
Deep Learning: Foundations and Concepts (Bishop, Bishop)
Do not hesitate to contact me shoud you require further information.