Implementing neural networks in C# - Part 3

In this post, we will explore how logistic regression can handle non-linearly separable data and take a step towards automating this process.

In the previous post, we observed that the logistic regression algorithm struggled with non-linearly separable data.

How to overcome this issue ?

The idea is not revolutionary and simply involves initially applying a fixed nonlinear transformation $f$ to the inputs instead of directly working with the original inputs.

The logistic regression algorithm is then applied to the new dataset.

Information 1

The approach of mapping the input space into another space to make the data linearly separable or more tractable in this new space is the foundation for kernel methods (from which we can derive support vector machines and similar techniques). This process can be highly fruitful if correctly applied and has robust advantages.

Information 2

In the figure above, the two spaces have the same dimensions for representation convenience, but in practice, there is no reason for them to be the same.

How should we proceed in our case ?

This process might sound a bit theoretical and abstract initially, but we will see how to put it into practice in the example from the previous post. For that, we will apply the following function to the inputs.

$$f(X,Y)=XY$$

Here are the results of this mapping for some values:

$X$$Y$$f(X,Y)$
-0.90.5664324069-0.509789165
0.89-0.4208036774-0.374515273
0.180.96265994050.173267893
.........
Information

In our example, the final space is a 1-dimensional space, whereas the input space was 2-dimensional. This is by no means a general rule of thumb: the final space can be much larger than the input space.

Data is linearly separable in the new space.

Applying the logistic regression algorithm to the new dataset

Once the mapping has been performed on the input space, it suffices to apply logistic regression to the new dataset, and we can hope to make accurate predictions. Obviously, the predicted values must be first mapped to the new space before being evaluated by the trained model.

$X$$Y$$f(X,Y)$prediction
-0.250.24-0.061
0.05-0.02-0.0011
0.920.860.79120
-0.5-0.550.2750

And now the predictions are perfectly accurate.

What are the drawbacks of this approach ?

The mathematical method we developed above appears to be well-suited for data with irregular shapes: it essentially involves magically performing a mapping between spaces to make the data linearly separable in the final space.

However, the challenge lies in how to identify this mapping. In our toy example, it was relatively easy to deduce the transformation because we could visualize the data graphically. But real-world scenarios involve many more dimensions and much more complex data that may be challenging or impossible to visualize. This is indeed the major issue with this process: we can never be certain that in the final space, our data is distinctly linearly separable.

Does this mean that the current approach is not productive? Absolutely not! While we can uphold the conceptual idea of mapping data to another space, the key lies in refining the process of executing this mapping. Imagine the prospect of automatically discovering this mapping instead of relying on guesswork, wouldn't that be a significant advancement ? This automation exists: enter neural networks and deep learning.

Implementing neural networks in C# - Part 4