Neural Network -- an introduction to function approximation
- 8 minsWhat is a neural network?
Background
As computer science students, we often heard these fancy stuff in life: image classification, pattern recognition, convolutional neural network, machine learning, etc. Sometimes, we are so overwhelmed by the jargons in the field that we do not wanna explore the field ourselves.
However, we are all human beings living in the 21st century – the era of big data , making techniques such as machine learning useful and meaningful for data analysis.
Machine learning is a century-old field that recently became tremendously popular due to the demand for convenient tools facilitates data analysis. It has two intersecting subfields based on techniques, namely statistical machine learning and neural network. Whereas statistical machine learning involves a heavy load of statistics, neural networks stress more on designing and parameter-tuning.
Based on the data we have, we can also divide machine learning into supervised and unsupervised learning. While supervised learning is to figure out the pattern with the guidance of a given standard, unsupervised learning is to recognize the pattern without any additional information.
The neural network is one group of algorithms used for machine learning that models the data using graphs of artificial neurons. It tries to mimic how the neurons in human brains work.
In this post, we will focus majorly on applying the neural network to function approximation. Since we will be informed of what our outputs should look like, we will be discussing about supervised neural network techniques only. Moreover, for the sake of illustration, we only focus on Artificial Neural Network architecture.
For those who are interested in Long-short Term Memory Neural Network, Convolutional Neural Network, and Generative Adversarial Network, please check out the links below:
- Long-short Term Memory: LSTM Blog Stanford CS231n
- Convolutional: CNN Blog Stanford CS231n
- Generative Adversarial: GAN Blog Original Paper
Technical detail
1. Components
A neuron network is a layer-by-layer structure. At each layer, it consists of processing elements (referred to as PEs afterward) and transfer functions.
Usually, the first layer of a network is called the input layer, the last layer is called the output layer and the layers in between are hidden layers.
The architecture of a neuron network is composed of the way that these layers combine together. For example, a network with 3-3-1 is a neural network where the first and second layer consist of 3 PEs and the output layer is of 1 PE. Note that the neural network on the left is of the topology N-1 with only two layers, one input layer, and one output layer.
Useful video explaning neural networks in more details:
2. Forward propagation
Weight is a mathematical representation of how important a factor is in the neural network. Assuming the identity transfer function, the higher the value of weight is, the more effect of that weight will be taken into account when calculating the output of the current layer.
The weight in the form of the left is the weight of the first layer from jth PE to the ith PE in the next layer.
An transfer function a node defines the output of that node given an input or set of inputs.
The simplest transfer function is identity function which gives an identity map from the output of the previous layer to the input of the next layer. Researchers and scientists normally choose sigmoid function or tanh function as transfer functions. However, the choice of the transfer function is open to discuss and their relative advantages are discussed here.
In forward propagation, we will first aggregate the results calculated from the previous layer, applying transfer function as indicated above. We will do this layer by layer toward the output layer.
The mathematic formula for forward propagation is shown on the left. You can iteratively do forward propagations for all layers.
We can also rewrite the summation part by matrix multiplication. We will leave this as an exercise for our readers.
Useful video for further illustration:
3. Backward propogation
When we reach the output layer in our neural network, we want to see how good our predicted result is compared to the desired output. So we will designate the network with an error function (usually Mean Square Error or Edit Distance) to evaluate how well we are doing with our current weights values and structures.
We start from the output layer, comparing the desired results with predicted results then tracing back one layer at a time.
The adjustments are made to our weights through various methods where gradient descent is the most popular one.
Once we are done adjusting weights and reach the input layer, we will redo the forward propagation again.
Useful video for further illustration:
4. Stopping criteria
Before we start training our neural network, we will pre-define a stopping criteria for our network. For example, we can say if the error is within 10e-5 then we will stop the training or if we have gone 10e5 iterations then we stop training further.
Stopping criteria is crucial since we need to set a goal for our network to reach. There are various ways to determine where to stop. Check the links below:
5. Implementation
I have a ready-to-go 3-layer neural network implemented in Matlab for you here. It can be easily translated to Python using TensorFlow or pyTorch. You can also build your own neural networks.
- A tutorial for TensorFlow users: Stanford CS 20SI
- Implement an ANN using pyTorch: PyTorch Implementation
What is function approximation?
In general, a function approximation problem asks us to select a function among a well-defined class[clarification needed] that closely matches (“approximates”) a target function in a task-specific way.
By Stone–Weierstrass theorem , every continuous function defined on a closed interval can be uniformly approximated as closely as desired by a polynomial function. We know from above that, if we choose linear function as our transfer function, then at each iteration of neural network, we are doing a matrix multiplication at each step.
If we expand matrix multiplication, it is easy to see that the whole process of forward propogation is equivalent to using a polynomial to approximate the target function. By Stone-Weierstrass theorem, we have that neural network has the ability to approximate functions satisfying specific requirements (some constraints including continuity, domain, etc.).
There is a formal theorem supporting this! The universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of n-dimensional Euclidean space, under mild assumptions on the transfer function. If you are interested in this theorem, you probably wanna read the following:
Can you show me an example?
We will illustrate the application of a simple 3-layer aritificial neural network in approximating inverse function f(x) = 1/x.
Source codes are uploaded here.
We will use the parameters as suggested by the left to train and test the data.
What are other applications?
We also use a 3-layer neural network to teach a robotic arm how to draw certain pictures. The result is quite fun!
Check this out: ANN application in Robotics
Reference
https://www.tandfonline.com/doi/full/10.1080/20964471.2017.1397411
http://math.uchicago.edu/~may/REU2016/REUPapers/Gaddy.pdf