Learning Feedforward Neural Network Through XOR
# Learning Feedforward Neural Network Through XOR
To getting more clear about Feedforward Nerual Network, we work on a sample XOR example. In this case, we are going to find a function to match the true model .
In this simple example, we will not be concerned with statistical generalization.
First, we treat this problem as a linear regression problem, and using Mean squared error (MSE) lose function. Assume our model is . We could use normal equation to minimalize lost. In this case, we got and when the lost is minimal, and the output of this mode would be all the time. We know the output is not correct. We want to get 1 when two inputs are different, we want to get 0 when two inputs are the same. The pictures below shown why linear model cannot represent XOR problem.
If we introduce a simple FNN with one hidden layer, the problem could be solved.The latest model is where and . Due to is a linear function, then cannot be linear because the output would be linear. And we know linear output is incorrect. Clearly, we must use a nonlinear function to describe the features. Most neural networks do so using an affine transformation controlled by learned parameters, followed by a fixed nonlinear function called an activation function. We use that strategy here, by defining , where provides the weights of a linear transformation and is biases.
Previously, to describe a linear regression model, we used a vector of weights and a scalar bias parameter to describe an affine transformation from an input vector to an output scalar. Now, we describe an affine transformation from a vector to a vector , so an entire vector of bias parameters is needed. The activation function is typically chosen to be a function that is applied element-wise, with . In modern neural networks, the default recommendation is to use the rectified linear unit, or ReLU, defined by the activation function .
We can now specify our complete network as
, in this case, max function is a hidden layler.
And we can then specify a solution to the XOR problem, let
and .
# How this model handle inputs
Let input , each raw represents a position. The first step of NN is multiply weight matrix, so
Then, adding bias , the output is
After applying rectified linear transformation, we got
At the end, we finish with multiplying by the weight vector . the output would be
And the output is correct to every inputs.