Shawn's blog Shawn's blog
About Me
  • Category
  • Tag
  • Archive
GitHub (opens new window)

Shawn Jin

I am not a creator of knowledge, I am just a porter of knowledge.
About Me
  • Category
  • Tag
  • Archive
GitHub (opens new window)
  • linear-algebra

  • statistic

  • data-mining

  • machine-learning

    • linear-regression

      • Basic Terminologies in Linear Algebra
    • linear-modelling

    • nerual-networks

    • Difference between Training set, Validation set and Test Set
    • Regularization in Machine Learning
    • Learning Feedforward Neural Network Through XOR
      • How this model handle inputs
    • Backprop Algorithm in Machine Learning
  • Data Science or Information Science
  • Data-Science
  • machine-learning
Shawn Jin
2020-10-28

Learning Feedforward Neural Network Through XOR

# Learning Feedforward Neural Network Through XOR

To getting more clear about Feedforward Nerual Network, we work on a sample XOR example. In this case, we are going to find a function f(x;θ)f(x; \theta)f(x;θ) to match the true model f∗f^*f∗.

In this simple example, we will not be concerned with statistical generalization.

First, we treat this problem as a linear regression problem, and using Mean squared error (MSE) lose function. Assume our model is f(x;w,b)=x⊤w+bf(x;w,b)=x^\top w+bf(x;w,b)=x⊤w+b. We could use normal equation to minimalize lost. In this case, we got w=0w=0w=0 and b=12b=\frac{1}{2}b=21​ when the lost is minimal, and the output of this mode would be 12\frac{1}{2}21​ all the time. We know the output is not correct. We want to get 1 when two inputs are different, we want to get 0 when two inputs are the same. The pictures below shown why linear model cannot represent XOR problem.

If we introduce a simple FNN with one hidden layer, the problem could be solved.The latest model is f(x;W,c,w,b)=f(2)(f(1)(x))f(x;W,c,w,b)=f^{(2)}(f^{(1)(x)})f(x;W,c,w,b)=f(2)(f(1)(x)) where h=f(1)(x;W,c)h=f^{(1)}(x;W,c)h=f(1)(x;W,c) and y=f(2)(h;w,b)y=f^{(2)}(h;w,b)y=f(2)(h;w,b). Due to f(2)f^{(2)}f(2) is a linear function, then f(1)f^{(1)}f(1) cannot be linear because the output would be linear. And we know linear output is incorrect. Clearly, we must use a nonlinear function to describe the features. Most neural networks do so using an affine transformation controlled by learned parameters, followed by a fixed nonlinear function called an activation function. We use that strategy here, by defining h=g(W⊤+c)\mathbf{h}=g(\mathbf{W}^\top+c)h=g(W⊤+c), where W\mathbf{W}W provides the weights of a linear transformation and ccc is biases.

Previously, to describe a linear regression model, we used a vector of weights and a scalar bias parameter to describe an affine transformation from an input vector to an output scalar. Now, we describe h\mathbf{h}h an affine transformation from a vector x\mathbf{x}x to a vector h\mathbf{h}h, so an entire vector of bias parameters is needed. The activation function ggg is typically chosen to be a function that is applied element-wise, with hi=g(x⊤W:,i+ci)h_i=g(x^\top\mathbf{W}_{:,i}+c_i)hi​=g(x⊤W:,i​+ci​). In modern neural networks, the default recommendation is to use the rectified linear unit, or ReLU, defined by the activation function g(z)=max{0,z}g(z) = max\{0, z\}g(z)=max{0,z}.

XOR prob

We can now specify our complete network as

f(x;W,c,w,b)=w⊤max{0,W⊤x+c}+bf(x;W,c,w,b)=w^\top max\{0, W^\top x+c\}+bf(x;W,c,w,b)=w⊤max{0,W⊤x+c}+b, in this case, max function is a hidden layler.

And we can then specify a solution to the XOR problem, let

W=[1111]W= \left[ \begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix} \right] W=[11​11​]

c=[0−1]c= \left[ \begin{matrix} 0 \\ -1 \end{matrix} \right] c=[0−1​]

w=[1−2]w= \left[ \begin{matrix} 1 \\ -2 \end{matrix} \right] w=[1−2​]

and b=0b = 0b=0.

# How this model handle inputs

Let input X=[00011011]X=\left[\begin{matrix}0 &0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1\end{matrix}\right]X=⎣⎡​0011​0101​⎦⎤​, each raw represents a position. The first step of NN is multiply weight matrix, so

XW=[00111122]XW= \left[ \begin{matrix} 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ 2 & 2 \end{matrix} \right] XW=⎣⎡​0112​0112​⎦⎤​

Then, adding bias ccc, the output is

[0−1101021]\left[ \begin{matrix} 0 & -1\\ 1 & 0\\ 1 & 0\\ 2 & 1 \end{matrix} \right] ⎣⎡​0112​−1001​⎦⎤​

After applying rectified linear transformation, we got

[00101021]\left[ \begin{matrix} 0 & 0\\ 1 & 0\\ 1 & 0\\ 2 & 1 \end{matrix} \right] ⎣⎡​0112​0001​⎦⎤​

At the end, we finish with multiplying by the weight vector www. the output would be

[0110]\left[ \begin{matrix} 0 \\ 1 \\ 1 \\ 0 \end{matrix} \right] ⎣⎡​0110​⎦⎤​

And the output is correct to every inputs.

#Data Science#Machine Learning#Neural Network
Updated: 2021/09/15, 20:43:56
Regularization in Machine Learning
Backprop Algorithm in Machine Learning

← Regularization in Machine Learning Backprop Algorithm in Machine Learning→

最近更新
01
Python import files from different directories
12-31
02
Classmethod in Python
09-15
03
Single/Double Star (/*) Parameters in Python
09-15
更多文章>
Theme by Vdoing | Copyright © 2019-2021 Shawn Jin | MIT License
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式