Backprop Algorithm in Machine Learning
# Backprop Algorithm in ML
Back propagation(Backprop) allows information from the cost to then flow backward through the network in order to compute the gradient.
The term back-propagation is often misunderstood as meaning the whole learning algorithm for multi layer neural networks. Actually, back-propagation refers only to the method for computing the gradient, while another algorithm, such as stochastic gradient descent, is used to perform learning using this gradient.
# Computational Graph
Many ways of formalizing computation as graphs are possible. Here, we use each node in the graph to indicate a variable. The variable may be a scalar, vector, matrix, tensor, or even a variable of another type.
To formalize our graphs, we also need to introduce the idea of an opera on . An operation is a simple function of one or more variables. Our graph language is accompanied by a set of allowable operations. Functions more complicated than the operations in this set may be described by composing many operations together.
# Chain Rule of Calculus
The chain rule of calculus (not to be confused with the chain rule of probability) is used to compute the derivatives of functions formed by composing other functions whose derivatives are known. Back-propagation is an algorithm that computes the chain rule, with a specific order of operations that is highly efficient.
Jacobian Matrix
A Jacobian matrix, sometimes simply called a Jacobian, is a matrix (opens new window) of first order partial derivatives (opens new window) (in some cases, the term "Jacobian" also refers to the determinant (opens new window) of the Jacobian matrix).