Skip to main content
Computational graph in Tensorboard showing the components involved in a TF BP update

Neuron

backprop-neuron

Simple DNN 1

backprop-simple-dnn

Simple DNN 2

A network consist of a concatenation of the following layers
  1. Fully Connected layer with input x(1)x^{(1)}, W(1)W^{(1)} and output z(1)z^{(1)}.
  2. RELU producing a(1)a^{(1)}
  3. Fully Connected layer with parameters W(2)W^{(2)} producing z(2)z^{(2)}
  4. SOFTMAX producing y^\hat{y}
  5. Cross-Entropy (CE) loss producing LL
The task of backprop consists of the following steps:
  1. Sketch the network and write down the equations for the forward path.
  2. Propagate the backwards path i.e. make sure you write down the expressions of the gradient of the loss with respect to all the network parameters.
NOTE: Please note that we have omitted the bias terms for simplicity.
Forward Pass StepSymbolic Equation
(1)z(1)=W(1)x(1)z^{(1)} = W^{(1)} x^{(1)}
(2)a(1)=max(0,z(1))a^{(1)} = \max(0, z^{(1)})
(3)z(2)=W(2)a(1)z^{(2)} = W^{(2)} a^{(1)}
(4)y^=softmax(z(2))\hat{y} = \mathtt{softmax}(z^{(2)})
(5)L=CE(y,y^)L = CE(y, \hat{y})
Backward Pass StepSymbolic Equation
(5)LL=1.0\frac{\partial L}{\partial L} = 1.0
(4)Lz(2)=y^y\frac{\partial L}{\partial z^{(2)}} = \hat y - y
(3a)LW(2)=a(1)(y^y)\frac{\partial L}{\partial W^{(2)}} = a^{(1)} (\hat y - y)
(3b)La(1)=W(2)(y^y)\frac{\partial L}{\partial a^{(1)}} = W^{(2)} (\hat y - y)
(2)Lz(1)=La(1)\frac{\partial L}{\partial z^{(1)}} = \frac{\partial L}{\partial a^{(1)}} if a(1)>0a^{(1)} > 0
(1)LW(1)=Lz(1)×x(1)\frac{\partial L}{\partial W^{(1)}} = \frac{\partial L}{\partial z^{(1)}} \times x^{(1)}
Key references: (Choromanska et al., 2014; Romero et al., 2014; Bengio, 2012; Jaderberg et al., 2016; Bengio et al., 2015)

References

  • Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures.
  • Bengio, E., Bacon, P., Pineau, J., Precup, D. (2015). Conditional Computation in Neural Networks for faster models.
  • Choromanska, A., Henaff, M., Mathieu, M., Ben Arous, G., LeCun, Y. (2014). The Loss Surfaces of Multilayer Networks.
  • Jaderberg, M., Czarnecki, W., Osindero, S., Vinyals, O., Graves, A., et al. (2016). Decoupled Neural Interfaces using Synthetic Gradients.
  • Romero, A., Ballas, N., Kahou, S., Chassang, A., Gatta, C., et al. (2014). FitNets: Hints for Thin Deep Nets.