Skip to main content
Computational graph in Tensorboard showing the components involved in a TF BP update

Neuron

backprop-neuron

Simple DNN 1

backprop-simple-dnn

Simple DNN 2

A network consist of a concatenation of the following layers
  1. Fully Connected layer with input x(1)x^{(1)}, W(1)W^{(1)} and output z(1)z^{(1)}.
  2. RELU producing a(1)a^{(1)}
  3. Fully Connected layer with parameters W(2)W^{(2)} producing z(2)z^{(2)}
  4. SOFTMAX producing y^\hat{y}
  5. Cross-Entropy (CE) loss producing LL
The task of backprop consists of the following steps:
  1. Sketch the network and write down the equations for the forward path.
  2. Propagate the backwards path i.e. make sure you write down the expressions of the gradient of the loss with respect to all the network parameters.
NOTE: Please note that we have omitted the bias terms for simplicity.
Forward Pass StepSymbolic Equation
(1)z(1)=W(1)x(1)z^{(1)} = W^{(1)} x^{(1)}
(2)a(1)=max(0,z(1))a^{(1)} = \max(0, z^{(1)})
(3)z(2)=W(2)a(1)z^{(2)} = W^{(2)} a^{(1)}
(4)y^=softmax(z(2))\hat{y} = \mathtt{softmax}(z^{(2)})
(5)L=CE(y,y^)L = CE(y, \hat{y})
Backward Pass StepSymbolic Equation
(5)LL=1.0\frac{\partial L}{\partial L} = 1.0
(4)Lz(2)=y^y\frac{\partial L}{\partial z^{(2)}} = \hat y - y
(3a)LW(2)=a(1)(y^y)\frac{\partial L}{\partial W^{(2)}} = a^{(1)} (\hat y - y)
(3b)La(1)=W(2)(y^y)\frac{\partial L}{\partial a^{(1)}} = W^{(2)} (\hat y - y)
(2)Lz(1)=La(1)\frac{\partial L}{\partial z^{(1)}} = \frac{\partial L}{\partial a^{(1)}} if a(1)>0a^{(1)} > 0
(1)LW(1)=Lz(1)×x(1)\frac{\partial L}{\partial W^{(1)}} = \frac{\partial L}{\partial z^{(1)}} \times x^{(1)}

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.