# 《深度学习入门:基于Python的理论与实现》第四章代码原理详细解析

①sigmoid函数的求导:
∂ y j ∂ x j = y j ( 1 − y j ) \frac{\partial y_j}{\partial x_j}=y_j(1-y_j)

②softmax函数的求导：
∂ E ∂ z i = a i − y i \frac{\partial E}{\partial z_i}=a_i-y_i

、------------------------------------------

、------------------------------------------

、------------------------------------------

、------------------------------------------
According to [2]:
△ｗ
= － ε ∂ E ∂ w =－\varepsilon\frac{\partial E}{\partial w} (8)

= － ε ∂ E ∂ w j i =－\varepsilon\frac{\partial E}{\partial w_{ji}} (6)

= － ε ∂ E ∂ x j ⋅ y i =－\varepsilon\frac{\partial E}{\partial x_j}·y_i
≈ － ε y − t b a t c h _ n u m ⋅ z 1. T \approx －\varepsilon \frac{y-t}{batch\_num}·z1.T (代码中的变量)
= － ε ⋅ g r a d s [ ′ W 2 ′ ] =－\varepsilon·grads['W2'] (代码中的变量)

a 1 a1 x 1 ( 隐 藏 层 输 入 ) x_1(隐藏层输入) (100,50)
z 1 z1 y 1 ( 隐 藏 层 输 出 ) y_1(隐藏层输出) (100,50)
a 2 a2 x 2 ( 输 出 层 输 入 ) x_2(输出层输入) (100,10)
y y y 2 ( 输 出 层 输 出 ) y_2(输出层输出) (100,10)
d a 1 da1 ∂ E ∂ x 2 ⋅ w 21 \frac{\partial E}{\partial x_2}·w_{21} (100,50)
d z 1 dz1 ∂ E ∂ x 2 ⋅ ∂ x 2 ∂ y 1 ⋅ ∂ y 1 ∂ x 1 \frac{\partial E}{\partial x_2}·\frac{\partial x_2}{\partial y_1}·\frac{\partial y_1}{\partial x_1} = ∂ E ∂ x 2 ⋅ w 21 ⋅ [ y 1 ⋅ ( 1 − y 1 ) ] \frac{\partial E}{\partial x_2}·w_{21}·[y_1·(1-y_1)] (100,50)
d y = y − t b a t c h _ n u m dy=\frac{y-t}{batch\_num} ∂ E ∂ x 2 \frac{\partial E}{\partial x_2} (这里的 x 2 x_2 是一个矢量,整体表示100条数据对各个输出节点的误差贡献) (100, 10)
z 1. T z1.T y i y_i (50, 100)
g r a d s [ ′ b 2 ′ ] grads['b2'] ∂ E ∂ x 2 \frac{\partial E}{\partial x_2} (10,)
g r a d s [ ′ W 2 ′ ] grads['W2'] ∂ E ∂ w 21 \frac{\partial E}{\partial w_{21}} (50,10)
g r a d s [ ′ W 1 ′ ] grads['W1'] ∂ E ∂ x 2 ⋅ ∂ x 2 ∂ y 1 ⋅ ∂ y 1 ∂ x 1 ⋅ ∂ x 1 ∂ w 10 = ∂ E ∂ x 2 ⋅ w 21 ⋅ [ y 1 ⋅ ( 1 − y 1 ) ] ⋅ x \frac{\partial E}{\partial x_2}·\frac{\partial x_2}{\partial y_1}·\frac{\partial y_1}{\partial x_1}·\frac{\partial x_1}{\partial w_{10}}=\frac{\partial E}{\partial x_2}·w_{21}·[y_1·(1-y_1)]·x (784,50)
g r a d s [ ′ b 1 ′ ] grads['b1'] ∂ E ∂ x 1 \frac{\partial E}{\partial x_1} (50,)

∂ E ∂ b 2 = ∂ E ∂ x 2 ∂ x 2 ∂ b 2 = ∂ E ∂ x 2 ∂ ( w 21 ⋅ y 1 + b 2 ) ∂ b 2 = ∂ E ∂ x 2 \frac{\partial E}{\partial b_2}=\frac{\partial E}{\partial x_2}\frac{\partial x_2}{\partial b_2}=\frac{\partial E}{\partial x_2}\frac{\partial(w_{21}·y_1+b2)}{\partial b_2}=\frac{\partial E}{\partial x_2}

g r a d s [ ′ b 1 ′ ] grads['b1'] 推导推导同理。
--------------------------------------------

--------------------------------------------

--------------------------------------------

softmax以及sigmoid的求导结果是不一样的.

[1]softmax with cross-entropy loss求导(转载＋细节整理)
[2]《learning representations by back-propagating errors》

QQ号联系： 360901061

【本文对您有帮助就好】