9. 벡터함수인경우..
F : R →R i.e.
F(x) = (f (x), … , f (x))
∇F(x) = (f (x), … , f (x))
Note : ∇F 는(1, d)‑행벡터입니다.
d
1 d
1
′
d
′
10. VIP for vector‑valued function
Linearity
∇(F + G) = ∇F + ∇G
Product rule : F G ∈ R
∇(F G) = (∇F)G + (∇G)F
Chain rule (f : R → R)
∇(f(G)(x)) = (∇G(x))(∇f(G(x))) = g (x) (G(x))
T
T
d
i=1
∑
d
i
′
∂yi
∂f
11. ∇(f(G)) = = = g (G)
dx
dz
i=1
∑
d
dx
dyi
∂yi
∂z
i=1
∑
d
i
′
∂yi
∂f
12. Vector‑valued multivariable 함수인경우..
F : R →R i.e.
F(x) = (f (x), … , f (x))
∇F = (∇f , … , ∇f ) =
Note : ∇F 는(n, m)‑행렬이다!
n m
1 m
1 m
⎣
⎡∂ fx1 1
⋮
∂ fxn 1
∂ fx1 2
⋮
∂ fxn 2
⋯
⋮
⋯
∂ fx1 m
⋮
∂ fxn m
⎦
⎤
13. Linearity
∇(F + G) = ∇F + ∇G
Product rule : F G ∈ R
∇(F G) = (∇F)G + (∇G)F
Chain rule : G :R →R , F :R →R
∇(F(G)(x)) = (∇G(x))(∇F(G(x)))
T
T
n k k m
23. 최소값을구하려면또미분해야합니다!
∇ ∥Xβ − Y∥β 2
2
= ∇ (Xβ) Xβ − (X Y) β − β X Y +Y Yβ ( T T T T T T
)
= ∇ Xβ Xβ + ∇ Xβ Xβ − (X Y) −X Yβ [ ] β [ ] T T
= 2X Xβ − 2X Y = 0T T
24. ∴ X Xβ =X Y
만약k ≤ n 이고 다중공선성이없다면rank(X X) = k 이므로X X
는역행렬이존재한다:
∴ β = (X X) X Y
참고로아까전엔...
β = (x x) x y
... 선형대수는그저차원맞춤법일뿐!
T T
T T
∗ T −1 T
∗ T −1 T
27. Variables
Input : x =z
Output in Hidden Layer ℓ : z = f (W z )
z = f W z , i ∈ {1, … , d }
Output : = f (W z )
(0)
(ℓ)
(ℓ)
(ℓ) (ℓ−1)
i
(ℓ) (ℓ)
(
j=1
∑
dℓ
ij
(ℓ)
j
(ℓ−1)
) ℓ
y^ (L)
(L) (L−1)
31. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
32. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
33. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
34. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
35. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
= ⋯
= δ Wℓ+1 ℓ+1
∂Wℓ
∂zℓ
= δ W f (W z )zℓ+1 ℓ+1
′
ℓ ℓ−1 ℓ−1
36. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
= ⋯
= δ Wℓ+1 ℓ+1
∂Wℓ
∂zℓ
= δ W f (W z )z = δ zℓ+1 ℓ+1
′
ℓ ℓ−1 ℓ−1 ℓ ℓ−1
48. Kronecker product
A : (n, m)‑행렬, B : (p, q)‑행렬
A ⊗ B =
A ⊗ B 는(np, mq)‑행렬이다
⎣
⎢
⎢
⎡a B11
a B21
⋮
a Bn1
a B12
a B22
⋮
a Bn2
⋯
⋯
⋱
⋯
a B1m
a B2m
⋮
a Bnm
⎦
⎥
⎥
⎤
49. = b ⊗I
I = np.eye(n) # (n,n)-Identity matrix
b = np.array([[b1],..,[bm]]) # (m,1)-Column vector
np.kron(b,I) # (mn,n)-matrix : Kronecker product
∂X
∂(Xb)
n
50. (재도전) Derivation of Back‑propagation Algorythm
:= = ∇ L
(d d , 1) = (d d , d ) × (d , 1)
∂Wℓ
∂L( )y^
∂(vec(W ))ℓ
∂L( )y^
∂(vec(W ))ℓ
∂y^
y^
ℓ ℓ−1 ℓ ℓ−1 L L
53. 여기서∇f 는다음과 같은(d , d )‑대각행렬입니다
∇f =
=
∂Wℓ
∂y^
∂(vec(W ))ℓ
∂f (W z )L L (L−1)
= ∇f
∂(vec(W) )ℓ
∂(W z )L (L−1)
L
L L L
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎡
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
⋱
f W z′
(
k=1
∑
dL
d kL
(L)
k
(L−1)
)
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎤
54. ∇f 와∇ L 이곱해지면다음과 같은(d , 1) 행렬이나온다
= diag(∇f ) ⊙ ∇ L
L y L
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎡
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
∂y1
∂L
⋮
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
∂ydL
∂L
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎤
L y
56. 그러므로... (만약ℓ = L 인경우)
Dimension check :
(d d , 1) = (d d , d ) × (d , 1)
∂WL
∂L( )y^
= diag(∇f ) ⊙ ∇ L
∂(vec(W ))L
∂(W z )L (L−1)
[ L y ]
=z ⊗I δ(L−1) dL L
L L−1 L L−1 L−1 L
57. 그러므로... (만약ℓ ≠ L 인경우)
Dimension check :
(d d , d ) × (d , d ) × (d , 1)
∂Wℓ
∂L( )y^
= diag(∇f ) ⊙ ∇ L
∂(vec(W ))ℓ
∂(W z )L (L−1)
[ L y ]
= W δ
∂(vec(W ))ℓ
∂z(L−1)
L
T
L
ℓ ℓ−1 L−1 L−1 L L
59. 뭐한번해보죠...
∂Wℓ
∂L( )y^
= W δ
∂(vec(W ))ℓ
∂z(L−1)
L
T
L
= W δ
∂(vec(W ))ℓ
∂f (W z )L−1 L−1 (L−2)
L
T
L
= diag(∇f ) ⊙W δ
∂(vec(W ))ℓ
∂(W z )L−1 (L−2)
[ L−1 L
T
L]
= W δ
∂(vec(W ))ℓ
∂z(L−2)
L−1
T
L−1
60. 계속하세요
=
∂Wℓ
∂L( )y^
W δ
∂(vec(W ))ℓ
∂z(L−2)
L−1
T
L−1
= W δ
∂(vec(W ))ℓ
∂f (W z )L−2 L−2 (L−3)
L−1
T
L−1
= diag(∇f ) ⊙W δ
∂(vec(W ))ℓ
∂(W z )L−2 (L−3)
[ L−2 L−1
T
L−1]
= W δ
∂(vec(W ))ℓ
∂z(L−3)
L−2
T
L−2
66. Back‑propagation via Matrix Calculus
= ∇f W ∇f ∇ L( )z
Dimension check :
∂Wℓ
∂L( )y^
ℓ
⎝
⎛
j=ℓ+1
∏
dL
j
T
j
⎠
⎞
y y^ (ℓ−1)
T
(d , d ) = (d , d ) × (d , d ) × (d , d )ℓ ℓ−1 ℓ ℓ
⎝
⎛
j=ℓ+1
∏
dL
j−1 j j j
⎠
⎞
×(d , 1) × (1, d )L ℓ−1
67. Back‑propagation Algorythm
위를이용해W , … ,W 를업데이트할수있다
W ← W − α =W − αδ z
δL
δℓ
δ1
= [diag(∇f (W z )) ⊙ ∇ L( )]L L (L−1) y y^
⋮
=W [diag(∇f (W z )) ⊙ δ ]ℓ+1
T
ℓ ℓ (ℓ−1) ℓ+1
⋮
=W [diag(∇f (W x)) ⊙ δ ]2
T
1 1 2
1 L
ℓ ℓ
∂Wℓ
∂L
ℓ ℓ (ℓ−1)
T