๐Ÿ˜บ

[cs231n] 4. Introduction to Neural Networks

Backpropagation

์ „์ฒด ํ•จ์ˆ˜์— ๊ฐ ๋ณ€์ˆ˜๋“ค์ด ๋ฏธ์น˜๋Š” ์˜ํ–ฅ โˆ‚fโˆ‚x\cfrac{\partial f}{\partial x} ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ•จ์ˆ˜๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ์—ฐ์‚ฐ๋“ค์„ ์ชผ๊ฐ  Computational Graph๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ๊ฐ์˜ ์—ฐ์‚ฐ๋งˆ๋‹ค Local Gradient๋ฅผ ๊ตฌํ•ด Chain-Rule์„ ์ด์šฉํ•ด Gradient๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ธฐ๋ฒ•.

Sigmoid Gate

Computational Graph์˜ ๊ฐ Node๋Š” ์ •์˜ํ•˜๊ธฐ ๋‚˜๋ฆ„. ์˜ˆ๋ฅผ ๋“ค์–ด Sigmoidํ•จ์ˆ˜ ์ž์ฒด๋ฅผ Node๋กœ ์ •์˜ํ•˜๋ฉด, Sigmoid ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ์ด์šฉํ•ด ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

Patterns in Backward Flow

โ€ข
ADD ๊ฒŒ์ดํŠธ : Gradient๋ฅผ ๊ทธ๋Œ€๋กœ ์ „๋‹ฌ (Gradient Distributor)
โ€ข
MAX ๊ฒŒ์ดํŠธ : ํฐ ๊ฐ’์—๊ฒŒ Gradient๋ฅผ ์ „๋‹ฌํ•˜๊ณ  ์ž‘์€ ๊ฐ’์—๊ฒŒ๋Š” 0์„ ์ „๋‹ฌ (Gradient Router)
โ€ข
MUL ๊ฒŒ์ดํŠธ : Gradient ๊ตํ™˜ (Gradient Switcher)

Backpropagation in Vectors

๊ฐ ๋ณ€์ˆ˜๊ฐ€ ๋ฒกํ„ฐ์ธ ๊ฒฝ์šฐ ๊ฐ ๊ฒŒ์ดํŠธ์˜ Gradient๋Š” Jacobian Matrix๋กœ ํ‘œํ˜„๋œ๋‹ค.
NN์ฐจ์› ๋ฒกํ„ฐ๊ฐ€ ์ž…๋ ฅ์ด๋ฉด Jacobian Matrix๋Š” N2N^2์ฐจ์›์ด ๋œ๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ๋ ‡๊ฒŒ ๊ณ„์‚ฐํ•˜๋ฉด ๋„ˆ๋ฌด ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰์ด ํ•„์š”ํ•˜๋‹ค.
ํ•˜์ง€๋งŒ, Backpropagation์—์„œ ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋Š” ์˜ค์ง ํ•˜๋‚˜์˜ ์ถœ๋ ฅ์—๋งŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฏ€๋กœ Jacobian Matrix๋Š” Diagonal Matrix๊ฐ€ ๋œ๋‹ค.

Jacobian Matrix

๋‹ค๋ณ€์ˆ˜ ๋ฒกํ„ฐ ํ•จ์ˆ˜์˜ 1๊ณ„๋„ ํŽธ๋ฏธ๋ถ„
๋‹ค๋ณ€์ˆ˜ ๋ฒกํ„ฐ ํ•จ์ˆ˜ f(t1,t2,...,tm)=(f1(t1,t2,...,tm),f2(t1,t2,...,tm),...,fn(t1,t2,...,tm))f(t_1, t_2,...,t_m) = (f_1(t_1,t_2,...,t_m),f_2(t_1,t_2,...,t_m),...,f_n(t_1,t_2,...,t_m))

Neural Networks

๋น„์„ ํ˜•์˜ ๋ณต์žกํ•œ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜๋“ค์„ ์Œ“์•„์˜ฌ๋ฆฐ ํ•จ์ˆ˜๋“ค์˜ Class