trsing’s diary

勉強、読んだ本、仕事で調べたこととかのメモ。

PRML 4.1.3~4.1.6

4.1.3 分類における最小二乗

(4.15)を微分

\frac{\partial}{\partial X}Tr[(AXB+C)(AXB+C)^{T}]
=2A^{T}(AXB+C)B^{T},\\
Tr[(AXB+C)(AXB+C)^{T}]=Tr[(AXB+C)^{T}(AXB+C)]
より
\frac{\partial}{\partial W}E_{D}(W)=2X^{T}(XW-T)

4.1.4 フィッシャーの線形判別

(演習4.4) w^{T}(m_{2}-m_{1})+\lambda(w^{T}w-1)w微分するとm_{2}-m_{1}+\lambda 2w

(4.25)から(4.26)

(y_{n}-m_{k})^{2}=(w^{T}x_{n}-w^{T}m_{k})(w^{T}x_{n}-w^{T}m_{k})^{T}=w^{T}(x_{n}-m_{k})(x_{n}-m_{k})^{T}w
(m_{2}-m_{1})^{2}も同様

(4.26)を微分

\frac{\partial}{\partial x}\frac{(Ax)^{T}Ax}{(Bx)^{T}Bx}=2\frac{A^{T}Ax}{(Bx)^{T}Bx}-2\frac{(x^{T}A^{T}Ax)B^{T}Bx}{(x^{T}B^{T}Bx)^{2}}
より

\frac{\partial}{\partial w}J(w)=2\frac{S_{B}w}{w^{T}S_{w}w}-2\frac{(w^{T}S_{B}w)S_{w}w}{(w^{T}S_{w}w)^{2}}

S_{B}w=(m_{2}-m_{1})(m_{2}-m_{1})^{T}w(m_{2}-m_{1})^{T}wスカラーなのでS_{B}w(m_{2}-m_{1})と同じ方向を持つ。

4.1.5 最小二乗との関係

$$ \sum_{n=1}^{N}x_{n} = \sum_{k=1}^{K} \sum_{n \in C_{k}} x_{n}=\sum_{k=1}^{K} N_{k}m_{k} $$ に注意すると
(4.32)=w^{T}\sum x_{n}+Nw_{0}-\sum t_{n}=N_{1}m_{1}+N_{2}m_{2}+N w_{0}
(4.33)=\sum(w^{T}x_{n})x_{n}-(w^{T}m)\sum x_{n}-\sum t_{n}x_{n}\\
=\sum (x_{n}^{T}w)x_{n}-(m^{T}w)(N_{1}m_{1}+N_{2}m_{2}) -N/N_{1}\sum x_{n}+N/N_{2}\sum x_{n}\\
=\sum x_{n}(x_{n}^{T}w)-(N_{1}m_{1}+N_{2}m_{2})(m^{T}w)-N(1/N_{1}\sum x_{n}-1/N_{2}\sum x_{n})\\
=(\sum x_{n}x_{n}^{T})w-(N_{1}m_{1}+N_{2}m_{2})1/N(N_{1}m_{1}+N_{2}m_{2})^{T}w-N(m_{1}-m_{2})\\
=0
(S_{W}+N_{1}N_{2}/NS_{B})=\sum
(x_{n}x_{n}^{T}-m_{1}x_{n}^{T}-x_{n}m_{1}^{T}+m_{1}m_{1}^{T})\\
\hspace{100pt}+\sum
(x_{n}x_{n}^{T}-m_{2}x_{n}^{T}-x_{n}m_{2}^{T}+m_{2}m_{2}^{T})\\
\hspace{100pt}+N_{1}N_{2}/N(m_{2}m_{2}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}+m_{1}m_{1}^{T})\\
\hspace{85pt}=(\sum
x_{n}x_{n}^{T})-m_{1}N_{1}m_{1}^{T}-N_{1}m_{1}m_{1}^{T}+N_{1}m_{1}m_{1}^{T}\\
\hspace{100pt}+(\sum
x_{n}x_{n}^{T})-m_{2}N_{2}m_{2}^{T}-N_{2}m_{2}m_{2}^{T}+N_{2}m_{2}m_{2}^{T}\\
\hspace{100pt}+N_{1}N_{2}/N(m_{2}m_{2}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}+m_{1}m_{1}^{T})\\
\hspace{85pt}=(\sum x_{n}x_{n}^{T})-N_{1}m_{1}m_{1}^{T}-N_{2}m_{2}m_{2}^{T}\\
\hspace{100pt}+N_{1}N_{2}/N(m_{2}m_{2}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}+m_{1}m_{1}^{T})\\
\hspace{85pt}=(\sum
x_{n}x_{n}^{T})
-1/N
(N_{1}(N_{1}+N_{2})m_{1}m_{1}^{T}+N_{2}(N_{1}+N_{2})m_{2}m_{2}^{T}
-N_{1}N_{2}(m_{2}m_{2}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}+m_{1}m_{1}^{T}))\\
\hspace{85pt}=(\sum
x_{n}x_{n}^{T})
-1/N
(N_{1}^{2}m_{1}m_{1}^{T}+N_{1}N_{2}(m_{1}m_{2}^{T}+m_{2}m_{1}^{T})+N_{2}^{2}m_{2}m_{2}^{T})\\
\hspace{85pt}=(\sum
x_{n}x_{n}^{T})
-1/N
(N_{1}m_{1}+N_{2}m_{2})(N_{1}m_{1}+N_{2}m_{2})^{T}\\
これより(4.33)と(4.37)は等しい。

4.1.6 多クラスにおけるフィッシャーの判別

$$ S_{T}=\sum_{n=1}^{N}(x_{n}-m)(x_{n}-m)^{T}=\sum_{k=1}^{K}\sum_{n \in C_{k}} (x_{n}-m)(x_{n}-m)^{T}\\= \sum_{k=1}^{K}\sum_{n \in C_{k}}( (x_{n}-m_{k})+(m_{k}-m) )( (x_{n}-m_{k})+(m_{k}-m) )^{T}\\= \sum_{k=1}^{K} \sum_{n \in C_{k}} [ (x_{n}-m_{k})(x_{n}-m_{k})^{T}+ (x_{n}-m_{k})(m_{k}-m)^{T}+ (m_{k}-m)(x_{n}-m_{k})^{T}+ (m_{k}-m)(m_{k}-m)^{T} ]\\= \sum_{k=1}^{K} [ S_{k}+N_{k}(m_{k}-m)(m_{k}-m)^{T}+ (\sum (x_{n}-m_{k}) )(m_{k}-m)^{T}+ (m_{k}-m)(\sum (x_{n}-m_{k}) )^{T} ]\\= S_{W}+S_{B}+ \sum_{k=1}^{K} [ (N_{k}m_{k}-N_{k}m_{k})(m_{k}-m)^{T}+ (m_{k}-m)(N_{k}m_{k}-N_{k}m_{k})^{T} ]\\= S_{W}+S_{B} $$