<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://solita-net.github.io</id>
    <title>SoliTa</title>
    <updated>2024-07-18T16:05:31.437Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://solita-net.github.io"/>
    <link rel="self" href="https://solita-net.github.io/atom.xml"/>
    <subtitle>记录学习过程</subtitle>
    <logo>https://solita-net.github.io/images/avatar.png</logo>
    <icon>https://solita-net.github.io/favicon.ico</icon>
    <rights>All rights reserved 2024, SoliTa</rights>
    <entry>
        <title type="html"><![CDATA[多层感知机的数值特性与初始化]]></title>
        <id>https://solita-net.github.io/post/duo-ceng-gan-zhi-ji-de-shu-zhi-te-xing-yu-chu-shi-hua/</id>
        <link href="https://solita-net.github.io/post/duo-ceng-gan-zhi-ji-de-shu-zhi-te-xing-yu-chu-shi-hua/">
        </link>
        <updated>2024-07-18T15:24:16.000Z</updated>
        <summary type="html"><![CDATA[<p>展示了梯度消失和梯度爆炸的原因，阐述其与初始化的联系，并给出一种初始化的方法。</p>
]]></summary>
        <content type="html"><![CDATA[<p>展示了梯度消失和梯度爆炸的原因，阐述其与初始化的联系，并给出一种初始化的方法。</p>
<!-- more -->
<h2 id="21-梯度消失与梯度爆炸">2.1 梯度消失与梯度爆炸</h2>
<ul>
<li>
<p><strong>梯度消失的根本原因</strong></p>
<p>梯度消失的根本原因并不是因为层层传递造成的，而是使用的函数求导之后数值太小的问题。</p>
<p>比如，以前曾经使用的<code>sigmoid</code>函数，其求导之后的图像只在中间有突起，左右两边非常平缓且接近于零。这导致在训练时，只要有一层的反向传播出现了一个很大的数值，在求导后数值将被覆盖，即梯度将被切断。</p>
<p>这会导致无法学习。</p>
<p>选用<code>ReLU</code>函数虽然没有<code>sigmoid</code>函数（在统计学上）看起来“优雅”，但是其很稳定，导数是稳定的0与1，能很好的解决梯度消失的问题。</p>
<p>因此可以预料到的是，参数的初始化至关重要。反向传播时，连接层求导得到的表达式会包含参数本身（连接的方法是线性加权相加），因此参数不能初始化为0，也需要警惕初始化的大小</p>
</li>
<li>
<p><strong>梯度爆炸的根本原因</strong></p>
<p>造成梯度爆炸的，不止有一个<strong>拟合函数梯度太大</strong>。</p>
<p>如果我们初始化生成的参数矩阵，是一个尺度过大的值的话（尺度与方差相对应），可以预料到反向传播时，梯度被不断乘上很大的值，最后模型没有办法收敛。</p>
<p>梯度爆炸还有一个危险：当网络中出现一个梯度暴增或跌至0的函数时，很有可能计算机的变量没办法承载如此大的数值导致出现非法值，致使程序没法运行。</p>
</li>
</ul>
<h2 id="22-对称性">2.2 对称性</h2>
<p>对称性的概念作用于两个隐藏单元中。</p>
<p>其问题的出现在于，如果初始化的时候设置所有参数为一个相同的常量时</p>
<ul>
<li>前向传播中，隐藏单元采用相同输入与相同参数，产生相同激活。</li>
<li>反向传播中，所有值相同，得到的更新值也相同。</li>
<li>更新一遍后，两个单元的参数是一模一样的。</li>
</ul>
<p>根据前面暂退法的描述，以及ResNET论文的叙述，每增加一个单元都会增大网络退化的概率。</p>
<p>因此，对称性的存在相当于空添了一个层但是什么作用都没发挥出来。</p>
<blockquote>
<p><a href="https://solita-net.github.io/post/leng-fan-resnet-wan-cheng-tu-xiang-fen-lei/">ResNET中关于网络退化的概述（在第二部分）</a></p>
</blockquote>
<p>看得出来，想要破坏对称性是很容易的：一方面是初始化设定不同值，另一方面是使用暂退法通过随即屏蔽打破对称性。</p>
<h2 id="23-xavier初始化">2.3 Xavier初始化</h2>
<p>从上文的叙述可以看出来，<strong>初始化是一个很重要的事</strong>。</p>
<p>在最开始，我们使用正态分布初始化权值，这个方法很简单因此不论。</p>
<p>初始化方法是深度学习基础研究的热点领域，有各种算法来避免运算中可能出现的问题。本次学习只学习到一个皮毛。作为入门，我们来学习Xavier初始化。</p>
<p>这里有一篇很好的展示这个初始化的效果的文章。</p>
<p><a href="https://blog.csdn.net/xian0710830114/article/details/125540678">深度学习参数初始化（一）Xavier初始化 含代码</a></p>
<h3 id="231-xavier初始化的条件">2.3.1 Xavier初始化的条件</h3>
<p>梯度爆炸和尺度，即方差有着密切的关联。</p>
<p>因此，Glorot认为：优秀的初始化应该使得<strong>各层的激活值和状态梯度</strong>在传播过程中的方差保持一致。也就是说我们要保证前向传播各层参数的方差和反向传播时各层参数的<strong>方差一致</strong>。</p>
<p>方差取决于激活函数，输入特征与初始化值。为了关注怎么初始化能最接近我们的方差一致目标，我们对前两个要素提出如下假设：</p>
<ul>
<li>输入的每个特征方差一样：Var(x)；</li>
<li>激活函数对称：这样就可以假设每层的输入均值都是0；</li>
<li>激活函数的f′(0)=1</li>
<li>初始时，状态值落在激活函数的线性区域：f′(Si(k))≈1</li>
</ul>
<h3 id="232-xavier初始化公式">2.3.2 Xavier初始化公式</h3>
<p>详细推导不展示，这里展示思路。</p>
<p>我们先来观察一个均匀分布的特征：</p>
<ul>
<li><strong>均匀分布的方差是定值，是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mrow><mo>(</mo><mi>b</mi><mo>−</mo><mi>a</mi><msup><mo>)</mo><mn>2</mn></msup></mrow><mn>12</mn></mfrac></mrow><annotation encoding="application/x-tex">\frac{(b-a)^2}{12}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.4539199999999999em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.10892em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">2</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.485em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathdefault mtight">b</span><span class="mbin mtight">−</span><span class="mord mathdefault mtight">a</span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913142857142857em;"><span style="top:-2.931em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></strong>。这是我们最后套用到初始化的关键。</li>
</ul>
<p>前向传播时，方差要一致，反向传播时，方差也要一致。要怎么达到这个效果？</p>
<p>我们已经假设输入和分布全部具有零均值和统一的方差。因此，首先算出每一层输入产生的反差和均值。假设分布的方差是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>σ</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\sigma^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>，输入的方差是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>γ</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">\gamma^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.008548em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05556em;">γ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>。</p>
<p class='katex-block katex-error' title='ParseError: KaTeX parse error: No such environment: split at position 7: \begin{̲s̲p̲l̲i̲t̲}̲\begin{aligned}…'>\begin{split}\begin{aligned}
    E[o_i] &amp; = \sum_{j=1}^{n_\mathrm{in}} E[w_{ij} x_j] \\&amp;= \sum_{j=1}^{n_\mathrm{in}} E[w_{ij}] E[x_j] \\&amp;= 0, \\
    \mathrm{Var}[o_i] &amp; = E[o_i^2] - (E[o_i])^2 \\
        &amp; = \sum_{j=1}^{n_\mathrm{in}} E[w^2_{ij} x^2_j] - 0 \\
        &amp; = \sum_{j=1}^{n_\mathrm{in}} E[w^2_{ij}] E[x^2_j] \\
        &amp; = n_\mathrm{in} \sigma^2 \gamma^2.
\end{aligned}\end{split}
</p>
<ul>
<li>反向传播公式也是一样的。</li>
</ul>
<p>要怎么样保证前向传播时，方差要一致，反向传播时，方差也要一致？</p>
<p>我们可以让<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>n</mi><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi></mrow></msub><msup><mi>σ</mi><mn>2</mn></msup><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">n_\mathrm{in} \sigma^2 = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.964108em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span>和<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>n</mi><mrow><mi mathvariant="normal">o</mi><mi mathvariant="normal">u</mi><mi mathvariant="normal">t</mi></mrow></msub><msup><mi>σ</mi><mn>2</mn></msup><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">n_\mathrm{out} \sigma^2 = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.964108em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">o</span><span class="mord mathrm mtight">u</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span>。问题在于，这个条件是难以同时满足的。</p>
<p>我们为了使得这个条件尽可能满足，我们松弛了条件，只要让</p>
<p class='katex-block'><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mtable><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>(</mo><msub><mi>n</mi><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi></mrow></msub><mo>+</mo><msub><mi>n</mi><mrow><mi mathvariant="normal">o</mi><mi mathvariant="normal">u</mi><mi mathvariant="normal">t</mi></mrow></msub><mo>)</mo><msup><mi>σ</mi><mn>2</mn></msup><mo>=</mo><mn>1</mn><mtext> 或等价于 </mtext><mi>σ</mi><mo>=</mo><msqrt><mfrac><mn>2</mn><mrow><msub><mi>n</mi><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi></mrow></msub><mo>+</mo><msub><mi>n</mi><mrow><mi mathvariant="normal">o</mi><mi mathvariant="normal">u</mi><mi mathvariant="normal">t</mi></mrow></msub></mrow></mfrac></msqrt><mi mathvariant="normal">.</mi></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{aligned}
\frac{1}{2} (n_\mathrm{in} + n_\mathrm{out}) \sigma^2 = 1 \text{ 或等价于 }
\sigma = \sqrt{\frac{2}{n_\mathrm{in} + n_\mathrm{out}}}.
\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.74em;vertical-align:-1.1199999999999999em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6200000000000003em;"><span style="top:-3.62em;"><span class="pstrut" style="height:3.576595em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">o</span><span class="mord mathrm mtight">u</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord">1</span><span class="mord text"><span class="mord"> </span><span class="mord cjk_fallback">或等价于</span><span class="mord"> </span></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.576595em;"><span class="svg-align" style="top:-4.4em;"><span class="pstrut" style="height:4.4em;"></span><span class="mord" style="padding-left:1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">o</span><span class="mord mathrm mtight">u</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.536595em;"><span class="pstrut" style="height:4.4em;"></span><span class="hide-tail" style="min-width:1.02em;height:2.48em;"><svg width='400em' height='2.48em' viewBox='0 0 400000 2592' preserveAspectRatio='xMinYMin slice'><path d='M424,2478c-1.3,-0.7,-38.5,-172,-111.5,-514c-73,
-342,-109.8,-513.3,-110.5,-514c0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,
25c-5.7,9.3,-9.8,16,-12.5,20s-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,
-13s76,-122,76,-122s77,-121,77,-121s209,968,209,968c0,-2,84.7,-361.7,254,-1079
c169.3,-717.3,254.7,-1077.7,256,-1081c4,-6.7,10,-10,18,-10H400000v40H1014.6
s-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185c-2,6,-10,9,-24,9
c-8,0,-12,-0.7,-12,-2z M1001 80H400000v40H1014z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.863405em;"><span></span></span></span></span></span><span class="mord">.</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1199999999999999em;"><span></span></span></span></span></span></span></span></span></span></span></span></p>
<p>我们把得到的<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>σ</mi></mrow><annotation encoding="application/x-tex">\sigma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">σ</span></span></span></span>和均匀分布的方差联系起来，最后就得到了初始化的分布。</p>
<p class='katex-block'><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>U</mi><mrow><mo fence="true">(</mo><mo>−</mo><msqrt><mfrac><mn>6</mn><mrow><msub><mi>n</mi><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi></mrow></msub><mo>+</mo><msub><mi>n</mi><mrow><mi mathvariant="normal">o</mi><mi mathvariant="normal">u</mi><mi mathvariant="normal">t</mi></mrow></msub></mrow></mfrac></msqrt><mo separator="true">,</mo><msqrt><mfrac><mn>6</mn><mrow><msub><mi>n</mi><mrow><mi mathvariant="normal">i</mi><mi mathvariant="normal">n</mi></mrow></msub><mo>+</mo><msub><mi>n</mi><mrow><mi mathvariant="normal">o</mi><mi mathvariant="normal">u</mi><mi mathvariant="normal">t</mi></mrow></msub></mrow></mfrac></msqrt><mo fence="true">)</mo></mrow><mi mathvariant="normal">.</mi></mrow><annotation encoding="application/x-tex">U\left(-\sqrt{\frac{6}{n_\mathrm{in} + n_\mathrm{out}}}, \sqrt{\frac{6}{n_\mathrm{in} + n_\mathrm{out}}}\right).
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.526625em;vertical-align:-0.95003em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">U</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord">−</span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.576595em;"><span class="svg-align" style="top:-4.4em;"><span class="pstrut" style="height:4.4em;"></span><span class="mord" style="padding-left:1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">o</span><span class="mord mathrm mtight">u</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">6</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.536595em;"><span class="pstrut" style="height:4.4em;"></span><span class="hide-tail" style="min-width:1.02em;height:2.48em;"><svg width='400em' height='2.48em' viewBox='0 0 400000 2592' preserveAspectRatio='xMinYMin slice'><path d='M424,2478c-1.3,-0.7,-38.5,-172,-111.5,-514c-73,
-342,-109.8,-513.3,-110.5,-514c0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,
25c-5.7,9.3,-9.8,16,-12.5,20s-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,
-13s76,-122,76,-122s77,-121,77,-121s209,968,209,968c0,-2,84.7,-361.7,254,-1079
c169.3,-717.3,254.7,-1077.7,256,-1081c4,-6.7,10,-10,18,-10H400000v40H1014.6
s-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185c-2,6,-10,9,-24,9
c-8,0,-12,-0.7,-12,-2z M1001 80H400000v40H1014z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.863405em;"><span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.576595em;"><span class="svg-align" style="top:-4.4em;"><span class="pstrut" style="height:4.4em;"></span><span class="mord" style="padding-left:1em;"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31750199999999995em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">i</span><span class="mord mathrm mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathrm mtight">o</span><span class="mord mathrm mtight">u</span><span class="mord mathrm mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">6</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-3.536595em;"><span class="pstrut" style="height:4.4em;"></span><span class="hide-tail" style="min-width:1.02em;height:2.48em;"><svg width='400em' height='2.48em' viewBox='0 0 400000 2592' preserveAspectRatio='xMinYMin slice'><path d='M424,2478c-1.3,-0.7,-38.5,-172,-111.5,-514c-73,
-342,-109.8,-513.3,-110.5,-514c0,-2,-10.7,14.3,-32,49c-4.7,7.3,-9.8,15.7,-15.5,
25c-5.7,9.3,-9.8,16,-12.5,20s-5,7,-5,7c-4,-3.3,-8.3,-7.7,-13,-13s-13,-13,-13,
-13s76,-122,76,-122s77,-121,77,-121s209,968,209,968c0,-2,84.7,-361.7,254,-1079
c169.3,-717.3,254.7,-1077.7,256,-1081c4,-6.7,10,-10,18,-10H400000v40H1014.6
s-87.3,378.7,-272.6,1166c-185.3,787.3,-279.3,1182.3,-282,1185c-2,6,-10,9,-24,9
c-8,0,-12,-0.7,-12,-2z M1001 80H400000v40H1014z'/></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.863405em;"><span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">.</span></span></span></span></span></p>
<p>推导中有很多假设和松弛的地方，但是，经过实证，这种初始化方法非常有效。</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[【冷饭】ResNET完成图像分类]]></title>
        <id>https://solita-net.github.io/post/leng-fan-resnet-wan-cheng-tu-xiang-fen-lei/</id>
        <link href="https://solita-net.github.io/post/leng-fan-resnet-wan-cheng-tu-xiang-fen-lei/">
        </link>
        <updated>2024-07-18T13:45:33.000Z</updated>
        <summary type="html"><![CDATA[<p>这是人工智能作业的实验报告，其中有对ResNET论文原理的叙述。</p>
]]></summary>
        <content type="html"><![CDATA[<p>这是人工智能作业的实验报告，其中有对ResNET论文原理的叙述。</p>
<!-- more -->
<h2 id="一-实验内容">一、实验内容</h2>
<h3 id="a-使用resnet完成图像分类">A. 使用ResNet完成图像分类</h3>
<p>阅读论文：<strong>Deep residual learning for image recognition</strong> （压缩包中已提供），使用PyTorch手动搭建一个ResNet网络（可使用任意ResNet变体，如ResNet-18, ResNet-34等），完成一个图像分类任务，根据自己的算力情况，完成MNIST或Cifar-10数据集上的图像分类任务，提交实验报告及代码。</p>
<p><em>要求：</em></p>
<ul>
<li>实验报告中包含对论文的理解，为什么ResNet是有效的？</li>
<li>实验报告中包含对核心代码的解释（至少包含数据集的预处理、ResNet的定义）。</li>
<li>实验报告中需要提供损失值以及准确率的收敛曲线。</li>
<li>（可选）实验报告中可以探究不同模型参数对结果的影响。</li>
</ul>
<h2 id="二-实验原理">二、实验原理</h2>
<h3 id="21-论文概述">2.1 论文概述</h3>
<p>ResNet，全称是Residual Net，又称作残差神经网络。</p>
<p>通过阅读论文，可以了解ResNet的大致原理。引用论文中的一段阐述。</p>
<blockquote>
<p>We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.</p>
</blockquote>
<p>这段论述大意为，我们将层重新表述为学习相对于层输入的<strong>残差函数</strong>，而不是学习无参考的函数。</p>
<p>ResNet的核心思想是引入**“残差连接”<strong>或</strong>“跳跃连接”**（skip connection），即让每一个神经网络层学习残差（Residual）。</p>
<ul>
<li>
<p>在传统的网络层中，我们希望学到某种映射$$H(x)$$（也即学习到一种难以直观理解的函数）</p>
</li>
<li>
<p>而在ResNet中，我们希望学到残差映射$$F(x)=H(x)−x$$，因此$$H(x)=F(x)+x$$。</p>
<p>这样，网络层实际上学习的是输入和输出之间的残差。通过这种方式，可以使得信息更容易在网络中传播，从而缓解梯度消失问题。</p>
</li>
</ul>
<p>为什么需要残差映射？论文中提到的关键问题是为了是<strong>网络退化</strong>（这是一个很奇怪但是就是存在的问题）。</p>
<blockquote>
<p>When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as reported in and thoroughly verified by our experiments.</p>
</blockquote>
<p>对一个朴素网络叠加更多层之后，网络的误差会变大，准确性会下滑。而且令人意外的是，这种退化并不是由过拟合引起的（下面这段论述解释了为什么这个现象不是过拟合引起的）。</p>
<blockquote>
<p>Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it. There exists a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model. The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart. But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time).</p>
</blockquote>
<p>如果直观思考的话，往一个神经网络中添加一大堆恒等映射层（就是什么都不做，输出输入相同），网络的准确性是不会改变的。但是现实就是，添加了这些层之后，就出现了网络退化问题。</p>
<p>这没有一个很好的解释，但是问题就存在在这：因为单纯的层数增加，即使这些层就是恒等映射都会导致网络退化。</p>
<p>因此很自然就想到，一个可行的办法是让一些输入直接跳接到后面的层中。</p>
<p>反映到实际方法上，就是直接在一些层的后面加上原始输入。</p>
<figure data-type="image" tabindex="1"><img src="https://solita-net.github.io/post-images/1721310555116.png" alt="" loading="lazy"></figure>
<p>仔细思考，此时中间这两个层经过学习之后，学习到的是什么？</p>
<p>因为我们开始要学习一个映射<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">H(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span></span>，但是现在我们往输出加了输入<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span></span></span></span>，然后我们整个网络经过训练最后还是会变成<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">H(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span></span></span></span>，那么神经网络为了能够收敛，学习到的正是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>−</mo><mi>x</mi></mrow><annotation encoding="application/x-tex">H(x)-x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathdefault" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathdefault">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span></span></span></span>。</p>
<h3 id="22-网络架构">2.2 网络架构</h3>
<p>上述的结构可以封装为一个模块，我们称之为<strong>残差模块</strong>。</p>
<p>根据上述论文的图示，我们设想一个模块应该有如下的功能：</p>
<ul>
<li>进行卷积运算、批量归一化和ReLU激活。</li>
<li>将输入直接通过短路连接添加到卷积运算的结果上。</li>
</ul>
<p>ResNET相比起CNN来讲，只是多了数个这样的残差块而已。</p>
<h3 id="23-流程图">2.3 流程图</h3>
<figure data-type="image" tabindex="2"><img src="https://solita-net.github.io/post-images/1721310568974.png" alt="" loading="lazy"></figure>
<p>这也是一个很标准的ResNET18的结构。</p>
<p>为什么明明只有6个层，却叫做ResNET18呢？</p>
<p>因为每个残差层里面是自己定义的。我期望一个残差块里的结构应该如下：</p>
<ul>
<li>卷积层1（第一层）</li>
<li>归一化</li>
<li>激活函数</li>
<li>卷积层2（第二层）</li>
<li>归一化</li>
</ul>
<p>然后，每一个残差层里包含两个残差块，即一个残差层里会包含四个层。</p>
<p>这样就一共有<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>2</mn><mo>+</mo><mn>4</mn><mo>∗</mo><mn>4</mn><mo>=</mo><mn>18</mn></mrow><annotation encoding="application/x-tex">2+4*4=18</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">∗</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">4</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord">8</span></span></span></span>个层了。</p>
<h2 id="三-代码展示">三、代码展示</h2>
<blockquote>
<p>悲报：由于我的电脑只有CPU没有显卡加速，在写好ResNET18之后尝试跑，跑了20分钟一轮都没跑完的情况下把电脑跑蓝屏了。无奈之下，我只能改用ResNET8，即将四个残差层删减之一个，程序才能勉强跑起来。</p>
</blockquote>
<p>首先，<code>pytorch</code>中没有方便的残差层直接拿过来实现。不过，鉴于残差层本质上是由一个完整的神经层加上最后的短接形成的。因此我们可以自己定义一个类把我们想要的功能整合到一个类中实现。</p>
<h3 id="31-残差块">3.1 残差块</h3>
<pre><code class="language-py">class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample:
            residual = self.downsample(x)
        out += residual
        out = F.relu(out)
        return out
</code></pre>
<p>这些功能几乎和CNN的功能相同，最后一个<code>out</code>的操作也很容易理解。</p>
<p>这里解释一下，<code>downsample</code>是下采样功能，用来防止后面的层和输出结果对不上。这样就可以免去计算的烦恼。</p>
<h3 id="32-resnet8模块">3.2 ResNet8模块</h3>
<pre><code class="language-py">class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self.make_layer(block, 64, layers[0])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(64, num_classes)

    def make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels),
            )
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x
</code></pre>
<p>其他的函数我们见得多了，但是出现了一个新的模块<code>make_layer()</code>。我们现在来逐个解释其中的含义。</p>
<pre><code class="language-py">downsample = None
if stride != 1 or self.in_channels != out_channels:
    downsample = nn.Sequential(
        nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
        nn.BatchNorm2d(out_channels),
    )
</code></pre>
<ul>
<li>这一步是用来下采样使用的。如果上一层的输出对不上这里的输入的话，我们必须通过中间加一个卷积层和归一化把输出调整成输入能够接受的形式。</li>
</ul>
<pre><code class="language-py">layers = []
layers.append(block(self.in_channels, out_channels, stride, downsample))
self.in_channels = out_channels
for _ in range(1, blocks):
    layers.append(block(self.in_channels, out_channels))
</code></pre>
<ul>
<li><code>layer</code>作为一个列表，存储的是<strong>层</strong>，即我们把每一个层都当作了一个元素存进了数组中。然后我们根据传递参数决定一层里面有几个残差块。</li>
</ul>
<pre><code class="language-py">return nn.Sequential(*layers)
</code></pre>
<ul>
<li>这个函数让这个层列表转换成了<code>pytorch</code>需要的格式。很方便的函数。</li>
</ul>
<h3 id="33-resnet函数">3.3 ResNet函数</h3>
<pre><code class="language-py">def ResNET8(num_classes=10):
    return ResNet(ResidualBlock, [2], num_classes)
</code></pre>
<p>这一步是为了程序更加方便的调用执行。这种写法可以发现，ResidualBlock是可以换成其他函数的。这样的协防让程序充满了可拓展性。</p>
<h3 id="34-训练函数">3.4 训练函数</h3>
<pre><code class="language-py">def train(model, train_loader, criterion, optimizer, num_epochs, epoch_losses_list, epoch_accuracy_list):
    print(&quot;-------START TRAINING-------&quot;)
    print(&quot;num_epochs =&quot;, num_epochs)
    print(&quot;model = ResNET8&quot;)
    for epoch in range(num_epochs):
        print(&quot;--------------&quot;)
        print(&quot;epoch&quot;, epoch+1)
        model.train()
        running_loss = 0.0
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)
        epoch_loss = running_loss / len(train_loader.dataset)
        print(f'Finish. Loss: {epoch_loss:.4f}')
        epoch_losses_list.append(epoch_loss)
        evaluate(model, test_loader, criterion, epoch_accuracy_list)
</code></pre>
<p>训练时几乎和CNN的时候一模一样。</p>
<p>不仅如此，接下来的几个函数也几乎都是和CNN相同的，因此不会再作进一步解释。</p>
<h3 id="35-检验函数">3.5 检验函数</h3>
<pre><code class="language-py">def evaluate(model, test_loader, criterion, epoch_accuracy_list):
    model.eval()
    test_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    test_loss = test_loss / len(test_loader.dataset)
    print(f'Test Loss= {test_loss:.4f}, Accuracy= {accuracy:.2f}%')
    epoch_accuracy_list.append(accuracy)
</code></pre>
<h3 id="36-画图函数">3.6 画图函数</h3>
<pre><code class="language-py">def plot_loss(losses):
    &quot;&quot;&quot;
    绘制损失值随 epoch 变化的图。

    参数:
    losses (list of float): 每个 epoch 的损失值平均值的列表。
    &quot;&quot;&quot;
    epochs = range(1, len(losses) + 1)  # 生成 epoch 的序列 [1, 2, 3, ..., len(losses)]
    plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

    plt.figure(figsize=(10, 5))  # 设置图形大小
    plt.plot(epochs, losses, 'b', label='训练loss曲线')  # 绘制损失值曲线，'b'表示蓝色线
    plt.title('loss随epoch变化曲线图')  # 图像标题
    plt.xlabel('Epochs次数')  # x轴标签
    plt.ylabel('Loss大小')  # y轴标签
    plt.legend()  # 显示图例
    plt.grid(True)  # 显示网格
    plt.show()  # 显示图像

def plot_accuracy(accuracy_list):
    &quot;&quot;&quot;
    绘制准确率随 epoch 变化的图。

    参数:
    accuracy_list (list of float)
    &quot;&quot;&quot;
    epochs = range(1, len(accuracy_list) + 1)  
    plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

    plt.figure(figsize=(10, 5))  # 设置图形大小
    plt.plot(epochs, accuracy_list, 'b', label='准确率曲线')  # 绘制损失值曲线，'b'表示蓝色线
    plt.title('准确率随epoch变化曲线图')  # 图像标题
    plt.xlabel('Epochs次数')  # x轴标签
    plt.ylabel('准确率')  # y轴标签
    plt.legend()  # 显示图例
    plt.grid(True)  # 显示网格
    plt.show()  # 显示图像
</code></pre>
<h3 id="37-主函数">3.7 主函数</h3>
<pre><code class="language-python">if __name__=='__main__':
    print(&quot;----------Program START----------&quot;)

    epoch_losses_list = []
    epoch_accuracy_list = []

    model = ResNET8(num_classes=10)
    print(&quot;ResNET8 ready.&quot;)

    transform = transforms.Compose(
        [transforms.RandomHorizontalFlip(),
        transforms.RandomCrop(32, padding=4),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

    print(&quot;start to download CIFAR-10...&quot;)
    train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
    print(&quot;CIFAR-10 ready.&quot;)

    test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False, num_workers=2)
    print(&quot;loader ready.&quot;)

    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    num_epochs = 20;
    train(model, train_loader, criterion, optimizer, num_epochs, epoch_losses_list, epoch_accuracy_list)
    print(&quot;--------------&quot;)
    print(&quot;Training Finish.&quot;)
    # 评估模型
    evaluate(model, test_loader, criterion, epoch_accuracy_list)
    plot_loss(epoch_losses_list)
    plot_accuracy(epoch_accuracy_list)
</code></pre>
<h2 id="四-结果展示">四、结果展示</h2>
<p>最后实验代码跑出来的结果如下。</p>
<pre><code>----------Program START----------
ResNET8 ready.
start to download CIFAR-10...
Files already downloaded and verified
CIFAR-10 ready.
Files already downloaded and verified
loader ready.
-------START TRAINING-------
num_epochs = 20
model = ResNET8
--------------
epoch 1
Finish. Loss: 1.4996
Test Loss= 1.4736, Accuracy= 46.91%
--------------
epoch 2
Finish. Loss: 1.2029
Test Loss= 1.3536, Accuracy= 51.32%
--------------
epoch 3
Finish. Loss: 1.0499
Test Loss= 1.0975, Accuracy= 60.77%
--------------
epoch 4
Finish. Loss: 0.9424
Test Loss= 1.0323, Accuracy= 62.77%
--------------
epoch 5
Finish. Loss: 0.8776
Test Loss= 1.1024, Accuracy= 61.99%
--------------
epoch 6
Finish. Loss: 0.8151
Test Loss= 0.8779, Accuracy= 69.67%
--------------
epoch 7
Finish. Loss: 0.7757
Test Loss= 0.8543, Accuracy= 70.06%
--------------
epoch 8
Finish. Loss: 0.7394
Test Loss= 0.8283, Accuracy= 71.33%
--------------
epoch 9
Finish. Loss: 0.7117
Test Loss= 0.8113, Accuracy= 71.74%
--------------
epoch 10
Finish. Loss: 0.6840
Test Loss= 0.8477, Accuracy= 70.78%
--------------
epoch 11
Finish. Loss: 0.6610
Test Loss= 0.7801, Accuracy= 72.98%
--------------
epoch 12
Finish. Loss: 0.6395
Test Loss= 0.7281, Accuracy= 75.32%
--------------
epoch 13
Finish. Loss: 0.6235
Test Loss= 0.6794, Accuracy= 75.95%
--------------
epoch 14
Finish. Loss: 0.6077
Test Loss= 0.6938, Accuracy= 75.80%
--------------
epoch 15
Finish. Loss: 0.5937
Test Loss= 0.7213, Accuracy= 75.36%
--------------
epoch 16
Finish. Loss: 0.5757
Test Loss= 0.6768, Accuracy= 76.34%
--------------
epoch 17
Finish. Loss: 0.5646
Test Loss= 0.6648, Accuracy= 77.30%
--------------
epoch 18
Finish. Loss: 0.5490
Test Loss= 0.6453, Accuracy= 77.94%
--------------
epoch 19
Finish. Loss: 0.5425
Test Loss= 0.6946, Accuracy= 76.63%
--------------
epoch 20
Finish. Loss: 0.5281
Test Loss= 0.6681, Accuracy= 77.64%
--------------
Training Finish.
Test Loss= 0.6695, Accuracy= 77.27%
</code></pre>
<p>绘制出的曲线如下。</p>
<figure data-type="image" tabindex="3"><img src="https://solita-net.github.io/post-images/1721310580443.png" alt="" loading="lazy"></figure>
<figure data-type="image" tabindex="4"><img src="https://solita-net.github.io/post-images/1721310586299.png" alt="" loading="lazy"></figure>
<p>可以看到，即使只有一个残差层，收敛速度也是很不错的，准确率也能飙到四分之三以上。</p>
<p>可惜的是，电脑性能不充足让我没办法做一个标准的<code>ResNET18</code>，效果应该会更佳。</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[深度学习中的欠拟合与过拟合]]></title>
        <id>https://solita-net.github.io/post/shen-du-xue-xi-zhong-qian-ni-he-yu-guo-ni-he-wen-ti/</id>
        <link href="https://solita-net.github.io/post/shen-du-xue-xi-zhong-qian-ni-he-yu-guo-ni-he-wen-ti/">
        </link>
        <updated>2024-07-18T12:40:21.000Z</updated>
        <content type="html"><![CDATA[<p>学习时间：2024.07.18<br>
学习来源：花书</p>
<hr>
<h2 id="11-定义">1.1 定义</h2>
<p>我们训练一个模型是为了期待<strong>训练误差能和泛化误差接近</strong>。</p>
<p>但是我们如果使用更复杂的模型和更少的样本时，很有可能泛化误差会增大。训练时，我们会对不断降低训练误差，但是很有可能泛化后模型表现依然然查。</p>
<ul>
<li>当训练误差和泛化误差都很大的时候，称为<strong>欠拟合</strong>。</li>
<li>当训练误差很小，但是泛化误差很大的时候，称为<strong>过拟合</strong>。</li>
</ul>
<h2 id="12-正则化技术">1.2 正则化技术</h2>
<p>正则化技术使用的前提：当我们拥有尽可能多的高质量数据，我们可以考正则化技术降低过拟合的可能性。</p>
<h2 id="13-权重衰减">1.3 权重衰减</h2>
<p>这是一种广泛使用的正则化技术。</p>
<p>其使用<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>L</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">L_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathdefault">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>范数乘一个超参数作为惩罚值，减在损失函数中，作为模型训练的损失函数。</p>
<ul>
<li>
<p>Q：为什么使用2范数而不是1范数？</p>
<p>A：1范数正则化线性回归称为<strong>套索回归</strong>，1范数带来的惩罚效果是会将权重集中在一小部分特征上而将其他权重清零，称为<strong>特征选择</strong>。而二范数正则化线性回归称为<strong>岭回归</strong>，其惩罚带来的效果是对大参数的压制，从而达到缩小拟合函数的体量的效果。</p>
</li>
</ul>
<h2 id="14-暂退法dropout">1.4 暂退法（Dropout）</h2>
<p>模型中的泛化能力和灵活性中的权衡称为<strong>偏差-方差均衡</strong>。</p>
<p>线性模型的灵活性很差（即具有很高的偏差），但是泛化能力很强，对不同的样本都可以得出相似的结果（即方差很小）。而神经网络是另一个极端，其具有极高灵活性的同时很有可能产生非常严重的过拟合（如果训练数据质量很差的话）。</p>
<p>我们期待“好”的预测模型能在未知的数据上有很好的表现。</p>
<p>而在经典泛化理论中，<strong>为了缩小训练和测试性能之间的差距，应该以简单的模型为目标。</strong></p>
<p>这里有一个已证明的结果：**具有输入噪声的训练等价于Tikhonov正则化 **。</p>
<p>因此可以在训练中对多层感知机施加噪声来达到增强平滑性的效果。</p>
<p>称为<strong>暂退法</strong>。</p>
<p>暂退法有效的一种直观理解是，过拟合的原因在于每一层依赖于前一层的激活值。使用暂退法施加噪声的时候，就会破坏这种依赖。</p>
<p>注入噪声的一种思路是杳然每一层的期望值等于没有噪声时的值。意味着注入噪声需要具有无偏性。</p>
<p>有两种实现</p>
<ul>
<li>一种是<strong>注入期望为0的高斯噪声</strong>。</li>
<li>另一种方法是直接<strong>训练时随机屏蔽某些层的输出</strong>（暂退法名称的由来）。</li>
</ul>
<p>如果要在代码中实现一个<code>dropout</code>层，一种好的方法是先生成一串和输入数量一样的列表，然后比较其和传入参数的大小，小于这个参数就屏蔽该层输出。</p>
<pre><code class="language-py">def dropout_layer(X, dropout):
    assert 0 &lt;= dropout &lt;= 1
    # 在本情况中，所有元素都被丢弃
    if dropout == 1:
        return torch.zeros_like(X)
    # 在本情况中，所有元素都被保留
    if dropout == 0:
        return X
    mask = (torch.rand(X.shape) &gt; dropout).float()
    return mask * X / (1.0 - dropout)
</code></pre>
<p>其中最后两行代码的含义如下。</p>
<ul>
<li>
<p><code>torch.rand(X.shape)</code>：生成一个与输入张量<code>X</code>形状相同的张量，每个元素都是介于0到1之间的均匀分布的随机数。</p>
</li>
<li>
<p><code>&gt; dropout</code>：生成一个布尔张量，其中元素值大于<code>dropout</code>的元素为True，否则为False。</p>
</li>
<li>
<p><code>.float()</code>：将布尔张量转换为浮点数张量，其中True变为1.0，False变为0.0。</p>
</li>
<li>
<p><code>mask * X</code>：将掩码<code>mask</code>与输入张量<code>X</code>逐元素相乘，这相当于随机将输入张量<code>X</code>中的部分元素设为0（即“丢弃”这些元素）。</p>
</li>
<li>
<p><code>/ (1.0 - dropout)</code>：对丢弃后的张量进行缩放，以保持输入张量的期望值不变。由于<code>(1.0 - dropout)</code>是剩余元素的比例，因此通过除以这个值，可以使得输出张量的期望值与未应用dropout前的期望值一致。</p>
</li>
</ul>
<p>（当然也可以直接调Dropout这个高级API里有的函数）</p>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[矩阵求导]]></title>
        <id>https://solita-net.github.io/post/ju-zhen-qiu-dao/</id>
        <link href="https://solita-net.github.io/post/ju-zhen-qiu-dao/">
        </link>
        <updated>2024-07-18T12:38:07.000Z</updated>
        <content type="html"><![CDATA[<p>学习时间：2024.07.18</p>
<hr>
<p>参考文档</p>
<p><a href="https://zhuanlan.zhihu.com/p/263777564">矩阵求导——本质篇（知乎）</a></p>
<p><a href="https://zhuanlan.zhihu.com/p/273729929">矩阵求导——基础篇（知乎）</a></p>
<p><a href="https://zhuanlan.zhihu.com/p/288541909">矩阵求导——进阶篇（知乎）</a></p>
<p>矩阵求导有分子布局和分母布局两种形式。一般采用<strong>分母布局</strong>。</p>
<ul>
<li>对一个向量求偏导时，使用分母布局得到的形式称为<strong>梯度向量形式</strong>。（向量是列向量形式）<p class='katex-block'><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi mathvariant="normal">∇</mi><mi>x</mi></msub><mi>f</mi><mrow><mo fence="true">(</mo><mi mathvariant="bold-italic">x</mi><mo fence="true">)</mo></mrow><mtext> </mtext><mtext> </mtext><mo>=</mo><mtext> </mtext><mtext> </mtext><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi><mrow><mo fence="true">(</mo><mi mathvariant="bold-italic">x</mi><mo fence="true">)</mo></mrow></mrow><mrow><mi mathvariant="normal">∂</mi><mi mathvariant="bold-italic">x</mi></mrow></mfrac><mo>=</mo><msup><mrow><mo fence="true">[</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mn>1</mn></msub></mrow></mfrac><mo separator="true">,</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mn>2</mn></msub></mrow></mfrac><mo separator="true">,</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo separator="true">,</mo><mfrac><mrow><mi mathvariant="normal">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mi>n</mi></msub></mrow></mfrac><mo fence="true">]</mo></mrow><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">\nabla _xf\left( \boldsymbol{x} \right) \,\,=\,\,\frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2},..., \frac{\partial f}{\partial x_n} \right] ^T
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord">∇</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">x</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;">(</span><span class="mord"><span class="mord"><span class="mord boldsymbol">x</span></span></span><span class="mclose delimcenter" style="top:0em;">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span></span><span class="base"><span class="strut" style="height:2.113em;vertical-align:-0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord"><span class="mord boldsymbol">x</span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;">(</span><span class="mord"><span class="mord"><span class="mord boldsymbol">x</span></span></span><span class="mclose delimcenter" style="top:0em;">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.631261em;vertical-align:-0.95003em;"></span><span class="minner"><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714399999999998em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714399999999998em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">.</span><span class="mord">.</span><span class="mord">.</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714399999999998em;"><span style="top:-2.3139999999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathdefault">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord" style="margin-right:0.05556em;">∂</span><span class="mord mathdefault" style="margin-right:0.10764em;">f</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8360000000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:1.681231em;"><span style="top:-3.9029000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span></span></p>
</li>
</ul>
<p>分子布局和分母布局的结果是互为转置的关系。</p>
<ul>
<li>复合函数的微分<p class='katex-block'><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo fence="true">{</mo><mtable><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mfrac><mrow><mi>d</mi><mtext> </mtext><mtext> </mtext><mi>f</mi></mrow><mrow><mi>d</mi><mi mathvariant="bold-italic">X</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi>d</mi><msup><mi mathvariant="bold-italic">Y</mi><mi>T</mi></msup></mrow><mrow><mi>d</mi><mi mathvariant="bold-italic">X</mi></mrow></mfrac><mfrac><mrow><mi>d</mi><mtext> </mtext><mtext> </mtext><mi>f</mi></mrow><mrow><mi>d</mi><mi mathvariant="bold-italic">Y</mi></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mfrac><mrow><mi>d</mi><mtext> </mtext><mtext> </mtext><mi>f</mi></mrow><mrow><mi>d</mi><mi mathvariant="bold-italic">X</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi>d</mi><mtext> </mtext><mtext> </mtext><mi>f</mi></mrow><mrow><mi>d</mi><msup><mi mathvariant="bold-italic">Y</mi><mi>T</mi></msup></mrow></mfrac><mfrac><mrow><mi>d</mi><mi mathvariant="bold-italic">Y</mi></mrow><mrow><mi>d</mi><msup><mi mathvariant="bold-italic">X</mi><mi>T</mi></msup></mrow></mfrac></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{cases}
	\frac{d\,\,f}{d\boldsymbol{X}}=\frac{d\boldsymbol{Y}^T}{d\boldsymbol{X}}\frac{d\,\,f}{d\boldsymbol{Y}}\\
	\frac{d\,\,f}{d\boldsymbol{X}}=\frac{d\,\,f}{d\boldsymbol{Y}^T}\frac{d\boldsymbol{Y}}{d\boldsymbol{X}^T}\\
\end{cases}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:3.0000299999999998em;vertical-align:-1.25003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size4">{</span></span><span class="mord"><span class="mtable"><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7046824999999997em;"><span style="top:-3.7046824999999997em;"><span class="pstrut" style="height:3.037365em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.07778em;">X</span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0373649999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.07778em;">X</span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.25555em;">Y</span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9190928571428572em;"><span style="top:-2.931em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.25555em;">Y</span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span style="top:-2.2646824999999997em;"><span class="pstrut" style="height:3.037365em;"></span><span class="mord"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.07778em;">X</span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9322159999999999em;"><span style="top:-2.615058em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.25555em;">Y</span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8213457142857143em;"><span style="top:-2.833252857142857em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mspace mtight" style="margin-right:0.19516666666666668em;"></span><span class="mord mathdefault mtight" style="margin-right:0.10764em;">f</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.38494199999999995em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8801079999999999em;"><span style="top:-2.615058em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.07778em;">X</span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8213457142857143em;"><span style="top:-2.833252857142857em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.394em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">d</span><span class="mord mtight"><span class="mord mtight"><span class="mord boldsymbol mtight" style="margin-right:0.25555em;">Y</span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.38494199999999995em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2046825em;"><span></span></span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></p>
</li>
</ul>
]]></content>
    </entry>
    <entry>
        <title type="html"><![CDATA[README]]></title>
        <id>https://solita-net.github.io/post/ce-shi/</id>
        <link href="https://solita-net.github.io/post/ce-shi/">
        </link>
        <updated>2024-07-17T14:06:55.000Z</updated>
        <summary type="html"><![CDATA[<p>该网站目的是保存我的学习记录，并方便自己与他人查阅。</p>
<p>每周三我会发布一篇学习报告，命名格式为<strong>学习记录 #【序号】</strong></p>
<p>同时我会学到新东西之后就记录下来上传一个小文档，方便个人查阅。内容是每周学习报告的一部分。</p>
]]></summary>
        <content type="html"><![CDATA[<p>该网站目的是保存我的学习记录，并方便自己与他人查阅。</p>
<p>每周三我会发布一篇学习报告，命名格式为<strong>学习记录 #【序号】</strong></p>
<p>同时我会学到新东西之后就记录下来上传一个小文档，方便个人查阅。内容是每周学习报告的一部分。</p>
<!-- more -->
<blockquote>
<p>相当于学习报告是一份方便检查的学习文档，而小断章是供个人查阅的</p>
</blockquote>
]]></content>
    </entry>
</feed>