> 2021年05月20日信息消化 ### 每天学点机器学习 #### 机器学习 Week5 ##### Cost Function 让我们首先定义几个我们需要使用的变量: > Let's first define a few variables that we will need to use: - L = total number of layers in the network - $s_l$= number of units (not counting bias unit) in layer l - K = number of output units/classes 回顾一下,在神经网络中,我们可能有许多输出节点。我们把$h_Θ(x)_k$表示为导致$k^{th}$输出的假说。我们的神经网络成本函数将是我们用于逻辑回归的函数的概括。回顾一下,正则化逻辑回归的成本函数是: > Recall that in neural networks, we may have many output nodes. We denote h_\Theta(x)_k*h*Θ(*x*)*k* as being a hypothesis that results in the k^{th}*k**t**h* output. Our cost function for neural networks is going to be a generalization of the one we used for logistic regression. Recall that the cost function for regularized logistic regression was: $$ J(\theta) = -\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]+\frac{λ}{2m}\sum_{h=1}^n\theta_j^2 $$ 对于神经网络来说,它将会稍微复杂一些: > For neural networks, it is going to be slightly more complicated: $$ J(\theta) =-\frac{1}{m}\sum_{i=1}^m\sum_{k=1}^K[y_k^{(i)}log((h_\theta(x^{(i)}))_k)+(1-y_k^{(i)})log(1-(h_\theta(x^{(i)}))_k)]+\frac{λ}{2m}\sum_{l=1}^{L-1}\sum_{i=1}^{sl}\sum_{j=1}^{sl+1} $$ 我们增加了一些嵌套求和,以考虑到我们的多个输出节点。在方程的第一部分,在方括号之前,我们有一个额外的嵌套求和,循环计算输出节点的数量。 > We have added a few nested summations to account for our multiple output nodes. In the first part of the equation, before the square brackets, we have an additional nested summation that loops through the number of output nodes. 在正则化部分,在方括号之后,我们必须考虑到多个Theta矩阵。我们当前theta矩阵中的列数等于我们当前层的节点数(包括偏置单元)。我们当前θ矩阵中的行数等于下一层的节点数(不包括偏置单元)。与之前的逻辑回归一样,我们对每个项进行平方。 > In the regularization part, after the square brackets, we must account for multiple theta matrices. The number of columns in our current theta matrix is equal to the number of nodes in our current layer (including the bias unit). The number of rows in our current theta matrix is equal to the number of nodes in the next layer (excluding the bias unit). As before with logistic regression, we square every term. 双重和仅仅是将输出层中每个单元计算的逻辑回归成本相加。 三重和简单地将整个网络中所有单个Θ的平方加起来。 三重和中的i并不是指训练实例i the double sum simply adds up the logistic regression costs calculated for each cell in the output layer the triple sum simply adds up the squares of all the individual Θs in the entire network. the i in the triple sum does not refer to training example i ##### 随堂小测  ##### Backpropagation Algorithm "反向传播 "是神经网络术语,用于最小化我们的成本函数,就像我们在逻辑回归和线性回归中的梯度下降一样。我们的目标是计算: > "Backpropagation" is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression. Our goal is to compute: $min_\theta J(\theta)$ 也就是说,我们希望使用theta中的一组最佳参数来最小化我们的成本函数J。在这一节中,我们将看一下我们用来计算J(Θ)的偏导的方程式。 > That is, we want to minimize our cost function J using an optimal set of parameters in theta. In this section we'll look at the equations we use to compute the partial derivative of J(Θ):  Given training set ${(x^{1}, y^{1})...(x^{m}, y^{m})}$ - Set $\Delta^{(l)}_{i,j}:= 0$ for all (l,i,j), (hence you end up having a matrix full of zeros) For training example t =1 to m: 1. Set $a^{(1)} := x^{(t)}$ 2. Perform forward propagation to compute $a^{(l)}$ for l=2,3,…,L  3. Using $y^{(t)}$, compute $\delta^{(L)} = a^{(L)} - y^{(t)}$ 其中L是我们的总层数,$a^{(L)}$是最后一层的激活单元的输出向量。因此,我们最后一层的 "误差值 "只是我们在最后一层的实际结果与y中的正确输出之间的差异。 Where L is our total number of layers and $a^{(L)}$ is the vector of outputs of the activation units for the last layer. So our "error values" for the last layer are simply the differences of our actual results in the last layer and the correct outputs in y. To get the delta values of the layers before the last layer, we can use an equation that steps us back from right to left: 4. Compute $\delta^{(L-1)}, \delta^{(L-2)},\dots,\delta^{(2)}\ \ using\ \ \delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)})$ 第l层的delta值是通过将下一层的delta值与第l层的theta矩阵相乘来计算的。然后我们将其与一个叫做g'的函数,或g-prime的函数相乘,这是激活函数g的导数,用$z^{l}$给出的输入值来评估。 The delta values of layer l are calculated by multiplying the delta values in the next layer with the theta matrix of layer l. We then element-wise multiply that with a function called g', or g-prime, which is the derivative of the activation function g evaluated with the input values given by $z^{l}$. The g-prime derivative terms can also be written out as:  其他参考资料:[バックプロパゲーション(誤差逆伝搬法)をイチから理解する](https://imagingsolution.net/deep-learning/backpropagation/) ### 每天学点Golang #### Go Builder Pattern — The Functional Way 原文:[Go Builder Pattern — The Functional Way](https://devcharmander.medium.com/go-builder-pattern-the-functional-way-e40f347017ce) ```go //RunFunctionalBuilder example func RunFunctionalBuilder() { e := NewEmployeeBuilder() employee := e.Called("Surya").WorksFor("IBM").At("Bangalore").Build() fmt.Println(employee) } ``` ##### Explanation 1. 我们为生成器结构添加一个属性,该属性接收一个动作数组。(第16行) 2. 每个构建器的成员函数都向上述属性添加一个动作。(第20-41行) 3. `Build()`函数遍历所有的动作,并在`Employee`对象上执行它。(第44-50行) 4. We add a property to the Builder struct that takes an array of actions. (line-16) 5. Every builder’s member function adds an action to the above property. (line 20–41) 6. The `Build()` function iterates through all the actions and executes it on an `Employee` object. (line 44–50) ### 9 Companies That Use Rust in Production 原文:[9 Companies That Use Rust in Production](https://blog.devgenius.io/9-companies-that-use-rust-in-production-9b8f6634b7b4) - Dropbox:Dropbox uses Rust for parts of its file synchronization engine. - Coursera: Coursera uses Rust for their programming assignments feature where students need to write and run a computer program to solve a problem. - Figma: Figma is a collaborative web-based design tool for vector graphics and interface prototyping. - npm:Its engineering team chose to rewrite their main service in Rust because they saw that the service’s performance would soon be a bottleneck if user growth kept up. - Microsoft:Microsoft has recently been experimenting with integrating Rust into its large C/C++ codebases. - Cloudflare:Cloudflare uses Rust in their core edge logic and as a replacement for C, which is memory-unsafe. - Facebook:Facebook used Rust to rewrite its source control backend - Amazon:AWS has used Rust for performance-sensitive components of services like Lambda, EC2, and S3. In addition, the company openly supports and sponsors the development of the language and its ecosystem. - Discord:Discord uses Rust in multiple places of their codebase, both on the client- and the server-side. 在大多数这些公司中,Rust的功能是作为C语言的一个严格意义上的更好的替代品--你可以看到一个明显的模式,即用Rust进行重写以避免性能下降。当团队需要额外的性能但又想避免与C语言相关的内存问题时,他们就会使用Rust。 > In most of these companies, Rust functions as a strictly better alternative for C — you can see a visible pattern of rewrites done in Rust to escape performance degradation. Teams reach for it when they need extra performance but want to avoid memory issues associated with C. ### 一点收获 - No is a decision. Yes is a responsibility - "治疗悲伤的最好方法......是学习一些东西。这是唯一不会失败的事情。你可能变老,在解剖中颤抖,你可能晚上躺在床上听你的静脉紊乱,你可能错过你唯一的爱,你可能看到你周围的世界被邪恶的疯子破坏,或者知道你的荣誉在卑鄙的思想的下水道中被践踏。那么只有一件事可以做--学习。学习为什么世界会摇摆,以及什么会摇摆。这是唯一一件心灵永远不会穷尽、永远不会疏远、永远不会被折磨、永远不会恐惧或不信任、永远不会梦想着后悔的事情。" 来源:《曾经和未来的国王》。曾经和未来的国王 > **"The best thing for being sad… is to learn something.** That is the only thing that never fails. You may grow old and trembling in your anatomies, you may lie awake at night listening to the disorder of your veins, you may miss your only love, you may see the world about you devastated by evil lunatics, or know your honor trampled in the sewers of baser minds. There is only one thing for it then — to learn. Learn why the world wags and what wags it. That is the only thing which the mind can never exhaust, never alienate, never be tortured by, never fear or distrust, and never dream of regretting." > > Source: The Once and Future King