Digest | 机器学习:重新理解梯度下降，SaaS技术栈

>  2021年04月27日信息消化

### 每天学点机器学习

#### 深入浅出--梯度下降法及其实现

原文：简书|https://www.jianshu.com/p/c7e642877b0e

自己在cousera机器学习的课程中，对梯度下降的微分（calculus）部分一直理解不到位，发现这边文章把计算过程解释的很清晰。

几个微分的例子：

![yFP0mf](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/uPic/yFP0mf.jpg)

##### 梯度

梯度实际上就是多变量微分的一般化。

![GUT0pE](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/uPic/GUT0pE.jpg)

我们可以看到，梯度就是分别对每个变量进行微分，然后用逗号分割开，梯度是用<>包括起来，说明梯度其实一个向量。

梯度是微积分中一个很重要的概念，之前提到过梯度的意义

- 在单变量的函数中，梯度其实就是函数的微分，代表着函数在某个给定点的切线的斜率
- 在多变量函数中，梯度是一个向量，向量有方向，梯度的方向就指出了函数在给定点的上升最快的方向

![6DJLMe](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/uPic/6DJLMe.jpg)

##### 梯度下降算法的实例

###### 单变量函数的梯度下降

假设有一个单变量函数$J(\theta)=\theta^2$，微分是$J'(\theta)=2\theta$，初始化起点为$\theta^0=1$, 学习率为0.4

套入梯度下降的迭代计算公式

$θ^0=1$

$θ^1=θ^0-\alpha*J'(θ^0) = 1-0.4*2=0.2$

$θ^2=θ^1-\alpha*J'(θ^1) = 0.04$

$θ^3=0.008$

$θ^4=0.00016$

如图，经过四次的运算，也就是走了四步，基本就抵达了函数的最低点，也就是山底

![H3pAZW](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/uPic/H3pAZW.jpg)

##### 多变量函数的梯度下降

假设目标函数$J(\theta)=\theta_1^2+\theta_2^2$, 假设初始起点为(1,3), 学习率为0.1

函数梯度为$\Delta{J(\theta)}=<2\theta_1,2\theta_2>$

进行多次迭代:

$\theta^0=(1,3)$

$\theta^1=\theta_0-\alpha\Delta{J(\theta)}=(1,3)-0.1(1.6,4.8)=(0.8,2.4)$

$\theta^2=(0.8,2.4)-0.1(1.6,4.8)=(0.64,1.92)$

$\theta^3=(0.512,1.536)$

....

$\theta^{10}=(0.10737182,0.3221225472)$

...

$\theta^{100}=(1.62963878e^{-10},4.88888e^{-10})$

我们发现，已经基本靠近函数的最小值点:

![ODx2cc](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/uPic/ODx2cc.jpg)

※附录有用python实现的例子

### 其他值得阅读

#### 产品战略栈

原文：[The Product Strategy Stack](https://www.reforge.com/blog/the-product-strategy-stack)

确定优先次序的困难往往是一个战略问题，而不是一个执行问题。当关于如何进行优先排序的指导意见缺失、不明确或与你所要做的事情脱节时，就不可能做出严格的优先排序决定。

> Difficulty prioritizing is often a strategy issue, not an execution issue. It is impossible to make rigorous prioritization decisions when the guidance on how to do so is missing, unclear or disconnected from what you are trying to do.

战略上的差距使团队更难执行。这就导致了机会的丧失。当战略被清楚地定义、沟通并与公司的使命和日常工作相联系时，执行就会变得容易得多。

> Gaps in strategy make it harder for teams to execute. This results in loss of opportunities. Execution becomes much easier when the strategy is clearly defined, communicated and connected to the company's mission and its day-to-day work.

##### 什么是产品战略栈?

为了诊断和解决这些问题，我们需要能够追踪到问题的源头。要做到这一点，我们不能把 "战略 "看作是一些无定形的、包罗万象的概念。相反，企业应该把使命、战略、路线图和目标之间的关系看作是一堆不同的概念。

> In order to diagnose and fix these issues, we need to be able to track the issues back to the source. To do this, we can't think about "strategy" as some amorphous, all-encompassing concept. Instead, companies should think about the relationship between mission, strategy, roadmap, and goals as a stack of distinct concepts:

- **Company Mission** - The world your company sees and the change it wants to bring to that world.
- **Company Strategy** - The logical plan you have to bring your company’s mission into being.
- **Product Strategy** - The logical plan for how the product will drive its part of the company strategy.
- **Product Roadmap** - The [sequence of features](https://www.reforge.com/blog/product-work-beyond-product-market-fit) that implement the Product Strategy.
- **Product Goals** - The quarterly and day-to-day outcomes of the Product Roadmap that [measure progress](https://www.reforge.com/blog/north-star-metric-growth) against the Product Strategy.

![The Product Strategy Stack.png](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/The%2BProduct%2BStrategy%2BStack.png)

重要的是，堆栈的每一层都建立在前一层的基础上。换句话说，每一层都是后一层的前提条件。如果不了解公司的使命，我们就不可能有公司的战略。如果不知道我们的产品战略，我们就不能有产品目标。鉴于各层之间的这种关系，产品战略起到了关键作用--它是公司目标和产品团队的产品交付工作之间的连接组织。

> Importantly, each layer of the stack builds on the previous layer. Put another way, each layer is a prerequisite for the successive layer. We cannot have a company strategy without knowing our company's mission. We cannot have product goals without knowing our product strategy. Given this relationship between the layers, Product Strategy serves a critical role—it is the connective tissue between the objectives of the company and the product delivery work of the product team.

#### One-Man SaaS技术栈

原文：[The Tech Stack of a One-Man SaaS](https://panelbear.com/blog/tech-stack/)

作为一个自筹资金的个人创始人，我相信专注于自动化是我能够为来自80多个国家的客户提供可靠服务的原因，并继续每周运送新功能。

> As a self-funded solo founder, I believe that focusing on automation is how I've been able to provide a reliable service to customers from more than 80 countries, and continue to ship new features on a weekly basis.

##### Frameworks and libraries

- [Django](https://www.djangoproject.com/), [React](https://reactjs.org/), [NextJS](https://nextjs.org/)
- [Celery](https://docs.celeryproject.org/): I use it for any kind of **background/scheduled tasks**. It does have a learning curve for more advanced use-cases, but it's quite reliable once you understand how it works, and more importantly when it fails.

##### Database

- [Clickhouse](https://clickhouse.tech/): I believe this is one of those technologies that over time will become ubiquitous. It's honestly a fantastic piece of software that enabled me to build features that initially seemed impossible on low-cost hardware. I do intend to write a future blog post on some lessons learned from running Clickhouse on Kubernetes. So stay tuned!
- [PostgreSQL](https://www.postgresql.org/): My go-to relational database. Sane defaults, battle-tested, and deeply integrated with Django. For Panelbear, I use it for all application data that is not analytics related. For the analytics data, I instead wrote a simple interface for querying Clickhouse within Django.
- [Redis](https://redis.io/): I use it for many things: caching, rate-limiting, as a task queue, and as a key/value store with TTL for various features. Rock-solid, and great documentation.

##### Deployment

我把我的基础设施当作牛而不是宠物，像服务器和集群这样的东西是要来来去去的。因此，如果一台服务器 "生病 "了，我就用另一台来代替它。这意味着所有的东西都被描述为git repo中的代码，而我不会通过SSH进入服务器来改变东西。你可以把它看作是一个模板，用一个命令把我的整个基础设施克隆到任何AWS地区/环境。

> I treat my infrastructure as cattle instead of pets, things like servers and clusters are meant to come and go. So if one server gets "sick", I just replace it with another one. That means everything is described as code in a git repo, and I do not change things by SSH'ing into the servers. You can think of it like a template to clone my entire infrastructure with one command into any AWS region/environment.

这也有助于我在灾难恢复的情况下。我只需运行几个命令，几分钟后，我的堆栈就被重新创建了。当我从DigitalOcean搬到Linode，最近又搬到AWS时，这特别有用。所有的东西都是用代码描述的，所以很容易跟踪我拥有的组件，即使是多年以后（所有的公司都有一些潜伏的AWS IAM策略或VPC子网，是通过在用户界面上点击创建的，现在每个人都依赖它）。

> This also helps me in case of disaster recovery. I just run a few commands, and some minutes later my stack has been re-created. This was particularly useful when I moved from DigitalOcean, to Linode, and recently to AWS. Everything is described in code, so it's easy to keep track of what components I own, even years later (all companies have some AWS IAM policy or VPC subnet lurking around which was created via clicky-clicky on the UI, and now everyone depends on it).

- [Terraform](https://www.terraform.io/): I manage most of my cloud infrastructure with Terraform. Things like EKS clusters, S3 buckets, roles, and RDS instances are declared in my Terraform manifests. The state is synced to an encrypted S3 bucket to avoid getting in trouble in case something happens to my development laptop.
- [Docker](https://www.docker.com/): I build everything as Docker images. Even stateful components like Clickhouse or Redis are packaged and shipped as Docker containers to my cluster. It also makes my stack very portable, as I can run it anywhere I can run Docker.
- [Kubernetes](https://kubernetes.io/): Allowed me to simplify the operational aspects tremendously. However, I wouldn’t bindly recommend it to everyone, as I already felt comfortable working with it after having the pleasure of putting down multiple production fires for my employer over the years. I also rely on managed offerings, which helps reduce the burden too.
- [GitHub Actions](https://github.com/features/actions): Normally I’d use [CircleCI](https://circleci.com/) in the past (which is also great), but for this project I prefer to use GitHub Actions as it removes yet another service which needs to have access to my repositories, and deployment secrets. However, CircleCI has plenty of good features, and I still recommend it.

##### Infrastructure

- [AWS](https://aws.amazon.com/): Predictable, and lots of managed services. However, I use it at my full-time job, so I didn't have to spend too much time figuring things out. The main services I use are EKS, ELB, S3, RDS, IAM and private VPCs. I might also add Cloudfront and Kinesis in the future.
- [Cloudflare](https://www.cloudflare.com/): I mainly use it for DDoS protection, serving DNS, and offloading edge caching of various static assets (currently shaves off 80% of the egress charges from AWS - their bandwidth pricing is insane!).
- [Let’s Encrypt](https://letsencrypt.org/): Free SSL certificate authority. I use cert-manager in my Kubernetes cluster to automatically issue and renew certificates based on my ingress rules.
- [Namecheap](https://www.namecheap.com/): My domain name registrar of choice. Allows MFA for login which is an important security feature. Unlike other registrars, they haven't surprised me with an expensive renewal every few years. I like them.

##### Kubernetes components

- [ingress-nginx](https://github.com/kubernetes/ingress-nginx/): Rock-solid ingress controller for Kubernetes using NGINX as a reverse proxy, and load balancer. Sits behind the NLB which controls ingress to the cluster nodes.
- [cert-manager](https://github.com/jetstack/cert-manager): Automatically issue/renew TLS certs as defined in my ingress rules.
- [external-dns](https://github.com/kubernetes-sigs/external-dns): Synchronizes exposed Kubernetes Services and Ingresses with DNS providers (such as Cloudflare).
- [prometheus-operator](https://github.com/prometheus-operator/prometheus-operator): Automatically monitors most of my services, and exposes dashboards via Grafana.
- [flux](https://fluxcd.io/): GitOps way to do continuous delivery in Kubernetes. Basically pulls and deploys new Docker images when I release them.

##### CLI tools

- [kubectl](https://kubernetes.io/): To interact with the Kubernetes cluster to watch logs, pods and services, SSH into a running container, and so on.
- [stern](https://github.com/wercker/stern): Multi pod log tailing for Kubernetes. Really handy.
- [htop](https://htop.dev/): Interactive system process viewer. Better than “top” if you ask me.
- [cURL](https://curl.se/): Issue HTTP requests locally, inspect headers.
- [HTTPie](https://httpie.io/): Like cURL, but simpler for JSON APIs.
- [hey](https://github.com/rakyll/hey): Load testing HTTP endpoints. Gives a nice latency distribution summary.

##### Monitoring

Update: If there's something that should never be down is the monitoring/alerting system. For my peace of mind and to reduce operational complexity, I migrated my monitoring system outside my AWS region, and decided to use a hosted service:

- [New Relic](https://newrelic.com/): I now use New Relic to monitor application metrics instead of a self-hosted Prometheus/Grafana. Think things like HTTP requests, latencies, event buffer sizes, and so on. I use New Relic's Prometheus adapter so I only need to expose a `/metrics` endpoint on my services and metrics get forwarded automatically.
- [Sentry](https://sentry.io/): Application exception monitoring and aggregation. Notifies when there are unhandled errors with additional metadata.

I previously hosted a monitoring stack inside my cluster. For reference, this was:

- [Prometheus](https://prometheus.io/): Efficient storage of time series data for monitoring. Tracks all the cluster and app metrics. It was a lot cheaper than using Cloudwatch for app metrics.
- [Grafana](https://grafana.com/): Nice dashboards for the Prometheus monitoring data. All dashboards are described in JSON files and versioned in the git repo.
- [Loki](https://grafana.com/oss/loki/): Log aggregation system inspired by Prometheus. It’s bundled with the prometheus-operator, and helps me search logs across the cluster.

##### Email

- [Fastmail](https://www.fastmail.com/): My choice of business email. Good, and reliable.
- [Postmark](https://postmarkapp.com/): I use it for transactional emails (email verification, weekly reports, login security alerts, password reset, and so on). Their email delivery rates are great, and the tooling/mobile app is top-notch.

##### Other

- [Panelbear](https://panelbear.com/?ref=blog-tech-stack): Of course what better tool to track Panelbear's website analytics than Panelbear itself :) The benefits of dogfooding are real, as I am my own customer.
- [Healthchecks.io](https://healthchecks.io/): Notifies me via email/whatsapp when a scheduled job doesn't run. It's also a bootstrapped SaaS, and I'm very happy to recommend it as I've used it for several years.
- [Trello](https://trello.com/): I use it to keep track of issues/requests/ideas and what-not.
- [Figma](https://www.figma.com/): Replaced [Sketch](https://www.sketch.com/) as my go-to tool for making quick mockups, banners, and illustrations for the landing pages.

#### 我用于运行我的SaaS的工具和服务

原文：[Tools and Services I Use to Run My SaaS](https://jake.nyc/words/tools-and-services-i-use-to-run-my-saas/)

##### Languages

- [TypeScript](https://www.typescriptlang.org/) — A superset of JavaScript with static types. I fell in love with this language when I first tried it a few years ago, and it’s only gotten better since then. I use it for both the frontend and server-side code.
- [SCSS](https://sass-lang.com/) — A superset of CSS with cool features like mixins and nesting. I use it for both the marketing site and — with [CSS modules](https://github.com/css-modules/css-modules) — the web app.

##### Build Systems and Frameworks

- [React](https://reactjs.org/) — The frontend is a single-page app built with React, which has been a great choice — it gets out of the way and lets me just work on the app. There might be other frameworks out there that are quote unquote better, but the sheer size of the React community means I basically never run into any uncharted territory.
- [Create React App](https://create-react-app.dev/) — A batteries-included build system for React. The first time I made a React app, I cobbled together the configuration for Webpack/Babel/etc myself. Create React App hides all that — with the option to “eject” and get the full configuration if I ever need to do something it doesn’t support.
- [Express](https://expressjs.com/) — A Node.js server-side framework. Just like with React, I chose it because it’s the most popular by far. As for why I chose Node.js rather than another language, it’s because I need to run the same video rendering code in the browser and on the server.
- [Hugo](https://gohugo.io/) — A fast static site generator written, used to build the marketing site.

##### Libraries

This list is also not exhaustive, but there are too many libraries to name them all here, so I’ve tried to keep it to the most notable or interesting ones.

- [immer](https://immerjs.github.io/immer/docs/introduction) — An intuitive, performant way to do immutability. It can also serialize patches you make to an object, so it’s fairly easy to implement undo functionality as well.
- [downshift](https://www.downshift-js.com/) — A React library for building accessible dropdowns and multi-selects. Easy to style, and you get accessibility for free.
- [popper](https://popper.js.org/) — A nifty little tooltip positioning library.
- [node-canvas](https://github.com/Automattic/node-canvas) — Node library for using the [canvas API](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API) on the server.
- [ffmpeg](https://www.ffmpeg.org/) — Swiss army knife audio and video library. I use this to on the server convert audio files to WAV and to combine individual frames into users' videos.

##### Deploying

I’ve previously written about how I deploy SongRender, so you can [read about it in more depth](https://jake.nyc/words/bluegreen-deploys-and-immutable-infrastructure-with-terraform/) if you’re interested. If you just want the high level, these are the tools involved.

- [Terraform](https://www.terraform.io/) — An “infrastructure as code” tool, where you describe the infrastructure you want and it diffs with the infrastructure that actually exists. Kind of like React, but for configuration management. I love Terraform, and I use it to manage any infrastructure it supports.
- [Packer](https://www.packer.io/) — A tool for building machine images. This makes it easy to deploy servers with Terraform.

##### Development

- [VS Code](https://code.visualstudio.com/) — Not quite a text editor, not quite an IDE. Probably the best Electron app I’ve ever used. I keep checking out [Nova](https://nova.app/) out of a love for all things Panic, but it’s not quite there yet.
- [Postgres.app](https://postgresapp.com/) — Simple free local PostgreSQL server for macOS.
- [dbmate](https://github.com/amacneil/dbmate) — Language- and database-agnostic migration tool.
- [Gitlab](https://gitlab.com/) — Source code hosting and versioning.
- [Prettier](https://prettier.io/) — Code formatter for JavaScript, HTML and CSS. If you’ve never used a code formatter before, do it now. It’ll change your life.
- [Jest](https://jestjs.io/) — Test runner for JavaScript. Bundled with Create React App.
- [Yarn](https://yarnpkg.com/) — Alternative package manager for Node.
- [Make](https://en.wikipedia.org/wiki/Make_(software)) — SongRender doesn’t need to be compiled, so this is just a task runner. [Self-documenting Makefile snippets](https://www.thapaliya.com/en/writings/well-documented-makefiles/) are super helpful.

##### Debugging

- [Postico](https://eggerapps.at/postico/) — Great indie Mac app for querying Postgres databases.
- [Insomnia](https://insomnia.rest/) — HTTP client. I use this sometimes when I’m working on the API and don’t want to worry about the browser. I also use it as a very crude admin dashboard: the API has a few admin endpoints for things like retrying a failed render, which I hit directly from Insomnia. [Paw](https://paw.cloud/) is a non-Electron alternative that I’ll spend the $50 on at some point.
- [Transmit](https://www.panic.com/transmit/) — File transfer app. I use this whenever I need to poke around object storage, since it’s much easier than using DigitalOcean’s web-based file browser.

##### Retired

Not every relationship was meant to live forever. These are all the tools that I’ve stopped using for one reason or another.

- [Ansible](https://www.ansible.com/) — Provisioning and deployment tool. Replaced by **Packer** and **Terraform**.
- [Let’s Encrypt](https://letsencrypt.org/) — Free SSL certificate authority. Replaced by **Cloudflare**, which does this automatically.
- [Healthchecks](https://healthchecks.io/) — Cron job monitor. Replaced by nothing; I refactored away all my cron jobs. This was pretty useful and I’d sign up again if I needed it.
- [SendGrid](https://sendgrid.com/) — A transactional and marketing email service. Replaced by Postmark, which has a faster web UI and better deliverability.

### 一点收获

### 附录:snippets

##### 用python实现一个简单的梯度下降算法

```python
import numpy as np

# 定义数据集和学习率
# Size of the points dataset.
m = 20

# Points x-coordinate and dummy value (x0, x1).
X0 = np.ones((m, 1))
X1 = np.arange(1, m+1).reshape(m, 1)
X = np.hstack((X0, X1))

# Points y-coordinate
y = np.array([
    3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
    11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)

# The Learning Rate alpha.
alpha = 0.01

# 代价函数
def error_function(theta, X, y):
    '''Error function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./2*m) * np.dot(np.transpose(diff), diff)

# 代价函数的梯度
def gradient_function(theta, X, y):
    '''Gradient of the function J definition.'''
    diff = np.dot(X, theta) - y
    return (1./m) * np.dot(np.transpose(X), diff)

# 梯度下降迭代计算
def gradient_descent(X, y, alpha):
    '''Perform gradient descent.'''
    theta = np.array([1, 1]).reshape(2, 1)
    gradient = gradient_function(theta, X, y)
    # 当梯度小于1e-5时，说明已经进入了比较平滑的状态，类似于山谷的状态，这时候再继续迭代效果也不大了，所以这个时候可以退出循环
    while not np.all(np.absolute(gradient) <= 1e-5):
        theta = theta - alpha * gradient
        gradient = gradient_function(theta, X, y)
    return theta

optimal = gradient_descent(X, y, alpha)
print('optimal:', optimal)
print('error function:', error_function(optimal, X, y)[0,0])
```