Digest | 构建协作式应用(Replicache), JQ入门, discord如何存储数据

现代服务器具有处理交易的巨大潜力，而这些潜力被 "现代 "应用堆栈浪费了99.99%。选择自己擅长的方向是创业者最大的核心竞争力。

>  2021年08月24日信息消化

### A Simple Way to Build Collaborative Web Apps

origin: [A Simple Way to Build Collaborative Web Apps](https://zjy.cloud/posts/collaborative-web-apps)

##### Client

We use React's *state* to store data, which is okay for the input value because it is temporary by nature, but not quite right for the todos. The todos need to be:

1. updated in a local browser cache for maximal speedThis pattern is sometimes called Optimistic UI
2. synced to the server for persistency
3. delivered to other clients in correct order and state.

Nowadays, there is a plethora of frontend state management libraries to choose from: Redux, MobX, Recoil, GraphQL clients like Apollo and Relay, etc. Sadly none of them works in our use case. What we need is a distributed system with realtime syncing and conflict resolution baked in. Although there are [good writings](https://www.figma.com/blog/how-figmas-multiplayer-technology-works) on this subject, distributed systems are still too hard to implement correctly for a one-person team. I'd like to bring in some help.

After some search, a promising option shows up - [Replicache](https://replicache.dev/), of which the homepage says:

> Replicache makes it easy to add realtime collaboration, lag-free UI, and offline support to web apps. It works with any backend stack.

Replicache implements a persistent store in the browser, using IndexedDB. You can mutate the store locally and subscribe to part of the store in your UI. When data changes, subscriptions re-fire, and the UI refreshes.

You need to provide two backend endpoints for Replicache to talk to: *replicache-pull* and *replicache-push*. *replicache-pull* sends back a subset of your database for the current client. *replicache-push* updates the database from local mutations. After applying a mutation on the server, you send a `WebSocket` message hinting to affected clients to pull again.

> As the Replicache doc says, managing your own WebSocket backend has a very high operational cost. We use [Ably](https://ably.com/) here. message hinting to affected clients to pull again.

That's all you need to do. Relicache orchestrates the whole process to make sure the state is consistent while being synced in realtime.

We will dive into the backend integration in the next section of this article. For now, let's rewrite the state-related code utilizing Replicache:

```js
// Only relevant part are shown

import { Replicache } from 'replicache'
import { useSubscribe } from 'replicache-react'
import { nanoid } from 'nanoid'

const rep = new Replicache({
  // other replicache options
  mutators: {
    async createTodo(tx, { id, completed, content, order }) {
      await tx.put(`todo/${id}`, {
        completed,
        content,
        order,
        id,
      })
    },
    async updateTodoCompleted(tx, { id, completed }) {
      const key = `todo/${id}`
      const todo = await tx.get(key)
      todo.completed = completed

await tx.put(`todo/${id}`, todo)
    },
    async deleteTodo(tx, { id }) {
      await tx.del(`todo/${id}`)
    },
  },
})

export default function TodoApp() {
  const todos =
    useSubscribe(rep, async (tx) => {
      return await tx.scan({ prefix: 'todo/' }).entries().toArray()
    }) ?? []

const onSubmit = (e) => {
    e.preventDefault()
    if (content.length > 0) {
      rep.mutate.createTodo({
        id: nanoid(),
        content,
        completed: false,
      })

setContent('')
    }
  }

const onChangeCompleted = (e) => {
    rep.mutate.updateTodoCompleted({
      id: todo.id,
      completed: e.target.checked,
    })
  }

const onDelete = (_e) => {
    rep.mutate.deleteTodo({ id: todo.id })
  }

// render
}
```

##### Server

Since we have implemented Optimistic UI on the client, most operations are already speedy (zero latency). For changes to be synced from one client to others quickly, we still need to achieve low latency for the requests to the server. Hopefully, the latency should be under 100ms for the collaboration to feel *realtime*.

We can only achieve that by globally deploying the server and the database. If we don't and only deployed to one region, the latency for a user in another continent will be several hundred milliseconds high no matter what we do. It's the speed of light, period.

Globally deploying a stateless server should be easy. At least that's what I initially thought. Turns out I was wrong. In 2021,  most cloudMostly I'm referring to PaaS like Heroku and Google App Engine. FaaS (function as a service) is much easier to deploy globally but comes with its own gotchas. providers still only allow you to deploy your server to a single region. You need to go many extra steps to have a global setup.

> Mostly I'm referring to PaaS like Heroku and Google App Engine. FaaS (function as a service) is much easier to deploy globally but comes with its own gotchas. providers still only allow you to deploy your server to a single region. You need to go many extra steps to have a global setup.

Luckily I find [Fly.io](https://fly.io/), a cloud service that helps you "deploy app servers close to your users", which is excatly what we need. It comes with an excellent command-line tool and a smooth "push to deploy" deployment flow. Scaling out to multiple regions (in our case, Hong Kong and Los Angeles) takes only a few keystrokes. Even better, they offer a pretty generous free tier.

Inspired by Google's [Spanner](https://research.google/pubs/pub39966/), many open source solutions come out. One of the most polished competitors is [CockroachDB](https://www.cockroachlabs.com/). Luckily, they offer a managed service with a 30-day trial.

Although I managed to build a version of Todo Light using CockroachDB, the end product in this article is based on a much simpler [Postgres setup](https://fly.io/blog/globally-distributed-postgres/) with distributed read replicas. Dealing with a global database brings in much complexity that is not essential to the subject matter of this article, which will wait for another piece.

##### Bonus - Implement Reordering with Fractional Indexing

You may notice that we use the type *text* for the *ord* column in the database schema, which seems better suited for a number type. The reason is we are using a technique called [Fractional Indexing](https://github.com/rocicorp/fractional-indexing) to implement reordering. Check the [source code](https://github.com/rocicorp/fractional-indexing) of Todo Light or try to implement it by yourself. It should be an interesting practice.

### A very brief history of Unix

origin: [A very brief history of Unix](https://changelog.com/posts/a-brief-history-of-unix)

##### AT&T Unix

So the original Unix is AT&T Unix and that was started in the late ’60s early ‘70s at Bell Labs. So that’s the O.G. It wasn’t even open source. It was proprietary. AT&T licensed Unix to various parties in the ‘70s and that led to the different Unix variants like UC Berkeley’s [BSD](https://en.wikipedia.org/wiki/Berkeley_Software_Distribution), Sun’s [Solaris](https://en.wikipedia.org/wiki/Oracle_Solaris), IBM’s [AIX](https://en.wikipedia.org/wiki/IBM_AIX), and there’s more than just those.

所以最初的Unix是AT&T Unix，它是在60年代末70年代初在贝尔实验室开始的。所以那是O.G.，它甚至没有开放源代码。它是专有的。AT&T在70年代将Unix授权给各方，这导致了不同的Unix变体，如UC Berkeley的[BSD](https://en.wikipedia.org/wiki/Berkeley_Software_Distribution)、Sun的[Solaris](https://en.wikipedia.org/wiki/Oracle_Solaris)、IBM的[AIX](https://en.wikipedia.org/wiki/IBM_AIX)，而且还不止这些。

##### UNIX®

Now, all-caps UNIX… that’s the trademark which AT&T owned until the ‘90s. Then it sold it to [Novell](https://en.wikipedia.org/wiki/Novell), which then sold their Unix Business Group to somebody else, but then they kept the copyright which eventually ended up at the [Open Group](https://en.wikipedia.org/wiki/The_Open_Group), which is like a consortium of different entities. No idea if they still hold it or what. So all-caps UNIX - that’s UNIX the trademark. Of course, there were legal disputes along the way, but those are not interesting.

现在，大写的UNIX......这是AT&T拥有的商标，直到90年代。然后它把它卖给了[Novell](https://en.wikipedia.org/wiki/Novell)，后者又把他们的Unix业务集团卖给了其他人，但他们保留了版权，最终由[Open Group](https://en.wikipedia.org/wiki/The_Open_Group)拥有，这就像一个不同实体的联盟。不知道他们是否仍然持有它或什么。所以大写的UNIX--那是UNIX的商标。当然，一路上也有法律纠纷，但这些并不有趣。

##### GNU

Back in the ‘80s the [GNU Project](https://en.wikipedia.org/wiki/GNU_Project) began, which was an effort to create a free software Unix-like system. You’ve probably heard of GNU. It stands for “GNU’s Not Unix”. It’s not Unix, but it’s *Unix-like* and it’s famous for many things. *(Not just the invention of the recursive acronym which is pretty rad and has been copied over and over again.)*

What else does GNU do? The [GPL](https://en.wikipedia.org/wiki/GNU_General_Public_License) (GNU General Public License), [GCC](https://en.wikipedia.org/wiki/GNU_Compiler_Collection) (GNU’s Compiler Collection). They’ve got GCC… and of course the coreutils like `ls`, `rm`, etc. and more.

So GNU had a lot of things going, but they didn’t really have a working kernel. There was [GNU Hurd](https://en.wikipedia.org/wiki/GNU_Hurd), which was being worked on back in the early ‘90s, but didn’t totally work yet when [Linux](https://en.wikipedia.org/wiki/Linux_kernel) came around.

早在80年代，GNU项目就开始了，它致力于创建一个类似Unix的自由软件系统。你可能听说过GNU。它是 "GNU's Not Unix "的缩写。它不是Unix，但它是类似Unix的系统，它因许多事情而闻名。(不仅仅是发明了递归缩写，这个缩写非常激进，而且被不断地复制）。)

GNU还做了什么？GPL（GNU通用公共许可证），GCC（GNU的编译器集合）。他们有GCC......当然还有ls、rm等coreutils，还有更多。

所以GNU有很多东西，但他们并没有真正的工作内核。还有GNU Hurd，它在90年代初就已经开始工作了，但是当Linux出现的时候还没有完全工作。

#### Linux

Linus Torvalds released Linux back in 1991. That’s a kernel. So the Linux kernel is an operating system kernel, which means it’s not an entire operating system. He released that as GPL, so it got integrated with a bunch of other GPL stuff.
And then there’s also the BSD Unix effort which was released in 1992. That led to NetBSD, FreeBSD, later on OpenBSD, and I think DragonFly… A few others
So Linux and BSD: they have more in common than they have in difference. They’re very similar, but the differences are what we focus on (of course) because those are the interesting bits. That’s what makes it unique. That’s why we should even have more than one in the first place. But what they have in common is the Unix philosophy and the Unix architecture.

#### The Unix philosophy

The Unix philosophy is something we talk about on the show all the time. In fact we’ve just talked about it with Mat Ryer with his tool xbar and how it accidentally followed some of the Unix philosophy and had some awesome results from that.
The Unix philosophy includes ideas like:

- “Make each program do one thing well”
- “Write programs that work together”
- “write programs that handle text streams”

So everything is text. If you can assume it’s text then you can write more simple programs that work with more things.

##### The Unix architecture

Then there’s the Unix architecture which has the unified file system that uses inter-process communication through pipes. We’ve already talked a little bit about pipes which serve as the primary means of communication. It also includes a shell scripting and command syntax called the Unix shell, which kind of brings us full circle, right?
So when we talk about Unix tools or “modern Unix” we are mostly referring to programs that:

- Follow the Unix philosophy

- Run inside the Unix architecture
- Are executed from a Unix shell

### An Introduction to JQ

origin: [An Introduction to JQ](https://earthly.dev/blog/jq-select/)

#### Array-Index

`jq` lets you select the whole array `[]`, a specific element `[3]`, or ranges `[2:5]` and combine these with the object index if needed.

It ends up looking something like this:

```bash
jq '.key[].subkey[2]
```

#### Removing Quotes From JQ Output

The `-r` option in `jq` gives you raw strings if you need that.

```bash
$ echo '["1","2","3"]' | jq -r '.[]'
1
2
3
```

The `-j` option (for join) can combine together your output.

```bash
$ echo '["1","2","3"]' | jq -j '.[]'
123
```

#### Putting Elements in an Array using jq

In fact, whenever you ask jq to return an unwrapped collection of elements, it prints them each on a new line. You can see this by explicitly asking jq to ignore its input and instead return two numbers:

```bash
$ echo '""' | jq '1,2' 
1
2
$ echo '""' | jq '[1,2]' 
[
  1,
  2
]
```

Similarly, to **put a generated collection of results into a JSON array**, you wrap it in an array constructor `[ ... ]`. e.g.:   `.[].title`  → `[ .[].title ]`

```bash
curl https://api.github.com/repos/stedolan/jq/issues?per_page=5 | \
  jq '[ .[].title ] '

```

#### Using jq to Select Multiple Fields

The easiest way to do this is using `,` to specify multiple filters:

```bash
curl https://api.github.com/repos/stedolan/jq/issues?per_page=2 | \ 
  jq ' .[].title, .[].number'
```

But this is returning the results of one selection after the other. To change the ordering, I can factor out the array selector:

```bash
curl https://api.github.com/repos/stedolan/jq/issues?per_page=2 | \
  jq '.[] | .title, .number'
```

#### Putting Elements Into an Object Using jq

If you were building up a JSON object out of several selectors, it would end up looking something like this:

```bash
jq '{ "key1": <<jq filter>>, "key2": <<jq filter>> }'
```

```bash
echo '["Adam","Gordon","Bell"]' | jq -r '{ "first_name":.[0], "last_name": .[2]}'
```

#### Sorting and Counting With JQ

##### jq Built-in Functions

If I want those labels in alphabetical order I can use the built in sort function. It works like this:

```bash
$  echo '["3","2","1"]' | jq 'sort'["1", "2", "3"]
```

Other built-ins that mirror JavaScript functionality are available, like length, reverse, and tostring and they can all be used in a similar way:

```bash
$  echo '["3","2","1"]' | jq 'reverse'["1", "2", "3"]$  echo '["3","2","1"]' | jq 'length'3
```

#### Pipes and Filters

```bash
echo '{"title":"JQ Select"}' | jq '.title' | jq 'length'# ↑ same with ↓echo '{"title":"JQ Select"}' | jq '.title | length'
```

Here are some more examples:

- `.title | length` will return the length of the title
- `.number | tostring` will return the issue number as a string
- `.[] | .key` will return the values of key `key` in the array (this is equivalent to this `.[].key`)

This means that sorting my labels array is simple. I can just change `.labels` to `.labels | sort`:

```bash
curl https://api.github.com/repos/stedolan/jq/issues/2289 | \  jq ' { title: .title, number: .number, labels: .labels | sort } '
```

And if you want just a label count that is easy as well:

```bash
$ curl https://api.github.com/repos/stedolan/jq/issues/2289 | \  jq ' { title: .title, number: .number, labels: .labels | length } '      {    "title": "Bump jinja2 from 2.10 to 2.11.3 in /docs",    "number": 2289,    "labels": 2  }  
```

##### Maps and Selects Using JQ

`map(...)` let’s you unwrap an array, apply a filter and then rewrap the results back into an array. You can think of it as a shorthand for `[ .[] | ... ]` and it comes up quite a bit in my experience, so it’s worth it committing to memory.

```bash
jq '[ .[] | { title: .title, number: .number, labels: .labels | length } ]'# ↑ same with ↓jq 'map({ title: .title, number: .number, labels: .labels | length }) 
```

`select` is a built-in function that takes a boolean expression and only returns elements that match. It’s similar to the `WHERE` clause in a SQL statement or array filter in JavaScript.

```bash
curl https://api.github.com/repos/stedolan/jq/issues?per_page=100 | \   jq 'map({ title: .title, number: .number, labels: .labels | length }) |    map(select(.labels > 0))'
```

#### Convert JSON to CSV via the Command Line using JQ

[Convert JSON to CSV via the Command Line using JQ](https://earthly.dev/blog/convert-to-from-json/#convert-json-to-csv-via-the-command-line-using-jq)

### How Discord Stores Billions of Messages

origin: [How Discord Stores Billions of Messages](https://blog.discord.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7)

##### HN Comment

Well... we have 3 node MongoDB cluster and are processing up to a million trades... per second. And a trade is way more complex than a chat message. Has tens to hundreds of fields, may require enriching with data from multiple external services and then requires to be stored, be searchable with unknown, arbitrary bitemporal queries and may need multiple downstream systems to be notified depending on a lot of factors when it is modified.

嗯......我们有3个节点的MongoDB集群，每秒处理多达一百万次交易。而一个交易比一个聊天信息要复杂得多。它有几十到几百个字段，可能需要用来自多个外部服务的数据来充实，然后需要存储，可以用未知的、任意的比特时间查询来搜索，并且可能需要多个下游系统在它被修改时得到通知，这取决于很多因素。

All this happens on the aforementioned MongoDB cluster and just two server nodes. And the two server nodes are really only for redundancy, a single node easily fits the load.

所有这些都发生在前面提到的MongoDB集群和仅仅两个服务器节点上。而这两个服务器节点其实只是为了冗余，一个节点就可以轻松满足负载。

What I want to say is:

-- processing a hundred million simple transactions per day is nothing difficult on modern hardware.

-- modern servers have stupendous potential to process transactions which is 99.99% wasted by "modern" application stacks,

-- if you are willing to spend a little bit of learning effort, it is easily possible to run millions of non trivial transactions per second on a single server,

-- most databases (even as bad as MongoDB is) have a potential to handle much more load than people think they can. You just need to kind of understand how it works and what its strengths are and play into rather than against them.

And if you think we are running Rust on bare metal and some super large servers -- you would be wrong. It is a normal Java reactive application running on OpenJDK on an 8 core server with couple hundred GB of memory. And the last time I needed to look at the profiler was about a year ago.

我想说的是。

--每天处理一亿个简单的交易，在现代硬件上并不困难。

-- 现代服务器具有处理交易的巨大潜力，而这些潜力被 "现代 "应用堆栈浪费了99.99%。

-- 如果你愿意花一点时间学习，很容易就能在一台服务器上每秒运行数百万个非琐碎的交易。

-- 大多数数据库（即使像MongoDB那样糟糕）都有可能处理比人们认为的更多的负载。你只需要了解它是如何工作的，以及它的优势是什么，并利用而不是反对它们。

如果你认为我们是在裸机和一些超级大的服务器上运行Rust，那你就错了。这是一个普通的Java反应式应用，在8核服务器上运行，有几百GB的内存。而我最后一次需要查看剖析器是在一年前。

### Misc

- OAuth (Open Authorization) is a secure, industry-standard protocol that allows you to approve one application interacting with another on your behalf **without giving away your password**. Instead of passing user credentials from app to app, OAuth lets you pass authorization between apps over HTTPS with **access tokens**.
- 你会发现，陈皓做的选择只计较是否有有价值的经历，而不是薪资待遇或是职位权力。“在我 24 岁的从银行出来的的时候我知道我想要什么了”，陈皓的人生目的性非常强，只要那些有价值的经历，哪怕失败也不后悔。[陈皓（左耳朵耗子](https://www.infoq.cn/article/WFAL8AFLdhZdj5WebvpG)
  - 要学会等“天时地利人和”，有三点最为关键：
    - 选择自己擅长的方向是创业者最大的核心竞争力；
    - 擅长的方向正好与时代的发展方向一致；
    - 创业时间点正好在快速爆发的前夜。