> 2021年08月20日信息消化 ### An Opinionated Guide to xargs origin: [An Opinionated Guide to xargs](https://www.oilshell.org/blog/2021/08/xargs.html) ### What Is `xargs`? It's an **adapter** between text streams and `argv` arrays, two essential concepts in shell. You pass it flags that specify how to split `stdin`. Then it generates **arguments** and invokes processes. Example: ``` $ echo 'alice bob' | xargs -n 1 -- echo hi hi alice hi bob ``` What's happening here? 1. `xargs` splits the input stream on whitespace, producing 2 arguments, `alice` and `bob`. 2. We passed `-n 1`, so `xargs` then passes each argument to a separate `echo hi $ARG` command. By default, it passes as many `args` to a command as possible, like `echo hi alice bob`. It may help to mentally`mental` replace `xargs` with the word **each**. As in, *for each word, line, or token, invoke this process with these `args`. In fact, I propose an `each` builtin for the [Oil language](https://www.oilshell.org/cross-ref.html?tag=Oil language#Oil language) below. #### Which Flags Should I Know About? You should know how to control: 1. The algorithm for **splitting** text into arguments (`-d`, `-0`). Discussed below. 2. How many arguments are passed to each **process** (`-n`). This determines the total number of processes`procesos` started. 3. Whether processes are run in sequence or in parallel (`-P`). For example, if you want the power of regexes to filter names, you can pipe to `egrep`, then explicitly split its output by newlines: ```bash # Remove Python and C++ unit tests ls | egrep '.*_test\.(py|cc)' | xargs -d $'\n' -- rm ``` ### Choose One of 3 Ways of Splitting `stdin` In [the comment](https://lobste.rs/s/wlqveb/xargs_considered_harmful#c_fp9vyj), I suggest using **only** these three styles of splitting: 1. `xargs` (the default): when you want **"words"** without spaces. For example, you can produce two `args` from the string `'alice bob'`. 2. `xargs -d $'\n'`: When you want the `args` to be **lines**, as in the `egrep` example above. (Note that `$'\n'` is [bash](https://www.oilshell.org/cross-ref.html?tag=bash#bash) syntax for a newline character, and Oil uses this syntax too.) 3. `xargs -0`: When you want to handle **untrusted data**. Someone could put a newline in a filename, but this is safe`seguro` with NUL-delimited tokens. Most of my scripts use the second style, and occasionally the third. Unix tools generally work better on streams of lines than streams of "words" or NUL-delimited tokens. ##### `xargs` Can Invoke Shell Functions With the `$0` Dispatch Pattern The original post discusses `xargs -I {}`, which allows you to control where each argument is substituted in the `argv` array. I occasionally use `-I`, but more often I use [xargs](https://www.oilshell.org/cross-ref.html?tag=xargs#xargs) with what I call the **[`$0` Dispatch Pattern](https://www.oilshell.org/blog/2021/07/blog-backlog-1.html#shell-programming-patterns)**. I outlined this shell programming pattern last month, but I still need to elaborate on it. The basic idea is to avoid the mini language of `-I {}` and just use shell — by **recursively invoking shell functions**. I use this all over Oil's own shell scripts, and elsewhere. ```bash do_one() { # Rather than xargs -I {}, it's more flexible to # use a function with $1 echo "Do something with $1" cp --verbose "$1" /tmp } do_all() { # Call the do_one function for each item. # Also add -P to make it parallel cat tasks.txt | xargs -n 1 -d $'\n' -- $0 do_one } "$@" # dispatch on $0; or use 'runproc' in Oil ``` Now run this script with either: - `demo.sh do_one $ARG` to test the work that's done on **each** item. You want to make this correct first. - `demo.sh do_all` to do work on **all** items. ##### Preview Tasks With an `echo` Prefix Before running a command like: ```bash $ cat tasks.txt | xargs -n 1 -- $0 do_one # It's often useful to preview it with echo: $ cat tasks.txt | xargs -n 1 -- echo $0 do_one ``` ##### `xargs -P` Automatically Parallelizes Tasks In the `do_all` example above, you can add `-P 8` to the [xargs](https://www.oilshell.org/cross-ref.html?tag=xargs#xargs) invocation to automatically parallelize it! For example, if you have 1000 indepdendent tasks, `xargs` will use 8 CPUs to run them as quickly as possible. I've used `-P 32` to make day-long jobs take an hour! You can't do that with a `for` loop. ##### `xargs` Composes With Other Tools ```bash # Filter tasks by name find ... | grep ... | xargs ... # Limit the number of tasks. I use this all the time # for faster testing find ... | head | xargs ... # Believe it or not, I use this to randomize music # and videos :) find ... | shuf | xargs mplayer ``` #### Recap To repeat, here are the benefits of the style I advocate: 1. **Incremental Development**: Figure out what to do on each item (what's a task?), then figure`figura` out what items to do it on (what tasks should I run?) 2. **Easy Testing** by using `echo` to preview tasks. This avoids running long batch jobs on the wrong input! 3. Better Performance - `xargs` lets you start as **few processes** as possible. - It also lets you start those processes **in parallel**. You can't do this with a for loop. 4. **Fewer Languages to Remember**. We use plain shell and a few flags to [xargs](https://www.oilshell.org/cross-ref.html?tag=xargs#xargs). 5. **Composition via Pipelines**. The task list becomes a "noun" that other shell tools can operate on. ### Patterns in confusing explanations origin: [Patterns in confusing explanations](https://news.ycombinator.com/item?id=28254630) #### now for the patterns! Now that I’ve explained my motivation, let’s explain the patterns! Here’s a quick index of all of them. They’re not in any particular order. 1. [pattern 1: making outdated assumptions about the audience’s knowledge](https://jvns.ca/blog/confusing-explanations/#pattern-1-making-outdated-assumptions-about-the-audience-s-knowledge) 2. [pattern 2: having inconsistent expectations of the reader’s knowledge](https://jvns.ca/blog/confusing-explanations/#pattern-2-having-inconsistent-expectations-of-the-reader-s-knowledge) 3. [pattern 3: strained analogies](https://jvns.ca/blog/confusing-explanations/#pattern-3-strained-analogies) 4. [pattern 4: fun illustrations on dry explanations](https://jvns.ca/blog/confusing-explanations/#pattern-4-fun-illustrations-on-dry-explanations) 5. [pattern 5: unrealistic examples](https://jvns.ca/blog/confusing-explanations/#pattern-5-unrealistic-examples) 6. [pattern 6: jargon that doesn’t mean anything](https://jvns.ca/blog/confusing-explanations/#pattern-6-jargon-that-doesn-t-mean-anything) 7. [pattern 7: missing key information](https://jvns.ca/blog/confusing-explanations/#pattern-7-missing-key-information) 8. [pattern 8: introducing too many concepts at a time](https://jvns.ca/blog/confusing-explanations/#pattern-8-introducing-too-many-concepts-at-a-time) 9. [pattern 9: starting out abstract](https://jvns.ca/blog/confusing-explanations/#pattern-9-starting-out-abstract) 10. [pattern 10: unsupported statements](https://jvns.ca/blog/confusing-explanations/#pattern-10-unsupported-statements) 11. [pattern 11: no examples](https://jvns.ca/blog/confusing-explanations/#pattern-11-no-examples) 12. [pattern 12: explaining the “wrong” way to do something without saying it’s wrong](https://jvns.ca/blog/confusing-explanations/#pattern-12-explaining-the-wrong-way-to-do-something-without-saying-it-s-wrong) 13. [pattern 13: “what” without “why”](https://jvns.ca/blog/confusing-explanations/#pattern-13-what-without-why) #### pattern 1: making outdated assumptions about the audience’s knowledge I also sometimes see this “outdated assumptions about the audience’s knowledge” problem with newer writing. It generally happens when the writer learned the concept many years ago, but doesn’t have a lot of experience explaining it in the present. So they give the type of explanation that assumes that the reader knows approximately the same things they and their friends knew in 2005 and don’t realize that most people learning it today have a different set of knowledge. #### pattern 2: having inconsistent expectations of the reader’s knowledge The problem with this is that are probably zero people who understand malloc but don’t understand how a for loop works! And even though it sounds silly, it’s easy to accidentally write like this if you don’t have a clear idea of who you’re writing for. ##### instead: pick 1 specific person and write for them! You can pick a friend, a coworker, or just a past version of yourself. Writing for just 1 person`persona` might feel insufficiently general`generales` (“what about all the other people??“) but writing that’s easy to understand for 1 person (other than you!) has a good chance of being easy to understand for many other people as well. #### pattern 3: strained analogies #### instead: keep analogies to a single idea Instead of using “big” analogies where I explain in depth exactly how an event processing system is like a river, I prefer to explain the analogy in one or two sentences to make a specific point and then leave the analogy behind. Here are 2 ways to do that. ##### **option 1: use “implicit” metaphors** For example, if we’re talking about streams, I might write: > Every event in a stream flows from a producer to a consumer. Here I’m using the word “flow”, which is definitely a water metaphor. I think this is great – it’s an efficient way to evoke an idea of directionality and the idea that there are potentially a large number of events. I put together a bunch more metaphors in this style in [Metaphors in man pages](https://jvns.ca/blog/2020/05/08/metaphors-in-man-pages/). **option`opción` 2: use a very limited analogy** For example, here’s a nice explanation from [When costs are nonlinear, keep it small](https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/) by Jessica Kerr that explains batching using an analogy to doing your laundry in a batch. > We like batching. Batching is more efficient: doing ten at once is faster than doing one, one, two, one, one, etc. I don't wash my socks as soon as I take them off, because lumping them in with the next load is free. This analog`yanalogía` is very clear! I think it works well because batching in laundry works for the same reasons as batching in computing – batching your laundry works because there’s a low incremental cost`coste` to adding another pair of socks to the load. And it’s only used to illustrate one idea – that batching is a good choice when there’s a low incremental cost for adding a new item. ##### pattern 4: fun illustrations on dry explanations #### instead: make the design reflect the style of the explanation There are lots of great examples of illustrated explanations where the writing is in a clear and friendly style: - [how dns works](https://howdns.works/ep1/) - [why’s (poignant) guide to ruby](https://poignant.guide/) - [how do calculators even](https://shop.bubblesort.io/products/how-do-calculators-even-zine) On the other hand, dry explanations are useful too! Nobody expects the [Intel instruction-set reference](https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html) to be light reading! The writing is dry and technical, and the design is very utilitarian, which matches the style of the writing. ##### pattern 5: unrealistic examples Here’s an unrealistic example of how to use `lambda` in Python: > ```python > numbers = [1, 2, 3, 4] > squares = map(lambda x: x * x, numbers) > ``` ###### instead: write realistic examples! Here’s a more realistic example of Python lambdas, which sorts a lista list of children by their age. (from my post [Write good examples by starting with real code](https://jvns.ca/blog/2021/07/08/writing-great-examples/)) This is how I use Python lambdas the most in practice. > ```python > children = [ > {"name": "ashwin", "age": 12}, > {"name": "radhika", "age": 3}, > ] > sorted_children = sorted(children, key=lambda x: x['age']) > ``` #### pattern 6: jargon that doesn’t mean anything Let’s talk about this sentence`frase` from this [chapter on commit signing](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work): > Git is cryptographically secure, but it’s not foolproof. “Cryptographically secure” is unclear here because it *sounds* like it should have a specific technical meaning, but it’s not explained anywhere what’s actually meant. Is it saying that Git uses SHA-1 to hash commits and it’s difficult to generate SHA-1 hash collisions? I don’t know! Using jargon in a meaningless way like this is confusing because it can trick the reader into thinking something specific is being said, when the information they need is not actually there. (the chapter doesn’t explain anywhere what’s meant by “cryptographically secure” in this context) ###### instead: Avoid jargon where it’s not needed A lot of the time I find I can communicate what I need to without using any jargon at all! For example, I’d explain why commit signing is important like this: > When making a Git commit, you can set any name and email you want! For example, I can make a commit right now saying I'm Linus Torvalds like this: > > ``` > git commit -m"Very Serious Kernel Update" \ > --author='Linus Torvalds ' > > ``` #### pattern 7: missing key information Sometimes explanations of a concept are missing the most important idea to understand the concept. For example, take this explanation from [this chapter on the Git object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) (which by the way has a nice concrete example of how to explore Git’s object model): > Git is a **content-addressable filesystem**. Great. What does that mean? It means that at the core of Git is a simple key-value data store. What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content. This paragraph is missing what to me is the main idea of content-addressable storage – that the key for a piece`una pieza` of content is a deterministic function of the content, usually a hash (though the page does later say that Git uses a SHA-1 hash). It’s important`importante` that the key is a function of the content and not just any random unique key because the idea is that the content is addressed by *itself* – if the content changes, then its key also has to change. This pattern is hard to recognize as a reader because – how are you supposed to recognize that there’s a key idea missing when you don’t know what the key ideas *are*? So this is a `casea` case where a reviewer who understands the subject well can be really helpful. #### 8: introducing too many concepts at a time Here’s an explanation of linkers from [this page](https://riptutorial.com/c/example/4360/the-linker) that I find confusing: > During the link process, the linker will pick up all the object modules specified on the command line, add some system-specific startup code in front and try to resolve all external references in the object module with external definitions in other object files (object files can be specified directly on the command line or may implicitly be added through libraries). It will then assign load addresses for the object files, that is, it specifies where the code and data will end up in the address space of the finished program. Once it’s got the load addresses, it can replace all the symbolic addresses in the object code with “real”, numerical addresses in the target’s address space. The program is ready to be executed now. Here are the concepts in this paragraph: - object modules (`.o` files) - external references - symbolic addresses - load addresses - system-specific startup code It’s too much! ###### instead: give each concept some space to breathe For example, I might explain “external references” like this: > if you run `objdump -d myfile.o` on an object file you can see that the `call` function`función` calls are missing a target address, so that's why the linker needs to fill that in. > > ``` > 33: e8 00 00 00 00 call 38 ^^^^^^^^^^^ this address is all 0s -- it needs to be filled in by the linker! with the actual function that's going to be called!38: 84 c0 test %al,%al3a: 74 3b je 77 3c: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp) > ``` There’s still a lot of missing information here (how does the linker know what address to fill in?), but it’s a clear `un claro` starting point and gives you questions to ask. #### pattern 9: starting out abstract Imagine I try to explain to you what a Unix signal using the [definition from Wikipedia](https://en.wikipedia.org/wiki/Signal_(IPC)). > Signals are a limited form of inter-process communication (IPC), typically used in Unix, Unix-like, and other POSIX-compliant operating systems. A signal is an asynchronous notification sent to a process or to a specific thread within the same process to notify it of an event. Signals originated in 1970s Bell Labs Unix and were later specified in the POSIX standard. By itself, this probably`probable` isn’t going to help you understand signals if you’ve never heard of them before! It’s very abstract and jargon-heavy (“asynchronous notification”, “inter-process communication”) and doesn’t have any information about what Unix signals are used for in practice. Of course, the Wikipedia explanation isn’t “bad” exactly – it’s probably written like that because teaching people about signals for the first time isn’t really the goal of the Wikipedia article on signals. ![img](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/signals.png) #### instead: start out concrete For example, I wrote this page explaining signals a few years ago. ### 如何编写前端设计文档? origin: [如何编写前端设计文档?](https://juejin.cn/post/6998519744072515621) 在笔者所在的前端研发流程中, 【技术调研及方案设计】属于连接【需求阶段】和【开发阶段】的中间节点。在需求详评(三审)后了, 需求的功能和交互已经基本确定, 而在实际进入开发之前, 还有一些**待确定的技术要点需要补全**, 这些要点包括 - **需求的可实现性**(理论上能不能做, 是否能支持某个功能, 某个交互是否能实现, 实现功能的成本是否过于巨大),假设你给PM拍胸口说啥功能你都能实现, 然后Ta提了一个[这样的需求](https://link.juejin.cn?target=https%3A%2F%2Fzhuanlan.zhihu.com%2Fp%2F41305243)... - **需求的整体架构**(前后端交互的流程和方式, 接口的路径、请求和响应参数) - **需求的具体设计**(前端页面/组件/服务的设计) #### 笔者团队前端的设计文档模版~需求文档 ##### 1.需求背景及资源 - 需求背景 - 相关文档 & 资源 - 需求文档: - 设计视觉稿: - 服务端IDL: - 第三方服务/SDK文档 - 测试Case: - 埋点文档: - 运营资源列表(optional): - 走查及验收文档: ##### 2.排期 - 排期Timeline - 排期拆分 ##### 3.设计方案 - 整体方案 - 项目搭建 - 部署方案 - 监控方案 - 页面设计 - 页面描述 - URL - UI & 交互逻辑(UI拆分) - 状态 - 请求逻辑 - 业务逻辑 - 埋点逻辑 - 组件设计 - 模块描述 - UI & 交互逻辑 - 状态 / Props - 业务逻辑 - 埋点逻辑 - 公用模块 - 模块描述 - 业务逻辑 ### Misc - [Neural-hash-collider](https://news.ycombinator.com/item?id=28229291) - 这几天闹的沸沸扬扬的Apple NueralHash,哈希冲突。 - 还是[中文](https://baijiahao.baidu.com/s?id=1708502978796177996&wfr=spider&for=pc)看懂点