Digest | HN的公开秘密, Golang 实用 DDD：Repository

“Open” because they’re not private. “Secret” because they’re not well known. 人工筛选+获赞速度。DDD: Domain Layer(Entity), Detail/Contract Layer(Data Transfer Object)

>  2021年10月29日信息消化

### Open secrets about Hacker News

origin: [Open secrets about Hacker News](https://bengtan.com/blog/open-secrets-hacker-news/)

“Open” because they’re not private. “Secret” because they’re not well known.

##### Sandbox, Live List, and the Top30

There are three parts of Hacker News which I’m calling the Sandbox, the Live List and the Top30.

The Top30 are the top 30 stories which appear on the [front page](https://news.ycombinator.com/new) of [Hacker News](https://news.ycombinator.com/). Stories with more points/upvotes are ranked higher. Stories which are older are ranked lower. An algorithm balances these two factors.

The Live List is the rest of the top stories after the Top30. These appear on [page two](https://news.ycombinator.com/news?p=2) and [three](https://news.ycombinator.com/news?p=3) and so on.

The Sandbox is the [‘new’](https://news.ycombinator.com/newest) section and lists all stories ranked by time. Newer stories appear higher.

When a new story is posted, it is listed only in the sandbox. Once (or if) it has 5 points, then it also appears in the Live List.

#### Making the front page

A story needs to accumulate 5 points to appear in the Live List. Where it initially appears depends on how quickly it accumulated points. **If points are acquired quickly enough, it appears high enough to be in the Top30**. Otherwise, it appears at a lower position.

##### Upvote conversion rate

Whether a story (acquires 5 points and) escapes the sandbox depends on the rate and the number of page views.

```plain text
Chance of escaping sandbox = upvote conversion rate ✕ views
```

A newly submitted story will get around 30 page views from being listed in the Sandbox (The actual number varies so let’s just talk about averages). All stories start with one point. If it gets 4 additional upvotes, it will escape the Sandbox.

In other words, the story must persuade **13.3%** or more of readers to **upvote**. That’s a pretty high conversion rate.

##### Timing

Timing affects how many times a sandboxed story is viewed.

```
Views = number of users online / rate of new submissions
```

But timing is not predictable in a reliable way.

Posting a story during a **busy period** means more viewers and more chances for upvotes. But there are also more newer stories that push it down the Sandbox ranking too.

The best time to post is when the ratio of viewers to new stories is highest. But no one knows with certainty when that happens.

##### Most stories ‘fail’

Most stories ‘fail’ and have only one or two points.

For example, [tptacek](https://news.ycombinator.com/user?id=tptacek) is one of the most well-known users on Hacker News. He has a [pretty good track record of getting blog posts on the front page](https://news.ycombinator.com/item?id=23458896) yet his ‘success’ rate is around 50%.

##### Re-posting

Hacker News allows re-posts. This is not widely known.

> If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.
>
> – https://news.ycombinator.com/newsfaq.html

**Re-posting can triple-or-more the chances of a story reaching the front page.**

##### Don’t delete and re-post

The flip-side of re-posting is that delete-and-repost is not allowed.

##### Text posts are penalised

Submissions which are text and **lack a URL** (sometimes called a ‘self post’) have a ranking **penalty applied**. The factor has been estimated to be a **0.4** or **0.7** multiplier. (Source: [Why you should submit your stuff as a blog posting](https://news.ycombinator.com/item?id=1076633))

> Posts without URLs get penalized, so you’d be better off posting this with a link, then adding your text as a first comment in the new thread.
>
> – https://news.ycombinator.com/item?id=21874086

##### Karma

[Do posts by authors with more karma rank higher?](https://news.ycombinator.com/item?id=23458656) The answer is: No.

Taking into account all of the above:

```
Chance of escaping sandbox = re-posts ✕ upvote conversion rate ✕ number of users online / rate of new submissions
```

##### Second chance pool

A small proportion of interesting stories which didn’t attract enough upvotes are curated by the **moderators**.

It is taken out of the sandbox and it’s timestamp is artifically manipulated so it appears in the second half of the Top30. From there, it decays normally.

> Moderators and a small number of reviewer users comb the depths of /newest looking for stories that got overlooked but which the community might find interesting. Those go into a second-chance pool from which stories are randomly selected and lobbed onto the bottom part of the front page. This guarantees them a few minutes of attention. If they don’t interest the community they soon fall off, but if they do, they get upvoted and stay on the front page.
>
> – https://news.ycombinator.com/item?id=11662380

The implication? The Top30 is a combination of **organically-upvoted** stories and **moderator-boosted** stories.

##### Vote manipulation

> HN’s anti-voting-ring software is now so strict that the main thing we have to do is turn it off when a submission is good enough
>
> – https://news.ycombinator.com/item?id=15511238

Hacker News readers are very protective of the site. Stories which reach the Top30 via dubious means attract intense scrutiny. Sometimes even organically-upvoted stories are suspected.

##### dang

Hacker News is moderated mainly by [dang](https://news.ycombinator.com/user?id=dang) aka Dan Gackle (pronounced ‘Gackley’). He’s not of asian descent. But [he](https://news.ycombinator.com/item?id=20643264) [is](https://news.ycombinator.com/item?id=25054045) [the](https://news.ycombinator.com/item?id=25049415) [best](https://news.ycombinator.com/item?id=25049119) [moderator](https://news.ycombinator.com/item?id=25056408) [in](https://news.ycombinator.com/item?id=25048958) [the](https://news.ycombinator.com/item?id=25051566) [world](https://news.ycombinator.com/item?id=25052249).

##### Random

The story ‘Learn Git Branching’ has been posted almost 20 times across [two](http://pcottle.github.io/learnGitBranching/) [urls](https://learngitbranching.js.org/). The distribution of points is [pretty](https://news.ycombinator.com/from?site=pcottle.github.io) [random](https://news.ycombinator.com/from?site=learngitbranching.js.org).

##### Further reading

- [A List of Hacker News’s Undocumented Features and Behaviors](https://github.com/minimaxir/hacker-news-undocumented)
- [The Lonely Work of Moderating Hacker News](https://www.newyorker.com/news/letter-from-silicon-valley/the-lonely-work-of-moderating-hacker-news) ([comments](https://news.ycombinator.com/item?id=20643052) and more [comments](https://news.ycombinator.com/item?id=25048415))
- [Awesome Hacker News](https://github.com/cheeaun/awesome-hacker-news) - A collection of awesome Hacker News apps, libraries, resources and shiny things

### Practical DDD in Golang: Repository

origin: [Practical DDD in Golang: Repository](https://levelup.gitconnected.com/practical-ddd-in-golang-repository-d308c9d79ba7)

> **Summary**
>
> DDD Pattern - Respository
>
> - Domain Layer | Repository: interface 
>   - Entity (business logic,)
> - Detail/Contract Layer 
>   - Data Transfer Object (map data to entity)

Today it is hard to imagine writing some application without accessing some storage in a runtime. Probably not even by writing deployment scripts, as they need to access configurational files that are still, in a way, types of storage.

Whenever you write some application that should solve some problem in the real business world, you need to connect to the database, external API, some cache system, anything. It is unavoidable.

From that perspective, it is no surprise to have a DDD pattern that solves such needs. Of course, DDD did not invent Repository and many of its appliances in other literature, but DDD added more clarity.

##### The Anti-Corruption Layer

Domain-Driven Design is a principle that we can apply to many aspects of software development and many places. Still, the main focus lies inside the domain layer, where our business logic should be.

As **Repository** always represents a structure that keeps technical details about connection to some external world, it already does not belong to our business logic.

But, from time to time, we need to **access the Repository from within the domain layer**. As the domain layer is the one on the bottom and does not communicate with others, we define the Repository inside it but as an interface.

- `repository.go`

```go
import (
	"context"

"github.com/google/uuid"
)

type Customer struct {
	ID uuid.UUID
	//
	// some fields
	//
}

type Customers []Customer

type CustomerRepository interface {
	GetCustomer(ctx context.Context, ID uuid.UUID) (*Customer, error)
	SearchCustomers(ctx context.Context, specification CustomerSpecification) (Customers, int, error)
	SaveCustomer(ctx context.Context, customer Customer) (*Customer, error)
	UpdateCustomer(ctx context.Context, customer Customer) (*Customer, error)
	DeleteCustomer(ctx context.Context, ID uuid.UUID) (*Customer, error)
}
```

The Entity Customer does not hold any information about the type of storage below: there is no Go tag defining JSON structure, [Gorm](https://gorm.io/index.html) columns, or anything similar. For that, we must use the infrastructure layer.

- `contract.go`

```go
// domain layer

// infrastructure layer

import (
	"context"

"github.com/google/uuid"
	"gorm.io/gorm"
)

type CustomerGorm struct {
	ID   uint   `gorm:"primaryKey;column:id"`
	UUID string `gorm:"uniqueIndex;column:uuid"`
	//
	// some fields
	//
}

func (c CustomerGorm) ToEntity() (model.Customer, error) {
	parsed, err := uuid.Parse(c.UUID)
	if err != nil {
		return Customer{}, err
	}
	
	return model.Customer{
		ID: parsed,
		//
		// some fields
		//
	}, nil
}

type CustomerRepository struct {
	connection *gorm.DB
}

func (r *CustomerRepository) GetCustomer(ctx context.Context, ID uuid.UUID) (*model.Customer, error) {
	var row CustomerGorm
	err := r.connection.WithContext(ctx).Where("uuid = ?", ID).First(&row).Error
	if err != nil {
		return nil, err
	}
	
	customer, err := row.ToEntity()
	if err != nil {
		return nil, err
	}
	
	return &customer, nil
}
//
// other methods
//
```

Inside the example above, you may see a fragment of `CustomerRepository` implementation. Internally it uses Gorm for easier integration, but you may use pure SQL queries as well. Lately, I have been using [Ent](https://entgo.io/) library a lot.

In the example, you see two different structures, `Customer` and `CustomerGorm`. The first one is **Entity**, where we want to keep our business logic, some domain invariants, and rules. It does not know anything about the underlying database.

The second structure is a [**Data Transfer Object**](https://martinfowler.com/eaaCatalog/dataTransferObject.html), which defines how our data is transferred from and to storage. This structure does not have any other responsibility but to map the database’s data to our Entity.

> The division of those two structures is the fundamental point for using Repository as Anti-Corruption layer in our application. It makes sure that **technical details of table structure do not pollute our business logic**.

What are the consequences here? First, it is the truth that we need to maintain two types of structures, one for business logic, one for storage. In addition, I insert the third structure as well, the one I use as Data Transfer Object for my API.

This approach brings complexity to our application and many mapping functions.

Still, besides whole this maintenance, it brings new value to our code. We can provide our Entities inside the domain layer in a way that describes our business logic the best. We do not limit them with the storage we use.

We can use one type of identifier inside our business (like UUID) and another for the database (unsigned integer). That goes for any data we want to use for the database and business logic.

Whenever we make changes in any of those layers, we will probably make adaptations inside mapping functions, and the rest of the layer we will not touch (or at least destroy).

We can decide that we want to switch to MongoDB, Cassandra, or any other type of storage. We can switch to external API, but still, that will not affect our domain layer.

##### Persistence

We use the Repository primarily for querying. It works perfectly with another DDD pattern, Specification that you may notice in the examples. We can use it without Specification, but sometimes it makes our life easier.

The second feature of the Repository is Persistence. We define the logic for sending our data into the storage below to keep it permanently, update, or even delete.

- `generator.go`

```go
unc NewRow(customer Customer) CustomerGorm {
	return CustomerGorm{
		UUID: uuid.NewString(),
		//
		// some fields
		//
	}
}

type CustomerRepository struct {
	connection *gorm.DB
}

func (r *CustomerRepository) SaveCustomer(ctx context.Context, customer Customer) (*Customer, error) {
	row := NewRow(customer)
	err := r.connection.WithContext(ctx).Save(&row).Error
	if err != nil {
		return nil, err
	}

customer, err = row.ToEntity()
	if err != nil {
		return nil, err
	}

return &customer, nil
}
//
// other methods
//
```

##### Types of Repositories

It is a mistake to think that we should use Repository just for the database. Yes, we use it the most with databases, as they are our first choice for storage, but today the other types of storage are more popular.

As mentioned, we can use MongoDB or Cassandra. We can use a Repository for keeping our cache, and in that case, it would be Redis, for example. It can even be REST API or configurational file.

- `storages.go`

```go
// redis repository

type CustomerRepository struct {
	client *redis.Client
}

func (r *CustomerRepository) GetCustomer(ctx context.Context, ID uuid.UUID) (*Customer, error) {
	data, err := r.client.Get(ctx, fmt.Sprintf("user-%s", ID.String())).Result()
	if err != nil {
		return nil, err
	}

var row CustomerJSON
	err = json.Unmarshal([]byte(data), &row)
	if err != nil {
		return nil, err
	}
	
	customer := row.ToEntity()

return &customer, nil
}

// API

type CustomerRepository struct {
	client *http.Client
	baseUrl string
}

func (r *CustomerRepository) GetCustomer(ctx context.Context, ID uuid.UUID) (*Customer, error) {
	resp, err := r.client.Get(path.Join(r.baseUrl, "users", ID.String()))
	if err != nil {
		return nil, err
	}
	
	data, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

var row CustomerJSON
	err = json.Unmarshal(data, &row)
	if err != nil {
		return nil, err
	}

customer := row.ToEntity()

return &customer, nil
}
```

Now we can see the real benefit of having **a split between our business logic and technical details**. We keep the same interface for our Repository, so our domain layer can always use it.

> So, your **Repository Contract** should always deal with your **business logic**, but your Repository implementation must use internal structures that you can map later to Entities.

##### Conclusion

The Repository is the well-known pattern responsible for querying and persisting data inside underlying storage. It is the main point for Anti-Corruption inside our application.

We define it as a Contract inside the domain layer and keep the actual implementation inside the infrastructure layer. It is a place for generating application-made identifiers and for running transactions.

### Misc

- XFS file system vs Ext4 file system
  - XFS: Scalable and fast repair utilities. (> 500TB), compared to Exf4, XFS has a relatively poor performance for single threaded, metadata-intensive workloads
  - Ext4: handles less file sizes (maximum supported size < 16TB in RHEL)