2018年11月

原文:https://research.swtch.com/interfaces

Go's interfaces—static, checked at compile time, dynamic when asked for—are, for me, the most exciting part of Go from a language design point of view. If I could export one feature of Go into other languages, it would be interfaces.

This post is my take on the implementation of interface values in the “gc” compilers: 6g, 8g, and 5g. Over at Airs, Ian Lance Taylor has written two posts about the implementation of interface values in gccgo. The implementations are more alike than different: the biggest difference is that this post has pictures.

Before looking at the implementation, let's get a sense of what it must support.

Usage

Go's interfaces let you use duck typing like you would in a purely dynamic language like Python but still have the compiler catch obvious mistakes like passing an int where an object with a Read method was expected, or like calling the Read method with the wrong number of arguments. To use interfaces, first define the interface type (say, ReadCloser):

type ReadCloser interface {
    Read(b []byte) (n int, err os.Error)
    Close()
}

and then define your new function as taking a ReadCloser. For example, this function calls Read repeatedly to get all the data that was requested and then calls Close:

func ReadAndClose(r ReadCloser, buf []byte) (n int, err os.Error) {
    for len(buf) > 0 && err == nil {
        var nr int
        nr, err = r.Read(buf)
        n += nr
        buf = buf[nr:]
    }
    r.Close()
    return
}

The code that calls ReadAndClose can pass a value of any type as long as it has Read and Close methods with the right signatures. And, unlike in languages like Python, if you pass a value with the wrong type, you get an error at compile time, not run time.

Interfaces aren't restricted to static checking, though. You can check dynamically whether a particular interface value has an additional method. For example:

type Stringer interface {
    String() string
}

func ToString(any interface{}) string {
    if v, ok := any.(Stringer); ok {
        return v.String()
    }
    switch v := any.(type) {
    case int:
        return strconv.Itoa(v)
    case float:
        return strconv.Ftoa(v, 'g', -1)
    }
    return "???"
}

The value any has static type interface{}, meaning no guarantee of any methods at all: it could contain any type. The “comma ok” assignment inside the if statement asks whether it is possible to convert any to an interface value of type Stringer, which has the method String. If so, the body of that statement calls the method to obtain a string to return. Otherwise, the switch picks off a few basic types before giving up. This is basically a stripped down version of what the fmt package does. (The if could be replaced by adding case Stringer: at the top of the switch, but I used a separate statement to draw attention to the check.)

As a simple example, let's consider a 64-bit integer type with a String method that prints the value in binary and a trivial Get method:

type Binary uint64

func (i Binary) String() string {
    return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
    return uint64(i)
}

A value of type Binary can be passed to ToString, which will format it using the String method, even though the program never says that Binary intends to implement Stringer. There's no need: the runtime can see that Binary has a String method, so it implements Stringer, even if the author of Binary has never heard of Stringer.

These examples show that even though all the implicit conversions are checked at compile time, explicit interface-to-interface conversions can inquire about method sets at run time. “Effective Go” has more details about and examples of how interface values can be used.

Interface Values

Languages with methods typically fall into one of two camps: prepare tables for all the method calls statically (as in C++ and Java), or do a method lookup at each call (as in Smalltalk and its many imitators, JavaScript and Python included) and add fancy caching to make that call efficient. Go sits halfway between the two: it has method tables but computes them at run time. I don't know whether Go is the first language to use this technique, but it's certainly not a common one. (I'd be interested to hear about earlier examples; leave a comment below.)

As a warmup, a value of type Binary is just a 64-bit integer made up of two 32-bit words (like in the last post, we'll assume a 32-bit machine; this time memory grows down instead of to the right):

请输入图片描述

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data. Assigning b to an interface value of type Stringer sets both words of the interface value.
请输入图片描述

(The pointers contained in the interface value are gray to emphasize that they are implicit, not directly exposed to Go programs.)

The first word in the interface value points at what I call an interface table or itable (pronounced i-table; in the runtime sources, the C implementation name is Itab). The itable begins with some metadata about the types involved and then becomes a list of function pointers. Note that the itable corresponds to the interface type, not the dynamic type. In terms of our example, the itable for Stringer holding type Binary lists the methods used to satisfy Stringer, which is just String: Binary's other methods (Get) make no appearance in the itable.

The second word in the interface value points at the actual data, in this case a copy of b. The assignment var s Stringer = b makes a copy of b rather than point at b for the same reason that var c uint64 = b makes a copy: if b later changes, s and c are supposed to have the original value, not the new one. Values stored in interfaces might be arbitrarily large, but only one word is dedicated to holding the value in the interface structure, so the assignment allocates a chunk of memory on the heap and records the pointer in the one-word slot. (There's an obvious optimization when the value does fit in the slot; we'll get to that later.)

To check whether an interface value holds a particular type, as in the type switch above, the Go compiler generates code equivalent to the C expression s.tab->type to obtain the type pointer and check it against the desired type. If the types match, the value can be copied by by dereferencing s.data.

To call s.String(), the Go compiler generates code that does the equivalent of the C expression s.tab->fun[0](s.data): it calls the appropriate function pointer from the itable, passing the interface value's data word as the function's first (in this example, only) argument. You can see this code if you run 8g -S x.go (details at the bottom of this post). Note that the function in the itable is being passed the 32-bit pointer from the second word of the interface value, not the 64-bit value it points at. In general, the interface call site doesn't know the meaning of this word nor how much data it points at. Instead, the interface code arranges that the function pointers in the itable expect the 32-bit representation stored in the interface values. Thus the function pointer in this example is (*Binary).String not Binary.String.

The example we're considering is an interface with just one method. An interface with more methods would have more entries in the fun list at the bottom of the itable.

Computing the Itable

Now we know what the itables look like, but where do they come from? Go's dynamic type conversions mean that it isn't reasonable for the compiler or linker to precompute all possible itables: there are too many (interface type, concrete type) pairs, and most won't be needed. Instead, the compiler generates a type description structure for each concrete type like Binary or int or func(map[int]string). Among other metadata, the type description structure contains a list of the methods implemented by that type. Similarly, the compiler generates a (different) type description structure for each interface type like Stringer; it too contains a method list. The interface runtime computes the itable by looking for each method listed in the interface type's method table in the concrete type's method table. The runtime caches the itable after generating it, so that this correspondence need only be computed once.

In our simple example, the method table for Stringer has one method, while the table for Binary has two methods. In general there might be ni methods for the interface type and nt methods for the concrete type. The obvious search to find the mapping from interface methods to concrete methods would take O(ni × nt) time, but we can do better. By sorting the two method tables and walking them simultaneously, we can build the mapping in O(ni + nt) time instead.

Memory Optimizations

The space used by the implementation described above can be optimized in two complementary ways.

First, if the interface type involved is empty—it has no methods—then the itable serves no purpose except to hold the pointer to the original type. In this case, the itable can be dropped and the value can point at the type directly:

请输入图片描述

Whether an interface type has methods is a static property—either the type in the source code says interface{} or it says interace{ methods... }—so the compiler knows which representation is in use at each point in the program.

Second, if the value associated with the interface value can fit in a single machine word, there's no need to introduce the indirection or the heap allocation. If we define Binary32 to be like Binary but implemented as a uint32, it could be stored in an interface value by keeping the actual value in the second word:
请输入图片描述

Whether the actual value is being pointed at or inlined depends on the size of the type. The compiler arranges for the functions listed in the type's method table (which get copied into the itables) to do the right thing with the word that gets passed in. If the receiver type fits in a word, it is used directly; if not, it is dereferenced. The diagrams show this: in the Binary version far above, the method in the itable is (*Binary).String, while in the Binary32 example, the method in the itable is Binary32.String not (*Binary32).String.

Of course, empty interfaces holding word-sized (or smaller) values can take advantage of both optimizations:
请输入图片描述

Method Lookup Performance

Smalltalk and the many dynamic systems that have followed it perform a method lookup every time a method gets called. For speed, many implementations use a simple one-entry cache at each call site, often in the instruction stream itself. In a multithreaded program, these caches must be managed carefully, since multiple threads could be at the same call site simultaneously. Even once the races have been avoided, the caches would end up being a source of memory contention.

Because Go has the hint of static typing to go along with the dynamic method lookups, it can move the lookups back from the call sites to the point when the value is stored in the interface. For example, consider this code snippet:

1   var any interface{}  // initialized elsewhere
2   s := any.(Stringer)  // dynamic conversion
3   for i := 0; i < 100; i++ {
4       fmt.Println(s.String())
5   }

In Go, the itable gets computed (or found in a cache) during the assignment on line 2; the dispatch for the s.String() call executed on line 4 is a couple of memory fetches and a single indirect call instruction.

In contrast, the implementation of this program in a dynamic language like Smalltalk (or JavaScript, or Python, or ...) would do the method lookup at line 4, which in a loop repeats needless work. The cache mentioned earlier makes this less expensive than it might be, but it's still more expensive than a single indirect call instruction.

Of course, this being a blog post, I don't have any numbers to back up this discussion, but it certainly seems like the lack of memory contention would be a big win in a heavily parallel program, as is being able to move the method lookup out of tight loops. Also, I'm talking about the general architecture, not the specifics o the implementation: the latter probably has a few constant factor optimizations still available.

More Information

The interface runtime support is in $GOROOT/src/pkg/runtime/iface.c. There's much more to say about interfaces (we haven't even seen an example of a pointer receiver yet) and the type descriptors (they power reflection in addition to the interface runtime) but those will have to wait for future posts.

Code

Supporting code (x.go):

package main

import (
 "fmt"
 "strconv"
)

type Stringer interface {
 String() string
}

type Binary uint64

func (i Binary) String() string {
 return strconv.Uitob64(i.Get(), 2)
}

func (i Binary) Get() uint64 {
 return uint64(i)
}

func main() {
 b := Binary(200)
 s := Stringer(b)
 fmt.Println(s.String())
}

Selected output of 8g -S x.go:

0045 (x.go:25) LEAL    s+-24(SP),BX
0046 (x.go:25) MOVL    4(BX),BP
0047 (x.go:25) MOVL    BP,(SP)
0048 (x.go:25) MOVL    (BX),BX
0049 (x.go:25) MOVL    20(BX),BX
0050 (x.go:25) CALL    ,BX

The LEAL loads the address of s into the register BX. (The notation n(SP) describes the word in memory at SP+n. 0(SP) can be shortened to (SP).) The next two MOVL instructions fetch the value from the second word in the interface and store it as the first function call argument, 0(SP). The final two MOVL instructions fetch the itable and then the function pointer from the itable, in preparation for calling that function.

原文:https://blog.golang.org/laws-of-reflection
译文:https://juejin.im/post/5bfb950751882511630d53bd?utm_source=gold_browser_extension

Introduction

Reflection in computing is the ability of a program to examine its own structure, particularly through types; it's a form of metaprogramming. It's also a great source of confusion.

In this article we attempt to clarify things by explaining how reflection works in Go. Each language's reflection model is different (and many languages don't support it at all), but this article is about Go, so for the rest of this article the word "reflection" should be taken to mean "reflection in Go".

Types and interfaces

Because reflection builds on the type system, let's start with a refresher about types in Go.

Go is statically typed. Every variable has a static type, that is, exactly one type known and fixed at compile time: int, float32, *MyType, []byte, and so on. If we declare

type MyInt int

var i int
var j MyInt

then i has type int and j has type MyInt. The variables i and j have distinct static types and, although they have the same underlying type, they cannot be assigned to one another without a conversion.

One important category of type is interface types, which represent fixed sets of methods. An interface variable can store any concrete (non-interface) value as long as that value implements the interface's methods. A well-known pair of examples is io.Reader and io.Writer, the types Reader and Writer from the io package:

// Reader is the interface that wraps the basic Read method.

type Reader interface {
    Read(p []byte) (n int, err error)
}

// Writer is the interface that wraps the basic Write method.

type Writer interface {
    Write(p []byte) (n int, err error)
}

Any type that implements a Read (or Write) method with this signature is said to implement io.Reader (or io.Writer). For the purposes of this discussion, that means that a variable of type io.Reader can hold any value whose type has a Read method:

var r io.Reader
r = os.Stdin
r = bufio.NewReader(r)
r = new(bytes.Buffer)
// and so on

It's important to be clear that whatever concrete value r may hold, r's type is always io.Reader: Go is statically typed and the static type of r is io.Reader.

An extremely important example of an interface type is the empty interface:

interface{}

It represents the empty set of methods and is satisfied by any value at all, since any value has zero or more methods.

Some people say that Go's interfaces are dynamically typed, but that is misleading. They are statically typed: a variable of interface type always has the same static type, and even though at run time the value stored in the interface variable may change type, that value will always satisfy the interface.

We need to be precise about all this because reflection and interfaces are closely related.

The representation of an interface

Russ Cox has written a detailed blog post about the representation of interface values in Go. It's not necessary to repeat the full story here, but a simplified summary is in order.

A variable of interface type stores a pair: the concrete value assigned to the variable, and that value's type descriptor. To be more precise, the value is the underlying concrete data item that implements the interface and the type describes the full type of that item. For instance, after

var r io.Reader
tty, err := os.OpenFile("/dev/tty", os.O_RDWR, 0)
if err != nil {
    return nil, err
}
r = tty

r contains, schematically, the (value, type) pair, (tty, *os.File). Notice that the type *os.File implements methods other than Read; even though the interface value provides access only to the Read method, the value inside carries all the type information about that value. That's why we can do things like this:

var w io.Writer
w = r.(io.Writer)

The expression in this assignment is a type assertion; what it asserts is that the item inside r also implements io.Writer, and so we can assign it to w. After the assignment, w will contain the pair (tty, *os.File). That's the same pair as was held in r. The static type of the interface determines what methods may be invoked with an interface variable, even though the concrete value inside may have a larger set of methods.

Continuing, we can do this:

var empty interface{}
empty = w

and our empty interface value empty will again contain that same pair, (tty, *os.File). That's handy: an empty interface can hold any value and contains all the information we could ever need about that value.

(We don't need a type assertion here because it's known statically that w satisfies the empty interface. In the example where we moved a value from a Reader to a Writer, we needed to be explicit and use a type assertion because Writer's methods are not a subset of Reader's.)

One important detail is that the pair inside an interface always has the form (value, concrete type) and cannot have the form (value, interface type). Interfaces do not hold interface values.

Now we're ready to reflect.

The first law of reflection

Reflection goes from interface value to reflection object.

At the basic level, reflection is just a mechanism to examine the type and value pair stored inside an interface variable. To get started, there are two types we need to know about in package reflect: Type and Value. Those two types give access to the contents of an interface variable, and two simple functions, called reflect.TypeOf and reflect.ValueOf, retrieve reflect.Type and reflect.Value pieces out of an interface value. (Also, from the reflect.Value it's easy to get to the reflect.Type, but let's keep the Value and Type concepts separate for now.)

Let's start with TypeOf:

package main

import (
    "fmt"
    "reflect"
)

func main() {
    var x float64 = 3.4
    fmt.Println("type:", reflect.TypeOf(x))
}

This program prints

type: float64

You might be wondering where the interface is here, since the program looks like it's passing the float64 variable x, not an interface value, to reflect.TypeOf. But it's there; as godoc reports, the signature of reflect.TypeOf includes an empty interface:

// TypeOf returns the reflection Type of the value in the interface{}.
func TypeOf(i interface{}) Type

When we call reflect.TypeOf(x), x is first stored in an empty interface, which is then passed as the argument; reflect.TypeOf unpacks that empty interface to recover the type information.

The reflect.ValueOf function, of course, recovers the value (from here on we'll elide the boilerplate and focus just on the executable code):

var x float64 = 3.4
fmt.Println("value:", reflect.ValueOf(x).String())
prints

value: <float64 Value>

(We call the String method explicitly because by default the fmt package digs into a reflect.Value to show the concrete value inside. The String method does not.)

Both reflect.Type and reflect.Value have lots of methods to let us examine and manipulate them. One important example is that Value has a Type method that returns the Type of a reflect.Value. Another is that both Type and Value have a Kind method that returns a constant indicating what sort of item is stored: Uint, Float64, Slice, and so on. Also methods on Value with names like Int and Float let us grab values (as int64 and float64) stored inside:

var x float64 = 3.4
v := reflect.ValueOf(x)
fmt.Println("type:", v.Type())
fmt.Println("kind is float64:", v.Kind() == reflect.Float64)
fmt.Println("value:", v.Float())
prints

type: float64
kind is float64: true
value: 3.4

There are also methods like SetInt and SetFloat but to use them we need to understand settability, the subject of the third law of reflection, discussed below.

The reflection library has a couple of properties worth singling out. First, to keep the API simple, the "getter" and "setter" methods of Value operate on the largest type that can hold the value: int64 for all the signed integers, for instance. That is, the Int method of Value returns an int64 and the SetInt value takes an int64; it may be necessary to convert to the actual type involved:

var x uint8 = 'x'
v := reflect.ValueOf(x)
fmt.Println("type:", v.Type())                            // uint8.
fmt.Println("kind is uint8: ", v.Kind() == reflect.Uint8) // true.
x = uint8(v.Uint())                                       // v.Uint returns a uint64.

The second property is that the Kind of a reflection object describes the underlying type, not the static type. If a reflection object contains a value of a user-defined integer type, as in

type MyInt int
var x MyInt = 7
v := reflect.ValueOf(x)

the Kind of v is still reflect.Int, even though the static type of x is MyInt, not int. In other words, the Kind cannot discriminate an int from a MyInt even though the Type can.

The second law of reflection

Reflection goes from reflection object to interface value.

Like physical reflection, reflection in Go generates its own inverse.

Given a reflect.Value we can recover an interface value using the Interface method; in effect the method packs the type and value information back into an interface representation and returns the result:

// Interface returns v's value as an interface{}.
func (v Value) Interface() interface{}
As a consequence we can say

y := v.Interface().(float64) // y will have type float64.
fmt.Println(y)
to print the float64 value represented by the reflection object v.

We can do even better, though. The arguments to fmt.Println, fmt.Printf and so on are all passed as empty interface values, which are then unpacked by the fmt package internally just as we have been doing in the previous examples. Therefore all it takes to print the contents of a reflect.Value correctly is to pass the result of the Interface method to the formatted print routine:

fmt.Println(v.Interface())
(Why not fmt.Println(v)? Because v is a reflect.Value; we want the concrete value it holds.) Since our value is a float64, we can even use a floating-point format if we want:

fmt.Printf("value is %7.1e\n", v.Interface())
and get in this case

3.4e+00

Again, there's no need to type-assert the result of v.Interface() to float64; the empty interface value has the concrete value's type information inside and Printf will recover it.

In short, the Interface method is the inverse of the ValueOf function, except that its result is always of static type interface{}.

Reiterating: Reflection goes from interface values to reflection objects and back again.

The third law of reflection

To modify a reflection object, the value must be settable.

The third law is the most subtle and confusing, but it's easy enough to understand if we start from first principles.

Here is some code that does not work, but is worth studying.

var x float64 = 3.4
v := reflect.ValueOf(x)
v.SetFloat(7.1) // Error: will panic.

If you run this code, it will panic with the cryptic message

panic: reflect.Value.SetFloat using unaddressable value

The problem is not that the value 7.1 is not addressable; it's that v is not settable. Settability is a property of a reflection Value, and not all reflection Values have it.

The CanSet method of Value reports the settability of a Value; in our case,

var x float64 = 3.4
v := reflect.ValueOf(x)
fmt.Println("settability of v:", v.CanSet())
prints

settability of v: false
It is an error to call a Set method on an non-settable Value. But what is settability?

Settability is a bit like addressability, but stricter. It's the property that a reflection object can modify the actual storage that was used to create the reflection object. Settability is determined by whether the reflection object holds the original item. When we say

var x float64 = 3.4
v := reflect.ValueOf(x)

we pass a copy of x to reflect.ValueOf, so the interface value created as the argument to reflect.ValueOf is a copy of x, not x itself. Thus, if the statement

v.SetFloat(7.1)

were allowed to succeed, it would not update x, even though v looks like it was created from x. Instead, it would update the copy of x stored inside the reflection value and x itself would be unaffected. That would be confusing and useless, so it is illegal, and settability is the property used to avoid this issue.

If this seems bizarre, it's not. It's actually a familiar situation in unusual garb. Think of passing x to a function:

f(x)

We would not expect f to be able to modify x because we passed a copy of x's value, not x itself. If we want f to modify x directly we must pass our function the address of x (that is, a pointer to x):

f(&x)

This is straightforward and familiar, and reflection works the same way. If we want to modify x by reflection, we must give the reflection library a pointer to the value we want to modify.

Let's do that. First we initialize x as usual and then create a reflection value that points to it, called p.

var x float64 = 3.4
p := reflect.ValueOf(&x) // Note: take the address of x.
fmt.Println("type of p:", p.Type())
fmt.Println("settability of p:", p.CanSet())

The output so far is

type of p: *float64
settability of p: false

The reflection object p isn't settable, but it's not p we want to set, it's (in effect) *p. To get to what p points to, we call the Elem method of Value, which indirects through the pointer, and save the result in a reflection Value called v:

v := p.Elem()
fmt.Println("settability of v:", v.CanSet())

Now v is a settable reflection object, as the output demonstrates,

settability of v: true

and since it represents x, we are finally able to use v.SetFloat to modify the value of x:

v.SetFloat(7.1)
fmt.Println(v.Interface())
fmt.Println(x)

The output, as expected, is

7.1
7.1

Reflection can be hard to understand but it's doing exactly what the language does, albeit through reflection Types and Values that can disguise what's going on. Just keep in mind that reflection Values need the address of something in order to modify what they represent.

Structs

In our previous example v wasn't a pointer itself, it was just derived from one. A common way for this situation to arise is when using reflection to modify the fields of a structure. As long as we have the address of the structure, we can modify its fields.

Here's a simple example that analyzes a struct value, t. We create the reflection object with the address of the struct because we'll want to modify it later. Then we set typeOfT to its type and iterate over the fields using straightforward method calls (see package reflect for details). Note that we extract the names of the fields from the struct type, but the fields themselves are regular reflect.Value objects.

type T struct {
    A int
    B string
}
t := T{23, "skidoo"}
s := reflect.ValueOf(&t).Elem()
typeOfT := s.Type()
for i := 0; i < s.NumField(); i++ {
    f := s.Field(i)
    fmt.Printf("%d: %s %s = %v\n", i,
        typeOfT.Field(i).Name, f.Type(), f.Interface())
}

The output of this program is

0: A int = 23
1: B string = skidoo

There's one more point about settability introduced in passing here: the field names of T are upper case (exported) because only exported fields of a struct are settable.

Because s contains a settable reflection object, we can modify the fields of the structure.

s.Field(0).SetInt(77)
s.Field(1).SetString("Sunset Strip")
fmt.Println("t is now", t)

And here's the result:

t is now {77 Sunset Strip}

If we modified the program so that s was created from t, not &t, the calls to SetInt and SetString would fail as the fields of t would not be settable.

Conclusion

Here again are the laws of reflection:

  • Reflection goes from interface value to reflection object.
  • Reflection goes from reflection object to interface value.
  • To modify a reflection object, the value must be settable.

Once you understand these laws reflection in Go becomes much easier to use, although it remains subtle. It's a powerful tool that should be used with care and avoided unless strictly necessary.

There's plenty more to reflection that we haven't covered — sending and receiving on channels, allocating memory, using slices and maps, calling methods and functions — but this post is long enough. We'll cover some of those topics in a later article.

By Rob Pike

原文:如何判断一个元素在亿级数据中是否存在?

Bloom Filter 原理

官方的说法是:它是一个保存了很长的二级制向量,同时结合 Hash 函数实现的。

听起来比较绕,但是通过一个图就比较容易理解了。
bloom filter原理

如图所示:

  • 首先需要初始化一个二进制的数组,长度设为 L(图中为 8),同时初始值全为 0 。
  • 当写入一个 A1=1000 的数据时,需要进行 H 次 hash 函数的运算(这里为 2 次);与 HashMap 有点类似,通过算出的 HashCode 与 L 取模后定位到 0、2 处,将该处的值设为 1。
  • A2=2000 也是同理计算后将 4、7 位置设为 1。
  • 当有一个 B1=1000 需要判断是否存在时,也是做两次 Hash 运算,定位到 0、2 处,此时他们的值都为 1 ,所以认为 B1=1000 存在于集合中。
  • 当有一个 B2=3000 时,也是同理。第一次 Hash 定位到 index=4 时,数组中的值为 1,所以再进行第二次 Hash 运算,结果定位到 index=5 的值为 0,所以认为 B2=3000 不存在于集合中。

整个的写入、查询的流程就是这样,汇总起来就是:

对写入的数据做 H 次 hash 运算定位到数组中的位置,同时将数据改为 1 。当有数据查询时也是同样的方式定位到数组中。
一旦其中的有一位为 0 则认为数据肯定不存在于集合,否则数据可能存在于集合中。

所以布隆过滤有以下几个特点:

  1. 只要返回数据不存在,则肯定不存在。
  2. 返回数据存在,但只能是大概率存在。
  3. 同时不能清除其中的数据。

第一点应该都能理解,重点解释下 2、3 点。
为什么返回存在的数据却是可能存在呢,这其实也和 HashMap 类似。
在有限的数组长度中存放大量的数据,即便是再完美的 Hash 算法也会有冲突,所以有可能两个完全不同的 A、B 两个数据最后定位到的位置是一模一样的。
这时拿 B 进行查询时那自然就是误报了。
删除数据也是同理,当我把 B 的数据删除时,其实也相当于是把 A 的数据删掉了,这样也会造成后续的误报。
基于以上的 Hash 冲突的前提,所以 Bloom Filter 有一定的误报率,这个误报率和 Hash 算法的次数 H,以及数组长度 L 都是有关的。

背景:最近用 nginx 作为 7 层代理,进行转发,但是上线时候发现晚上高峰期会有很多 502 time out ,从而导致 no live upstreams while connecting to upstream ,造成短时间服务不可用

问题排查:


1. 一开始以为是网络抖动,因为代理机器跟后端机器跨机房了
    但是连续几天都会产生告警,而且网络监控也没有观察到网络出了问题。所以问题不应该出现在网络上
2. 初步认为是 后端机器的 性能问题
    通过 elk 查看 后端机器的 error log 和 access log,发现并没有任何 502 错误,cpu 和 内存使用率也很正常(说明并不是后端机器性能问题)
3. 观察代理机器的错误日志, 发现12 点的时候尤其多,
    查看了下系统的 crontab ,发现12 点的时候有个日志压缩的 进程会启动,初步怀疑这个进程会造成影响,将时间换到了低峰期,第二天观察,发现还是会有很多 502 的告警(比之前少 10% 左右)
4. 将代理机器的错误日志根据 request url 进行统计,发现有大量的 静态资源请求,而且后面带了一串 hash 值,类似:"img?a=123dsfa24f"
    使用vmstat 1查看代理机器的io 性能,发现有大量的读写,procs 栏处于b 状态的很多。ps aux 查看进程状态,发现 nginx worker 的状态 有很多是D,再翻开 nginx upstream 的配置,发现开启了 proxy cache ,会不会是因为大量的小文件读写以及删除引起的呢?
    先尝试把 其中一台 nginx proxy 的 proxy cache 关闭掉,然后进行观察,第二天发现,关掉 proxy cache 的机器 502 的超时的请求不见了,然后把剩下的机器的 proxy cache 也关掉,然后问题就解决了。。

小插曲:

由于告警进程有问题(告警堆积的问题),导致告警的时间和出问题的时间有偏差,导致一开始排查问题的方向被带偏了,用了很多的时间

结论:

系统监控还是很重要的,
max_fails 参数可以调大点 fail_timeout 可以调小点(看业务量而定)
nginx proxy cache 对于这种大量小文件的作用不明显,后续可以考虑上 cdn

原文:https://gocn.vip/article/441
之前在 golang 群里有人问过为什么程序会莫名其妙的 hang 死然后不再响应任何请求。单核 cpu 打满。

这个特征和我们公司的某个系统曾经遇到的情况很相似,内部经过了很长时间的定位分析总结,期间还各种阅读 golangruntimegc 代码,最终才定位到是业务里出现了类型下面这样的代码:

package main

import "runtime"

func main() {
    var ch = make(chan int, 100)
    go func() {
        for i := 0; i < 100; i++ {
            ch <- 1
            if i == 88 {
                runtime.GC()
            }
        }
    }()

    for {
        // the wrong part
        if len(ch) == 100 {
            sum := 0
            itemNum := len(ch)
            for i := 0; i < itemNum; i++ {
                sum += <-ch
            }
            if sum == itemNum {
                return
            }
        }
    }

}

main goroutine 里循环判断 ch 里是否数据被填满,在另一个 goroutine 里把 100 条数据塞到 ch 里。看起来程序应该是可以正常结束的对不对?

感兴趣的话你可以自己尝试运行一下。实际上这个程序在稍微老一些版本的 golang 上是会卡死在后面这个 for 循环上的。原因呢?

使用:

GODEBUG="schedtrace=300,scheddetail=1" ./test1

应该可以看到这时候 gcwaiting 标记为 1。所以当初都怀疑是 golang gcbug。。但最终折腾了半天才发现还是自己的代码的问题。

因为在 for 循环中没有函数调用的话,编译器不会插入调度代码,所以这个执行 for 循环的 goroutine 没有办法被调出,而在循环期间碰到 gc,那么就会卡在 gcwaiting 阶段,并且整个进程永远 hang 死在这个循环上。并不再对外响应。

当然,上面这段程序在最新版本的 golang 1.8/1.9 中已经不会 hang 住了(实验结果,没有深究原因)。某次更新说明中官方声称在密集循环中理论上也会让其它的 goroutine 有被调度的机会,那么我们选择相信官方,试一下下面这个程序:

package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "runtime"
    "time"
)

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU())
    go server()
    go printNum()
    var i = 1
    for {
        // will block here, and never go out
        i++
    }
    fmt.Println("for loop end")
    time.Sleep(time.Second * 3600)
}

func printNum() {
    i := 0
    for {
        fmt.Println(i)
        i++
    }
}

func HelloServer(w http.ResponseWriter, req *http.Request) {
    io.WriteString(w, "hello, world!\n")
}

func server() {
    http.HandleFunc("/", HelloServer)
    err := http.ListenAndServe(":12345", nil)

    if err != nil {
        log.Fatal("ListenAndServe: ", err)
    }
}

运行几秒之后 curl 一发:

curl localhost:12345

感觉还是不要再相信官方了。研究研究之后不小心写出了这样的 bug 怎么定位比较好。首先分析一下这种类型 bug 发生时的程序特征:

  1. 卡死在 for 循环上
  2. gcwaiting=1
  3. 没有系统调用

由于没有系统调用,不是系统调用导致的锅,所以我们没有办法借助 strace 之类的工具看程序是不是 hang 在系统调用上。而 gcwaiting=1 实际上并不能帮我们定位到问题到底出现在哪里。

然后就剩卡死在 for 循环上了,密集的 for 循环一般会导致一个 cpu 核心被打满。如果之前做过系统编程的同学应该对 perf 这个工具很了解,可以使用:

perf top

对 cpu 的使用情况进行采样,这样我们就可以对 cpu 使用排名前列的程序函数进行定位。实际上 perf top 的执行结果也非常直观:

  99.52%  ffff                     [.] main.main
   0.06%  [kernel]                 [k] __do_softirq
   0.05%  [kernel]                 [k] 0x00007fff81843a35
   0.03%  [kernel]                 [k] mpt_put_msg_frame
   0.03%  [kernel]                 [k] finish_task_switch
   0.03%  [kernel]                 [k] tick_nohz_idle_enter
   0.02%  perf                     [.] 0x00000000000824d7
   0.02%  [kernel]                 [k] e1000_xmit_frame
   0.02%  [kernel]                 [k] VbglGRPerform

你看,我们的程序实际上是卡在了 main.main 函数上。一发命令秒级定位。

妈妈再也不用担心我的程序不小心写出死循环了。实际上有时候我的一个普通循环为什么变成了死循环并不是像上面这样简单的 demo 那样好查,这时候你还可以用上 delve,最近就帮 jsoniter 定位了一个类似上面这样的 bug

https://github.com/gin-gonic/gin/issues/1086

perf 定位到函数,再用 pid attach 到进程,找到正在执行循环的 goroutine,然后结合 locals 的打印一路 next

问题定位 over。