Goroutine Stack Buffers

Je me suis demandé en plaçant des tableaux et des tranches sur la pile, comme std :: array en C ++ et des tableaux en C. La pile a été bien écrite par Vincent Blanchon dans l'article Go: Comment évolue la taille de la pile de Goroutine? . Vincent parle des changements de pile dans les goroutines. En résumé:


  • taille minimale de la pile 2 Ko;
  • la taille maximale dépend de l'architecture, sur une architecture 64 bits 1 Go;
  • chaque fois que la taille de la pile double;
  • la pile augmente et diminue à la fois.

Je vais découvrir combien peut être mis sur la pile et ce que les frais généraux peuvent apporter.J'utiliserai go 1.13.8 compilé à partir des sources avec des informations de débogage et des options de compilation -gcflags=-m.


Pour obtenir la sortie de débogage pendant l'exécution du programme, vous devez runtime/stack.godéfinir la constante stackDebugà la valeur souhaitée:


const (
    // stackDebug == 0: no logging
    //            == 1: logging of per-stack operations
    //            == 2: logging of per-frame operations
    //            == 3: logging of per-word updates
    //            == 4: logging of per-word reads
    stackDebug       = 0
    //...
)

Programme très simple


Je vais commencer par un programme qui ne fait rien pour voir comment la pile se démarque sous le goroutine principal:


package main

func main() {
}

Nous voyons comment le pool global est rempli de segments 2, 8, 32 Ko:


$ GOMAXPROCS=1 ./app 
stackalloc 32768
  allocated 0xc000002000
stackalloc 2048
stackcacherefill order=0
  allocated 0xc000036000
stackalloc 32768
  allocated 0xc00003a000
stackalloc 8192
stackcacherefill order=2
  allocated 0xc000046000
stackalloc 2048
  allocated 0xc000036800
stackalloc 2048
  allocated 0xc000037000
stackalloc 2048
  allocated 0xc000037800
stackalloc 32768
  allocated 0xc00004e000
stackalloc 8192
  allocated 0xc000048000

C'est un point important. Nous pouvons nous attendre à ce que la mémoire allouée soit réutilisée.


programme pas très simple


Où sans Bonjour tout le monde !


package main

import "fmt"

func helloWorld()  {
    fmt.Println("Hello world!")
}

func main() {
    helloWorld()
}

runtime: newstack sp=0xc000036348 stack=[0xc000036000, 0xc000036800]
    morebuf={pc:0x41463b sp:0xc000036358 lr:0x0}
    sched={pc:0x4225b1 sp:0xc000036350 lr:0x0 ctxt:0x0}
stackalloc 4096
stackcacherefill order=1
  allocated 0xc000060000
copystack gp=0xc000000180 [0xc000036000 0xc000036350 0xc000036800] -> [0xc000060000 0xc000060b50 0xc000061000]/4096
stackfree 0xc000036000 2048
stack grow done
stackalloc 2048
  allocated 0xc000036000
Hello world!

La taille de 2 Ko n'était pas suffisante et l'allocation de 4 Ko du bloc est allée. Nous traiterons les chiffres entre crochets. Ce sont des adresses; aux bords, c'est la valeur de la structure:


// Stack describes a Go execution stack.
// The bounds of the stack are exactly [lo, hi),
// with no implicit data structures on either side.
type stack struct {
    lo uintptr
    hi uintptr
}

Au milieu, il s'agit d'une adresse indiquant combien la pile est utilisée, calculée comme suit:


//...
used := old.hi - gp.sched.sp
//...
print("copystack gp=", gp, " [", hex(old.lo), " ", hex(old.hi-used), " ", hex(old.hi), "]", " -> [", hex(new.lo), " ", hex(new.hi-used), " ", hex(new.hi), "]/", newsize, "\n")

:


[0xc000036000 0xc000036350 0xc000036800]2048848
[0xc000060000 0xc000060b50 0xc000061000]40962896

- :


package main

import (
    "fmt"
    "time"
)

func helloWorld()  {
    fmt.Println("Hello world!")
}

func main() {
    for i := 0 ; i< 12 ; i++ {
        go helloWorld()
    }

    time.Sleep(5*time.Second)
}

$ ./app 2>&1 | grep "alloc 4"
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096

, . .



stack_test.go , . bigsize, go build -gcflags=-m main.go:8:6: moved to heap: x.


package main

const bigsize = 1024*1024*10
const depth = 50

//go:noinline
func step(i int) byte {
    var x [bigsize]byte
    if i != depth {
        return x[i] * step(i+1)
    } else {
        return x[i]
    }
}

func main() {
    step(0)
}

10 MB, depth > 50 stack overflow:


runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

, -gcflags="-m"


package main

import (
    "fmt"
    "os"
)

const bigsize = 1024 * 1024 * 10

func readSelf(buff []byte) int { // readSelf buff does not escape
    f, err := os.Open("main.go") // inlining call to os.Open
    if err != nil {
        panic(err)
    }

    count, _ := f.Read(buff)
    if err := f.Close(); err != nil { // inlining call to os.(*File).Close
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) { // printSelf buff does not escape
    tmp := string(buff[:count]) // string(buff[:count]) escapes to heap
    // inlining call to fmt.Println
    // tmp escapes to heap
    // printSelf []interface {} literal does not escape
    // io.Writer(os.Stdout) escapes to heap
    fmt.Println(tmp)
}

func foo() {
    var data [bigsize]byte
    cnt := readSelf(data[:])
    printSelf(data[:], cnt)
}

func main() {
    foo() // can inline main
}

, , :


stackalloc 2048
stackcacherefill order=0
  allocated 0xc000042000
runtime: newstack sp=0xc00008ef40 stack=[0xc00008e000, 0xc00008f000]
    morebuf={pc:0x48e4e0 sp:0xc00008ef50 lr:0x0}
    sched={pc:0x48e4b8 sp:0xc00008ef48 lr:0x0 ctxt:0x0}
stackalloc 8192
stackcacherefill order=2
  allocated 0xc000078000
copystack gp=0xc000000180 [0xc00008e000 0xc00008ef48 0xc00008f000] -> [0xc000078000 0xc000079f48 0xc00007a000]/8192
stackfree 0xc00008e000 4096
stack grow done
...
runtime: newstack sp=0xc0010aff40 stack=[0xc0008b0000, 0xc0010b0000]
    morebuf={pc:0x48e4e0 sp:0xc0010aff50 lr:0x0}
    sched={pc:0x48e4b8 sp:0xc0010aff48 lr:0x0 ctxt:0x0}
stackalloc 16777216
  allocated 0xc0010b0000
copystack gp=0xc000000180 [0xc0008b0000 0xc0010aff48 0xc0010b0000] -> [0xc0010b0000 0xc0020aff48 0xc0020b0000]/16777216
stackfree 0xc0008b0000 8388608
stack grow done

, .



64 KB :


package main

import (
    "fmt"
    "os"
)

const bigsize = 1024*64 - 1

func readSelf(buff []byte) int { // readSelf buff does not escape
    f, err := os.Open("main.go") // inlining call to os.Open
    if err != nil {
        panic(err)
    }

    count, _ := f.Read(buff)
    if err := f.Close(); err != nil { // inlining call to os.(*File).Close
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) { // printSelf buff does not escape
    tmp := string(buff[:count]) // string(buff[:count]) escapes to heap
    // inlining call to fmt.Println
    // tmp escapes to heap
    // printSelf []interface {} literal does not escape
    // io.Writer(os.Stdout) escapes to heap
    fmt.Println(tmp)
}

func foo() {
    data := make([]byte, bigsize) // make([]byte, bigsize) does not escape
    cnt := readSelf(data)
    printSelf(data, cnt)
}

func main() { // can inline main
    foo()
}

stackalloc 131072
  allocated 0xc0000d0000
copystack gp=0xc000000180 [0xc0000c0000 0xc0000cff48 0xc0000d0000] -> [0xc0000d0000 0xc0000eff48 0xc0000f0000]/131072
stackfree 0xc0000c0000 65536


Go (zero values). . , sync.Pool.


:


  • ( );
  • ;
  • ( /dev/null).

, , .


func readSelf(buff []byte) int {
    f, err := os.Open("bench_test.go")
    if err != nil {
        panic(err)
    }

    count, err := f.Read(buff)
    if err != nil {
        panic(err)
    }

    if err := f.Close(); err != nil {
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) {
    f, err := os.OpenFile(os.DevNull, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
    if err != nil {
        panic(err)
    }
    if _, err := f.Write(buff[:count]); err != nil {
        panic(err)
    }

    if err := f.Close(); err != nil {
        panic(err)
    }
}

//go:noinline
func usePoolBlock4k() {
    inputp := pool4blocks.Get().(*[]byte)
    outputp := pool4blocks.Get().(*[]byte)

    cnt := readSelf(*inputp)
    copy(*outputp, *inputp)
    printSelf(*outputp, cnt)

    pool4blocks.Put(inputp)
    pool4blocks.Put(outputp)
}

//go:noinline
func useStackBlock4k() {
    var input [block4k]byte
    var output [block4k]byte

    cnt := readSelf(input[:])
    copy(output[:], input[:])
    printSelf(output[:], cnt)
}

func BenchmarkPoolBlock4k(b *testing.B) {
    runtime.GC()
    for i := 0; i < 8; i++ {
        data := pool4blocks.Get()
        pool4blocks.Put(data)
    }
    for i := 0; i < 8; i++ {
        usePoolBlock4k()
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        usePoolBlock4k()
    }
}

func BenchmarkStackBlock4k(b *testing.B) {
    runtime.GC()
    for i := 0; i < 8; i++ {
        useStackBlock4k()
    }
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        useStackBlock4k()
    }
}

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: gostack/04_benchmarks
BenchmarkPoolBlock4k-12           312918          3804 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock4k-12          313255          3833 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock8k-12           300796          3980 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock8k-12          294266          4110 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock16k-12          288734          4138 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock16k-12         269382          4408 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock32k-12          272139          4407 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock32k-12         240339          4957 ns/op         256 B/op          8 allocs/op
PASS

4 KB, . , . :


(pprof) top 15
Showing nodes accounting for 7.99s, 79.58% of 10.04s total
Dropped 104 nodes (cum <= 0.05s)
Showing top 15 nodes out of 85
      flat  flat%   sum%        cum   cum%
     3.11s 30.98% 30.98%      3.38s 33.67%  syscall.Syscall6
     2.01s 20.02% 51.00%      2.31s 23.01%  syscall.Syscall
     0.83s  8.27% 59.26%      0.83s  8.27%  runtime.epollctl
     0.60s  5.98% 65.24%      0.60s  5.98%  runtime.memmove
     0.21s  2.09% 67.33%      0.21s  2.09%  runtime.unlock
     0.18s  1.79% 69.12%      0.18s  1.79%  runtime.nextFreeFast
     0.14s  1.39% 70.52%      0.33s  3.29%  runtime.exitsyscall
     0.14s  1.39% 71.91%      0.21s  2.09%  runtime.reentersyscall
     0.13s  1.29% 73.21%      0.46s  4.58%  runtime.mallocgc
     0.12s  1.20% 74.40%      1.25s 12.45%  gostack/04_benchmarks.useStackBlock32k
     0.12s  1.20% 75.60%      0.13s  1.29%  runtime.exitsyscallfast
     0.11s  1.10% 76.69%      0.45s  4.48%  runtime.SetFinalizer
     0.11s  1.10% 77.79%      0.11s  1.10%  runtime.casgstatus
     0.10s     1% 78.78%      0.13s  1.29%  runtime.deferreturn
     0.08s   0.8% 79.58%      1.24s 12.35%  gostack/04_benchmarks.useStackBlock16k

syscalls. useStackBlock32k getFromStack32k. 120ms:


(pprof) list useStackBlock32k
Total: 10.04s
ROUTINE ======================== useStackBlock32k in bench_test.go
     120ms      1.25s (flat, cum) 12.45% of Total
         .          .    158:   printSelf(output[:], cnt)
         .          .    159:}
         .          .    160:
         .          .    161://go:noinline
         .          .    162:func useStackBlock32k() {
      90ms       90ms    163:   var input [block32k]byte
      30ms       30ms    164:   var output [block32k]byte
         .          .    165:
         .      530ms    166:   cnt := readSelf(input[:])
         .      160ms    167:   copy(output[:], input[:])
         .      440ms    168:   printSelf(output[:], cnt)
         .          .    169:}

ROUTINE ======================== usePoolBlock32k in bench_test.go
      10ms      1.18s (flat, cum) 11.75% of Total
         .          .    115:   pool16blocks.Put(outputp)
         .          .    116:}
         .          .    117:
         .          .    118://go:noinline
         .          .    119:func usePoolBlock32k() {
         .       10ms    120:   inputp := pool32blocks.Get().(*[]byte)
         .       10ms    121:   outputp := pool32blocks.Get().(*[]byte)
         .          .    122:
         .      520ms    123:   cnt := readSelf(*inputp)
      10ms      200ms    124:   copy(*outputp, *inputp)
         .      440ms    125:   printSelf(*outputp, cnt)
         .          .    126:
         .          .    127:   pool32blocks.Put(inputp)
         .          .    128:   pool32blocks.Put(outputp)
         .          .    129:}

:


  • 10 MB;
  • 64 KB;
  • il y a des frais généraux pour diviser la pile;
  • Il y a des frais généraux à une valeur nulle .

All Articles