Buffers de pilha da Goroutine

Gostaria de saber colocando matrizes e fatias na pilha, como std :: array em C ++ e matrizes em C. A pilha foi bem escrita por Vincent Blanchon no artigo Go: Como o tamanho da pilha da Goroutine evolui? . Vincent fala sobre mudanças de pilha nas goroutines. Em suma:


  • tamanho mínimo da pilha 2 KB;
  • o tamanho máximo depende da arquitetura, em uma arquitetura de 64 bits de 1 GB;
  • cada vez que o tamanho da pilha dobra;
  • a pilha aumenta e diminui.

Descobrirei o quanto pode ser colocado na pilha e o que a sobrecarga pode trazer.Usarei o go 1.13.8 compilado da fonte com informações de depuração e opções de compilação -gcflags=-m.


Para obter a saída de depuração enquanto o programa está sendo executado, você deve runtime/stack.godefinir a constante stackDebugpara o valor desejado:


const (
    // stackDebug == 0: no logging
    //            == 1: logging of per-stack operations
    //            == 2: logging of per-frame operations
    //            == 3: logging of per-word updates
    //            == 4: logging of per-word reads
    stackDebug       = 0
    //...
)

Programa muito simples


Vou começar com um programa que não faz nada para ver como a pilha se destaca na goroutine principal:


package main

func main() {
}

Vemos como o pool global é preenchido com os segmentos 2, 8, 32 KB:


$ GOMAXPROCS=1 ./app 
stackalloc 32768
  allocated 0xc000002000
stackalloc 2048
stackcacherefill order=0
  allocated 0xc000036000
stackalloc 32768
  allocated 0xc00003a000
stackalloc 8192
stackcacherefill order=2
  allocated 0xc000046000
stackalloc 2048
  allocated 0xc000036800
stackalloc 2048
  allocated 0xc000037000
stackalloc 2048
  allocated 0xc000037800
stackalloc 32768
  allocated 0xc00004e000
stackalloc 8192
  allocated 0xc000048000

esse é um ponto importante. Podemos esperar que a memória alocada seja reutilizada.


programa não muito simples


Onde sem Olá mundo !


package main

import "fmt"

func helloWorld()  {
    fmt.Println("Hello world!")
}

func main() {
    helloWorld()
}

runtime: newstack sp=0xc000036348 stack=[0xc000036000, 0xc000036800]
    morebuf={pc:0x41463b sp:0xc000036358 lr:0x0}
    sched={pc:0x4225b1 sp:0xc000036350 lr:0x0 ctxt:0x0}
stackalloc 4096
stackcacherefill order=1
  allocated 0xc000060000
copystack gp=0xc000000180 [0xc000036000 0xc000036350 0xc000036800] -> [0xc000060000 0xc000060b50 0xc000061000]/4096
stackfree 0xc000036000 2048
stack grow done
stackalloc 2048
  allocated 0xc000036000
Hello world!

O tamanho de 2 KB não foi suficiente e a alocação de 4 KB do bloco foi encerrada. Lidaremos com os números entre colchetes. Estes são os endereços; nas bordas, este é o valor da estrutura:


// Stack describes a Go execution stack.
// The bounds of the stack are exactly [lo, hi),
// with no implicit data structures on either side.
type stack struct {
    lo uintptr
    hi uintptr
}

No meio, este é um endereço indicando quanto a pilha é usada, calculada da seguinte forma:


//...
used := old.hi - gp.sched.sp
//...
print("copystack gp=", gp, " [", hex(old.lo), " ", hex(old.hi-used), " ", hex(old.hi), "]", " -> [", hex(new.lo), " ", hex(new.hi-used), " ", hex(new.hi), "]/", newsize, "\n")

:


[0xc000036000 0xc000036350 0xc000036800]2048848
[0xc000060000 0xc000060b50 0xc000061000]40962896

- :


package main

import (
    "fmt"
    "time"
)

func helloWorld()  {
    fmt.Println("Hello world!")
}

func main() {
    for i := 0 ; i< 12 ; i++ {
        go helloWorld()
    }

    time.Sleep(5*time.Second)
}

$ ./app 2>&1 | grep "alloc 4"
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096

, . .



stack_test.go , . bigsize, go build -gcflags=-m main.go:8:6: moved to heap: x.


package main

const bigsize = 1024*1024*10
const depth = 50

//go:noinline
func step(i int) byte {
    var x [bigsize]byte
    if i != depth {
        return x[i] * step(i+1)
    } else {
        return x[i]
    }
}

func main() {
    step(0)
}

10 MB, depth > 50 stack overflow:


runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

, -gcflags="-m"


package main

import (
    "fmt"
    "os"
)

const bigsize = 1024 * 1024 * 10

func readSelf(buff []byte) int { // readSelf buff does not escape
    f, err := os.Open("main.go") // inlining call to os.Open
    if err != nil {
        panic(err)
    }

    count, _ := f.Read(buff)
    if err := f.Close(); err != nil { // inlining call to os.(*File).Close
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) { // printSelf buff does not escape
    tmp := string(buff[:count]) // string(buff[:count]) escapes to heap
    // inlining call to fmt.Println
    // tmp escapes to heap
    // printSelf []interface {} literal does not escape
    // io.Writer(os.Stdout) escapes to heap
    fmt.Println(tmp)
}

func foo() {
    var data [bigsize]byte
    cnt := readSelf(data[:])
    printSelf(data[:], cnt)
}

func main() {
    foo() // can inline main
}

, , :


stackalloc 2048
stackcacherefill order=0
  allocated 0xc000042000
runtime: newstack sp=0xc00008ef40 stack=[0xc00008e000, 0xc00008f000]
    morebuf={pc:0x48e4e0 sp:0xc00008ef50 lr:0x0}
    sched={pc:0x48e4b8 sp:0xc00008ef48 lr:0x0 ctxt:0x0}
stackalloc 8192
stackcacherefill order=2
  allocated 0xc000078000
copystack gp=0xc000000180 [0xc00008e000 0xc00008ef48 0xc00008f000] -> [0xc000078000 0xc000079f48 0xc00007a000]/8192
stackfree 0xc00008e000 4096
stack grow done
...
runtime: newstack sp=0xc0010aff40 stack=[0xc0008b0000, 0xc0010b0000]
    morebuf={pc:0x48e4e0 sp:0xc0010aff50 lr:0x0}
    sched={pc:0x48e4b8 sp:0xc0010aff48 lr:0x0 ctxt:0x0}
stackalloc 16777216
  allocated 0xc0010b0000
copystack gp=0xc000000180 [0xc0008b0000 0xc0010aff48 0xc0010b0000] -> [0xc0010b0000 0xc0020aff48 0xc0020b0000]/16777216
stackfree 0xc0008b0000 8388608
stack grow done

, .



64 KB :


package main

import (
    "fmt"
    "os"
)

const bigsize = 1024*64 - 1

func readSelf(buff []byte) int { // readSelf buff does not escape
    f, err := os.Open("main.go") // inlining call to os.Open
    if err != nil {
        panic(err)
    }

    count, _ := f.Read(buff)
    if err := f.Close(); err != nil { // inlining call to os.(*File).Close
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) { // printSelf buff does not escape
    tmp := string(buff[:count]) // string(buff[:count]) escapes to heap
    // inlining call to fmt.Println
    // tmp escapes to heap
    // printSelf []interface {} literal does not escape
    // io.Writer(os.Stdout) escapes to heap
    fmt.Println(tmp)
}

func foo() {
    data := make([]byte, bigsize) // make([]byte, bigsize) does not escape
    cnt := readSelf(data)
    printSelf(data, cnt)
}

func main() { // can inline main
    foo()
}

stackalloc 131072
  allocated 0xc0000d0000
copystack gp=0xc000000180 [0xc0000c0000 0xc0000cff48 0xc0000d0000] -> [0xc0000d0000 0xc0000eff48 0xc0000f0000]/131072
stackfree 0xc0000c0000 65536


Go (zero values). . , sync.Pool.


:


  • ( );
  • ;
  • ( /dev/null).

, , .


func readSelf(buff []byte) int {
    f, err := os.Open("bench_test.go")
    if err != nil {
        panic(err)
    }

    count, err := f.Read(buff)
    if err != nil {
        panic(err)
    }

    if err := f.Close(); err != nil {
        panic(err)
    }

    return count
}

func printSelf(buff []byte, count int) {
    f, err := os.OpenFile(os.DevNull, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
    if err != nil {
        panic(err)
    }
    if _, err := f.Write(buff[:count]); err != nil {
        panic(err)
    }

    if err := f.Close(); err != nil {
        panic(err)
    }
}

//go:noinline
func usePoolBlock4k() {
    inputp := pool4blocks.Get().(*[]byte)
    outputp := pool4blocks.Get().(*[]byte)

    cnt := readSelf(*inputp)
    copy(*outputp, *inputp)
    printSelf(*outputp, cnt)

    pool4blocks.Put(inputp)
    pool4blocks.Put(outputp)
}

//go:noinline
func useStackBlock4k() {
    var input [block4k]byte
    var output [block4k]byte

    cnt := readSelf(input[:])
    copy(output[:], input[:])
    printSelf(output[:], cnt)
}

func BenchmarkPoolBlock4k(b *testing.B) {
    runtime.GC()
    for i := 0; i < 8; i++ {
        data := pool4blocks.Get()
        pool4blocks.Put(data)
    }
    for i := 0; i < 8; i++ {
        usePoolBlock4k()
    }

    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        usePoolBlock4k()
    }
}

func BenchmarkStackBlock4k(b *testing.B) {
    runtime.GC()
    for i := 0; i < 8; i++ {
        useStackBlock4k()
    }
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        useStackBlock4k()
    }
}

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: gostack/04_benchmarks
BenchmarkPoolBlock4k-12           312918          3804 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock4k-12          313255          3833 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock8k-12           300796          3980 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock8k-12          294266          4110 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock16k-12          288734          4138 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock16k-12         269382          4408 ns/op         256 B/op          8 allocs/op
BenchmarkPoolBlock32k-12          272139          4407 ns/op         256 B/op          8 allocs/op
BenchmarkStackBlock32k-12         240339          4957 ns/op         256 B/op          8 allocs/op
PASS

4 KB, . , . :


(pprof) top 15
Showing nodes accounting for 7.99s, 79.58% of 10.04s total
Dropped 104 nodes (cum <= 0.05s)
Showing top 15 nodes out of 85
      flat  flat%   sum%        cum   cum%
     3.11s 30.98% 30.98%      3.38s 33.67%  syscall.Syscall6
     2.01s 20.02% 51.00%      2.31s 23.01%  syscall.Syscall
     0.83s  8.27% 59.26%      0.83s  8.27%  runtime.epollctl
     0.60s  5.98% 65.24%      0.60s  5.98%  runtime.memmove
     0.21s  2.09% 67.33%      0.21s  2.09%  runtime.unlock
     0.18s  1.79% 69.12%      0.18s  1.79%  runtime.nextFreeFast
     0.14s  1.39% 70.52%      0.33s  3.29%  runtime.exitsyscall
     0.14s  1.39% 71.91%      0.21s  2.09%  runtime.reentersyscall
     0.13s  1.29% 73.21%      0.46s  4.58%  runtime.mallocgc
     0.12s  1.20% 74.40%      1.25s 12.45%  gostack/04_benchmarks.useStackBlock32k
     0.12s  1.20% 75.60%      0.13s  1.29%  runtime.exitsyscallfast
     0.11s  1.10% 76.69%      0.45s  4.48%  runtime.SetFinalizer
     0.11s  1.10% 77.79%      0.11s  1.10%  runtime.casgstatus
     0.10s     1% 78.78%      0.13s  1.29%  runtime.deferreturn
     0.08s   0.8% 79.58%      1.24s 12.35%  gostack/04_benchmarks.useStackBlock16k

syscalls. useStackBlock32k getFromStack32k. 120ms:


(pprof) list useStackBlock32k
Total: 10.04s
ROUTINE ======================== useStackBlock32k in bench_test.go
     120ms      1.25s (flat, cum) 12.45% of Total
         .          .    158:   printSelf(output[:], cnt)
         .          .    159:}
         .          .    160:
         .          .    161://go:noinline
         .          .    162:func useStackBlock32k() {
      90ms       90ms    163:   var input [block32k]byte
      30ms       30ms    164:   var output [block32k]byte
         .          .    165:
         .      530ms    166:   cnt := readSelf(input[:])
         .      160ms    167:   copy(output[:], input[:])
         .      440ms    168:   printSelf(output[:], cnt)
         .          .    169:}

ROUTINE ======================== usePoolBlock32k in bench_test.go
      10ms      1.18s (flat, cum) 11.75% of Total
         .          .    115:   pool16blocks.Put(outputp)
         .          .    116:}
         .          .    117:
         .          .    118://go:noinline
         .          .    119:func usePoolBlock32k() {
         .       10ms    120:   inputp := pool32blocks.Get().(*[]byte)
         .       10ms    121:   outputp := pool32blocks.Get().(*[]byte)
         .          .    122:
         .      520ms    123:   cnt := readSelf(*inputp)
      10ms      200ms    124:   copy(*outputp, *inputp)
         .      440ms    125:   printSelf(*outputp, cnt)
         .          .    126:
         .          .    127:   pool32blocks.Put(inputp)
         .          .    128:   pool32blocks.Put(outputp)
         .          .    129:}

:


  • 10 MB;
  • 64 KB;
  • há sobrecarga para dividir a pilha;
  • Há sobrecarga no valor zero .

All Articles