Gostaria de saber colocando matrizes e fatias na pilha, como std :: array em C ++ e matrizes em C. A pilha foi bem escrita por Vincent Blanchon no artigo Go: Como o tamanho da pilha da Goroutine evolui? . Vincent fala sobre mudanças de pilha nas goroutines. Em suma:
- tamanho mínimo da pilha 2 KB;
- o tamanho máximo depende da arquitetura, em uma arquitetura de 64 bits de 1 GB;
- cada vez que o tamanho da pilha dobra;
- a pilha aumenta e diminui.
Descobrirei o quanto pode ser colocado na pilha e o que a sobrecarga pode trazer.Usarei o go 1.13.8 compilado da fonte com informações de depuração e opções de compilação -gcflags=-m
.
Para obter a saída de depuração enquanto o programa está sendo executado, você deve runtime/stack.go
definir a constante stackDebug
para o valor desejado:
const (
stackDebug = 0
)
Programa muito simples
Vou começar com um programa que não faz nada para ver como a pilha se destaca na goroutine principal:
package main
func main() {
}
Vemos como o pool global é preenchido com os segmentos 2, 8, 32 KB:
$ GOMAXPROCS=1 ./app
stackalloc 32768
allocated 0xc000002000
stackalloc 2048
stackcacherefill order=0
allocated 0xc000036000
stackalloc 32768
allocated 0xc00003a000
stackalloc 8192
stackcacherefill order=2
allocated 0xc000046000
stackalloc 2048
allocated 0xc000036800
stackalloc 2048
allocated 0xc000037000
stackalloc 2048
allocated 0xc000037800
stackalloc 32768
allocated 0xc00004e000
stackalloc 8192
allocated 0xc000048000
esse é um ponto importante. Podemos esperar que a memória alocada seja reutilizada.
programa não muito simples
Onde sem Olá mundo !
package main
import "fmt"
func helloWorld() {
fmt.Println("Hello world!")
}
func main() {
helloWorld()
}
runtime: newstack sp=0xc000036348 stack=[0xc000036000, 0xc000036800]
morebuf={pc:0x41463b sp:0xc000036358 lr:0x0}
sched={pc:0x4225b1 sp:0xc000036350 lr:0x0 ctxt:0x0}
stackalloc 4096
stackcacherefill order=1
allocated 0xc000060000
copystack gp=0xc000000180 [0xc000036000 0xc000036350 0xc000036800] -> [0xc000060000 0xc000060b50 0xc000061000]/4096
stackfree 0xc000036000 2048
stack grow done
stackalloc 2048
allocated 0xc000036000
Hello world!
O tamanho de 2 KB não foi suficiente e a alocação de 4 KB do bloco foi encerrada. Lidaremos com os números entre colchetes. Estes são os endereços; nas bordas, este é o valor da estrutura:
type stack struct {
lo uintptr
hi uintptr
}
No meio, este é um endereço indicando quanto a pilha é usada, calculada da seguinte forma:
used := old.hi - gp.sched.sp
print("copystack gp=", gp, " [", hex(old.lo), " ", hex(old.hi-used), " ", hex(old.hi), "]", " -> [", hex(new.lo), " ", hex(new.hi-used), " ", hex(new.hi), "]/", newsize, "\n")
:
- :
package main
import (
"fmt"
"time"
)
func helloWorld() {
fmt.Println("Hello world!")
}
func main() {
for i := 0 ; i< 12 ; i++ {
go helloWorld()
}
time.Sleep(5*time.Second)
}
$ ./app 2>&1 | grep "alloc 4"
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096
stackalloc 4096
, . .
stack_test.go , . bigsize, go build -gcflags=-m
main.go:8:6: moved to heap: x
.
package main
const bigsize = 1024*1024*10
const depth = 50
func step(i int) byte {
var x [bigsize]byte
if i != depth {
return x[i] * step(i+1)
} else {
return x[i]
}
}
func main() {
step(0)
}
10 MB, depth > 50 stack overflow
:
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow
, -gcflags="-m"
package main
import (
"fmt"
"os"
)
const bigsize = 1024 * 1024 * 10
func readSelf(buff []byte) int {
f, err := os.Open("main.go")
if err != nil {
panic(err)
}
count, _ := f.Read(buff)
if err := f.Close(); err != nil {
panic(err)
}
return count
}
func printSelf(buff []byte, count int) {
tmp := string(buff[:count])
fmt.Println(tmp)
}
func foo() {
var data [bigsize]byte
cnt := readSelf(data[:])
printSelf(data[:], cnt)
}
func main() {
foo()
}
, , :
stackalloc 2048
stackcacherefill order=0
allocated 0xc000042000
runtime: newstack sp=0xc00008ef40 stack=[0xc00008e000, 0xc00008f000]
morebuf={pc:0x48e4e0 sp:0xc00008ef50 lr:0x0}
sched={pc:0x48e4b8 sp:0xc00008ef48 lr:0x0 ctxt:0x0}
stackalloc 8192
stackcacherefill order=2
allocated 0xc000078000
copystack gp=0xc000000180 [0xc00008e000 0xc00008ef48 0xc00008f000] -> [0xc000078000 0xc000079f48 0xc00007a000]/8192
stackfree 0xc00008e000 4096
stack grow done
...
runtime: newstack sp=0xc0010aff40 stack=[0xc0008b0000, 0xc0010b0000]
morebuf={pc:0x48e4e0 sp:0xc0010aff50 lr:0x0}
sched={pc:0x48e4b8 sp:0xc0010aff48 lr:0x0 ctxt:0x0}
stackalloc 16777216
allocated 0xc0010b0000
copystack gp=0xc000000180 [0xc0008b0000 0xc0010aff48 0xc0010b0000] -> [0xc0010b0000 0xc0020aff48 0xc0020b0000]/16777216
stackfree 0xc0008b0000 8388608
stack grow done
, .
64 KB :
package main
import (
"fmt"
"os"
)
const bigsize = 1024*64 - 1
func readSelf(buff []byte) int {
f, err := os.Open("main.go")
if err != nil {
panic(err)
}
count, _ := f.Read(buff)
if err := f.Close(); err != nil {
panic(err)
}
return count
}
func printSelf(buff []byte, count int) {
tmp := string(buff[:count])
fmt.Println(tmp)
}
func foo() {
data := make([]byte, bigsize)
cnt := readSelf(data)
printSelf(data, cnt)
}
func main() {
foo()
}
stackalloc 131072
allocated 0xc0000d0000
copystack gp=0xc000000180 [0xc0000c0000 0xc0000cff48 0xc0000d0000] -> [0xc0000d0000 0xc0000eff48 0xc0000f0000]/131072
stackfree 0xc0000c0000 65536
Go (zero values). . , sync.Pool.
:
, , .
func readSelf(buff []byte) int {
f, err := os.Open("bench_test.go")
if err != nil {
panic(err)
}
count, err := f.Read(buff)
if err != nil {
panic(err)
}
if err := f.Close(); err != nil {
panic(err)
}
return count
}
func printSelf(buff []byte, count int) {
f, err := os.OpenFile(os.DevNull, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
panic(err)
}
if _, err := f.Write(buff[:count]); err != nil {
panic(err)
}
if err := f.Close(); err != nil {
panic(err)
}
}
func usePoolBlock4k() {
inputp := pool4blocks.Get().(*[]byte)
outputp := pool4blocks.Get().(*[]byte)
cnt := readSelf(*inputp)
copy(*outputp, *inputp)
printSelf(*outputp, cnt)
pool4blocks.Put(inputp)
pool4blocks.Put(outputp)
}
func useStackBlock4k() {
var input [block4k]byte
var output [block4k]byte
cnt := readSelf(input[:])
copy(output[:], input[:])
printSelf(output[:], cnt)
}
func BenchmarkPoolBlock4k(b *testing.B) {
runtime.GC()
for i := 0; i < 8; i++ {
data := pool4blocks.Get()
pool4blocks.Put(data)
}
for i := 0; i < 8; i++ {
usePoolBlock4k()
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
usePoolBlock4k()
}
}
func BenchmarkStackBlock4k(b *testing.B) {
runtime.GC()
for i := 0; i < 8; i++ {
useStackBlock4k()
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
useStackBlock4k()
}
}
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: gostack/04_benchmarks
BenchmarkPoolBlock4k-12 312918 3804 ns/op 256 B/op 8 allocs/op
BenchmarkStackBlock4k-12 313255 3833 ns/op 256 B/op 8 allocs/op
BenchmarkPoolBlock8k-12 300796 3980 ns/op 256 B/op 8 allocs/op
BenchmarkStackBlock8k-12 294266 4110 ns/op 256 B/op 8 allocs/op
BenchmarkPoolBlock16k-12 288734 4138 ns/op 256 B/op 8 allocs/op
BenchmarkStackBlock16k-12 269382 4408 ns/op 256 B/op 8 allocs/op
BenchmarkPoolBlock32k-12 272139 4407 ns/op 256 B/op 8 allocs/op
BenchmarkStackBlock32k-12 240339 4957 ns/op 256 B/op 8 allocs/op
PASS
4 KB, . , . :
(pprof) top 15
Showing nodes accounting for 7.99s, 79.58% of 10.04s total
Dropped 104 nodes (cum <= 0.05s)
Showing top 15 nodes out of 85
flat flat% sum% cum cum%
3.11s 30.98% 30.98% 3.38s 33.67% syscall.Syscall6
2.01s 20.02% 51.00% 2.31s 23.01% syscall.Syscall
0.83s 8.27% 59.26% 0.83s 8.27% runtime.epollctl
0.60s 5.98% 65.24% 0.60s 5.98% runtime.memmove
0.21s 2.09% 67.33% 0.21s 2.09% runtime.unlock
0.18s 1.79% 69.12% 0.18s 1.79% runtime.nextFreeFast
0.14s 1.39% 70.52% 0.33s 3.29% runtime.exitsyscall
0.14s 1.39% 71.91% 0.21s 2.09% runtime.reentersyscall
0.13s 1.29% 73.21% 0.46s 4.58% runtime.mallocgc
0.12s 1.20% 74.40% 1.25s 12.45% gostack/04_benchmarks.useStackBlock32k
0.12s 1.20% 75.60% 0.13s 1.29% runtime.exitsyscallfast
0.11s 1.10% 76.69% 0.45s 4.48% runtime.SetFinalizer
0.11s 1.10% 77.79% 0.11s 1.10% runtime.casgstatus
0.10s 1% 78.78% 0.13s 1.29% runtime.deferreturn
0.08s 0.8% 79.58% 1.24s 12.35% gostack/04_benchmarks.useStackBlock16k
syscalls. useStackBlock32k
getFromStack32k
. 120ms:
(pprof) list useStackBlock32k
Total: 10.04s
ROUTINE ======================== useStackBlock32k in bench_test.go
120ms 1.25s (flat, cum) 12.45% of Total
. . 158: printSelf(output[:], cnt)
. . 159:}
. . 160:
. . 161://go:noinline
. . 162:func useStackBlock32k() {
90ms 90ms 163: var input [block32k]byte
30ms 30ms 164: var output [block32k]byte
. . 165:
. 530ms 166: cnt := readSelf(input[:])
. 160ms 167: copy(output[:], input[:])
. 440ms 168: printSelf(output[:], cnt)
. . 169:}
ROUTINE ======================== usePoolBlock32k in bench_test.go
10ms 1.18s (flat, cum) 11.75% of Total
. . 115: pool16blocks.Put(outputp)
. . 116:}
. . 117:
. . 118://go:noinline
. . 119:func usePoolBlock32k() {
. 10ms 120: inputp := pool32blocks.Get().(*[]byte)
. 10ms 121: outputp := pool32blocks.Get().(*[]byte)
. . 122:
. 520ms 123: cnt := readSelf(*inputp)
10ms 200ms 124: copy(*outputp, *inputp)
. 440ms 125: printSelf(*outputp, cnt)
. . 126:
. . 127: pool32blocks.Put(inputp)
. . 128: pool32blocks.Put(outputp)
. . 129:}
:
- 10 MB;
- 64 KB;
- há sobrecarga para dividir a pilha;
- Há sobrecarga no valor zero .