Illustration created for A Journey With Go from an original gopher created by Rene French.In terms of performance, the systematic use of pointers instead of copying the structure itself to share structures for many Go developers seems the best option. In order to understand the effect of using a pointer instead of a copy of the structure, we will consider two use cases.Intensive data distribution
Let's look at a simple example when you want to share a structure to access its values:type S struct {
a, b, c int64
d, e, f string
g, h, i float64
}
Here is the basic structure, access to which can be shared by copy or pointer:func byCopy() S {
return S{
a: 1, b: 1, c: 1,
e: "foo", f: "foo",
g: 1.0, h: 1.0, i: 1.0,
}
}
func byPointer() *S {
return &S{
a: 1, b: 1, c: 1,
e: "foo", f: "foo",
g: 1.0, h: 1.0, i: 1.0,
}
}
Based on these two methods, we can write 2 benchmarks. The first is where the structure is passed in with a copy:func BenchmarkMemoryStack(b *testing.B) {
var s S
f, err := os.Create("stack.out")
if err != nil {
panic(err)
}
defer f.Close()
err = trace.Start(f)
if err != nil {
panic(err)
}
for i := 0; i < b.N; i++ {
s = byCopy()
}
trace.Stop()
b.StopTimer()
_ = fmt.Sprintf("%v", s.a)
}
The second - very similar to the first - where the structure is passed by pointer:func BenchmarkMemoryHeap(b *testing.B) {
var s *S
f, err := os.Create("heap.out")
if err != nil {
panic(err)
}
defer f.Close()
err = trace.Start(f)
if err != nil {
panic(err)
}
for i := 0; i < b.N; i++ {
s = byPointer()
}
trace.Stop()
b.StopTimer()
_ = fmt.Sprintf("%v", s.a)
}
Let's run the benchmarks:go test ./... -bench=BenchmarkMemoryHeap -benchmem -run=^$ -count=10 > head.txt && benchstat head.txt
go test ./... -bench=BenchmarkMemoryStack -benchmem -run=^$ -count=10 > stack.txt && benchstat stack.txt
We get the following statistics:name time/op
MemoryHeap-4 75.0ns ± 5%
name alloc/op
MemoryHeap-4 96.0B ± 0%
name allocs/op
MemoryHeap-4 1.00 ± 0%
------------------
name time/op
MemoryStack-4 8.93ns ± 4%
name alloc/op
MemoryStack-4 0.00B
name allocs/op
MemoryStack-4 0.00
Using a copy of the structure was 8 times faster than using a pointer to it!To understand why, let's look at the graphs generated by the trace: the
graph for the structure passed by the copy the
graph for the structure passed by the pointerThe first graph is pretty simple. Since the heap is not used, there is no garbage collector and excess gorutin.In the second case, using pointers causes the Go compiler to move the variable to the heap and run the garbage collector. If we increase the scale of the graph, we will see that the garbage collector occupies an important part of the process:
This graph shows that the garbage collector starts every 4 ms.If we zoom in again, we can get detailed information about what exactly is happening: The
blue, pink and red stripes are the phases of the garbage collector, and the brown ones are associated with allocation in the heap (marked “runtime.bgsweep” on the graph):Sweeping is the release from the heap of data-related sections of memory that are not marked as used. This action occurs when goroutines try to isolate new values in the heap memory. Sweeping delay is added to the cost of performing the allocation in the heap memory and does not apply to any delays associated with garbage collection.
www.ardanlabs.com/blog/2018/12/garbage-collection-in-go-part1-semantics.html
Even if this example is a bit extreme, we see how it can be expensive to allocate a variable on the heap rather than on the stack. In our example, the structure is much faster allocated on the stack and copied than created on the heap and its address is shared.If you are not familiar with the stack / heap, and if you want to know more about their internal details, you can find a lot of information on the Internet, for example, this article by Paul Gribble.
Things can be even worse if we limit the processor to 1 using GOMAXPROCS = 1:name time/op
MemoryHeap 114ns ± 4%
name alloc/op
MemoryHeap 96.0B ± 0%
name allocs/op
MemoryHeap 1.00 ± 0%
------------------
name time/op
MemoryStack 8.77ns ± 5%
name alloc/op
MemoryStack 0.00B
name allocs/op
MemoryStack 0.00
If the benchmark for placing on the stack did not change, then the indicator on the heap decreased from 75ns / op to 114ns / op.Intensive function calls
We will add two empty methods to our structure and adapt our benchmarks a little:func (s S) stack(s1 S) {}
func (s *S) heap(s1 *S) {}
The benchmark with the placement on the stack will create the structure and pass it a copy:func BenchmarkMemoryStack(b *testing.B) {
var s S
var s1 S
s = byCopy()
s1 = byCopy()
for i := 0; i < b.N; i++ {
for i := 0; i < 1000000; i++ {
s.stack(s1)
}
}
}
And the benchmark for the heap will pass the structure by pointer:func BenchmarkMemoryHeap(b *testing.B) {
var s *S
var s1 *S
s = byPointer()
s1 = byPointer()
for i := 0; i < b.N; i++ {
for i := 0; i < 1000000; i++ {
s.heap(s1)
}
}
}
As expected, the results are completely different now:name time/op
MemoryHeap-4 301µs ± 4%
name alloc/op
MemoryHeap-4 0.00B
name allocs/op
MemoryHeap-4 0.00
------------------
name time/op
MemoryStack-4 595µs ± 2%
name alloc/op
MemoryStack-4 0.00B
name allocs/op
MemoryStack-4 0.00
Conclusion
Using a pointer instead of a copy of the structure in go is not always good. To choose a good semantics for your data, I highly recommend reading a post on value / pointer semantics written by Bill Kennedy . This will give you a better idea of the strategies you can use with your structures and built-in types. In addition, profiling memory usage will definitely help you understand what is happening with your allocations and heap.