- Published on
A Fun Way to Imagine Memory Management
- Authors

- Name
- AbdulHafeez AbdulRaheem
Introduction
Currently at my workplace I'm having to deal with memory constraints I hadn't given much thought to before. We're streaming gigabytes of data in various formats (Proto, JSON, XML) from multiple data sources, with a strict limit of keeping the server's RAM usage below 500MB for processing. This limitation forces us to think about memory management when making code changes or when building new features.
Essentially I wanted to express how memory should be visualized, or at least how I personally conceptualize it. I plan to cover some basic concepts like stack and heap memory, along with what happens behind the scenes when you allocate memory to the stack versus the heap. I'll also discuss how game engineers, known for their efficiency, achieve impressive performance with minimal memory, often relying on low-level languages but also using clever memory optimization techniques.
The RAM
If you have a computer science background you know the RAM is what acts like a miniature brain to store temporary information we need to access quickly. But here's how I like to think about it:
Imagine RAM as a massive warehouse with numbered shelves.
Each shelf has an address (like 0x7ffd5e8c3a40), and each shelf can hold exactly one byte. When your program needs to store something, it's essentially asking the warehouse manager: "Hey, I need 8 shelves next to each other to store this number." The manager finds a spot, reserves those shelves, and hands you the address of the first one.
The beautiful thing about RAM is that it doesn't matter which shelf you access, shelf #1 or shelf #1,000,000, it takes the same amount of time. That's the "Random Access" part. Unlike a hard drive (which is more like a vinyl record that needs to spin to the right track), RAM is instant access to any location.
But here's the catch: this warehouse has two very different sections with completely different management styles.
The Stack: Your Organized Desk
The stack is like a perfectly organized desk where you pile papers on top of each other. Simple rules:
- You can only add to the top (push)
- You can only remove from the top (pop)
- You always know exactly how tall the pile is
func calculateSum() int {
a := 10 // Push 'a' onto the stack
b := 20 // Push 'b' onto the stack
sum := a + b // Push 'sum' onto the stack
return sum // Pop everything when function returns
}
When calculateSum() is called, the program reserves a chunk of stack space called a stack frame. It looks something like this:
┌─────────────────────┐ ← Stack Pointer (top)
│ sum = 30 │
├─────────────────────┤
│ b = 20 │
├─────────────────────┤
│ a = 10 │
├─────────────────────┤
│ return address │ ← Where to go after function ends
├─────────────────────┤
│ previous frame... │
└─────────────────────┘
The moment the function returns, the stack pointer simply moves back down. We don't even need to "clean up" the data, we just pretend it doesn't exist anymore. The next function call will overwrite it.
Why is this fast?
- No searching: We always know where the top is
- No fragmentation: Everything is contiguous
- No bookkeeping: Just move a pointer up or down
- CPU cache friendly: Sequential memory access patterns
The stack allocation is literally just:
stack_pointer -= size_needed
return stack_pointer
That's it. One subtraction. Compare that to heap allocation, which we'll get to shortly.
The limitation? The stack has a fixed size (usually 1-8MB per thread), and everything on it must have a known size at compile time. You can't say "give me an array of unknown length" on the stack.
The Heap: The Chaotic Storage Unit
If the stack is your organized desk, the heap is a massive storage unit where you rent space of various sizes, and nobody's cleaning up after you.
func createUser() *User {
user := &User{ // Allocated on the HEAP
Name: "John",
Age: 30,
}
return user // Pointer escapes the function
}
Why does user go on the heap here? Because we're returning a pointer to it. If it lived on the stack, the function's stack frame would be "popped" when it returns, and we'd be pointing to garbage memory. The Go compiler detects this through escape analysis and automatically moves it to the heap.
Here's what the heap looks like after several allocations and deallocations:
┌────────┬────────┬─────┬────────┬─────┬────────────┐
│ Used │ Free │Used │ Free │Used │ Free │
│ 64B │ 128B │ 32B │ 64B │256B │ 512B │
└────────┴────────┴─────┴────────┴─────┴────────────┘
See those gaps? That's fragmentation. You might have 704 bytes free in total, but if you need a contiguous 200-byte block, you can't use the smaller gaps. The heap allocator has to:
- Search for a suitable free block
- Potentially split a larger block
- Update bookkeeping metadata
- Handle thread synchronization (locks!)
- Later: merge adjacent free blocks
This is why heap allocation can be 10-100x slower than stack allocation.
The Allocator's Nightmare
Let's peek behind the curtain at what happens when you write make([]byte, 1000) in Go:
Your Code Runtime Operating System
│ │ │
▼ │ │
make([]byte, 1000) │ │
│ │ │
└───────────────▶│ │
mallocgc() │
│ │
▼ │
Check size class │
(1000 bytes → class 14) │
│ │
▼ │
Check thread cache │
(mcache) │
│ │
┌─────────┴─────────┐ │
▼ ▼ │
Cache hit Cache miss │
(fast path) (slow path) │
│ │ │
│ ▼ │
│ Check central │
│ cache (mcentral) │
│ │ │
│ ▼ │
│ Still empty? │
│ │ │
│ ▼ │
│ Request from │
│ mheap │
│ │ │
│ └───────────▶│
│ mmap() syscall
│ (ask OS for pages)
│ │
▼ ▼
Return pointer ◀─────────────────────┘
Go's allocator uses size classes to reduce fragmentation. Instead of allocating exactly what you ask for, it rounds up to the nearest size class (8, 16, 32, 48, 64, 80... bytes). This means a 1000-byte request actually gets 1024 bytes, but it massively simplifies the allocator's job.
Escape Analysis: Where Does This Live?
Go's compiler decides at compile time whether something goes on the stack or heap. You can actually see these decisions:
go build -gcflags="-m" main.go
func stackAlloc() int {
x := 42 // stays on stack
return x
}
func heapAlloc() *int {
x := 42 // escapes to heap!
return &x // because we return a pointer
}
func sliceGrowth() {
s := make([]int, 0, 10) // might stay on stack (small, known size)
s = append(s, 1, 2, 3) // still stack if it fits
s2 := make([]int, 0) // escapes! (unknown final size)
for i := 0; i < 1000; i++ {
s2 = append(s2, i) // definitely heap
}
}
Common escape triggers:
- Returning a pointer to a local variable
- Storing a pointer in a global variable
- Sending a pointer through a channel
- Closures capturing local variables
- Interface conversions (sometimes)
- Slices that grow beyond initial capacity
Game Engineers: The Memory Wizards
Game developers operate under brutal constraints: 16ms per frame (60 FPS), fixed memory budgets, and zero tolerance for garbage collection pauses. Here's how they think differently:
1. Object Pools
Instead of allocating and freeing bullets every time someone shoots:
// The naive way (allocation every shot)
func shoot() *Bullet {
return &Bullet{x: playerX, y: playerY, active: true}
}
// The game dev way (pre-allocated pool)
type BulletPool struct {
bullets [1000]Bullet // Pre-allocated array
active [1000]bool // Track which are in use
}
func (p *BulletPool) acquire() *Bullet {
for i := range p.active {
if !p.active[i] {
p.active[i] = true
return &p.bullets[i]
}
}
return nil // Pool exhausted
}
func (p *BulletPool) release(b *Bullet) {
idx := (uintptr(unsafe.Pointer(b)) - uintptr(unsafe.Pointer(&p.bullets[0]))) / unsafe.Sizeof(Bullet{})
p.active[idx] = false
}
Zero allocations during gameplay. The pool is allocated once at startup.
2. Data-Oriented Design
Traditional OOP scatters data across memory:
// Object-Oriented: Each entity is its own heap allocation
type Entity struct {
Position Vector3 // 24 bytes
Velocity Vector3 // 24 bytes
Health int // 8 bytes
Sprite *Texture // 8 bytes
// ... 64 bytes per entity, scattered across heap
}
entities := make([]*Entity, 10000) // 10,000 pointers to random heap locations
When you iterate through entities to update positions, the CPU cache is constantly missing because each entity lives in a different memory location.
Data-oriented design groups similar data together:
// Data-Oriented: Group by access pattern
type Positions struct {
X [10000]float32 // 40KB contiguous
Y [10000]float32 // 40KB contiguous
Z [10000]float32 // 40KB contiguous
}
type Velocities struct {
X [10000]float32
Y [10000]float32
Z [10000]float32
}
func updatePositions(pos *Positions, vel *Velocities, dt float32) {
for i := 0; i < 10000; i++ {
pos.X[i] += vel.X[i] * dt // Sequential memory access = cache heaven
pos.Y[i] += vel.Y[i] * dt
pos.Z[i] += vel.Z[i] * dt
}
}
This can be 10-50x faster due to cache efficiency. The CPU prefetcher loves sequential access patterns.
3. Arena Allocators
For temporary per-frame allocations, game engines use arena (or bump) allocators:
type Arena struct {
buffer []byte
offset int
}
func NewArena(size int) *Arena {
return &Arena{buffer: make([]byte, size)}
}
func (a *Arena) Alloc(size int) unsafe.Pointer {
// Align to 8 bytes
aligned := (a.offset + 7) &^ 7
if aligned+size > len(a.buffer) {
panic("arena exhausted")
}
ptr := unsafe.Pointer(&a.buffer[aligned])
a.offset = aligned + size
return ptr
}
func (a *Arena) Reset() {
a.offset = 0 // "Free" everything instantly
}
At the start of each frame, reset the arena. All per-frame allocations are instant (just bump a pointer), and cleanup is instant (just reset the offset). No GC involved.
4. Memory Layout Awareness
Consider cache line size (typically 64 bytes):
// Bad: False sharing in concurrent code
type Counters struct {
a int64 // These might be on the same cache line
b int64 // Different goroutines writing = cache thrashing
}
// Good: Padded to separate cache lines
type Counters struct {
a int64
_ [56]byte // Padding to fill cache line
b int64
_ [56]byte
}
Streaming SAP Data: The Real World Problem
Now let's talk about what actually prompted this post. At work, we're pulling data from SAP interfaces, specifically OData services that expose entities and their properties. A single entity like SalesOrder might have nested LineItems, each with Materials, PricingConditions, and so on. Multiply that by hundreds of thousands of records, and you're looking at gigabytes of hierarchical data.
The naive approach kills you:
// This will eat your RAM alive
type SalesOrder struct {
OrderID string
Customer Customer
LineItems []LineItem // Could be hundreds per order
Properties map[string]any // SAP loves dynamic properties
}
func loadAllOrders() []SalesOrder {
resp, _ := http.Get("https://sap-gateway/odata/SalesOrders?$expand=LineItems,Customer")
body, _ := ioutil.ReadAll(resp.Body) // ENTIRE response in memory
var result struct {
Value []SalesOrder `json:"value"`
}
json.Unmarshal(body, &result) // Now we have TWO copies in memory
return result.Value
}
With 100,000 orders averaging 50KB each, you're looking at 5GB just for the raw data, then another 5GB+ when deserialized into Go structs. Your 500MB limit is laughing at you.
Stream Processing Pattern
The key insight is: you rarely need all the data at once. Process each entity as it arrives, then discard it.
// Stream and process one entity at a time
func streamOrders(process func(SalesOrder) error) error {
resp, err := http.Get("https://sap-gateway/odata/SalesOrders?$expand=LineItems")
if err != nil {
return err
}
defer resp.Body.Close()
decoder := json.NewDecoder(resp.Body)
// Navigate to the "value" array
// OData responses look like: {"@odata.context": "...", "value": [...]}
for {
token, err := decoder.Token()
if err != nil {
return err
}
if key, ok := token.(string); ok && key == "value" {
break
}
}
// Consume the opening bracket of the array
decoder.Token() // '['
// Stream each entity
for decoder.More() {
var order SalesOrder
if err := decoder.Decode(&order); err != nil {
return err
}
if err := process(order); err != nil {
return err
}
// order goes out of scope here, eligible for GC
}
return nil
}
// Usage
func main() {
var totalValue float64
var orderCount int
streamOrders(func(order SalesOrder) error {
totalValue += order.TotalAmount
orderCount++
// Maybe write to database, send to queue, etc.
return nil
})
fmt.Printf("Processed %d orders, total: %.2f\n", orderCount, totalValue)
}
Memory usage is now constant regardless of dataset size. You're only ever holding one SalesOrder in memory at a time.
Handling SAP's Nested Entities
SAP OData gets tricky with deep nesting. An expanded query might return:
{
"OrderID": "12345",
"LineItems": [
{
"ItemID": "001",
"Material": {
"MaterialID": "MAT-001",
"Description": "Widget",
"PricingConditions": [
{"ConditionType": "PR00", "Amount": 99.99},
{"ConditionType": "DISC", "Amount": -10.00}
]
}
}
]
}
If LineItems can have thousands of entries, even streaming one order at a time isn't enough. You need to stream the nested arrays too:
// For truly massive nested structures, use a SAX-style approach
type StreamingOrderProcessor struct {
currentOrder *SalesOrder
currentLineItem *LineItem
onLineItem func(*LineItem) error
onOrderComplete func(*SalesOrder) error
}
func (p *StreamingOrderProcessor) ProcessToken(token json.Token, decoder *json.Decoder) error {
// State machine that processes tokens one at a time
// Only materializes one LineItem at a time
// Calls onLineItem callback as each completes
// ...
}
This is more complex, but it means you can process a single order with 10,000 line items without ever holding all 10,000 in memory.
Pagination: Let SAP Do the Work
SAP OData supports server-side pagination. Use it:
func streamWithPagination(baseURL string, process func(SalesOrder) error) error {
nextURL := baseURL + "?$top=1000" // Fetch 1000 at a time
for nextURL != "" {
resp, err := http.Get(nextURL)
if err != nil {
return err
}
var page struct {
Value []SalesOrder `json:"value"`
NextLink string `json:"@odata.nextLink"`
}
// Even here, stream the response body
decoder := json.NewDecoder(resp.Body)
decoder.Decode(&page)
resp.Body.Close()
for _, order := range page.Value {
if err := process(order); err != nil {
return err
}
}
// Clear the slice to help GC
page.Value = nil
nextURL = page.NextLink // Empty string when done
}
return nil
}
Now you're holding at most 1000 orders in memory at once. Combined with the streaming decoder, you can process millions of records within your 500MB budget.
Reusing Buffers for SAP Properties
SAP entities often have dynamic properties that come as map[string]interface{}. These maps allocate heavily. If you're processing similar entities repeatedly, reuse the map:
type EntityProcessor struct {
// Reusable buffer for properties
propBuffer map[string]interface{}
}
func (p *EntityProcessor) Process(decoder *json.Decoder) error {
// Clear and reuse instead of allocating new
for k := range p.propBuffer {
delete(p.propBuffer, k)
}
// Decode into existing map
if err := decoder.Decode(&p.propBuffer); err != nil {
return err
}
// Process properties...
return nil
}
The Delta Pattern
For ongoing synchronization, don't re-fetch everything. SAP OData supports delta queries:
// Initial load
resp, _ := http.Get("/odata/SalesOrders?$deltatoken=start")
// Response includes a delta link
// {"value": [...], "@odata.deltaLink": "/odata/SalesOrders?$deltatoken=abc123"}
// Subsequent calls only return changes
resp, _ = http.Get("/odata/SalesOrders?$deltatoken=abc123")
// Returns only created, modified, or deleted entities since last call
Instead of streaming 100,000 entities every sync, you're streaming maybe 500 changes. Memory pressure drops dramatically.
Practical Tips for Your Go Code
After dealing with the 500MB constraint at work, here's what I've learned:
1. Reuse Slices and Buffers
// Bad: New allocation every call
func process(data []byte) []byte {
result := make([]byte, len(data))
// ... process
return result
}
// Good: Reuse buffer
func process(data []byte, buf []byte) []byte {
buf = buf[:0] // Reset length, keep capacity
// ... process into buf
return buf
}
2. Use sync.Pool for Frequent Allocations
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 4096)
},
}
func processRequest() {
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Use buf...
}
3. Preallocate with Known Capacity
// Bad: Multiple reallocations as slice grows
result := []string{}
for _, item := range items {
result = append(result, process(item))
}
// Good: Single allocation
result := make([]string, 0, len(items))
for _, item := range items {
result = append(result, process(item))
}
4. Stream Instead of Buffer
// Bad: Load entire file into memory
data, _ := ioutil.ReadFile("huge.json")
var result MyStruct
json.Unmarshal(data, &result)
// Good: Stream the data
file, _ := os.Open("huge.json")
decoder := json.NewDecoder(file)
var result MyStruct
decoder.Decode(&result)
5. Profile Before Optimizing
go test -bench=. -benchmem
go tool pprof -alloc_space profile.out
The -benchmem flag shows allocations per operation. Often the biggest wins come from eliminating allocations you didn't even know were happening.
Conclusion
Memory isn't just "space where data lives", it's a complex landscape of trade-offs between speed, flexibility, and resource usage. The stack gives you blazing speed but limited flexibility. The heap gives you flexibility but at a performance cost. Game engineers have spent decades learning to work within these constraints, and their techniques (object pools, data-oriented design, arena allocators) are increasingly relevant as we build more performance-sensitive applications.
Next time you write make() or new(), take a moment to think about where that memory is coming from, how it's being managed, and whether there's a more efficient way to accomplish your goal. Your future self (and your server's RAM) will thank you.