Introduction

Currently at my workplace I'm having to deal with memory constraints I hadn't given much thought to before. We're streaming gigabytes of data in various formats (Proto, JSON, XML) from multiple data sources, with a strict limit of keeping the server's RAM usage below 500MB for processing. This limitation forces us to think about memory management when making code changes or when building new features.

Essentially I wanted to express how memory should be visualized, or at least how I personally conceptualize it. I plan to cover some basic concepts like stack and heap memory, along with what happens behind the scenes when you allocate memory to the stack versus the heap. I'll also discuss how game engineers, known for their efficiency, achieve impressive performance with minimal memory, often relying on low-level languages but also using clever memory optimization techniques.

The RAM

If you have a computer science background you know the RAM is what acts like a miniature brain to store temporary information we need to access quickly. But here's how I like to think about it:

Imagine RAM as a massive warehouse with numbered shelves.

Each shelf has an address (like 0x7ffd5e8c3a40), and each shelf can hold exactly one byte. When your program needs to store something, it's essentially asking the warehouse manager: "Hey, I need 8 shelves next to each other to store this number." The manager finds a spot, reserves those shelves, and hands you the address of the first one.

The beautiful thing about RAM is that it doesn't matter which shelf you access, shelf #1 or shelf #1,000,000, it takes the same amount of time. That's the "Random Access" part. Unlike a hard drive (which is more like a vinyl record that needs to spin to the right track), RAM is instant access to any location.

But here's the catch: this warehouse has two very different sections with completely different management styles.

The Stack: Your Organized Desk

The stack is like a perfectly organized desk where you pile papers on top of each other. Simple rules:

You can only add to the top (push)
You can only remove from the top (pop)
You always know exactly how tall the pile is

func calculateSum() int {
    a := 10       // Push 'a' onto the stack
    b := 20       // Push 'b' onto the stack
    sum := a + b  // Push 'sum' onto the stack
    return sum    // Pop everything when function returns
}

When calculateSum() is called, the program reserves a chunk of stack space called a stack frame. It looks something like this:

┌─────────────────────┐ ← Stack Pointer (top)
│     sum = 30        │
├─────────────────────┤
│     b = 20          │
├─────────────────────┤
│     a = 10          │
├─────────────────────┤
│   return address    │ ← Where to go after function ends
├─────────────────────┤
│  previous frame...  │
└─────────────────────┘

The moment the function returns, the stack pointer simply moves back down. We don't even need to "clean up" the data, we just pretend it doesn't exist anymore. The next function call will overwrite it.

Why is this fast?

No searching: We always know where the top is
No fragmentation: Everything is contiguous
No bookkeeping: Just move a pointer up or down
CPU cache friendly: Sequential memory access patterns

The stack allocation is literally just:

stack_pointer -= size_needed
return stack_pointer

That's it. One subtraction. Compare that to heap allocation, which we'll get to shortly.

The limitation? The stack has a fixed size (usually 1-8MB per thread), and everything on it must have a known size at compile time. You can't say "give me an array of unknown length" on the stack.

The Heap: The Chaotic Storage Unit

If the stack is your organized desk, the heap is a massive storage unit where you rent space of various sizes, and nobody's cleaning up after you.

func createUser() *User {
    user := &User{     // Allocated on the HEAP
        Name: "John",
        Age:  30,
    }
    return user        // Pointer escapes the function
}

Why does user go on the heap here? Because we're returning a pointer to it. If it lived on the stack, the function's stack frame would be "popped" when it returns, and we'd be pointing to garbage memory. The Go compiler detects this through escape analysis and automatically moves it to the heap.

Here's what the heap looks like after several allocations and deallocations:

┌────────┬────────┬─────┬────────┬─────┬────────────┐
│  Used  │  Free  │Used │  Free  │Used │   Free     │
│  64B   │  128B  │ 32B │  64B   │256B │   512B     │
└────────┴────────┴─────┴────────┴─────┴────────────┘

See those gaps? That's fragmentation. You might have 704 bytes free in total, but if you need a contiguous 200-byte block, you can't use the smaller gaps. The heap allocator has to:

Search for a suitable free block
Potentially split a larger block
Update bookkeeping metadata
Handle thread synchronization (locks!)
Later: merge adjacent free blocks

This is why heap allocation can be 10-100x slower than stack allocation.

The Allocator's Nightmare

Let's peek behind the curtain at what happens when you write make([]byte, 1000) in Go:

Your Code          Runtime             Operating System
    │                │                      │
    ▼                │                      │
make([]byte, 1000)   │                      │
    │                │                      │
    └───────────────▶│                      │
                 mallocgc()                 │
                     │                      │
                     ▼                      │
                 Check size class           │
                 (1000 bytes → class 14)    │
                     │                      │
                     ▼                      │
                 Check thread cache         │
                 (mcache)                   │
                     │                      │
           ┌─────────┴─────────┐            │
           ▼                   ▼            │
       Cache hit          Cache miss        │
       (fast path)        (slow path)       │
           │                   │            │
           │                   ▼            │
           │              Check central     │
           │              cache (mcentral)  │
           │                   │            │
           │                   ▼            │
           │              Still empty?      │
           │                   │            │
           │                   ▼            │
           │              Request from      │
           │              mheap             │
           │                   │            │
           │                   └───────────▶│
           │                            mmap() syscall
           │                            (ask OS for pages)
           │                                │
           ▼                                ▼
       Return pointer ◀─────────────────────┘

Go's allocator uses size classes to reduce fragmentation. Instead of allocating exactly what you ask for, it rounds up to the nearest size class (8, 16, 32, 48, 64, 80... bytes). This means a 1000-byte request actually gets 1024 bytes, but it massively simplifies the allocator's job.

Escape Analysis: Where Does This Live?

Go's compiler decides at compile time whether something goes on the stack or heap. You can actually see these decisions:

go build -gcflags="-m" main.go

func stackAlloc() int {
    x := 42      // stays on stack
    return x
}

func heapAlloc() *int {
    x := 42      // escapes to heap!
    return &x    // because we return a pointer
}

func sliceGrowth() {
    s := make([]int, 0, 10)   // might stay on stack (small, known size)
    s = append(s, 1, 2, 3)    // still stack if it fits

    s2 := make([]int, 0)      // escapes! (unknown final size)
    for i := 0; i < 1000; i++ {
        s2 = append(s2, i)    // definitely heap
    }
}

Common escape triggers:

Returning a pointer to a local variable
Storing a pointer in a global variable
Sending a pointer through a channel
Closures capturing local variables
Interface conversions (sometimes)
Slices that grow beyond initial capacity

Game Engineers: The Memory Wizards

Game developers operate under brutal constraints: 16ms per frame (60 FPS), fixed memory budgets, and zero tolerance for garbage collection pauses. Here's how they think differently:

1. Object Pools

Instead of allocating and freeing bullets every time someone shoots:

// The naive way (allocation every shot)
func shoot() *Bullet {
    return &Bullet{x: playerX, y: playerY, active: true}
}

// The game dev way (pre-allocated pool)
type BulletPool struct {
    bullets [1000]Bullet  // Pre-allocated array
    active  [1000]bool    // Track which are in use
}

func (p *BulletPool) acquire() *Bullet {
    for i := range p.active {
        if !p.active[i] {
            p.active[i] = true
            return &p.bullets[i]
        }
    }
    return nil  // Pool exhausted
}

func (p *BulletPool) release(b *Bullet) {
    idx := (uintptr(unsafe.Pointer(b)) - uintptr(unsafe.Pointer(&p.bullets[0]))) / unsafe.Sizeof(Bullet{})
    p.active[idx] = false
}

Zero allocations during gameplay. The pool is allocated once at startup.

2. Data-Oriented Design

Traditional OOP scatters data across memory:

// Object-Oriented: Each entity is its own heap allocation
type Entity struct {
    Position  Vector3   // 24 bytes
    Velocity  Vector3   // 24 bytes
    Health    int       // 8 bytes
    Sprite    *Texture  // 8 bytes
    // ... 64 bytes per entity, scattered across heap
}
entities := make([]*Entity, 10000)  // 10,000 pointers to random heap locations

When you iterate through entities to update positions, the CPU cache is constantly missing because each entity lives in a different memory location.

Data-oriented design groups similar data together:

// Data-Oriented: Group by access pattern
type Positions struct {
    X [10000]float32  // 40KB contiguous
    Y [10000]float32  // 40KB contiguous
    Z [10000]float32  // 40KB contiguous
}

type Velocities struct {
    X [10000]float32
    Y [10000]float32
    Z [10000]float32
}

func updatePositions(pos *Positions, vel *Velocities, dt float32) {
    for i := 0; i < 10000; i++ {
        pos.X[i] += vel.X[i] * dt  // Sequential memory access = cache heaven
        pos.Y[i] += vel.Y[i] * dt
        pos.Z[i] += vel.Z[i] * dt
    }
}

This can be 10-50x faster due to cache efficiency. The CPU prefetcher loves sequential access patterns.

3. Arena Allocators

For temporary per-frame allocations, game engines use arena (or bump) allocators:

type Arena struct {
    buffer []byte
    offset int
}

func NewArena(size int) *Arena {
    return &Arena{buffer: make([]byte, size)}
}

func (a *Arena) Alloc(size int) unsafe.Pointer {
    // Align to 8 bytes
    aligned := (a.offset + 7) &^ 7
    if aligned+size > len(a.buffer) {
        panic("arena exhausted")
    }
    ptr := unsafe.Pointer(&a.buffer[aligned])
    a.offset = aligned + size
    return ptr
}

func (a *Arena) Reset() {
    a.offset = 0  // "Free" everything instantly
}

At the start of each frame, reset the arena. All per-frame allocations are instant (just bump a pointer), and cleanup is instant (just reset the offset). No GC involved.

4. Memory Layout Awareness

Consider cache line size (typically 64 bytes):

// Bad: False sharing in concurrent code
type Counters struct {
    a int64  // These might be on the same cache line
    b int64  // Different goroutines writing = cache thrashing
}

// Good: Padded to separate cache lines
type Counters struct {
    a int64
    _ [56]byte  // Padding to fill cache line
    b int64
    _ [56]byte
}

Streaming SAP Data: The Real World Problem

Now let's talk about what actually prompted this post. At work, we're pulling data from SAP interfaces, specifically OData services that expose entities and their properties. A single entity like SalesOrder might have nested LineItems, each with Materials, PricingConditions, and so on. Multiply that by hundreds of thousands of records, and you're looking at gigabytes of hierarchical data.

The naive approach kills you:

// This will eat your RAM alive
type SalesOrder struct {
    OrderID    string
    Customer   Customer
    LineItems  []LineItem       // Could be hundreds per order
    Properties map[string]any   // SAP loves dynamic properties
}

func loadAllOrders() []SalesOrder {
    resp, _ := http.Get("https://sap-gateway/odata/SalesOrders?$expand=LineItems,Customer")
    body, _ := ioutil.ReadAll(resp.Body)   // ENTIRE response in memory

    var result struct {
        Value []SalesOrder `json:"value"`
    }
    json.Unmarshal(body, &result)   // Now we have TWO copies in memory
    return result.Value
}

With 100,000 orders averaging 50KB each, you're looking at 5GB just for the raw data, then another 5GB+ when deserialized into Go structs. Your 500MB limit is laughing at you.

Stream Processing Pattern

The key insight is: you rarely need all the data at once. Process each entity as it arrives, then discard it.

// Stream and process one entity at a time
func streamOrders(process func(SalesOrder) error) error {
    resp, err := http.Get("https://sap-gateway/odata/SalesOrders?$expand=LineItems")
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    decoder := json.NewDecoder(resp.Body)

    // Navigate to the "value" array
    // OData responses look like: {"@odata.context": "...", "value": [...]}
    for {
        token, err := decoder.Token()
        if err != nil {
            return err
        }
        if key, ok := token.(string); ok && key == "value" {
            break
        }
    }

    // Consume the opening bracket of the array
    decoder.Token()  // '['

    // Stream each entity
    for decoder.More() {
        var order SalesOrder
        if err := decoder.Decode(&order); err != nil {
            return err
        }

        if err := process(order); err != nil {
            return err
        }
        // order goes out of scope here, eligible for GC
    }

    return nil
}

// Usage
func main() {
    var totalValue float64
    var orderCount int

    streamOrders(func(order SalesOrder) error {
        totalValue += order.TotalAmount
        orderCount++

        // Maybe write to database, send to queue, etc.
        return nil
    })

    fmt.Printf("Processed %d orders, total: %.2f\n", orderCount, totalValue)
}

Memory usage is now constant regardless of dataset size. You're only ever holding one SalesOrder in memory at a time.

Handling SAP's Nested Entities

SAP OData gets tricky with deep nesting. An expanded query might return:

{
    "OrderID": "12345",
    "LineItems": [
        {
            "ItemID": "001",
            "Material": {
                "MaterialID": "MAT-001",
                "Description": "Widget",
                "PricingConditions": [
                    {"ConditionType": "PR00", "Amount": 99.99},
                    {"ConditionType": "DISC", "Amount": -10.00}
                ]
            }
        }
    ]
}

If LineItems can have thousands of entries, even streaming one order at a time isn't enough. You need to stream the nested arrays too:

// For truly massive nested structures, use a SAX-style approach
type StreamingOrderProcessor struct {
    currentOrder    *SalesOrder
    currentLineItem *LineItem
    onLineItem      func(*LineItem) error
    onOrderComplete func(*SalesOrder) error
}

func (p *StreamingOrderProcessor) ProcessToken(token json.Token, decoder *json.Decoder) error {
    // State machine that processes tokens one at a time
    // Only materializes one LineItem at a time
    // Calls onLineItem callback as each completes
    // ...
}

This is more complex, but it means you can process a single order with 10,000 line items without ever holding all 10,000 in memory.

Pagination: Let SAP Do the Work

SAP OData supports server-side pagination. Use it:

func streamWithPagination(baseURL string, process func(SalesOrder) error) error {
    nextURL := baseURL + "?$top=1000"   // Fetch 1000 at a time

    for nextURL != "" {
        resp, err := http.Get(nextURL)
        if err != nil {
            return err
        }

        var page struct {
            Value    []SalesOrder `json:"value"`
            NextLink string       `json:"@odata.nextLink"`
        }

        // Even here, stream the response body
        decoder := json.NewDecoder(resp.Body)
        decoder.Decode(&page)
        resp.Body.Close()

        for _, order := range page.Value {
            if err := process(order); err != nil {
                return err
            }
        }

        // Clear the slice to help GC
        page.Value = nil

        nextURL = page.NextLink   // Empty string when done
    }

    return nil
}

Now you're holding at most 1000 orders in memory at once. Combined with the streaming decoder, you can process millions of records within your 500MB budget.

Reusing Buffers for SAP Properties

SAP entities often have dynamic properties that come as map[string]interface{}. These maps allocate heavily. If you're processing similar entities repeatedly, reuse the map:

type EntityProcessor struct {
    // Reusable buffer for properties
    propBuffer map[string]interface{}
}

func (p *EntityProcessor) Process(decoder *json.Decoder) error {
    // Clear and reuse instead of allocating new
    for k := range p.propBuffer {
        delete(p.propBuffer, k)
    }

    // Decode into existing map
    if err := decoder.Decode(&p.propBuffer); err != nil {
        return err
    }

    // Process properties...
    return nil
}

The Delta Pattern

For ongoing synchronization, don't re-fetch everything. SAP OData supports delta queries:

// Initial load
resp, _ := http.Get("/odata/SalesOrders?$deltatoken=start")

// Response includes a delta link
// {"value": [...], "@odata.deltaLink": "/odata/SalesOrders?$deltatoken=abc123"}

// Subsequent calls only return changes
resp, _ = http.Get("/odata/SalesOrders?$deltatoken=abc123")
// Returns only created, modified, or deleted entities since last call

Instead of streaming 100,000 entities every sync, you're streaming maybe 500 changes. Memory pressure drops dramatically.

Practical Tips for Your Go Code

After dealing with the 500MB constraint at work, here's what I've learned:

1. Reuse Slices and Buffers

// Bad: New allocation every call
func process(data []byte) []byte {
    result := make([]byte, len(data))
    // ... process
    return result
}

// Good: Reuse buffer
func process(data []byte, buf []byte) []byte {
    buf = buf[:0]  // Reset length, keep capacity
    // ... process into buf
    return buf
}

2. Use sync.Pool for Frequent Allocations

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

func processRequest() {
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)

    // Use buf...
}

3. Preallocate with Known Capacity

// Bad: Multiple reallocations as slice grows
result := []string{}
for _, item := range items {
    result = append(result, process(item))
}

// Good: Single allocation
result := make([]string, 0, len(items))
for _, item := range items {
    result = append(result, process(item))
}

4. Stream Instead of Buffer

// Bad: Load entire file into memory
data, _ := ioutil.ReadFile("huge.json")
var result MyStruct
json.Unmarshal(data, &result)

// Good: Stream the data
file, _ := os.Open("huge.json")
decoder := json.NewDecoder(file)
var result MyStruct
decoder.Decode(&result)

5. Profile Before Optimizing

go test -bench=. -benchmem
go tool pprof -alloc_space profile.out

The -benchmem flag shows allocations per operation. Often the biggest wins come from eliminating allocations you didn't even know were happening.

Conclusion

Memory isn't just "space where data lives", it's a complex landscape of trade-offs between speed, flexibility, and resource usage. The stack gives you blazing speed but limited flexibility. The heap gives you flexibility but at a performance cost. Game engineers have spent decades learning to work within these constraints, and their techniques (object pools, data-oriented design, arena allocators) are increasingly relevant as we build more performance-sensitive applications.

Next time you write make() or new(), take a moment to think about where that memory is coming from, how it's being managed, and whether there's a more efficient way to accomplish your goal. Your future self (and your server's RAM) will thank you.