May 9, 2026 · 9 min read

Go Worker Pools: Handling 10,000 Concurrent Requests on a $20 Server

The naive approach — one goroutine per request — works fine up to about 10k concurrent connections. Past that you're paying for context switches on every tick, the GC is burning CPU cleaning up goroutine stacks, and RSS climbs past what your $20 server can handle.

Worker pool architecture

The Problem With Unbounded Goroutines

Each goroutine starts with an 8 KB stack that grows dynamically. At 50k concurrent connections you're looking at 400 MB of stack space minimum — before any allocations. The scheduler has to context-switch across all of them every millisecond, and GC pause times start creeping into p99.

The real killer is a thundering herd: every goroutine wakes up simultaneously, hammers the same downstream service, and your latency distribution turns into a long tail.

A Bounded Worker Pool

A worker pool is a fixed number of goroutines reading jobs off a buffered channel. Instead of spinning up a goroutine per request you enqueue the job and let a worker pick it up.

type Pool struct {
    jobs chan Job
}

func NewPool(workers, queue int) *Pool {
    p := &Pool{jobs: make(chan Job, queue)}
    for i := 0; i < workers; i++ {
        go p.worker()
    }
    return p
}

func (p *Pool) worker() {
    for job := range p.jobs {
        job.fn(job.ctx)
    }
}

func (p *Pool) Submit(ctx context.Context, fn func(context.Context)) error {
    select {
    case p.jobs <- Job{ctx: ctx, fn: fn}:
        return nil
    case <-ctx.Done():
        return ctx.Err()
    }
}

The select with ctx.Done() is critical — when the upstream client disconnects the job is dropped instead of queuing for minutes.

Sizing the Pool

Throughput: goroutine-per-request vs worker pool
Fig 2. Worker pool hits 22.1k req/s vs 8.4k unbounded, at 10k concurrent.
Memory usage comparison
Fig 3. Peak RSS drops 1.4 GB → 280 MB. An 80% reduction on the same instance.

CPU-bound work: GOMAXPROCS workers. More than that adds context switches without adding throughput.

I/O-bound work: Start at GOMAXPROCS × 8. Most goroutines will be blocked on I/O so you can have more active than you have CPUs. Profile and tune from there.

Real Numbers

After switching from goroutine-per-request to a 200-worker pool on a 2-vCPU instance:

  • Peak RSS: 1.4 GB → 280 MB
  • Throughput at 10k concurrent: 8.4k → 22.1k req/s
  • p99 latency at 10k concurrent: 4.2s → 340ms
  • GC pause time: 180ms → 12ms

Always size through profiling, not intuition. The numbers above are for a specific workload — yours will differ.

Related topics
GoSystemsConcurrency

T
Tanmay Bohra
Full Stack Engineer at Grant Thornton Bharat. Building high-concurrency systems in Go and TypeScript.
← portfolio chat with tanmay ↗