May 13, 2026 · 12 min read

Inngest for Agentic Systems: Durable Execution Without the Infrastructure Tax

When you build an agent that makes seven sequential LLM calls — each step with a 3% failure rate — your end-to-end success rate is 0.97^7 = 80.8%. One in five runs fails. In production, across hundreds of daily executions, that's a constant background noise of failed jobs, lost context, and users who get nothing.

The instinct is to retry the whole function. That fixes the success rate — function-level retry with 3 attempts gets you to ~99.9%. But it introduces a different problem: every retry re-runs every step that already succeeded. Your agent re-calls the LLM for context it already has, re-queries the database for data it already fetched, re-embeds documents it already processed. You're burning tokens and compute to re-derive work that never failed.

END-TO-END SUCCESS RATE — 7-STEP AGENT, 3% PER-STEP FAILURE RATE70%85%100%80.8%Naive(no retry)99.9%Function-levelretry99.98%Step-level retryInngestfunction retry achieves 99.9% but re-executes all 7 steps on each failure — inngest retries only the failed step

The real fix is step-level durability: retry only the step that failed, resume from the exact point of failure, don't touch anything that already completed. That's what Inngest gives you — and it's the premise the rest of this post builds on.

What Inngest actually is

Inngest is not a queue. The distinction matters.

A queue gives you: retry the job if it fails. Inngest gives you: retry the step that failed, resume from the exact checkpoint, sleep for days without holding server resources, pause indefinitely for human input, observe everything with full replay.

The mental model: every Inngest function is a checkpoint-aware workflow. Each step.run() call creates a durable checkpoint. If your server restarts mid-execution, Inngest replays the function from the beginning — but steps that already completed return their cached result instantly without re-executing. Only the step that failed (or never started) actually runs.

EXECUTION TIMELINE — FAILURE AT STEP 5 OF 7TraditionalInngest12345FAILrestart →12345671✓2✓3✓4✓5retry →5✓67+5 wasted steps60% fewer steps executed

This changes the unit of failure from "the entire job" to "the specific step that failed." For agents where each LLM call is expensive, slow, and independently fallible, this is a meaningful architectural upgrade.

The four core primitives

step.run(id, fn) — Execute a unit of work with automatic retry and checkpoint. The step ID must be stable and unique within the function. Changing it invalidates the checkpoint.

step.sleep(id, duration) — Pause execution without holding a server connection. The function state is serialized, the process releases, and the function re-invokes when the duration expires. Zero compute during sleep.

step.waitForEvent(id, config) — Pause until a matching event arrives, up to a configurable timeout. This is how you implement human-in-the-loop without a custom webhook router and state machine.

step.sendEvent(id, event) — Emit events from within a running function. This is how agents trigger other agents, and how multi-agent systems hand work off without tight coupling.

Pattern 1: Reliable LLM calls

The key insight: disable function-level retry, enable step-level retry. Steps that succeed don't re-run on failure elsewhere.

import { inngest } from './inngest';
import OpenAI from 'openai';

const openai = new OpenAI();

export const researchAgent = inngest.createFunction(
  {
    id: 'research-agent',
    retries: 0, // function-level retry off — step handles recovery
    throttle: { limit: 10, period: '1m' },
  },
  { event: 'research.requested' },
  async ({ event, step }) => {
    // Step 1: generate outline — checkpointed on success
    const outline = await step.run('generate-outline', async () => {
      const res = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: 'Outline: ' + event.data.topic }],
      });
      return res.choices[0].message.content;
    });

    // Step 2: expand — if this fails, step 1 does NOT re-run
    const sections = await step.run('expand-sections', async () => {
      const res = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: 'Expand this outline:

' + outline }],
      });
      return res.choices[0].message.content;
    });

    const citations = await step.run('generate-citations', async () => {
      return fetchCitations(event.data.topic, sections);
    });

    return { outline, sections, citations };
  }
);

If expand-sections fails and exhausts retries, the function fails — but generate-outline is already checkpointed. Re-triggering the function returns the outline from cache instantly; expansion retries from exactly where it failed. No tokens re-burned.

Pattern 2: Parallel tool execution

Agent frameworks often serialize tool calls. With Inngest, Promise.all across steps makes them concurrent — each with independent retry. If one tool fails, the others' results are already checkpointed.

const [webResult, codeResult, dbResult] = await Promise.all([
  step.run('tool-web-search', () => searchWeb(query)),
  step.run('tool-code-interpreter', () => runCode(snippet)),
  step.run('tool-database', () => queryDB(params)),
]);

const synthesis = await step.run('synthesize', async () => {
  return openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: formatToolResults(webResult, codeResult, dbResult) }],
  });
});
FAN-OUT: PARALLEL TOOL EXECUTION WITH PER-STEP RETRYOrchestratorstep.run()Web Searchstep.run('tool-web')Code Interpreterstep.run('tool-code')Database Querystep.run('tool-db')✓ cached✗ retry✓ cachedJoinPromise.allLLM callall resultscode interpreter retried independently — web and db results already checkpointed, not re-executed

The checkpointing means parallel tool execution is safe by default. A transient code interpreter timeout doesn't re-run your database query. Each tool is independently atomic.

Pattern 3: Human-in-the-loop

Most implementations bolt this on after the fact: a webhook endpoint, a polling loop, edge cases when the reviewer never responds. With waitForEvent, the pattern collapses to three lines.

export const contentReviewAgent = inngest.createFunction(
  { id: 'content-review-agent' },
  { event: 'content.draft.created' },
  async ({ event, step }) => {
    const draft = await step.run('generate-draft', () =>
      generateContent(event.data.brief)
    );

    await step.run('notify-reviewer', () =>
      sendSlackMessage({ text: 'Review needed: ' + draft.title })
    );

    // Pauses here. Server releases. Zero compute for up to 72h.
    const decision = await step.waitForEvent('wait-for-approval', {
      event: 'content.review.decision',
      match: 'data.draftId',
      timeout: '72h',
    });

    if (!decision) {
      await step.run('escalate', () => sendEscalationAlert(event.data.brief));
      return { status: 'timed_out' };
    }

    if (decision.data.verdict === 'approved') {
      await step.run('publish', () => publishContent(draft));
      return { status: 'published' };
    }

    await step.run('revise', () => queueRevision(draft, decision.data.feedback));
    return { status: 'revision_queued' };
  }
);

When a reviewer submits their decision, call inngest.send({ name: 'content.review.decision', data: { draftId, verdict, feedback } }). The paused function resumes within seconds.

HUMAN-IN-THE-LOOP — PAUSE UP TO 72H WITHOUT HOLDING A SERVER PROCESSGeneratedraftNotifyreviewerPAUSEwaitForEventup to 72hserver idle →zero compute costtimeout →?rejectPublishstep.run()Revisestep.run()Published✓ donereviewer POSTs to your API → inngest.send({ name: 'content.review.decision' }) → function resumes in seconds

The server releases during the pause. Not a long-poll. Not a cron job checking a database. Not a Redis key with a TTL. The function suspends cleanly, and the Inngest platform handles the wake-up. This pattern works for approval workflows, async code review pipelines, multi-step onboarding — anywhere humans are in the critical path.

Pattern 4: Multi-agent orchestration

step.invoke() calls another Inngest function and waits for its result — synchronously from the orchestrator's perspective, but with full durability at the infrastructure level. Sub-agents run independently, with their own retry policies and observability.

export const orchestrator = inngest.createFunction(
  { id: 'research-orchestrator' },
  { event: 'task.submitted' },
  async ({ event, step }) => {
    const research = await step.invoke('run-researcher', {
      function: researcherAgent,
      data: { topic: event.data.topic },
    });

    const [critique, formatted] = await Promise.all([
      step.invoke('run-critic', { function: criticAgent, data: { research } }),
      step.invoke('run-formatter', { function: formatterAgent, data: { research } }),
    ]);

    return step.invoke('run-synthesizer', {
      function: synthesizerAgent,
      data: { research, critique, formatted },
    });
  }
);

Each sub-agent is independently retryable, independently observable in the Inngest dashboard, and independently deployable — potentially to different services. The orchestrator doesn't know or care where sub-agents run. If the critic fails, Inngest retries it without re-running the researcher.

This is where Inngest separates from both queues (which have no concept of sub-agent coordination) and LangGraph (which handles agent logic but not infrastructure durability). You can combine them: LangGraph manages the reasoning graph inside each sub-agent, Inngest manages the durable execution of the entire multi-agent pipeline.

Observability without instrumentation

Every Inngest function produces a step-level trace by default: which steps ran, how long each took, how many retries, what data was passed in and returned. The dashboard gives you a timeline view with full input/output at each step.

For agents, this is unusually valuable. LLM calls are opaque — you can't tell from application logs whether a bad output came from a bad prompt, a rate limit retry that corrupted context, or a timeout that returned a partial response. Step-level traces tell you exactly what happened at each decision point. Inngest also exposes a replay API: re-run any past execution with the original input, which makes debugging production agent failures tractable.

When to reach for Inngest

FEATURE COMPARISON — AGENTIC WORKFLOW INFRASTRUCTUREFeatureBullMQ / SQSLangGraphInngestStep-level checkpointingpartialHuman-in-the-loop waitDIY buildinterrupt()✓ built-inParallel steps + per-step retrypartial✓ Promise.allMax pause durationqueue TTLunlimited (DB)monthsLocal dev experienceRedis requireddev serverObservability (built-in)add-onbasicstep traces + replayLanguage supportJS/Python/GoPythonTS / Python / GoLangGraph and Inngest are complementary — LangGraph for agent reasoning logic, Inngest for durable execution

Inngest is the right choice when: you need step-level durability and recovery (not just job-level retry), you have human-in-the-loop workflows with indefinite wait times, you're orchestrating multiple agents with complex dependencies, or you want production-grade observability without building a tracing system.

LangGraph is the right choice when: you need fine-grained graph-based agent reasoning in Python, and you're comfortable managing your own checkpointing infrastructure (PostgresSaver). LangGraph and Inngest compose well — LangGraph runs the reasoning loop inside a node, Inngest provides durable execution around it.

Raw queues (BullMQ, SQS, Kafka) are the right choice when you have simple, well-understood job processing with predictable failure modes and existing queue infrastructure. For complex multi-step agentic workflows, they require significant custom work to approximate what Inngest gives you out of the box.

The sleep() insight

From Inngest's architecture: step.sleep() calls do not hold a server connection during the sleep duration. Function state is serialized, the connection is released, and the function re-invokes when the duration expires.

This changes the economics of long-running agents. A workflow that pauses 48 hours for human review costs nothing in server compute during the pause. Traditional approaches either hold a worker process (expensive), poll a database (wasteful), or build a custom state machine (complex). None are the right default.

The deeper point: Inngest treats the workflow as the unit of durability, not the individual function call. Step-level checkpointing, event-driven coordination, and resource-efficient pausing all follow from that single design decision. For agentic systems — where work is long, multi-step, LLM-mediated, and expensive to restart — that's the right abstraction level to build on.

Related topics
AITypeScriptSystems

T
Tanmay Bohra
Full Stack Engineer at Grant Thornton Bharat. Building high-concurrency systems in Go and TypeScript.
← portfolio chat with tanmay ↗