Writing

Blog.

Notes on systems, Go, and things I'm building — or thinking about building.

→

Loading…

Jun 24, 2026 · 10 min read

Sakana Means Fish. Their New Model Is a School of Them.

Sakana AI built a 7B conductor that orchestrates GPT-5.5, Claude, and Gemini. On GPQA-Diamond it beats Mythos 5. Here is exactly how it works and why the architecture matters.

Sakana AIFugu +7

Jun 20, 2026 · 8 min read

Most Consultants Are Eloquent. Few Are Clear.

EPIC is the communication framework I use in every client session. Four moves: Express WHY, Point first, Illustrate with stories, Close on commitment. No fluff.

communicationconsulting +6

Jun 18, 2026 · 14 min read

CrewAI Thinks in People. LangGraph Thinks in Process.

CrewAI vs LangGraph isn't a capability debate — it's a mental model debate. CrewAI thinks in people. LangGraph in process. Here's which one fits your problem.

CrewAILangGraph +7

Jun 17, 2026 · 18 min read

I Built a WebSocket Server Just to Show Three Progress Messages

Three wrong turns building a LangGraph agent — chained functions, while loops, and WebSockets for progress — and the production fixes that actually ship.

LangGraphLangChain +10

Jun 17, 2026 · 9 min read

'Respond Only With Valid JSON' Fails 1 in 10 Times

Naive JSON prompting fails 5-15% in production. Every major LLM provider ships constrained decoding now — the model cannot violate your schema. Here's how.

constrained decodingstructured output +8

Jun 17, 2026 · 10 min read

A Jailbroken Claude Was Used to Attack Claude

Claude Fable 5 survived 1,000 hours of bug bounty testing. A multi-agent pack hunt bypassed it in 48 hours. Here's why classifiers can't see the whole attack.

LLM securityjailbreak +8

Jun 17, 2026 · 11 min read

The Most Interesting Open Model Right Now Has No Official Benchmarks

Z.ai shipped GLM-5.2: 744B MoE, 1M context, MIT license, zero benchmarks at launch. Here's what the architecture means and why the silence is the real story.

GLM-5.2Zhipu AI +10

May 29, 2026 · 10 min read

AuditForge: Why the Best Configuration Audit Has Zero AI In It

I built a configuration audit tool for Oracle Fusion ERP and deliberately left out the LLMs. Here's why deterministic pipelines win in regulated environments.

AuditForgeDeterministic AI +4

May 29, 2026 · 14 min read

The Agentic Audit: Consulting Firms Are One Compliance Cycle Behind

IT audit has 40 years of maturity. Then agentic AI arrived — LLMs making tool calls with real-world side effects — and the frameworks ran out of answers.

AI AuditAgentic AI +5

May 29, 2026 · 14 min read

Building a Multi-Agent Debate System for Transfer Pricing Defense

India has ₹12+ lakh crore in pending TP disputes at ITAT. A traditional TP study costs ₹20–50 lakh. Aura TP uses adversarial agent ensembles — Inclusion Counsel, Exclusion Counsel, and a Partner Arbitrator — to automate comparable selection review and TPO defense drafting. Here’s the architecture, what breaks, and the honest cost math.

AI AgentsTransfer Pricing +4

May 29, 2026 · 13 min read

AI Governance in the Era of Autonomous Agents

EU AI Act high-risk obligations are live. 40% of enterprise systems can't be classified. Here's what boards, engineers, and deployers actually need to do about it.

AI GovernanceEU AI Act +4

May 29, 2026 · 11 min read

Speech AI in 2026: The Infrastructure Layer for Voice Agents

Voice is the default UI for agentic systems in 2026. Here's the full infrastructure stack: STT engines, TTS providers, WebRTC vs WebSockets, and where your 570ms latency budget actually goes.

Speech AIVoice Agents +5

May 23, 2026 · 10 min read

Picking Your LLM Without the Hype

MMLU is saturated noise. SWE-bench Pro is the real signal. Here's how to actually evaluate models — benchmarks that matter, the fine-tune vs RAG vs few-shot question, and what each model tier is actually for.

LLM evaluationmodel selection +11

May 23, 2026 · 8 min read

Semantic Caching Is the Cheat Code Nobody Talks About

Most LLM teams discover semantic caching too late. Here's how the three-layer architecture works, why the threshold matters more than you think, and what real production hit rates look like.

semantic cachingLLM infrastructure +8

May 23, 2026 · 9 min read

The LLM Deployment Provider Breakdown

Groq vs Bedrock vs Azure OpenAI vs self-hosted vLLM — when each provider actually makes sense, what the compliance tradeoffs look like, and where the $80K/month break-even comes from.

LLM deploymentAWS Bedrock +10

May 23, 2026 · 12 min read

Multi-Agent Systems Are Messier Than You Think

At 95% per-agent reliability, a 10-step sequential chain succeeds 60% of the time. What actually breaks in production multi-agent systems — and what LangGraph, AutoGen, and CrewAI don't tell you.

AI AgentsLangGraph +3

May 23, 2026 · 14 min read

The AI Chatbot Scaling Playbook

What breaks when your AI chatbot hits 1,000 concurrent users — and how to fix it before it does. Stateless design, semantic caching, gateways, and streaming at scale.

AI InfrastructureScaling +3

May 23, 2026 · 13 min read

Why Your RAG Falls Apart at Scale

The real failure rate of RAG in production is 26.4% — not 2.3%. Here's what actually breaks: chunking, retrieval drift, context stuffing, and silent hallucinations. And how to fix it.

RAGVector Database +3

May 20, 2026 · 12 min read

I Made My Website Bookable by Any AI Agent

After Google I/O 2026 announced AP2 and Universal Cart, I spent a weekend making tanmaybohra.com callable by Gemini, Claude, or any MCP-compatible AI. Here's exactly what I built — and where the ecosystem actually stands.

May 19, 2026 · 14 min read

The Golden Database: How Production AI Agents Are Actually Built

Replit tried fine-tuning and abandoned it. The real architecture behind the best AI coding agents is a curated store of high-quality traces, retrieved at inference time. Here's how it works — and when fine-tuning, Agent Skills, or inference-time compute is actually the right answer.

May 19, 2026 · 18 min read

The Cheapest LLM Call Is the One You Never Make

Sourcegraph got a $1M cloud bill three months after launch. Fixie's agent looped 847 times before anyone noticed. These aren't edge cases — they're the default outcome of shipping agents without treating observability and cost as first-class concerns.

AI AgentsObservability +3

May 13, 2026 · 12 min read

Inngest for Agentic Systems: Durable Execution Without the Infrastructure Tax

A naive 7-step agent with 3% per-step failure has an 80% end-to-end success rate. Inngest's step-level durability raises that to 99.98% — and eliminates the need to hand-build retry logic, human-in-the-loop state machines, or observability tooling from scratch.

AITypeScript +1

May 9, 2026 · 5 min read

A Book Club Changed How I Think About My Work

Developer life stays close to your own team - until an optional hobby group quietly puts you in the same room as people who have been losing sleep over problems you never knew existed.

ReflectionPeople +1

May 9, 2026 · 10 min read

LangGraph in Production: What Nobody Tells You

State machines, retry logic, and the edge cases that make agentic workflows fall apart at scale. Three months of production LangGraph — the parts the docs skip over.

May 9, 2026 · 8 min read

PostgreSQL Indexing Patterns I Actually Use

Partial indexes, covering indexes, and when BRIN beats everything — built from real EXPLAIN ANALYZE output, not toy examples. The strategies that cut query time from 4.8s to 12ms.

PostgreSQLSystems +1

May 9, 2026 · 7 min read

Zod in Production: Runtime Type Safety Without the Tax

TypeScript's type system vanishes at runtime. Zod bridges that gap — but the naive integration adds overhead you don't need. Here's how we use it at scale without paying the parse tax.

TypeScriptBackend

May 9, 2026 · 9 min read

Go Worker Pools: Handling 10,000 Concurrent Requests on a $20 Server

The naive goroutine-per-request model looks fine until 50k connections hit simultaneously. Here's the worker pool pattern I actually use in production, and why it cut our memory by 80%.

← portfolio chat with tanmay ↗