Writing

Blog.

Notes on systems, Go, and things I'm building — or thinking about building.

Loading…
Jun 24, 2026 · 10 min read
Sakana Means Fish. Their New Model Is a School of Them.
Sakana AI built a 7B conductor that orchestrates GPT-5.5, Claude, and Gemini. On GPQA-Diamond it beats Mythos 5. Here is exactly how it works and why the architecture matters.
Sakana AIFugu +7
Jun 20, 2026 · 8 min read
Most Consultants Are Eloquent. Few Are Clear.
EPIC is the communication framework I use in every client session. Four moves: Express WHY, Point first, Illustrate with stories, Close on commitment. No fluff.
communicationconsulting +6
Jun 18, 2026 · 14 min read
CrewAI Thinks in People. LangGraph Thinks in Process.
CrewAI vs LangGraph isn't a capability debate — it's a mental model debate. CrewAI thinks in people. LangGraph in process. Here's which one fits your problem.
CrewAILangGraph +7
Jun 17, 2026 · 18 min read
I Built a WebSocket Server Just to Show Three Progress Messages
Three wrong turns building a LangGraph agent — chained functions, while loops, and WebSockets for progress — and the production fixes that actually ship.
LangGraphLangChain +10
Jun 17, 2026 · 9 min read
'Respond Only With Valid JSON' Fails 1 in 10 Times
Naive JSON prompting fails 5-15% in production. Every major LLM provider ships constrained decoding now — the model cannot violate your schema. Here's how.
constrained decodingstructured output +8
Jun 17, 2026 · 10 min read
A Jailbroken Claude Was Used to Attack Claude
Claude Fable 5 survived 1,000 hours of bug bounty testing. A multi-agent pack hunt bypassed it in 48 hours. Here's why classifiers can't see the whole attack.
LLM securityjailbreak +8
Jun 17, 2026 · 11 min read
The Most Interesting Open Model Right Now Has No Official Benchmarks
Z.ai shipped GLM-5.2: 744B MoE, 1M context, MIT license, zero benchmarks at launch. Here's what the architecture means and why the silence is the real story.
GLM-5.2Zhipu AI +10
May 29, 2026 · 10 min read
AuditForge: Why the Best Configuration Audit Has Zero AI In It
I built a configuration audit tool for Oracle Fusion ERP and deliberately left out the LLMs. Here's why deterministic pipelines win in regulated environments.
AuditForgeDeterministic AI +4
May 29, 2026 · 14 min read
The Agentic Audit: Consulting Firms Are One Compliance Cycle Behind
IT audit has 40 years of maturity. Then agentic AI arrived — LLMs making tool calls with real-world side effects — and the frameworks ran out of answers.
AI AuditAgentic AI +5
May 29, 2026 · 14 min read
Building a Multi-Agent Debate System for Transfer Pricing Defense
India has ₹12+ lakh crore in pending TP disputes at ITAT. A traditional TP study costs ₹20–50 lakh. Aura TP uses adversarial agent ensembles — Inclusion Counsel, Exclusion Counsel, and a Partner Arbitrator — to automate comparable selection review and TPO defense drafting. Here’s the architecture, what breaks, and the honest cost math.
AI AgentsTransfer Pricing +4
May 29, 2026 · 13 min read
AI Governance in the Era of Autonomous Agents
EU AI Act high-risk obligations are live. 40% of enterprise systems can't be classified. Here's what boards, engineers, and deployers actually need to do about it.
AI GovernanceEU AI Act +4
May 29, 2026 · 11 min read
Speech AI in 2026: The Infrastructure Layer for Voice Agents
Voice is the default UI for agentic systems in 2026. Here's the full infrastructure stack: STT engines, TTS providers, WebRTC vs WebSockets, and where your 570ms latency budget actually goes.
Speech AIVoice Agents +5
May 23, 2026 · 10 min read
Picking Your LLM Without the Hype
MMLU is saturated noise. SWE-bench Pro is the real signal. Here's how to actually evaluate models — benchmarks that matter, the fine-tune vs RAG vs few-shot question, and what each model tier is actually for.
LLM evaluationmodel selection +11
May 23, 2026 · 8 min read
Semantic Caching Is the Cheat Code Nobody Talks About
Most LLM teams discover semantic caching too late. Here's how the three-layer architecture works, why the threshold matters more than you think, and what real production hit rates look like.
semantic cachingLLM infrastructure +8
May 23, 2026 · 9 min read
The LLM Deployment Provider Breakdown
Groq vs Bedrock vs Azure OpenAI vs self-hosted vLLM — when each provider actually makes sense, what the compliance tradeoffs look like, and where the $80K/month break-even comes from.
LLM deploymentAWS Bedrock +10
May 23, 2026 · 12 min read
Multi-Agent Systems Are Messier Than You Think
At 95% per-agent reliability, a 10-step sequential chain succeeds 60% of the time. What actually breaks in production multi-agent systems — and what LangGraph, AutoGen, and CrewAI don't tell you.
AI AgentsLangGraph +3
May 23, 2026 · 14 min read
The AI Chatbot Scaling Playbook
What breaks when your AI chatbot hits 1,000 concurrent users — and how to fix it before it does. Stateless design, semantic caching, gateways, and streaming at scale.
AI InfrastructureScaling +3
May 23, 2026 · 13 min read
Why Your RAG Falls Apart at Scale
The real failure rate of RAG in production is 26.4% — not 2.3%. Here's what actually breaks: chunking, retrieval drift, context stuffing, and silent hallucinations. And how to fix it.
RAGVector Database +3
May 20, 2026 · 12 min read
I Made My Website Bookable by Any AI Agent
After Google I/O 2026 announced AP2 and Universal Cart, I spent a weekend making tanmaybohra.com callable by Gemini, Claude, or any MCP-compatible AI. Here's exactly what I built — and where the ecosystem actually stands.
AP2MCP +3
May 19, 2026 · 14 min read
The Golden Database: How Production AI Agents Are Actually Built
Replit tried fine-tuning and abandoned it. The real architecture behind the best AI coding agents is a curated store of high-quality traces, retrieved at inference time. Here's how it works — and when fine-tuning, Agent Skills, or inference-time compute is actually the right answer.
AISystems +1
May 19, 2026 · 18 min read
The Cheapest LLM Call Is the One You Never Make
Sourcegraph got a $1M cloud bill three months after launch. Fixie's agent looped 847 times before anyone noticed. These aren't edge cases — they're the default outcome of shipping agents without treating observability and cost as first-class concerns.
AI AgentsObservability +3
May 13, 2026 · 12 min read
Inngest for Agentic Systems: Durable Execution Without the Infrastructure Tax
A naive 7-step agent with 3% per-step failure has an 80% end-to-end success rate. Inngest's step-level durability raises that to 99.98% — and eliminates the need to hand-build retry logic, human-in-the-loop state machines, or observability tooling from scratch.
AITypeScript +1
May 9, 2026 · 5 min read
A Book Club Changed How I Think About My Work
Developer life stays close to your own team - until an optional hobby group quietly puts you in the same room as people who have been losing sleep over problems you never knew existed.
ReflectionPeople +1
May 9, 2026 · 10 min read
LangGraph in Production: What Nobody Tells You
State machines, retry logic, and the edge cases that make agentic workflows fall apart at scale. Three months of production LangGraph — the parts the docs skip over.
AIPython +1
May 9, 2026 · 8 min read
PostgreSQL Indexing Patterns I Actually Use
Partial indexes, covering indexes, and when BRIN beats everything — built from real EXPLAIN ANALYZE output, not toy examples. The strategies that cut query time from 4.8s to 12ms.
PostgreSQLSystems +1
May 9, 2026 · 7 min read
Zod in Production: Runtime Type Safety Without the Tax
TypeScript's type system vanishes at runtime. Zod bridges that gap — but the naive integration adds overhead you don't need. Here's how we use it at scale without paying the parse tax.
TypeScriptBackend
May 9, 2026 · 9 min read
Go Worker Pools: Handling 10,000 Concurrent Requests on a $20 Server
The naive goroutine-per-request model looks fine until 50k connections hit simultaneously. Here's the worker pool pattern I actually use in production, and why it cut our memory by 80%.
GoSystems +1
← portfolio chat with tanmay ↗