<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Tanmay Bohra's Blog</title>
    <link>https://blogs.tanmaybohra.com</link>
    <description>Notes on systems, Go, TypeScript, and agentic AI — by Tanmay Bohra.</description>
    <language>en-IN</language>
    <managingEditor>hello@tanmaybohra.com (Tanmay Bohra)</managingEditor>
    <webMaster>hello@tanmaybohra.com (Tanmay Bohra)</webMaster>
    <atom:link href="https://blogs.tanmaybohra.com/rss.xml" rel="self" type="application/rss+xml" />
    <lastBuildDate>Sat, 23 May 2026 15:08:35 GMT</lastBuildDate>
    <ttl>1440</ttl>
  <item>
    <title>Picking Your LLM Without the Hype</title>
    <link>https://blogs.tanmaybohra.com/posts/picking-your-llm-without-the-hype/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/picking-your-llm-without-the-hype/</guid>
    <description>MMLU is saturated noise. SWE-bench Pro is the real signal. Here's how to actually evaluate models — benchmarks that matter, the fine-tune vs RAG vs few-shot question, and what each model tier is actually for.</description>
    <pubDate>Sat, 23 May 2026 14:45:37 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>LLM evaluation</category><category>model selection</category><category>SWE-bench</category><category>MMLU</category><category>RAG vs fine-tuning</category><category>few-shot prompting</category><category>Claude</category><category>GPT-4o</category><category>Gemini</category><category>Llama</category><category>TTFT</category><category>context window</category><category>AI engineering</category>
  </item>
  <item>
    <title>Semantic Caching Is the Cheat Code Nobody Talks About</title>
    <link>https://blogs.tanmaybohra.com/posts/semantic-caching-llm-cheat-code/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/semantic-caching-llm-cheat-code/</guid>
    <description>Most LLM teams discover semantic caching too late. Here's how the three-layer architecture works, why the threshold matters more than you think, and what real production hit rates look like.</description>
    <pubDate>Sat, 23 May 2026 14:45:21 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>semantic caching</category><category>LLM infrastructure</category><category>RAG optimization</category><category>GPTCache</category><category>Redis LangCache</category><category>vector search</category><category>cosine similarity</category><category>LLM cost reduction</category><category>prompt caching</category><category>chatbot performance</category>
  </item>
  <item>
    <title>The LLM Deployment Provider Breakdown</title>
    <link>https://blogs.tanmaybohra.com/posts/llm-deployment-provider-breakdown/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/llm-deployment-provider-breakdown/</guid>
    <description>Groq vs Bedrock vs Azure OpenAI vs self-hosted vLLM — when each provider actually makes sense, what the compliance tradeoffs look like, and where the $80K/month break-even comes from.</description>
    <pubDate>Sat, 23 May 2026 14:37:43 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>LLM deployment</category><category>AWS Bedrock</category><category>Azure OpenAI</category><category>Groq</category><category>vLLM</category><category>Together.ai</category><category>Fireworks.ai</category><category>Vertex AI</category><category>self-hosting</category><category>HIPAA compliance</category><category>inference infrastructure</category><category>ML ops</category>
  </item>
  <item>
    <title>Multi-Agent Systems Are Messier Than You Think</title>
    <link>https://blogs.tanmaybohra.com/posts/multi-agent-systems-production-reality/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/multi-agent-systems-production-reality/</guid>
    <description>At 95% per-agent reliability, a 10-step sequential chain succeeds 60% of the time. What actually breaks in production multi-agent systems — and what LangGraph, AutoGen, and CrewAI don't tell you.</description>
    <pubDate>Sat, 23 May 2026 14:33:17 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI Agents</category><category>LangGraph</category><category>Multi-Agent</category><category>Production</category><category>LLM Infrastructure</category>
  </item>
  <item>
    <title>The AI Chatbot Scaling Playbook</title>
    <link>https://blogs.tanmaybohra.com/posts/ai-chatbot-scaling-playbook/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/ai-chatbot-scaling-playbook/</guid>
    <description>What breaks when your AI chatbot hits 1,000 concurrent users — and how to fix it before it does. Stateless design, semantic caching, gateways, and streaming at scale.</description>
    <pubDate>Sat, 23 May 2026 10:01:02 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI Infrastructure</category><category>Scaling</category><category>LLM Gateway</category><category>ECS</category><category>Redis</category>
  </item>
  <item>
    <title>Why Your RAG Falls Apart at Scale</title>
    <link>https://blogs.tanmaybohra.com/posts/rag-falls-apart-at-scale/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/rag-falls-apart-at-scale/</guid>
    <description>The real failure rate of RAG in production is 26.4% — not 2.3%. Here's what actually breaks: chunking, retrieval drift, context stuffing, and silent hallucinations. And how to fix it.</description>
    <pubDate>Sat, 23 May 2026 10:00:30 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>RAG</category><category>Vector Database</category><category>AI Infrastructure</category><category>Hybrid Search</category><category>Production</category>
  </item>
  <item>
    <title>I Made My Website Bookable by Any AI Agent</title>
    <link>https://blogs.tanmaybohra.com/posts/mcp-ap2-bookable-by-ai/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/mcp-ap2-bookable-by-ai/</guid>
    <description>After Google I/O 2026 announced AP2 and Universal Cart, I spent a weekend making tanmaybohra.com callable by Gemini, Claude, or any MCP-compatible AI. Here's exactly what I built — and where the ecosystem actually stands.</description>
    <pubDate>Wed, 20 May 2026 18:49:40 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AP2</category><category>MCP</category><category>AI Agents</category><category>Razorpay</category><category>FastAPI</category>
  </item>
  <item>
    <title>The Golden Database: How Production AI Agents Are Actually Built</title>
    <link>https://blogs.tanmaybohra.com/posts/the-golden-database/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/the-golden-database/</guid>
    <description>Replit tried fine-tuning and abandoned it. The real architecture behind the best AI coding agents is a curated store of high-quality traces, retrieved at inference time. Here's how it works â€” and when fine-tuning, Agent Skills, or inference-time compute is actually the right answer.</description>
    <pubDate>Tue, 19 May 2026 15:01:16 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI</category><category>Systems</category><category>LLM</category>
  </item>
  <item>
    <title>The Cheapest LLM Call Is the One You Never Make</title>
    <link>https://blogs.tanmaybohra.com/posts/llm-agent-observability-cost/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/llm-agent-observability-cost/</guid>
    <description>Sourcegraph got a $1M cloud bill three months after launch. Fixie's agent looped 847 times before anyone noticed. These aren't edge cases — they're the default outcome of shipping agents without treating observability and cost as first-class concerns.</description>
    <pubDate>Tue, 19 May 2026 04:41:54 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI Agents</category><category>Observability</category><category>Cost Optimization</category><category>LLM</category><category>Engineering</category>
  </item>
  <item>
    <title>Inngest for Agentic Systems: Durable Execution Without the Infrastructure Tax</title>
    <link>https://blogs.tanmaybohra.com/posts/inngest-agentic-architecture/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/inngest-agentic-architecture/</guid>
    <description>A naive 7-step agent with 3% per-step failure has an 80% end-to-end success rate. Inngest's step-level durability raises that to 99.98% — and eliminates the need to hand-build retry logic, human-in-the-loop state machines, or observability tooling from scratch.</description>
    <pubDate>Wed, 13 May 2026 17:58:46 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI</category><category>TypeScript</category><category>Systems</category>
  </item>
  <item>
    <title>A Book Club Changed How I Think About My Work</title>
    <link>https://blogs.tanmaybohra.com/posts/ssn-book-club/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/ssn-book-club/</guid>
    <description>Developer life stays close to your own team - until an optional hobby group quietly puts you in the same room as people who have been losing sleep over problems you never knew existed.</description>
    <pubDate>Sat, 09 May 2026 09:23:57 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>Reflection</category><category>People</category><category>Work</category>
  </item>
  <item>
    <title>LangGraph in Production: What Nobody Tells You</title>
    <link>https://blogs.tanmaybohra.com/posts/langgraph-production/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/langgraph-production/</guid>
    <description>State machines, retry logic, and the edge cases that make agentic workflows fall apart at scale. Three months of production LangGraph — the parts the docs skip over.</description>
    <pubDate>Sat, 09 May 2026 09:22:43 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>AI</category><category>Python</category><category>Systems</category>
  </item>
  <item>
    <title>PostgreSQL Indexing Patterns I Actually Use</title>
    <link>https://blogs.tanmaybohra.com/posts/postgresql-indexing-patterns/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/postgresql-indexing-patterns/</guid>
    <description>Partial indexes, covering indexes, and when BRIN beats everything — built from real EXPLAIN ANALYZE output, not toy examples. The strategies that cut query time from 4.8s to 12ms.</description>
    <pubDate>Sat, 09 May 2026 09:22:42 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>PostgreSQL</category><category>Systems</category><category>SQL</category>
  </item>
  <item>
    <title>Zod in Production: Runtime Type Safety Without the Tax</title>
    <link>https://blogs.tanmaybohra.com/posts/zod-runtime-validation/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/zod-runtime-validation/</guid>
    <description>TypeScript's type system vanishes at runtime. Zod bridges that gap — but the naive integration adds overhead you don't need. Here's how we use it at scale without paying the parse tax.</description>
    <pubDate>Sat, 09 May 2026 09:22:42 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>TypeScript</category><category>Backend</category>
  </item>
  <item>
    <title>Go Worker Pools: Handling 10,000 Concurrent Requests on a $20 Server</title>
    <link>https://blogs.tanmaybohra.com/posts/go-worker-pools/</link>
    <guid isPermaLink="true">https://blogs.tanmaybohra.com/posts/go-worker-pools/</guid>
    <description>The naive goroutine-per-request model looks fine until 50k connections hit simultaneously. Here's the worker pool pattern I actually use in production, and why it cut our memory by 80%.</description>
    <pubDate>Sat, 09 May 2026 09:22:41 GMT</pubDate>
    <author>hello@tanmaybohra.com (Tanmay Bohra)</author>
    <category>Go</category><category>Systems</category><category>Concurrency</category>
  </item>
  </channel>
</rss>