Blog
Notes on backend, AI agents, and shipping things that hold up in production.
GPT-5.5 vs Claude Opus 4.7: The SWE-bench War That Defines AI Coding in 2026
GPT-5.5 just edged Claude Opus 4.7 on SWE-bench Verified — 88.7% to 87.6% — while Claude Mythos Preview quietly topped the harder SWE-bench Pro at 77.8%. And xAI's Grok Build entered the race at 70.8% with a model priced 25x cheaper. The real story is not who is winning this week. It is what this race reveals about what AI can and cannot do in a real codebase — and how to choose the right model for your team.
The Multi-Agent Era Is Here: What Anthropic's 2026 Coding Report Means for Every Developer
Anthropic's 2026 Agentic Coding Trends Report reveals a seismic shift: average Claude Code sessions now run 23 minutes (up from 4), touch 47 tool calls, and 78% involve multi-file edits. The verdict is clear — 2025 was the year of AI assistants, 2026 is the year of AI teams. Here's what every developer needs to know.
Claude 'Dreaming' Is Not a Gimmick: How Anthropic's New Agent Memory Fixes Production AI
At Code with Claude 2026 (May 6), Anthropic shipped three features for Claude Managed Agents: Dreaming, Outcomes, and Multiagent Orchestration. Together they tackle the hardest problems in production AI — agents that forget, context windows that overflow, and termination conditions that are never clearly defined. Harvey saw 6x task completion improvement. Wisedocs cut document review time by 50%. Here is what each feature does and why it matters to developers building real systems.
xAI Grok Build: What Arena Mode and 8 Parallel Agents Mean for Your Coding Workflow
xAI launched Grok Build on May 14, 2026 — a CLI coding agent powered by Grok 4.3 that runs up to 8 parallel sub-agents and ranks their outputs automatically with Arena Mode. Here is what the multi-agent competition model means for developer workflows, why the 2M-token context window matters, and where local-first architecture changes the enterprise calculus.
AI Coding Agents Need Managed Workspaces, Not Laptop Sprawl
The hot developer trend on May 17, 2026: coding agents are becoming governed execution systems. The key question is no longer only which model writes code, but where agents run, what they can touch, and how teams review their work.
LangGraph in Production: What the Docs Don't Tell You (AWS Bedrock + Guardrails Edition)
I led the Sales Agent platform at Parcel Perform using LangGraph + AWS Bedrock + Claude Sonnet 4.6. Here's what I actually learned — the failures, the gotchas, and the production decisions that matter in 2026.
Building Production MCP Servers: Lessons from Wiring AI to Real B2B Lead-Gen Pipelines
MCP is the new API layer for AI agents. I built pp-mcp-leadgen — Parcel Perform's B2B lead-gen MCP server. Here's what designing tools for LLM consumption actually requires.
From 20M Records/Day to AI Agents: What Backend Engineers Get Wrong About LLMs
I spent years building 20M+ records/day pipelines, then moved to leading GenAI products. Almost every intuition I'd built needed recalibrating. Here's the mental model mismatch map.