Blog

Notes on backend, AI agents, and shipping things that hold up in production.

May 20, 2026

GPT-5.5 vs Claude Opus 4.7: The SWE-bench War That Defines AI Coding in 2026

GPT-5.5 just edged Claude Opus 4.7 on SWE-bench Verified — 88.7% to 87.6% — while Claude Mythos Preview quietly topped the harder SWE-bench Pro at 77.8%. And xAI's Grok Build entered the race at 70.8% with a model priced 25x cheaper. The real story is not who is winning this week. It is what this race reveals about what AI can and cannot do in a real codebase — and how to choose the right model for your team.

AI AgentsDeveloper ToolsGenAI
May 19, 2026

The Multi-Agent Era Is Here: What Anthropic's 2026 Coding Report Means for Every Developer

Anthropic's 2026 Agentic Coding Trends Report reveals a seismic shift: average Claude Code sessions now run 23 minutes (up from 4), touch 47 tool calls, and 78% involve multi-file edits. The verdict is clear — 2025 was the year of AI assistants, 2026 is the year of AI teams. Here's what every developer needs to know.

AI AgentsDeveloper ToolsGenAI
May 18, 2026

Claude 'Dreaming' Is Not a Gimmick: How Anthropic's New Agent Memory Fixes Production AI

At Code with Claude 2026 (May 6), Anthropic shipped three features for Claude Managed Agents: Dreaming, Outcomes, and Multiagent Orchestration. Together they tackle the hardest problems in production AI — agents that forget, context windows that overflow, and termination conditions that are never clearly defined. Harvey saw 6x task completion improvement. Wisedocs cut document review time by 50%. Here is what each feature does and why it matters to developers building real systems.

AI AgentsDeveloper ToolsGenAI
May 17, 2026

xAI Grok Build: What Arena Mode and 8 Parallel Agents Mean for Your Coding Workflow

xAI launched Grok Build on May 14, 2026 — a CLI coding agent powered by Grok 4.3 that runs up to 8 parallel sub-agents and ranks their outputs automatically with Arena Mode. Here is what the multi-agent competition model means for developer workflows, why the 2M-token context window matters, and where local-first architecture changes the enterprise calculus.

AI AgentsDeveloper ToolsGenAI
May 17, 2026

AI Coding Agents Need Managed Workspaces, Not Laptop Sprawl

The hot developer trend on May 17, 2026: coding agents are becoming governed execution systems. The key question is no longer only which model writes code, but where agents run, what they can touch, and how teams review their work.

AI AgentsGenAIBackendDeveloper Tools
May 10, 2026

LangGraph in Production: What the Docs Don't Tell You (AWS Bedrock + Guardrails Edition)

I led the Sales Agent platform at Parcel Perform using LangGraph + AWS Bedrock + Claude Sonnet 4.6. Here's what I actually learned — the failures, the gotchas, and the production decisions that matter in 2026.

LangGraphAWS BedrockAI AgentsGenAI
May 7, 2026

Building Production MCP Servers: Lessons from Wiring AI to Real B2B Lead-Gen Pipelines

MCP is the new API layer for AI agents. I built pp-mcp-leadgen — Parcel Perform's B2B lead-gen MCP server. Here's what designing tools for LLM consumption actually requires.

MCPAI AgentsGenAIBackend
May 3, 2026

From 20M Records/Day to AI Agents: What Backend Engineers Get Wrong About LLMs

I spent years building 20M+ records/day pipelines, then moved to leading GenAI products. Almost every intuition I'd built needed recalibrating. Here's the mental model mismatch map.

AI AgentsBackendGenAI