LangGraph in Production: What the Docs Don't Tell You (AWS Bedrock + Guardrails Edition)

I took on the Tech Lead role for Parcel Perform's Sales Agent — an AI sales automation platform built with LangChain + LangGraph + LangSmith on AWS Bedrock, running Claude Sonnet 4.6. With agentic AI no longer a demo but actual business infrastructure, I want to share what I actually learned: the failures, the gotchas, and the architecture decisions that separated a stable production system from a well-written prototype.

Why LangGraph Over Plain LangChain (or Custom Orchestration)

The first question I had to answer was: do we even need LangGraph? We could have wired raw LangChain chains together, or built a custom state machine. We chose LangGraph for three concrete reasons.

First, conditional routing. A sales agent needs branching logic — if the prospect is already in CRM, skip enrichment; if email confidence is low, fall back to manual review. LangGraph's graph edges let you express this cleanly without if-else spaghetti in application code.

Second, checkpointing. LangGraph's built-in checkpointer (we used the Postgres backend) gave us free-of-charge persistence — agents can be interrupted, inspected, and resumed. Critical when a workflow spans multiple API calls and you don't want to rerun expensive steps.

Third, LangSmith integration. You get full trace visualization out of the box. For a production agent that touches Claude Sonnet, Apollo, and our internal APIs, being able to replay a specific run and see exactly which node failed is non-negotiable.

Graph State Design: The Decisions That Actually Matter

The most important architectural decision in LangGraph is your state schema. Get it wrong and you'll be refactoring the whole graph.

Our state held: prospect data, enrichment results, a list of tool calls made, an action queue, and a conversation history. The mistake we made early on was putting too much raw data into state. Each checkpoint serializes the full state — bloated state means slow checkpoints and expensive storage.

The fix: store references (prospect ID, thread ID) in state, not the full objects. Fetch data at node entry time. State should be a control structure, not a data cache.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    prospect_id: str
    thread_id: str
    messages: Annotated[list, add_messages]
    action_queue: list[str]
    last_tool_result: dict | None
    human_review_required: bool

graph = StateGraph(AgentState)
graph.add_node("enrich", enrich_prospect)
graph.add_node("draft_email", draft_email_node)
graph.add_node("review_gate", human_review_node)
graph.add_conditional_edges("enrich", route_after_enrich, {
    "draft": "draft_email",
    "review": "review_gate",
    "end": END,
})

AWS Bedrock + Guardrails Integration

We run Claude Sonnet 4.6 via AWS Bedrock, authenticated with IRSA (IAM Roles for Service Accounts) in Kubernetes — no static credentials, role assumption per pod. The Terraform module for this took about a day to get right but pays dividends in security.

Guardrails were a requirement from day one. We configured: content filters for harmful output, denied topics (we don't want the agent discussing competitor pricing), and PII redaction on the way out. The tricky part is Guardrails add ~200-400ms latency per call. Design your timeout budgets accordingly.

from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-sonnet-4-6",
    region_name="us-east-1",
    model_kwargs={"temperature": 0.3, "max_tokens": 2048},
    guardrail_config={
        "guardrailIdentifier": "your-guardrail-id",
        "guardrailVersion": "DRAFT",
        "trace": "enabled",
    },
)

LangSmith: Observability Is Not Optional

We track every agent run in LangSmith with structured metadata: prospect ID, campaign ID, model version, cost per run. After three months of production traffic, this data is how we tune the system — not intuition.

Concrete example: we noticed a specific tool (Apollo enrichment) had a 15% timeout rate during peak hours. Without LangSmith traces, we would have seen "agent failed" in logs. With it, we saw exactly which node, which API call, and the exact payload. Fixed in an afternoon.

We also log to ELK for cost aggregation across campaigns. LangSmith gives per-run granularity; ELK gives weekly cost dashboards. Both are necessary — they answer different questions.

Real Failure Modes I Hit in Production

1. Tool Call Loops

The agent would enter a loop: call tool A → get ambiguous result → call tool A again → same result → repeat. LangGraph doesn't stop this automatically. Fix: add a tool_call_count field to state and add a max-iterations guard at the router node.

2. State Blowup on Long Threads

Conversation history in state grows unboundedly by default with add_messages. For a sales agent with 20+ back-and-forth turns, this blows up context windows and costs money. Fix: implement a trim_messages step that keeps the last N messages plus a compressed summary of earlier context.

3. Bedrock Rate Limits Under Campaign Load

Running email generation for 5,000 prospects in a campaign window hit Bedrock's TPM limits hard. We added exponential backoff with jitter at the node level and batched workflows with a semaphore. Also: enable Bedrock Provisioned Throughput for predictable capacity if you run campaigns at scale.

When NOT to Use LangGraph

LangGraph is overkill for simple, linear chains. If your workflow is: call LLM → parse output → return result, use plain LangChain or just call the Bedrock API directly. LangGraph's value shows up when you have branching, loops, human-in-the-loop gates, or long-running workflows with state persistence needs.

Also: if your team isn't already familiar with graph-based thinking, the learning curve is real. We spent the first two weeks mostly arguing about state schema design. Budget for that.

Takeaways

LangGraph in production is a different beast from LangGraph in tutorials. The wins are real — clean conditional routing, checkpointing, LangSmith observability — but the gotchas (state bloat, tool loops, rate limits) will find you quickly if you don't design for them upfront. Build observability in from day one, keep state lean, and set explicit failure boundaries. Everything else you can iterate on.