Sunday, 19 April 2026

Deploy Production-Ready AI Agent with AWS Bedrock AgentCore & LangGraph


Here is a LangGraph-powered RAG agent with persistent short- and long-term memory, deployed as a containerized runtime on AWS Bedrock AgentCore, that answers user queries by retrieving context from a FAISS vector store and personalizing responses using conversation history across sessions.

Please follow the code and ReadMe for the Implementation 

https://github.com/LeelaPrasadG/Bedrock_lang_RAG_agentcore

Important Points:

  1. From local script to cloud agent in minutes — With agentcore configure + agentcore launch, a LangGraph agent running on your laptop becomes a fully managed, auto-scaled runtime on AWS — no infrastructure code needed.
  2. Two deployment paths, one codebase — The same 01_agentcore_runtime.py can be deployed as a direct code deploy (fast prototyping) or a Docker container (production). One flag switches between them: --deployment-type container.
  3. Memory that actually persists — AgentCoreMemorySaver gives short-term per-thread history; AgentCoreMemoryStore gives long-term semantic memory across sessions. Together they let the agent remember what a user said three conversations ago — something vanilla LLMs can't do.
  4. MicroVM isolation per session — Every unique runtimeSessionId spins up a fresh MicroVM on AWS. This means tenant isolation and clean state without any extra work from the developer.
  5. @app.entrypoint is the only contract — Bedrock AgentCore only cares about one decorator. Everything else — HTTP server, routing, container lifecycle — is handled by the runtime. Your business logic stays clean.
  6. Invoke from anywhere via boto3 — Once deployed, any Python app can call your agent using client.invoke_agent_runtime() — no special SDK, no API Gateway setup, just standard AWS credentials.
  7. Built-in observability out of the box — The auto-generated Dockerfile installs aws-opentelemetry-distro and wraps the entrypoint with opentelemetry-instrument. Traces flow to AWS X-Ray automatically.
  8. FAISS + OpenAI Embeddings for RAG — The FAQ knowledge base uses FAISS for local vector search, keeping retrieval fast and cost-free at runtime. Only the final LLM call incurs API cost.

Referencehttps://www.youtube.com/watch?v=cTBGIKAckKE&t=2193s


Building a ReAct Agent with LangGraph & LangSmith

In this post, I walk through building a ReAct (Reasoning + Acting) agent using LangGraph and Groq's openai/gpt-oss-120b model, where the...