Here is a LangGraph-powered RAG agent with persistent short- and long-term memory, deployed as a containerized runtime on AWS Bedrock AgentCore, that answers user queries by retrieving context from a FAISS vector store and personalizing responses using conversation history across sessions.
Please follow the code and ReadMe for the Implementation
https://github.com/LeelaPrasadG/Bedrock_lang_RAG_agentcore
Important Points:
- From local script to cloud agent in minutes — With agentcore configure + agentcore launch, a LangGraph agent running on your laptop becomes a fully managed, auto-scaled runtime on AWS — no infrastructure code needed.
- Two deployment paths, one codebase — The same 01_agentcore_runtime.py can be deployed as a direct code deploy (fast prototyping) or a Docker container (production). One flag switches between them: --deployment-type container.
- Memory that actually persists — AgentCoreMemorySaver gives short-term per-thread history; AgentCoreMemoryStore gives long-term semantic memory across sessions. Together they let the agent remember what a user said three conversations ago — something vanilla LLMs can't do.
- MicroVM isolation per session — Every unique runtimeSessionId spins up a fresh MicroVM on AWS. This means tenant isolation and clean state without any extra work from the developer.
- @app.entrypoint is the only contract — Bedrock AgentCore only cares about one decorator. Everything else — HTTP server, routing, container lifecycle — is handled by the runtime. Your business logic stays clean.
- Invoke from anywhere via boto3 — Once deployed, any Python app can call your agent using client.invoke_agent_runtime() — no special SDK, no API Gateway setup, just standard AWS credentials.
- Built-in observability out of the box — The auto-generated Dockerfile installs aws-opentelemetry-distro and wraps the entrypoint with opentelemetry-instrument. Traces flow to AWS X-Ray automatically.
- FAISS + OpenAI Embeddings for RAG — The FAQ knowledge base uses FAISS for local vector search, keeping retrieval fast and cost-free at runtime. Only the final LLM call incurs API cost.
Reference: https://www.youtube.com/watch?v=cTBGIKAckKE&t=2193s