n8n 2.0: Build AI agents with persistent memory
n8n 2.0 shipped persistent agent memory, sandboxed code execution, and 70+ AI nodes. Here is how to use them in production.
We've spent the last 11 months shipping voice agent deployments for coaches, consultants, fintech, real estate, and a handful of edge cases. Ninety-six in production. Here's what we've learned about what actually works in 2026.
1. The model isn't the bottleneck anymore
GPT-4o-realtime, Claude 3.5 Sonnet voice, and the open-source equivalents are good enough for 92% of production scenarios. Telephony latency, audio processing pipelines, and prompt routing are now the failure modes not LLM quality.
If your agent feels janky, audit your audio path before you audit your prompts. Eight times out of ten, that's where the friction lives.
"The agents that work feel like infrastructure. The agents that fail feel like party tricks."
2. Voice ≠ chatbot with audio
Every team that tries to port their chatbot prompt to voice fails the same way: too verbose, too formal, too explainer-y. Voice is improv. You need shorter turns, callback handles, and graceful interruption.
3. The handoff is the product
The best voice agent in the world is useless if the post-call sync is broken. Notes go to CRM. CRM triggers sequence. Sequence books follow-up. Calendar invites human. That is the system. The voice piece is one component.
If you want to see a live example, our AI calling system is running in production for loan servicing and collections you can see the real numbers on the case studies page.
n8n 2.0 shipped in January 2026 with three features that change how you build AI agents: persistent memory, sandboxed code execution, and native LangChain integration with 70+ AI nodes. If you have been treating n8n as a simple webhook router, it is time to revisit that assumption.
This is a practical walkthrough of what those features mean, how to wire them up, and where they break. We have been running n8n 2.0 in production workflows for six months and these are the patterns that actually hold up.
What changed in n8n 2.0
The 1.x line treated AI as an add-on: you called an HTTP node to hit an LLM endpoint and parsed the response yourself. Fragile, verbose, and impossible to debug at scale. n8n 2.0 redesigned the execution model around agent-native concepts.
- Persistent agent memory: Agents now have a memory layer that survives across executions. A conversation that started on Monday still has full context on Wednesday, without you manually passing history through workflow variables.
- Sandboxed code execution: The Code node runs in an isolated V8 context. No more worrying about a runaway script leaking state into adjacent executions. You can also install npm packages per-sandbox now.
- 70+ AI nodes via LangChain: Vector stores, embeddings, chat models, document loaders, and output parsers are all first-class nodes. You drag them in and connect them like any other node, no Python bridge required.
Setting up a persistent memory agent
The memory layer in n8n 2.0 uses a pluggable store. Out of the box you get in-memory (wiped on restart), Postgres, Redis, and Supabase. For any production use, pick Postgres or Supabase. Here is the minimal setup:
- Add an AI Agent node to your workflow.
- In the Agent node settings, open the Memory panel and select Postgres Chat Memory.
- Set a
sessionIdexpression. This is the key n8n uses to scope memory. Typically you want{{ $json.userId }}or{{ $json.threadId }}so each user or conversation has isolated history. - Set a
windowSize: how many past messages to include in each prompt. 20 is a safe starting point. Higher means more context but higher token cost per call.
That is it. The agent will now read prior turns from Postgres before generating each response and write the new turn back after. The memory is queryable, auditable, and survives restarts.
The sessionId is everything
The single most common mistake we see: setting a static sessionId like "my-agent" instead of a dynamic per-user value. Every user ends up sharing one memory pool, which means the agent hallucinates context from completely unrelated conversations. Always scope sessionId to the entity whose context you want to preserve.
Sandboxed code execution in practice
The new Code node lets you run arbitrary JavaScript in an isolated sandbox. The practical upside: you can do data transformations, API calls with custom auth headers, and business logic that would take five n8n nodes in a single readable block of code.
Two things to know before you lean on it heavily:
- Execution timeout is 10 seconds by default. Long-running computations will hard-fail. If you need async work, split it into separate executions connected by a queue node.
- npm packages are per-execution, not cached. If you install a package inside the sandbox, it is installed fresh on every run. Keep your dependency footprint small or the cold-start cost will show up in your execution times.
Building a multi-step agent with 70+ AI nodes
The LangChain node library is where n8n 2.0 pulls away from Make and Zapier for serious AI work. You can build a full RAG pipeline purely with drag-and-drop nodes:
- Document Loader node pulls from Google Drive, Notion, S3, or a URL.
- Text Splitter chunks it with configurable overlap.
- Embeddings node generates vectors using OpenAI, Cohere, or a self-hosted model.
- Vector Store node upserts to Pinecone, Qdrant, or Postgres pgvector.
- AI Agent node with a Vector Store Retriever tool attached handles the query side.
The whole pipeline is visual, versioned in n8n, and testable node-by-node. No Python, no infrastructure code. We have used this exact pattern for a client knowledge base agent that now handles support queries across 14,000 documents, with the agent citing sources in its responses.
Where n8n 2.0 still has rough edges
Being honest about the limitations matters if you are choosing a platform for a production system:
- Long-running workflows above 15 minutes will timeout on the cloud tier. Self-hosted gives you control over the execution timeout, which is one strong reason to run your own instance for heavy workloads.
- Memory window truncation is not smart. n8n slices the last N messages by count, not by semantic relevance. For long-context use cases you will want to add a summarization step to compress old turns before they fall out of the window.
- Error handling in AI nodes is still shallow. If the LLM returns a malformed response or hits a rate limit, the workflow errors in a way that requires manual inspection. Building robust retry logic around AI nodes takes more wiring than it should.
None of these are blockers. They are just things to design around rather than discover in production at 2am.
When to use n8n 2.0 for AI agents vs. a coded agent framework
n8n 2.0 is the right choice when the agent is one component in a larger business automation: it triggers on a webhook, does some AI reasoning, writes to a CRM, sends a Slack message, and closes. The visual graph is genuinely valuable for debugging and handing off to non-engineers.
If your agent needs complex multi-agent orchestration, dynamic tool generation, or sub-second latency, you are better served by a coded framework like LangGraph or Crew. The visual layer that makes n8n easy to use also adds overhead you cannot opt out of.
For the typical operations or sales workflow automation use case, n8n 2.0 is the fastest path from idea to production-grade agent. We have shipped over 100 systems on it and the new memory and code primitives have removed the last category of workarounds we used to need.
If you want this built for your business, book a 20-minute call with Nexica AI. We build production-grade AI systems in 14 days.