Companies
,
Productivity
,

Scaling AI Memory: How Zep’s Knowledge Graph Enhances Llama 3 Chat History

By Ken Collins

AI
4
min
4
min read time

The Challenge: AI’s Lack of Native Memory

AI models like Meta’s Llama 3 are increasingly being used in enterprise applications, but they have a critical limitation—no built-in memory. Managing chat history is complex, often requiring engineers to build custom solutions or rely on third-party tools. OpenAI’s Thread object has attempted to address this, but it lacks flexibility for those seeking more control over data retrieval and context persistence.

Why AI Memory Matters for Engineering Leaders

For engineering teams building AI-driven applications, effective memory management impacts:

  • User Experience – AI agents that remember context improve engagement and reduce redundancy.
  • Scalability – Poor memory handling leads to higher compute costs and inefficient processing.
  • Data Control – Enterprise applications require transparent and structured knowledge retention beyond API constraints.

Zep’s Solution: Open-Source AI Memory with Graph-Based RAG

I’ve been pretty bullish on the future of AI models like Meta’s Llama 3 series and their inference stack. By making their platform widely accessible, Meta enables competitive innovation while pursuing platform and developer standardization—a natural fit for an industry moving toward small, customizable solutions. Yet, there remains a significant gap in the adoption of these models, leaving OpenAI to dominate with its industry-defining API standards on their own inference stack.

This open-source memory layer integrates with AI models to manage chat history and structure context dynamically. Unlike simple retrieval-based memory solutions, this system uses a knowledge graph approach, powered by Graphiti and Neo4j, to establish entity relationships and improve recall accuracy.

Key Differentiators of Zep’s Memory Layer:

  • Graph-Based RAG (Retrieval-Augmented Generation): Enables precise and structured memory retrieval, improving AI responses.
  • Session-Based Memory Persistence: Supports multi-user and multi-session environments, ideal for enterprise applications.
  • Integration with Any AI Stack: Works independently of specific frameworks or inference engines.
  • Structured Data Handling: Converts unstructured chat data into structured JSON for better downstream use.

Personal Experience with Zep

Personal AI has been a long-running project I’ve been exploring. I have several AI experiments that require long-form memory with the ability to continuously learn from Notion, synthesize knowledge, and maybe even one day execute tasks on my behalf.

Last month, I came across This foundational memory layer and agreed to do this sponsored article. It turned out to be exactly what I needed for my projects. Beyond offering memory, it’s built on a temporal reasoning layer powered by knowledge graphs. Best of all, it’s entirely open-source under a project called Graphiti, which leverages Neo4j.

Implementation: How Zep Works

For companies deploying AI models in production, integrating skilled AI engineers into teams can be just as important as choosing the right tools. At Torc, we’ve seen how access to specialized AI talent accelerates adoption of solutions like Zep and helps teams build more efficient memory architectures.

The platform can be integrated into AI workflows with minimal setup. Here’s how:

  1. Create a User and Session: Establish unique identifiers for users and chat sessions.
  2. Enable Memory Retrieval: Store and retrieve past interactions dynamically.
  3. Leverage Graph-Based Context: Improve AI responses by structuring chat history as interconnected entities.
  4. Deploy in Production: Use Zep’s cloud offering or self-hosted version for maximum control.

A simple TypeScript integration via the Vercel AI SDK allows for quick adoption, making it easy for teams to add long-term memory capabilities without overhauling existing infrastructure.

Why CTOs and Engineering Leaders Should Care

Scaling AI capabilities isn’t just about picking the right tools—it’s about having the right engineering talent to implement them effectively. With platforms like Zep offering flexible, open-source memory solutions, companies that leverage specialized AI engineers can optimize performance while staying agile. For companies deploying AI at scale, Zep provides:

  • A more flexible alternative to OpenAI’s Thread object for context management.
  • An open-source, vendor-neutral approach to AI memory, reducing platform lock-in.
  • Better control over AI context and structured outputs for enterprise-grade applications.

Exploring Zep’s Knowledge Graph

This solution builds a knowledge graph to create a comprehensive view of the user’s world, capturing entities and their relationships. Let’s start by focusing on adding "Relevant Memory" and "Chat Messages" into our example project.

The system allows users to have multiple sessions contributing to the same shared memory. You can also connect multiple users as a group under one or even several sessions. Group-based sessions enable agents to access shared knowledge, such as documentation or group chats. Additionally, users with multiple sessions can maintain continuity in chats, even when logging out and back into an AI platform. It feels like Zep has all use cases covered.

The context management process describes how this platform handles memory when messages are sent. Zep dynamically creates relevant facts and entities using Graphiti’s knowledge graph. Each message from the user or response from the assistant updates the graph, enriching the memory’s context.

Final Thoughts

For engineering leaders exploring AI memory solutions, the right talent makes all the difference. If your team is scaling AI capabilities and looking for highly skilled engineers who understand retrieval-based memory and knowledge graphs, platforms like Torc can help accelerate deployment. Explore Zep, test its capabilities, and consider how the right team can drive even more value. Engineering leaders looking for an efficient, scalable way to manage AI memory should explore this solution without depending on rigid third-party APIs.

Share

Be part of our community!

Contact us for further information