Content
# 🧠 A2A Protocol Demo: Smart Context Memory
> **High-Fidelity Context Management** for LLM Agents.
>
> A Python-based **Smart Context Memory** engineering example that demonstrates a priority-based eviction algorithm and RAG (Retrieval-Augmented Generation) tracing mechanism.
---
## 📖 Introduction
When building complex LLM Agents (especially RAG applications), we face a core pain point: **the Context Window is limited**.
The traditional approach is **FIFO (First In, First Out)**: when the conversation gets too long, the earliest messages are simply truncated. This can lead to severe consequences—an Agent may forget the initially set System Prompt or lose track of key documents queried three turns ago, resulting in a dramatic drop in answer quality.
The problem that **a2a-protocol-demo10** addresses is: **how to retain the most valuable information within a limited Token space?**
It implements a **Smart Memory Core** with the following capabilities:
* ✅ **Priority Eviction**: System Prompts and RAG documents have the highest weight, while casual chat has the lowest. When the window is full, casual chat is prioritized for eviction.
* ✅ **Source Tracking**: Each knowledge Slot carries a `source_id`.
* ✅ **Deduplication**: Prevents the same content from occupying space multiple times.
## 📂 File Structure
| File Name | Description |
| :--- | :--- |
| `protocol.py` | **[Foundation]** Defines an enhanced Slot structure (adding `priority`, `source_id`, `tokens`). |
| `smart_memory.py` | **[Core Algorithm]** Implements the smart sliding window and eviction strategy. |
| `rag_engine.py` | **[Simulation]** A simple vector retrieval library with ID tracing functionality. |
| `server.py` | **[Server]** A FastAPI service integrating Memory and RAG. |
| `client.py` | **[Demo]** Simulates a long conversation script that triggers memory overflow. |
## 🚀 Quick Start
This project does not require an OpenAI API Key and is based on pure Python logic simulation, ready to use out of the box.
### 1. Install Dependencies
```bash
pip install fastapi uvicorn pydantic requests
```
### 2. Start the Server
```bash
python server.py
```
*After starting, the server will listen on port 8000, waiting to process `chat` requests.*
### 3. Run the Demo Client
Open a new terminal window:
```bash
python client.py
```
## 📊 What to Watch
Please pay close attention to the console output while running `client.py`, especially when sending the message **"What are the reimbursement regulations?"**:
1. **Window Overflow**: The total number of Tokens exceeds the set threshold (150).
2. **Smart Decisions**:
* ❌ The system **deleted** the previous casual chat record ("The weather is nice today").
* ✅ The system **retained** the earlier RAG document ("VPN connection methods").
* ✅ The system **retained** the earliest System Prompt.
3. **Result**: Even after multiple rounds of conversation, the Agent can still answer questions about VPNs, proving that key information was not lost.
## 🧩 Core Code Snippets
### Slot Priority Definition (`protocol.py`)
```python
class SlotType(str, Enum):
SYSTEM = "system" # Weight: 10.0 (never deleted)
RAG_DOC = "rag_doc" # Weight: 7.0 (important knowledge)
USER = "user" # Weight: 9.0 (current intent)
AGENT = "agent" # Weight: 5.0 (historical responses)
TOOL_LOG = "tool_log" # Weight: 2.0 (prioritized for eviction)
```
### Eviction Algorithm Logic (`smart_memory.py`)
```python
def _optimize_context(self):
while current_tokens > MAX_TOKEN_WINDOW:
# Identify the Slot with the lowest priority (protect System Prompt)
candidates = [s for s in self.slots if s.type != SlotType.SYSTEM]
victim = min(candidates, key=lambda s: s.priority)
self.slots.remove(victim) # Remove the victim
```
## 📝 License
MIT