Content
# 🧠 A2A Protocol Demo: Smart Context Memory
> **High-Fidelity Context Management**
> Solving the core architectural issue of LLM's "only 7 seconds of memory."
## 📖 Project Background
One of the biggest challenges in building enterprise-level Agents is the limitation of the **Context Window**.
The traditional "First-In-First-Out" (FIFO) strategy is quite foolish: it often deletes the most important **System Prompt** or **RAG retrieved documents**, just to keep the most recent "thank you."
This project demonstrates a **Smart Context Core**, which no longer simply stores conversations but manages Context like an operating system manages memory.
### ✨ Core Features
1. **📊 Priority-based Eviction**:
* The system **never** deletes the System Prompt when Token overflow occurs.
* It prioritizes deleting "chitchat responses" or "tool logs."
* Algorithm: `Evict Victim = min(Slot.priority)`
2. **🔗 Source Tracking**:
* Each knowledge Slot carries a `source_id`, allowing the LLM to know the origin of the information.
3. **🗑️ Semantic Deduplication**:
* Prevents the same RAG document from occupying memory space multiple times.
4. **📈 Real-time Snapshot**:
* Clients can monitor the server's memory status and Token usage in real-time.
## 📂 Project Structure
```text
a2a-protocol-demo/
├── protocol.py # [Protocol Layer] Defines Slot structure (adds priority, source_id)
├── smart_memory.py # [Core] Memory eviction and window maintenance algorithms
├── rag_engine.py # [Simulation] Knowledge base retriever
├── server.py # [Server] A2A Agent interface
├── client.py # [Client] Simulates conversation flow and test scripts
└── requirements.txt # Dependency list
```
## 🛠️ Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Start the Server
```bash
python server.py
```
*The server will start at `http://localhost:8000`, initially containing a high-priority System Slot.*
### 3. Run the Test Client
```bash
python client.py
```
## 🔬 Experimental Observation Guide
After running `client.py`, please pay close attention to the console output to experience the behavior of smart memory:
| Step | Action | Expected Memory Behavior |
| :--- | :--- | :--- |
| **1. Ask about VPN** | Trigger RAG | **Stored**. Contains System(P:10), User(P:9), RAG(P:7), Agent(P:5). |
| **2. Chitchat** | Normal conversation | **Stored**. Memory gradually fills up. |
| **3. Ask about reimbursement** | **RAG returns long text** | **💥 Trigger overflow!**<br>The system will automatically calculate and prioritize **deleting** the Agent response from step 2 (P:5), **keeping** the RAG document from step 1 (P:7). |
| **4. Ask about VPN again** | Repeat question | **🚫 Refusal to enter**. The deduplication mechanism finds the content already exists, saving space. |
## 🧩 Core Code Analysis
### Slot Priority Definition (`protocol.py`)
We assign different "survival weights" to different types of information:
```python
class SlotType(str, Enum):
SYSTEM = "system" # 👑 P:10.0 (never deleted)
USER = "user" # 🥇 P:9.0 (user intent is important)
RAG_DOC = "rag_doc" # 🥈 P:7.0 (knowledge base is the basis for answers)
AGENT = "agent" # 🥉 P:5.0 (older historical responses are least important)
TOOL_LOG = "tool_log" # 💀 P:2.0 (debug logs, can be deleted anytime)
```
### Eviction Logic (`smart_memory.py`)
```python
def _optimize_context(self):
while current_tokens > MAX_WINDOW:
# 1. Protect System Slot
candidates = [s for s in self.slots if s.type != SlotType.SYSTEM]
# 2. Find the least useful Slot
victim = min(candidates, key=lambda s: s.priority)
# 3. Remove
self.slots.remove(victim)
```
## 📝 Learning Notes
* **The Nature of AI Memory**: AI is stateless; what we call "memory" is actually a segment of text (Prompt) that we manually concatenate with each request.
* **Engineering Value**: If you are working on enterprise-level RAG, **Memory Management** is a more fundamental core competency than Prompt Engineering.
* **A2A Protocol**: By standardizing the `Slot` object, we can transfer this advanced memory with "weights" and "sources" between different Agents, rather than just passing strings.
## License
MIT