Content
# 🧠 a2a-protocol-demo4: Smart Context Compression

This is a reference implementation project for demonstrating the **LLM Agent context compression algorithm**.
Based on **MCP (Model Context Protocol)** and **A2A communication protocol**, this project builds an intelligent **Context Funnel**, showcasing how to maintain an infinite dialogue flow within a limited Token window through algorithmic strategies.
> **Core Value**: The code illustrates the complete data flow process from "full memory" to "streamlined Prompt," making it an excellent example for learning Agent memory management.
## 🌟 Key Features
This project implements a three-level compression pipeline:
1. **🧩 Semantic Deduplication**
* Utilizes (simulated) vector Embedding to calculate cosine similarity.
* **Effect**: Automatically merges repeated semantic expressions from users (e.g., sending "Hello" multiple times), reducing redundancy.
2. **⚖️ Priority Filtering**
* Based on a `SlotType` weight system.
* **Effect**: When Tokens are insufficient, low-value information (e.g., `TOOL_LOG`) is prioritized for removal, retaining core dialogue.
3. **📝 Abstractive Summarization**
* When physical space is still insufficient, calls (simulated) LLM to generate summaries.
* **Effect**: Compresses old historical records into a single `System Summary`, achieving long-term memory compression.
## 🏗️ Architecture
### The Pipeline
```mermaid
graph TD
Raw[Massive Raw Historical Slots] --> |1. Input| Pipe(Smart Context Funnel)
subgraph "Compression Pipeline"
Pipe --> Step1{Semantic Similarity Detection}
Step1 --> |Similarity > 0.99| Merge[Merge Duplicate Slot]
Step1 --> |Not Similar| Next1
Merge --> Next1
Next1{Are Tokens Over Limit?} --> |Yes| Step2[Priority Filtering]
Step2 --> |Discard Tool Logs| Next2
Next1 --> |No| Output
Next2{Still Over Limit?} --> |Yes| Step3[LLM Summary Generation]
Step3 --> |Compress Old History| Summary[Generate Summary Slot]
Summary --> Output
Next2 --> |No| Output
end
Output[Final Prompt] --> LLM
```
## 📂 Project Structure
```text
a2a-protocol-demo4/
├── algorithms.py # Core algorithm library (Embedding simulation, deduplication, summarization logic)
├── engine.py # Pipeline manager (responsible for assembling algorithm processes)
├── protocol.py # Data model (Slot definitions, weight enumeration)
├── server.py # Agent server (A2A protocol implementation)
├── client.py # Test client (simulates specific scenarios to trigger algorithms)
└── requirements.txt # Dependency list
```
## 🚀 Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Start the Server
The server will run at `http://localhost:8000` and activate the smart compression funnel (limited to 100 tokens for demonstration).
```bash
python server.py
```
### 3. Run the Test Client
The client will automatically execute three test cases to validate the **deduplication**, **filtering**, and **summarization** functionalities.
```bash
python client.py
```
## 🧪 Experimental Observation Guide
After running the client, please pay close attention to the **Debug Info** output in the console:
* **Test Scenario 1 (Semantic Deduplication)**:
* Send two "Hello Server!" messages.
* Expected: Only one appears in the Prompt, triggering `✂️ [Algorithm] Semantic Deduplication`.
* **Test Scenario 2 (Priority Filtering)**:
* Send a large number of `TOOL_LOG` messages.
* Expected: Although written to the database, no Logs appear in the Prompt, triggering `⚠️ Discard Tool Logs`.
* **Test Scenario 3 (Summary Compression)**:
* Send an excessively long text.
* Expected: Old conversations disappear, and `[SYSTEM]: Summary of...` appears, triggering `🤖 [Algorithm] LLM Summary Generation`.
## 📚 Knowledge Notes
### 1. Why Use Embedding?
In `algorithms.py`, we simulate Embedding. In a real production environment, you would use OpenAI's `text-embedding-3-small` or HuggingFace's `SentenceTransformer`. Vectorization allows computers to understand that "Hello" and "您好" are similar, which is the basis for **semantic deduplication**.
### 2. Weight Design of SlotType
In `protocol.py`, we define the enumeration:
```python
class SlotType(str, Enum):
SYSTEM = "system" # Must be retained
USER_INTENT = "user" # Should be retained as much as possible
TOOL_LOG = "tool_log" # Can be discarded at any time
```
This **hierarchical discard strategy** is smarter than a simple "first in, first out (FIFO)" approach, preventing important information from being pushed out of context by irrelevant logs.
### 3. Simulation vs Real
To make this project ready to use out of the box (without an API Key), both `get_mock_embedding` and `llm_compress_to_summary` are simulated implementations. To use it in production, simply replace the implementations of these two functions to connect to a real LLM API.
## License
MIT