Content
# mock-llm
[](https://badge.fury.io/js/@dwmkerr%2Fmock-llm)
[](https://codecov.io/gh/dwmkerr/mock-llm)
[](#contributors)
Simple OpenAI compatible Mock API server. Useful for deterministic testing of AI applications.
## Introduction
Creating integration tests for AI applications that rely on LLMs can be challenging due to costs, the complexity of response structures and the non-deterministic nature of LLMs. Mock LLM runs as a simple 'echo' server that responses to a user message.
The server can be configured to provide different responses based on the input, which can be useful for testing error scenarios, different payloads, etc. It is currently designed to mock the [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions) but could be extended to mock the list models APIs, responses APIs, A2A apis and so on in the future.
<!-- vim-markdown-toc GFM -->
- [Quickstart](#quickstart)
- [Configuration](#configuration)
- [Customising Responses](#customising-responses)
- [Loading Configuration Files](#loading-configuration-files)
- [Updating Configuration](#updating-configuration)
- [Health & Readiness Checks](#health--readiness-checks)
- [Template Variables](#template-variables)
- [Sequential Responses](#sequential-responses)
- [Streaming Configuration](#streaming-configuration)
- [MCP (Model Context Protocol) Mocking](#mcp-model-context-protocol-mocking)
- [A2A (Agent to Agent Protocol) Mocking](#a2a-agent-to-agent-protocol-mocking)
- [Deploying to Kubernetes with Helm](#deploying-to-kubernetes-with-helm)
- [Examples](#examples)
- [Developer Guide](#developer-guide)
- [Samples](#samples)
- [Contributors](#contributors)
<!-- vim-markdown-toc -->
## Quickstart
Install and run:
```bash
npm install -g mock-llm
mock-llm
```
Mock-LLM runs on port 6556 (which is dial-pad code 6556, to avoid conflicts with
common ports).
Or use Docker:
```bash
docker run -p 6556:6556 ghcr.io/dwmkerr/mock-llm
```
Or [use Helm](#deploying-to-kubernetes-with-helm) for Kubernetes deployments.
Test with curl. The default rule for incoming requests is to reply with the user's exact message:
```bash
curl -X POST http://localhost:6556/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
Response:
```json
{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"model": "gpt-4",
"choices": [{
"message": {
"role": "assistant",
"content": "Hello"
},
"finish_reason": "stop"
}]
}
```
Mock LLM also has basic support for the [A2A (Agent-to-Agent) protocol](docs/a2a.md) for testing agent messages, task, and asynchronous operations.
## Configuration
Responses are configured using a `yaml` file loaded from `mock-llm.yaml` in the current working directory. Rules are evaluated in order - last match wins.
The default configuration echoes the last user message:
```yaml
rules:
# Default echo rule
- path: "/v1/chat/completions"
# The JMESPath expression '@' always matches.
match: "@"
response:
status: 200
content: |
{
"id": "chatcmpl-{{timestamp}}",
"object": "chat.completion",
"model": "{{jmes request body.model}}",
"choices": [{
"message": {
"role": "assistant",
"content": "{{jmes request body.messages[-1].content}}"
},
"finish_reason": "stop"
}]
}
```
### Customising Responses
[JMESPath](https://jmespath.org/) is a query language for JSON used to match incoming requests and extract values for responses.
This returns a fixed message for `hello` and simulates a `401` error for `error-401`, and simulates `v1/models`:
```yaml
rules:
# Fixed message when input contains 'hello':
- path: "/v1/chat/completions"
match: "contains(body.messages[-1].content, 'hello')"
response:
status: 200
content: |
{
"choices": [{
"message": {
"role": "assistant",
"content": "Hi there! How can I help you today?"
},
"finish_reason": "stop"
}]
}
# Realistic OpenAI 401 if the input contains `error-401`:
- path: "/v1/chat/completions"
match: "contains(body.messages[-1].content, 'error-401')"
response:
status: 401
content: |
{
"error": {
"message": "Incorrect API key provided.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
# List models endpoint
- path: "/v1/models"
# The JMESPath expression '@' always matches.
match: "@"
response:
status: 200
# Return a set of models.
content: |
{
"data": [
{"id": "gpt-4", "object": "model"},
{"id": "gpt-3.5-turbo", "object": "model"}
]
}
```
### Loading Configuration Files
The `--config` parameter can be used for a non-default location:
```bash
# Use the '--config' parameter directly...
mock-llm --config /tmp/myconfig.yaml
# ...mount a config file from the working directory for mock-llm in docker.
docker run -v $(pwd)/mock-llm.yaml:/app/mock-llm.yaml -p 6556:6556 ghcr.io/dwmkerr/mock-llm
```
### Updating Configuration
Configuration can be updated at runtime via the `/config` endpoint: `GET` returns current config (JSON by default, YAML with `Accept: application/x-yaml`), `POST` replaces it, `PATCH` merges updates, `DELETE` resets to default. Both `POST` and `PATCH` accept JSON (`Content-Type: application/json`) or YAML (`Content-Type: application/x-yaml`).
### Health & Readiness Checks
```bash
curl http://localhost:6556/health
# {"status":"healthy"}
curl http://localhost:6556/ready
# {"status":"ready"}
```
### Template Variables
Available in response content templates:
- `{{jmes request <query>}}` - Query the request object using [JMESPath](https://jmespath.org/):
- `request.body` - Request body (e.g., `body.model`, `body.messages[-1].content`)
- `request.headers` - HTTP headers, lowercase (e.g., `headers.authorization`)
- `request.method` - HTTP method (e.g., `POST`)
- `request.path` - Request path (e.g., `/v1/chat/completions`)
- `request.query` - Query parameters (e.g., `query.apikey`)
- `{{timestamp}}` - Current time in milliseconds
Objects and arrays are automatically JSON-stringified. Primitives are returned as-is.
```yaml
"model": "{{jmes request body.model}}" // "gpt-4"
"message": {{jmes request body.messages[0]}} // {"role":"system","content":"..."}
"auth": "{{jmes request headers.authorization}}" // "Bearer sk-..."
"apikey": "{{jmes request query.apikey}}" // "test-123"
```
### Sequential Responses
For testing multi-turn interactions like tool calling, use `sequence` to return different responses based on request order:
```yaml
rules:
# First request: trigger tool call
- path: "/v1/chat/completions"
sequence: 0
response:
status: 200
content: '{"choices":[{"message":{"tool_calls":[{"function":{"name":"get_weather"}}]},"finish_reason":"tool_calls"}]}'
# Second request: return final answer
- path: "/v1/chat/completions"
sequence: 1
response:
status: 200
content: '{"choices":[{"message":{"content":"The weather is 72°F"},"finish_reason":"stop"}]}'
```
Rules can use `match`, `sequence`, both, or neither:
| `match` | `sequence` | Behavior |
|---------|------------|----------|
| No | No | Matches all requests (catch-all) |
| Yes | No | Content-based matching only |
| No | Yes | Order-based matching only |
| Yes | Yes | Must satisfy both conditions |
Sequence counters are tracked per path and reset via `DELETE /config`.
### Streaming Configuration
Mock-LLM supports streaming responses when clients send `stream: true` in their requests. Streaming behavior is configured globally:
```yaml
streaming:
chunkSize: 50 # characters per chunk (default: 50)
chunkIntervalMs: 50 # milliseconds between chunks (default: 50)
rules:
- path: "/v1/chat/completions"
match: "@"
# etc...
```
When clients request streaming, Mock-LLM returns Server-Sent Events (SSE) with `Content-Type: text/event-stream`:
```javascript
const stream = await client.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }],
stream: true // Enables streaming
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
```
This enables deterministic testing of streaming protocol responses. Errors conditions can also be tested - error responses are sent as per the [OpenAI Streaming Specification](https://platform.openai.com/docs/api-reference/chat-streaming/streaming).
## MCP (Model Context Protocol) Mocking
Mock-LLM exposes MCP servers and tools which support testing the MCP protocol, details are in the [MCP Documentation](docs/mcp.md).
## A2A (Agent to Agent Protocol) Mocking
Mock-LLM exposes A2A servers and tools which support testing the A2A protocol, details are in the [A2A Documentation](docs/a2a.md).
## Deploying to Kubernetes with Helm
```bash
# Install from OCI registry
helm install mock-llm oci://ghcr.io/dwmkerr/charts/mock-llm --version 0.1.8
# Install with Ark resources enabled
# Requires Ark to be installed: https://github.com/mckinsey/agents-at-scale-ark
helm install mock-llm oci://ghcr.io/dwmkerr/charts/mock-llm --version 0.1.8 \
--set ark.model.enabled=true \
--set ark.a2a.enabled=true \
--set ark.mcp.enabled=true
# Verify deployment
kubectl get deployment mock-llm
kubectl get service mock-llm
# Port forward and test
kubectl port-forward svc/mock-llm 6556:6556 &
curl -X POST http://localhost:6556/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
```
Custom configuration via values.yaml:
```yaml
# Optional additional mock-llm configuration.
config:
rules:
- path: "/v1/chat/completions"
match: "contains(messages[-1].content, 'hello')"
response:
status: 200
content: |
{
"choices": [{
"message": {
"role": "assistant",
"content": "Hi there!"
},
"finish_reason": "stop"
}]
}
# Or use existing ConfigMap (must contain key 'mock-llm.yaml')
# existingConfigMap: "my-custom-config"
```
See the [full Helm documentation](docs/helm.md) for advanced configuration, Ark integration, and more.
## Examples
Any OpenAI API compatible SDKs can be used with Mock LLM. For Node.js:
```javascript
const OpenAI = require('openai');
const client = new OpenAI({
apiKey: 'mock-key',
baseURL: 'http://localhost:6556/v1'
});
const response = await client.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }]
});
console.log(response.choices[0].message.content);
// "Hello"
```
And for Python:
```python
from openai import OpenAI
client = OpenAI(
api_key='mock-key',
base_url='http://localhost:6556/v1'
)
response = client.chat.completions.create(
model='gpt-4',
messages=[{'role': 'user', 'content': 'Hello'}]
)
print(response.choices[0].message.content)
# "Hello"
```
## Developer Guide
Install dependencies and start with live-reload:
```bash
npm install
npm run dev
```
Lint or run tests:
```bash
npm run lint
npm run test
```
Test and inspect the MCP Server running locally:
```bash
npm run local:inspect
```
## Samples
Each sample below is in the form of an extremely minimal script that shows:
1. How to configure mock-llm for a specific scenario
2. How to run the scenario
3. How to validate the results
These can be a reference for your own tests. Each sample is also run as part of the project's build pipeline.
| Sample | Description |
|--------|-------------|
| [01-echo-message.sh](samples/01-echo-message.sh) | Assert a response from an LLM. |
| [02-error-401.sh](samples/02-error-401.sh) | Verify error handling scenario. |
| [03-system-message-in-conversation.sh](samples/03-system-message-in-conversation.sh) | Test system message handling in conversations. |
| [04-headers-validation.sh](samples/04-headers-validation.sh) | Test custom HTTP header validation. |
| [05-a2a-countdown-agent.sh](samples/05-a2a-countdown-agent.sh) | Test A2A blocking task operations. |
| [06-a2a-echo-agent.sh](samples/06-a2a-echo-agent.sh) | Test A2A message handling. |
| [07-a2a-message-context.sh](samples/07-a2a-message-context.sh) | Test A2A message context and history. |
| [08-mcp-echo-tool.sh](samples/08-mcp-echo-tool.sh) | Test MCP tool invocation. |
| [09-token-usage.sh](samples/09-token-usage.sh) | Test token usage tracking. |
| [10-mcp-inspect-headers.sh](samples/10-mcp-inspect-headers.sh) | Test MCP header inspection. |
| [11-sequential-tool-calling.sh](samples/11-sequential-tool-calling.sh) | Test sequential responses for tool-calling flows. |
Each sample below is a link to a real-world deterministic integration test in [Ark](https://github.com/mckinsey/agents-at-scale-ark) that uses `mock-llm` features. These tests can be used as a reference for your own tests.
| Test | Description |
|------|-------------|
| [agent-default-model](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/agent-default-model) | Basic LLM query and response. |
| [model-custom-headers](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/model-custom-headers) | Passing custom headers to models. |
| [query-parameter-ref](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/query-parameter-ref) | Dynamic prompt resolution from ConfigMaps and Secrets. |
| [query-token-usage](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/query-token-usage) | Token usage tracking and reporting. |
| [a2a-agent-discovery](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/a2a-agent-discovery) | A2A agent discovery and server readiness. |
| [a2a-message-query](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/a2a-message-query) | A2A message handling. |
| [a2a-blocking-task-completed](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/a2a-blocking-task-completed) | A2A blocking task successful completion. |
| [a2a-blocking-task-failed](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/a2a-blocking-task-failed) | A2A blocking task error handling. |
| [mcp-discovery](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/mcp-discovery) | MCP server and tool discovery. |
| [mcp-header-propagation (PR #311)](https://github.com/mckinsey/agents-at-scale-ark/pull/311) | MCP header propagation from Agents and Queries. |
| [agent-tools](https://github.com/mckinsey/agents-at-scale-ark/tree/main/tests/agent-tools) | Sequential responses for tool-calling agents. |
## Contributors
Thanks to ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- prettier-ignore-start -->
<!-- markdownlint-disable -->
<table>
<tbody>
<tr>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/dwmkerr"><img src="https://avatars.githubusercontent.com/u/1926984?v=4?s=100" width="100px;" alt="Dave Kerr"/><br /><sub><b>Dave Kerr</b></sub></a><br /><a href="https://github.com/dwmkerr/mock-llm/commits?author=dwmkerr" title="Code">💻</a> <a href="https://github.com/dwmkerr/mock-llm/commits?author=dwmkerr" title="Documentation">📖</a> <a href="#infra-dwmkerr" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a> <a href="#maintenance-dwmkerr" title="Maintenance">🚧</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/lucaromagnoli"><img src="https://avatars.githubusercontent.com/u/38782977?v=4?s=100" width="100px;" alt="Luca Romagnoli"/><br /><sub><b>Luca Romagnoli</b></sub></a><br /><a href="https://github.com/dwmkerr/mock-llm/commits?author=lucaromagnoli" title="Code">💻</a></td>
<td align="center" valign="top" width="14.28%"><a href="https://github.com/daniele-marostica"><img src="https://avatars.githubusercontent.com/u/238710818?v=4?s=100" width="100px;" alt="Daniele"/><br /><sub><b>Daniele</b></sub></a><br /><a href="https://github.com/dwmkerr/mock-llm/commits?author=daniele-marostica" title="Code">💻</a></td>
</tr>
</tbody>
</table>
<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- ALL-CONTRIBUTORS-LIST:END -->
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome.