Providers
OpenKoi is model-agnostic. It supports multiple LLM providers out of the box and can assign different models to different roles within the iteration engine. Providers are auto-discovered from environment variables and local services -- no manual configuration required for the common case.
Provider Trait
Every provider implements the same Rust trait, making them interchangeable:
#[async_trait]
pub trait ModelProvider: Send + Sync {
fn id(&self) -> &str;
fn name(&self) -> &str;
fn models(&self) -> &[ModelInfo];
async fn chat(
&self,
request: ChatRequest,
) -> Result<ChatResponse, ProviderError>;
async fn chat_stream(
&self,
request: ChatRequest,
) -> Result<Pin<Box<dyn Stream<Item = Result<ChatChunk>>>>, ProviderError>;
async fn embed(
&self,
texts: &[&str],
) -> Result<Vec<Vec<f32>>, ProviderError>;
}The chat method is used for non-streaming calls (evaluation, planning). The chat_stream method is used for execution output that streams to the terminal. The embed method generates vector embeddings for semantic memory search.
Built-in Providers
Anthropic
| Property | Value |
|---|---|
| Provider ID | anthropic |
| API key env var | ANTHROPIC_API_KEY |
| Default model | claude-sonnet-4-5 |
| API | Anthropic Messages API |
| Streaming | Yes (SSE) |
| Embeddings | No (use OpenAI embedder) |
Anthropic is the highest-priority provider when auto-detecting. OpenKoi uses Anthropic's prompt caching feature to reduce costs on repeated system prompts across iterations within a session.
Prompt Caching
The system prompt (soul + task context + skill descriptions) is marked with cache_control: Ephemeral. This tells Anthropic to cache the system prompt across consecutive API calls, saving approximately 90% of input tokens on the system prompt portion.
// Simplified from src/provider/anthropic.rs
SystemBlock {
text: system_prompt,
cache_control: Some(CacheControl::Ephemeral),
}Prompt caching is automatic and requires no configuration. It only applies to the Anthropic provider.
Credential Sources
Anthropic keys can come from multiple sources (checked in order):
ANTHROPIC_API_KEYenvironment variable- Claude CLI credentials at
~/.claude/.credentials.json - macOS Keychain entry
Claude Code-credentials - Saved credentials at
~/.openkoi/credentials/anthropic.key
OpenAI
| Property | Value |
|---|---|
| Provider ID | openai |
| API key env var | OPENAI_API_KEY |
| Default model | gpt-5.2 |
| API | OpenAI Chat Completions API |
| Streaming | Yes (SSE) |
| Embeddings | Yes (text-embedding-3-small) |
OpenAI is the second-priority provider. It also serves as the default embedder -- openai/text-embedding-3-small is used for vector embeddings unless overridden.
Credential Sources
OPENAI_API_KEYenvironment variable- OpenAI Codex CLI credentials
- Saved credentials at
~/.openkoi/credentials/openai.key
Google
| Property | Value |
|---|---|
| Provider ID | google |
| API key env var | GOOGLE_API_KEY |
| Default model | gemini-2.5-pro |
| API | Google Generative AI API |
| Streaming | Yes |
| Embeddings | Yes |
Google is the third-priority provider in auto-detection.
Ollama
| Property | Value |
|---|---|
| Provider ID | ollama |
| Connection | Local probe at localhost:11434 |
| Default model | Best available (see priority below) |
| API | Ollama REST API (OpenAI-compatible) |
| Streaming | Yes |
| Embeddings | Yes (model-dependent) |
Ollama is the zero-cost, zero-key path. OpenKoi probes localhost:11434 on startup. If Ollama is running, it queries the available models and selects the best one.
Model Priority
When multiple Ollama models are available, OpenKoi picks the best one by quality:
| Priority | Model | Notes |
|---|---|---|
| 1 | qwen2.5-coder | Best coding model available locally |
| 2 | codestral | Strong coding model from Mistral |
| 3 | deepseek-coder-v2 | Good code + general capability |
| 4 | llama3.3 | Strong general-purpose model |
| 5 | llama3.1 | Older but capable |
| 6 | mistral | Lightweight general-purpose |
| 7 | gemma2 | Google's open model |
| fallback | First available model | If none of the above are found |
If no models are installed, OpenKoi suggests running ollama pull llama3.3.
AWS Bedrock
| Property | Value |
|---|---|
| Provider ID | bedrock |
| Auth | AWS credentials (IAM / SigV4) |
| API | AWS Bedrock Runtime API |
| Streaming | Yes |
| Embeddings | Yes (model-dependent) |
Required Environment Variables
| Variable | Required | Description |
|---|---|---|
AWS_ACCESS_KEY_ID | Yes | AWS IAM access key |
AWS_SECRET_ACCESS_KEY | Yes | AWS IAM secret key |
AWS_SESSION_TOKEN | No | Temporary session token (for assumed roles) |
AWS_REGION | No | AWS region (defaults to us-east-1) |
Bedrock uses SigV4 signing for all API requests. OpenKoi handles the signing internally using the standard AWS credential chain.
Available Models
| Model | Bedrock Model ID |
|---|---|
| Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514 |
| Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022 |
| Amazon Nova Pro | amazon.nova-pro-v1:0 |
| Llama 3.3 70B | meta.llama3-3-70b-instruct-v1:0 |
OpenAI-Compatible Providers
Any provider that implements the OpenAI Chat Completions API can be used. These are configured via environment variables or the config file.
| Provider | API Key Env Var | Default Model | Base URL |
|---|---|---|---|
| Groq | GROQ_API_KEY | llama-3.3-70b-versatile | https://api.groq.com/openai/v1 |
| OpenRouter | OPENROUTER_API_KEY | auto | https://openrouter.ai/api/v1 |
| Together | TOGETHER_API_KEY | meta-llama/Llama-3.3-70B-Instruct-Turbo | https://api.together.xyz/v1 |
| DeepSeek | DEEPSEEK_API_KEY | deepseek-chat | https://api.deepseek.com/v1 |
| xAI | XAI_API_KEY | grok-4-0709 | https://api.x.ai/v1 |
| Custom | User-defined | User-defined | User-defined |
Custom Endpoint Configuration
For a self-hosted or unlisted OpenAI-compatible endpoint, use the provider picker during openkoi init and select "Other (OpenAI-compatible URL)", or set it up directly in the config file.
Credential Discovery
On startup, OpenKoi scans for credentials in the following order. The first match wins.
| Priority | Source | Example |
|---|---|---|
| 1 | Environment variables | ANTHROPIC_API_KEY, OPENAI_API_KEY, etc. |
| 2 | Claude CLI credentials | ~/.claude/.credentials.json (OAuth token) |
| 3 | Claude CLI Keychain (macOS) | macOS Keychain entry Claude Code-credentials |
| 4 | OpenAI Codex CLI | Codex CLI auth credentials |
| 5 | Qwen CLI | ~/.qwen/oauth_creds.json |
| 6 | Saved OpenKoi credentials | ~/.openkoi/credentials/<provider>.key |
| 7 | Ollama probe | TCP connection to localhost:11434 |
This means if you already have Claude Code or Codex CLI installed and authenticated, OpenKoi will automatically use those credentials with zero setup.
Default Model Priority
When no model is specified (no --model flag, no OPENKOI_MODEL, no config file), OpenKoi picks the best available model:
| Priority | Provider | Model | Why |
|---|---|---|---|
| 1 | anthropic | claude-sonnet-4-5 | Best overall quality for coding and reasoning |
| 2 | openai | gpt-5.2 | Strong general-purpose model |
| 3 | google | gemini-2.5-pro | Competitive with large context window |
| 4 | ollama | Best local model | Free, no API key needed |
If no provider is found at all, OpenKoi launches an interactive provider picker that helps you set up Ollama (free) or paste an API key.
Role-Based Model Assignment
OpenKoi assigns models to four distinct roles in the iteration engine:
| Role | What it does | Recommended characteristics |
|---|---|---|
| Executor | Performs the task (writes code, generates text, analyzes data) | Fast, high-quality generation |
| Evaluator | Judges the executor's output against rubrics | Precise, critical reasoning |
| Planner | Creates the initial plan and refines it between iterations | Good at decomposition and strategy |
| Embedder | Generates vector embeddings for semantic memory search | Fast, inexpensive |
Default Behavior
If you specify a single model (via --model or OPENKOI_MODEL), it is used for executor, evaluator, and planner. The embedder always defaults to openai/text-embedding-3-small unless explicitly overridden.
# All three roles use claude-sonnet-4-5
openkoi --model anthropic/claude-sonnet-4-5 "Fix the bug"Per-Role Assignment
For more control, assign different models to different roles:
# CLI flags
openkoi --executor anthropic/claude-sonnet-4-5 --evaluator anthropic/claude-opus-4-6 "Fix the bug"# config.toml
[models]
executor = "anthropic/claude-sonnet-4-5"
evaluator = "anthropic/claude-opus-4-6"
planner = "anthropic/claude-sonnet-4-5"
embedder = "openai/text-embedding-3-small"A common pattern is to use a fast, cheaper model for execution and a more capable model for evaluation:
[models]
executor = "openai/gpt-5.2"
evaluator = "anthropic/claude-opus-4-6"Fallback Chain
When a provider returns a transient error (rate limit, server error, timeout), OpenKoi automatically falls back to the next model in the chain.
How It Works
- The primary model is tried first.
- On transient failure, the model enters a cooldown period and is temporarily skipped.
- The next model in the fallback chain is tried.
- If all models in the chain fail, the task returns an
AllCandidatesExhaustederror.
Configuration
[models.fallback]
executor = [
"anthropic/claude-sonnet-4-5",
"openai/gpt-5.2",
"ollama/llama3.3",
]What Triggers a Fallback
| Error Type | Fallback? | Notes |
|---|---|---|
| Rate limit (429) | Yes | Model enters cooldown |
| Server error (5xx) | Yes | Model enters cooldown |
| Timeout | Yes | Model enters cooldown |
| Authentication error (401/403) | No | Permanent error, not retriable |
| Invalid request (400) | No | Permanent error, not retriable |
Model Reference Format
Throughout OpenKoi -- CLI flags, config files, REPL commands -- models are referenced using the provider/model-name format:
anthropic/claude-sonnet-4-5
openai/gpt-5.2
google/gemini-2.5-pro
ollama/llama3.3
ollama/codestral
bedrock/anthropic.claude-sonnet-4-20250514
groq/llama-3.3-70b-versatile
openrouter/auto
together/meta-llama/Llama-3.3-70B-Instruct-Turbo
deepseek/deepseek-chat
xai/grok-4-0709The provider prefix is required to disambiguate models that may share names across providers.
Adding a New Provider
OpenKoi's provider layer is designed for extensibility. There are three paths:
1. OpenAI-Compatible (Easiest)
If the provider implements the OpenAI Chat Completions API, no code changes are needed. Set the API key and base URL:
export MY_PROVIDER_API_KEY=sk-...Then reference it via config or CLI with an OpenAI-compatible prefix.
2. WASM Plugin
For providers with non-standard APIs, implement the provider interface as a WASM plugin:
[plugins]
wasm = ["~/.openkoi/plugins/wasm/my-provider.wasm"]WASM plugins run sandboxed and must declare network capabilities in their manifest.
3. Native (Rust)
For first-class support, implement the ModelProvider trait in Rust and submit a contribution. See src/provider/ for existing implementations.
Token Usage Tracking
Every API call returns a TokenUsage struct that feeds into cost tracking:
pub struct TokenUsage {
pub input_tokens: u32,
pub output_tokens: u32,
pub cache_read_tokens: u32, // Anthropic prompt caching
pub cache_write_tokens: u32, // Anthropic prompt caching
}The cache_read_tokens and cache_write_tokens fields are specific to Anthropic's prompt caching. For other providers, these are always zero. Cost is calculated per-model using the provider's published pricing.
View your token usage and costs with:
openkoi status --costs