Skip to content

Providers

OpenKoi is model-agnostic. It supports multiple LLM providers out of the box and can assign different models to different roles within the iteration engine. Providers are auto-discovered from environment variables and local services -- no manual configuration required for the common case.

Provider Trait

Every provider implements the same Rust trait, making them interchangeable:

rust
#[async_trait]
pub trait ModelProvider: Send + Sync {
    fn id(&self) -> &str;
    fn name(&self) -> &str;
    fn models(&self) -> &[ModelInfo];

    async fn chat(
        &self,
        request: ChatRequest,
    ) -> Result<ChatResponse, ProviderError>;

    async fn chat_stream(
        &self,
        request: ChatRequest,
    ) -> Result<Pin<Box<dyn Stream<Item = Result<ChatChunk>>>>, ProviderError>;

    async fn embed(
        &self,
        texts: &[&str],
    ) -> Result<Vec<Vec<f32>>, ProviderError>;
}

The chat method is used for non-streaming calls (evaluation, planning). The chat_stream method is used for execution output that streams to the terminal. The embed method generates vector embeddings for semantic memory search.


Built-in Providers

Anthropic

PropertyValue
Provider IDanthropic
API key env varANTHROPIC_API_KEY
Default modelclaude-sonnet-4-5
APIAnthropic Messages API
StreamingYes (SSE)
EmbeddingsNo (use OpenAI embedder)

Anthropic is the highest-priority provider when auto-detecting. OpenKoi uses Anthropic's prompt caching feature to reduce costs on repeated system prompts across iterations within a session.

Prompt Caching

The system prompt (soul + task context + skill descriptions) is marked with cache_control: Ephemeral. This tells Anthropic to cache the system prompt across consecutive API calls, saving approximately 90% of input tokens on the system prompt portion.

rust
// Simplified from src/provider/anthropic.rs
SystemBlock {
    text: system_prompt,
    cache_control: Some(CacheControl::Ephemeral),
}

Prompt caching is automatic and requires no configuration. It only applies to the Anthropic provider.

Credential Sources

Anthropic keys can come from multiple sources (checked in order):

  1. ANTHROPIC_API_KEY environment variable
  2. Claude CLI credentials at ~/.claude/.credentials.json
  3. macOS Keychain entry Claude Code-credentials
  4. Saved credentials at ~/.openkoi/credentials/anthropic.key

OpenAI

PropertyValue
Provider IDopenai
API key env varOPENAI_API_KEY
Default modelgpt-5.2
APIOpenAI Chat Completions API
StreamingYes (SSE)
EmbeddingsYes (text-embedding-3-small)

OpenAI is the second-priority provider. It also serves as the default embedder -- openai/text-embedding-3-small is used for vector embeddings unless overridden.

Credential Sources

  1. OPENAI_API_KEY environment variable
  2. OpenAI Codex CLI credentials
  3. Saved credentials at ~/.openkoi/credentials/openai.key

Google

PropertyValue
Provider IDgoogle
API key env varGOOGLE_API_KEY
Default modelgemini-2.5-pro
APIGoogle Generative AI API
StreamingYes
EmbeddingsYes

Google is the third-priority provider in auto-detection.

Ollama

PropertyValue
Provider IDollama
ConnectionLocal probe at localhost:11434
Default modelBest available (see priority below)
APIOllama REST API (OpenAI-compatible)
StreamingYes
EmbeddingsYes (model-dependent)

Ollama is the zero-cost, zero-key path. OpenKoi probes localhost:11434 on startup. If Ollama is running, it queries the available models and selects the best one.

Model Priority

When multiple Ollama models are available, OpenKoi picks the best one by quality:

PriorityModelNotes
1qwen2.5-coderBest coding model available locally
2codestralStrong coding model from Mistral
3deepseek-coder-v2Good code + general capability
4llama3.3Strong general-purpose model
5llama3.1Older but capable
6mistralLightweight general-purpose
7gemma2Google's open model
fallbackFirst available modelIf none of the above are found

If no models are installed, OpenKoi suggests running ollama pull llama3.3.

AWS Bedrock

PropertyValue
Provider IDbedrock
AuthAWS credentials (IAM / SigV4)
APIAWS Bedrock Runtime API
StreamingYes
EmbeddingsYes (model-dependent)

Required Environment Variables

VariableRequiredDescription
AWS_ACCESS_KEY_IDYesAWS IAM access key
AWS_SECRET_ACCESS_KEYYesAWS IAM secret key
AWS_SESSION_TOKENNoTemporary session token (for assumed roles)
AWS_REGIONNoAWS region (defaults to us-east-1)

Bedrock uses SigV4 signing for all API requests. OpenKoi handles the signing internally using the standard AWS credential chain.

Available Models

ModelBedrock Model ID
Claude Sonnet 4anthropic.claude-sonnet-4-20250514
Claude 3.5 Haikuanthropic.claude-3-5-haiku-20241022
Amazon Nova Proamazon.nova-pro-v1:0
Llama 3.3 70Bmeta.llama3-3-70b-instruct-v1:0

OpenAI-Compatible Providers

Any provider that implements the OpenAI Chat Completions API can be used. These are configured via environment variables or the config file.

ProviderAPI Key Env VarDefault ModelBase URL
GroqGROQ_API_KEYllama-3.3-70b-versatilehttps://api.groq.com/openai/v1
OpenRouterOPENROUTER_API_KEYautohttps://openrouter.ai/api/v1
TogetherTOGETHER_API_KEYmeta-llama/Llama-3.3-70B-Instruct-Turbohttps://api.together.xyz/v1
DeepSeekDEEPSEEK_API_KEYdeepseek-chathttps://api.deepseek.com/v1
xAIXAI_API_KEYgrok-4-0709https://api.x.ai/v1
CustomUser-definedUser-definedUser-defined

Custom Endpoint Configuration

For a self-hosted or unlisted OpenAI-compatible endpoint, use the provider picker during openkoi init and select "Other (OpenAI-compatible URL)", or set it up directly in the config file.


Credential Discovery

On startup, OpenKoi scans for credentials in the following order. The first match wins.

PrioritySourceExample
1Environment variablesANTHROPIC_API_KEY, OPENAI_API_KEY, etc.
2Claude CLI credentials~/.claude/.credentials.json (OAuth token)
3Claude CLI Keychain (macOS)macOS Keychain entry Claude Code-credentials
4OpenAI Codex CLICodex CLI auth credentials
5Qwen CLI~/.qwen/oauth_creds.json
6Saved OpenKoi credentials~/.openkoi/credentials/<provider>.key
7Ollama probeTCP connection to localhost:11434

This means if you already have Claude Code or Codex CLI installed and authenticated, OpenKoi will automatically use those credentials with zero setup.


Default Model Priority

When no model is specified (no --model flag, no OPENKOI_MODEL, no config file), OpenKoi picks the best available model:

PriorityProviderModelWhy
1anthropicclaude-sonnet-4-5Best overall quality for coding and reasoning
2openaigpt-5.2Strong general-purpose model
3googlegemini-2.5-proCompetitive with large context window
4ollamaBest local modelFree, no API key needed

If no provider is found at all, OpenKoi launches an interactive provider picker that helps you set up Ollama (free) or paste an API key.


Role-Based Model Assignment

OpenKoi assigns models to four distinct roles in the iteration engine:

RoleWhat it doesRecommended characteristics
ExecutorPerforms the task (writes code, generates text, analyzes data)Fast, high-quality generation
EvaluatorJudges the executor's output against rubricsPrecise, critical reasoning
PlannerCreates the initial plan and refines it between iterationsGood at decomposition and strategy
EmbedderGenerates vector embeddings for semantic memory searchFast, inexpensive

Default Behavior

If you specify a single model (via --model or OPENKOI_MODEL), it is used for executor, evaluator, and planner. The embedder always defaults to openai/text-embedding-3-small unless explicitly overridden.

bash
# All three roles use claude-sonnet-4-5
openkoi --model anthropic/claude-sonnet-4-5 "Fix the bug"

Per-Role Assignment

For more control, assign different models to different roles:

bash
# CLI flags
openkoi --executor anthropic/claude-sonnet-4-5 --evaluator anthropic/claude-opus-4-6 "Fix the bug"
toml
# config.toml
[models]
executor  = "anthropic/claude-sonnet-4-5"
evaluator = "anthropic/claude-opus-4-6"
planner   = "anthropic/claude-sonnet-4-5"
embedder  = "openai/text-embedding-3-small"

A common pattern is to use a fast, cheaper model for execution and a more capable model for evaluation:

toml
[models]
executor  = "openai/gpt-5.2"
evaluator = "anthropic/claude-opus-4-6"

Fallback Chain

When a provider returns a transient error (rate limit, server error, timeout), OpenKoi automatically falls back to the next model in the chain.

How It Works

  1. The primary model is tried first.
  2. On transient failure, the model enters a cooldown period and is temporarily skipped.
  3. The next model in the fallback chain is tried.
  4. If all models in the chain fail, the task returns an AllCandidatesExhausted error.

Configuration

toml
[models.fallback]
executor = [
  "anthropic/claude-sonnet-4-5",
  "openai/gpt-5.2",
  "ollama/llama3.3",
]

What Triggers a Fallback

Error TypeFallback?Notes
Rate limit (429)YesModel enters cooldown
Server error (5xx)YesModel enters cooldown
TimeoutYesModel enters cooldown
Authentication error (401/403)NoPermanent error, not retriable
Invalid request (400)NoPermanent error, not retriable

Model Reference Format

Throughout OpenKoi -- CLI flags, config files, REPL commands -- models are referenced using the provider/model-name format:

anthropic/claude-sonnet-4-5
openai/gpt-5.2
google/gemini-2.5-pro
ollama/llama3.3
ollama/codestral
bedrock/anthropic.claude-sonnet-4-20250514
groq/llama-3.3-70b-versatile
openrouter/auto
together/meta-llama/Llama-3.3-70B-Instruct-Turbo
deepseek/deepseek-chat
xai/grok-4-0709

The provider prefix is required to disambiguate models that may share names across providers.


Adding a New Provider

OpenKoi's provider layer is designed for extensibility. There are three paths:

1. OpenAI-Compatible (Easiest)

If the provider implements the OpenAI Chat Completions API, no code changes are needed. Set the API key and base URL:

bash
export MY_PROVIDER_API_KEY=sk-...

Then reference it via config or CLI with an OpenAI-compatible prefix.

2. WASM Plugin

For providers with non-standard APIs, implement the provider interface as a WASM plugin:

toml
[plugins]
wasm = ["~/.openkoi/plugins/wasm/my-provider.wasm"]

WASM plugins run sandboxed and must declare network capabilities in their manifest.

3. Native (Rust)

For first-class support, implement the ModelProvider trait in Rust and submit a contribution. See src/provider/ for existing implementations.


Token Usage Tracking

Every API call returns a TokenUsage struct that feeds into cost tracking:

rust
pub struct TokenUsage {
    pub input_tokens: u32,
    pub output_tokens: u32,
    pub cache_read_tokens: u32,   // Anthropic prompt caching
    pub cache_write_tokens: u32,  // Anthropic prompt caching
}

The cache_read_tokens and cache_write_tokens fields are specific to Anthropic's prompt caching. For other providers, these are always zero. Cost is calculated per-model using the provider's published pricing.

View your token usage and costs with:

bash
openkoi status --costs

Released under the MIT License.