CostWise Managed Cost Policy SDK — automatically reduce AI costs without changing your workflow
Project description
CostWise MCP SDK
Automatically reduce AI costs without changing your workflow.
The CostWise Managed Cost Policy (MCP) SDK analyzes your LLM requests, classifies task complexity, and recommends cheaper models and token limits — saving up to 90% on AI costs.
Install
pip install costwise-mcp
Quick Start
from costwise_mcp import CostPolicy
policy = CostPolicy(
api_key="cw_your_api_key",
backend_url="https://app.cost-wise.dev", # your CostWise instance
)
decision = policy.optimize(
prompt="Translate 'hello' to French",
model="gpt-5.4",
)
print(decision.recommended_model) # "gpt-5.4-mini" (70% cheaper)
print(decision.max_tokens) # 256
print(decision.estimated_cost) # $0.001154
# Call your LLM with the optimized params
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model=decision.recommended_model,
max_tokens=decision.max_tokens,
messages=[{"role": "user", "content": "Translate 'hello' to French"}],
)
# Report actual usage (async, non-blocking, optional)
policy.report(decision, actual_tokens=response.usage.total_tokens)
Team Policies (AI Governance)
Enforce model access and budgets per team:
from costwise_mcp import CostPolicy
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
# Interns: restricted to budget models (tier 3), max $50/month
decision = policy.optimize(
prompt="Write a summary",
model="gpt-5.4", # they asked for premium
team="interns", # policy enforces tier 3
)
# decision.recommended_model = "gpt-5.4-nano"
# decision.message = "Team 'interns' restricted to tier 3 models"
# Engineers: standard tier, $500/month budget
decision = policy.optimize(
prompt="Debug this code",
model="gpt-5.4",
team="engineers", # policy enforces tier 2
)
# decision.recommended_model = "gpt-5.4-mini"
# AI Research: full access, no budget limit
decision = policy.optimize(
prompt="Analyze this architecture...",
model="gpt-5.4",
team="ai-team", # tier 1 = all models allowed
)
# decision.recommended_model = "gpt-5.4" (kept for complex tasks)
Team tier levels
| Tier | Access | Example teams |
|---|---|---|
| 1 — Premium | All models (gpt-5.4, claude-opus-4-6, etc.) | AI Research, CTO |
| 2 — Standard | Mid-tier models (gpt-5.4-mini, claude-sonnet-4-6) | Engineers, Data Science |
| 3 — Budget | Cheapest models only (gpt-5.4-nano, claude-haiku-4-5) | Interns, Support, QA |
Configure team policies in CostWise: Settings > MCP Teams.
Error Handling
from costwise_mcp import (
CostPolicy,
BudgetExceededError,
ModelBlockedError,
TierRestrictionError,
PolicyViolationError,
)
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
decision = policy.optimize(
prompt="Generate a report",
model="gpt-5.4",
team="interns",
)
if not decision.allowed:
print(f"Blocked: {decision.message}")
# "Team 'interns' monthly budget exceeded ($50.42 / $50.00)"
else:
# Proceed with the LLM call
print(f"Use {decision.recommended_model}, max {decision.max_tokens} tokens")
Error types
| Error | When | Fields |
|---|---|---|
BudgetExceededError |
Team monthly/daily budget exceeded | current_spend, budget_limit |
ModelBlockedError |
Model is on the team's blocklist | blocked_model, alternative |
TierRestrictionError |
Model tier above team's limit | model_tier, max_tier |
PolicyViolationError |
Any team policy violation (base) | team, reason |
InvalidAPIKeyError |
API key invalid or expired | — |
RateLimitError |
Backend rate limit exceeded | retry_after |
Framework Integration
The SDK works with any AI framework — it runs before the LLM call, not instead of it.
OpenAI SDK
from costwise_mcp import CostPolicy
from openai import OpenAI
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = OpenAI()
decision = policy.optimize("Summarize this article", model="gpt-5.4", team="engineers")
response = client.chat.completions.create(
model=decision.recommended_model,
max_tokens=decision.max_tokens,
messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, actual_tokens=response.usage.total_tokens)
Anthropic SDK
from costwise_mcp import CostPolicy
import anthropic
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = anthropic.Anthropic()
decision = policy.optimize(prompt, model="claude-opus-4-6", team="ai-team")
message = client.messages.create(
model=decision.recommended_model,
max_tokens=decision.max_tokens,
messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, output_tokens=message.usage.output_tokens)
LangChain
from costwise_mcp import CostPolicy
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = ChatOpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
result = llm.invoke([HumanMessage(content=prompt)])
LlamaIndex
from costwise_mcp import CostPolicy
from llama_index.llms.openai import OpenAI
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = OpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
response = llm.complete(prompt)
Google Gemini
from costwise_mcp import CostPolicy
import google.generativeai as genai
policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
decision = policy.optimize(prompt, model="gemini-2.5-pro", team="engineers")
model = genai.GenerativeModel(decision.recommended_model)
response = model.generate_content(prompt)
Custom Configuration
from costwise_mcp import CostPolicy, PolicyConfig
policy = CostPolicy(
api_key="cw_...",
backend_url="https://app.cost-wise.dev",
config=PolicyConfig(
max_tokens_simple=128, # override default 256
max_tokens_medium=512, # override default 1024
max_tokens_complex=2048, # override default 4096
auto_downgrade=True, # auto-select cheaper models for simple tasks
blocked_models=["o1-pro"], # local blocklist (in addition to team policy)
telemetry_batch_size=100, # send every 100 events (default: 50)
telemetry_flush_interval=60, # send every 60 seconds (default: 30)
),
project_id="my-chatbot",
)
Supported Models (75 models, 10 providers)
| Provider | Models |
|---|---|
| OpenAI (18) | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-4.1, gpt-4o, gpt-4o-mini, o1, o1-mini, o3, o3-mini, o3-pro, o4-mini |
| Anthropic (23) | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-5, claude-sonnet-4-5, claude-3.5-sonnet, claude-3.5-haiku |
| Google (7) | gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash |
| Mistral (7) | mistral-large, mistral-small, open-mistral-nemo, codestral, pixtral-large |
| Meta/Llama (6) | llama-3.3-70b, llama-3.1-405b, llama-4-scout, llama-4-maverick |
| Cohere (4) | command-r-plus, command-r, command-light, command-a |
| xAI/Grok (3) | grok-3, grok-3-mini, grok-2 |
| AWS Nova (3) | nova-pro, nova-lite, nova-micro |
| DeepSeek (2) | deepseek-chat, deepseek-reasoner |
| AI21 (2) | jamba-1.5-large, jamba-1.5-mini |
Privacy
- Prompts are never sent to the CostWise backend
- Only metadata: token counts, model name, cost, team ID
- Telemetry is optional:
PolicyConfig(telemetry_enabled=False)
API Reference
CostPolicy(api_key, backend_url, config, project_id)
| Parameter | Type | Required | Description |
|---|---|---|---|
api_key |
str | Yes | CostWise API key (cw_...). Get one at Settings > MCP SDK. |
backend_url |
str | No | Your CostWise instance URL. Default: https://app.cost-wise.dev |
config |
PolicyConfig | No | Custom token limits, budgets, blocked models |
project_id |
str | No | Group analytics by project |
policy.optimize(prompt, model, task_type, max_budget, team) → Decision
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt |
str | Yes | Prompt text (for token estimation + complexity classification) |
model |
str | Yes | Model you intend to use (e.g. "gpt-5.4") |
task_type |
TaskComplexity | No | Override auto-classification: SIMPLE, MEDIUM, COMPLEX |
max_budget |
float | No | Max cost in USD for this request |
team |
str | No | Team ID for policy enforcement (e.g. "interns") |
policy.report(decision, actual_tokens, output_tokens, latency_ms)
Report actual usage after LLM call. Async, non-blocking.
Decision object
| Field | Type | Description |
|---|---|---|
recommended_model |
str | Model to use |
max_tokens |
int | Token limit |
estimated_cost |
float | Cost in USD |
original_model |
str | Originally requested model |
input_tokens |
int | Estimated input tokens |
complexity |
TaskComplexity | simple, medium, complex |
allowed |
bool | Whether request is within budget/policy |
savings_pct |
float | Percentage saved vs original |
estimated_savings |
float | USD saved |
message |
str | Human-readable recommendation |
Get Your API Key
- Sign in to your CostWise instance
- Go to Settings > MCP SDK
- Click Generate API Key
- Copy the
cw_...key - Set up team policies at Settings > MCP Teams
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file costwise_mcp-0.2.0.tar.gz.
File metadata
- Download URL: costwise_mcp-0.2.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6db10830b5408a96e1337fb5b9a0077fa50a7b8f73c582862cf96904958cad17
|
|
| MD5 |
2aa33dbc76ef7b00edb71f11e86d63a0
|
|
| BLAKE2b-256 |
06f5b6ca8d9ddd7de186b4f5ebe7902c9793bcdc4a19f2146ca047390c2c65c8
|
File details
Details for the file costwise_mcp-0.2.0-py3-none-any.whl.
File metadata
- Download URL: costwise_mcp-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0a6afb622bd485896427093545c4100744d280f723098d7e0824c2a970287c2
|
|
| MD5 |
ba894b3297708fac63f48b122b549f4d
|
|
| BLAKE2b-256 |
63d373950d9425878576154928cdd136cc65d366d348612bdf525fccbe40e380
|