CostWise Managed Cost Policy SDK — automatically reduce AI costs without changing your workflow

These details have not been verified by PyPI

Project links

Project description

CostWise MCP SDK

Automatically reduce AI costs without changing your workflow.

The CostWise Managed Cost Policy (MCP) SDK analyzes your LLM requests, classifies task complexity, and recommends cheaper models and token limits — saving up to 90% on AI costs.

Install

pip install costwise-mcp

Quick Start

from costwise_mcp import CostPolicy

policy = CostPolicy(
    api_key="cw_your_api_key",
    backend_url="https://app.cost-wise.dev",  # your CostWise instance
)

decision = policy.optimize(
    prompt="Translate 'hello' to French",
    model="gpt-5.4",
)

print(decision.recommended_model)  # "gpt-5.4-mini" (70% cheaper)
print(decision.max_tokens)         # 256
print(decision.estimated_cost)     # $0.001154

# Call your LLM with the optimized params
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": "Translate 'hello' to French"}],
)

# Report actual usage (async, non-blocking, optional)
policy.report(decision, actual_tokens=response.usage.total_tokens)

Team Policies (AI Governance)

Enforce model access and budgets per team:

from costwise_mcp import CostPolicy

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

# Interns: restricted to budget models (tier 3), max $50/month
decision = policy.optimize(
    prompt="Write a summary",
    model="gpt-5.4",      # they asked for premium
    team="interns",        # policy enforces tier 3
)
# decision.recommended_model = "gpt-5.4-nano"
# decision.message = "Team 'interns' restricted to tier 3 models"

# Engineers: standard tier, $500/month budget
decision = policy.optimize(
    prompt="Debug this code",
    model="gpt-5.4",
    team="engineers",      # policy enforces tier 2
)
# decision.recommended_model = "gpt-5.4-mini"

# AI Research: full access, no budget limit
decision = policy.optimize(
    prompt="Analyze this architecture...",
    model="gpt-5.4",
    team="ai-team",        # tier 1 = all models allowed
)
# decision.recommended_model = "gpt-5.4" (kept for complex tasks)

Team tier levels

Tier	Access	Example teams
1 — Premium	All models (gpt-5.4, claude-opus-4-6, etc.)	AI Research, CTO
2 — Standard	Mid-tier models (gpt-5.4-mini, claude-sonnet-4-6)	Engineers, Data Science
3 — Budget	Cheapest models only (gpt-5.4-nano, claude-haiku-4-5)	Interns, Support, QA

Configure team policies in CostWise: Settings > MCP Teams.

Error Handling

from costwise_mcp import (
    CostPolicy,
    BudgetExceededError,
    ModelBlockedError,
    TierRestrictionError,
    PolicyViolationError,
)

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(
    prompt="Generate a report",
    model="gpt-5.4",
    team="interns",
)

if not decision.allowed:
    print(f"Blocked: {decision.message}")
    # "Team 'interns' monthly budget exceeded ($50.42 / $50.00)"
else:
    # Proceed with the LLM call
    print(f"Use {decision.recommended_model}, max {decision.max_tokens} tokens")

Error types

Error	When	Fields
`BudgetExceededError`	Team monthly/daily budget exceeded	`current_spend`, `budget_limit`
`ModelBlockedError`	Model is on the team's blocklist	`blocked_model`, `alternative`
`TierRestrictionError`	Model tier above team's limit	`model_tier`, `max_tier`
`PolicyViolationError`	Any team policy violation (base)	`team`, `reason`
`InvalidAPIKeyError`	API key invalid or expired	—
`RateLimitError`	Backend rate limit exceeded	`retry_after`

Framework Integration

The SDK works with any AI framework — it runs before the LLM call, not instead of it.

OpenAI SDK

from costwise_mcp import CostPolicy
from openai import OpenAI

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = OpenAI()

decision = policy.optimize("Summarize this article", model="gpt-5.4", team="engineers")
response = client.chat.completions.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, actual_tokens=response.usage.total_tokens)

Anthropic SDK

from costwise_mcp import CostPolicy
import anthropic

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")
client = anthropic.Anthropic()

decision = policy.optimize(prompt, model="claude-opus-4-6", team="ai-team")
message = client.messages.create(
    model=decision.recommended_model,
    max_tokens=decision.max_tokens,
    messages=[{"role": "user", "content": prompt}],
)
policy.report(decision, output_tokens=message.usage.output_tokens)

LangChain

from costwise_mcp import CostPolicy
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = ChatOpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
result = llm.invoke([HumanMessage(content=prompt)])

LlamaIndex

from costwise_mcp import CostPolicy
from llama_index.llms.openai import OpenAI

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gpt-5.4", team="engineers")
llm = OpenAI(model=decision.recommended_model, max_tokens=decision.max_tokens)
response = llm.complete(prompt)

Google Gemini

from costwise_mcp import CostPolicy
import google.generativeai as genai

policy = CostPolicy(api_key="cw_...", backend_url="https://app.cost-wise.dev")

decision = policy.optimize(prompt, model="gemini-2.5-pro", team="engineers")
model = genai.GenerativeModel(decision.recommended_model)
response = model.generate_content(prompt)

Custom Configuration

from costwise_mcp import CostPolicy, PolicyConfig

policy = CostPolicy(
    api_key="cw_...",
    backend_url="https://app.cost-wise.dev",
    config=PolicyConfig(
        max_tokens_simple=128,      # override default 256
        max_tokens_medium=512,      # override default 1024
        max_tokens_complex=2048,    # override default 4096
        auto_downgrade=True,        # auto-select cheaper models for simple tasks
        blocked_models=["o1-pro"],  # local blocklist (in addition to team policy)
        telemetry_batch_size=100,   # send every 100 events (default: 50)
        telemetry_flush_interval=60, # send every 60 seconds (default: 30)
    ),
    project_id="my-chatbot",
)

Supported Models (75 models, 10 providers)

Provider	Models
OpenAI (18)	gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-4.1, gpt-4o, gpt-4o-mini, o1, o1-mini, o3, o3-mini, o3-pro, o4-mini
Anthropic (23)	claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-5, claude-sonnet-4-5, claude-3.5-sonnet, claude-3.5-haiku
Google (7)	gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash
Mistral (7)	mistral-large, mistral-small, open-mistral-nemo, codestral, pixtral-large
Meta/Llama (6)	llama-3.3-70b, llama-3.1-405b, llama-4-scout, llama-4-maverick
Cohere (4)	command-r-plus, command-r, command-light, command-a
xAI/Grok (3)	grok-3, grok-3-mini, grok-2
AWS Nova (3)	nova-pro, nova-lite, nova-micro
DeepSeek (2)	deepseek-chat, deepseek-reasoner
AI21 (2)	jamba-1.5-large, jamba-1.5-mini

Privacy

Prompts are never sent to the CostWise backend
Only metadata: token counts, model name, cost, team ID
Telemetry is optional: PolicyConfig(telemetry_enabled=False)

API Reference

`CostPolicy(api_key, backend_url, config, project_id)`

Parameter	Type	Required	Description
`api_key`	str	Yes	CostWise API key (`cw_...`). Get one at Settings > MCP SDK.
`backend_url`	str	No	Your CostWise instance URL. Default: `https://app.cost-wise.dev`
`config`	PolicyConfig	No	Custom token limits, budgets, blocked models
`project_id`	str	No	Group analytics by project

`policy.optimize(prompt, model, task_type, max_budget, team) → Decision`

Parameter	Type	Required	Description
`prompt`	str	Yes	Prompt text (for token estimation + complexity classification)
`model`	str	Yes	Model you intend to use (e.g. `"gpt-5.4"`)
`task_type`	TaskComplexity	No	Override auto-classification: `SIMPLE`, `MEDIUM`, `COMPLEX`
`max_budget`	float	No	Max cost in USD for this request
`team`	str	No	Team ID for policy enforcement (e.g. `"interns"`)

`policy.report(decision, actual_tokens, output_tokens, latency_ms)`

Report actual usage after LLM call. Async, non-blocking.

`Decision` object

Field	Type	Description
`recommended_model`	str	Model to use
`max_tokens`	int	Token limit
`estimated_cost`	float	Cost in USD
`original_model`	str	Originally requested model
`input_tokens`	int	Estimated input tokens
`complexity`	TaskComplexity	simple, medium, complex
`allowed`	bool	Whether request is within budget/policy
`savings_pct`	float	Percentage saved vs original
`estimated_savings`	float	USD saved
`message`	str	Human-readable recommendation

Get Your API Key

Sign in to your CostWise instance
Go to Settings > MCP SDK
Click Generate API Key
Copy the cw_... key
Set up team policies at Settings > MCP Teams

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 28, 2026

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costwise_mcp-0.2.0.tar.gz (20.0 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

costwise_mcp-0.2.0-py3-none-any.whl (19.5 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file costwise_mcp-0.2.0.tar.gz.

File metadata

Download URL: costwise_mcp-0.2.0.tar.gz
Upload date: Mar 28, 2026
Size: 20.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for costwise_mcp-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6db10830b5408a96e1337fb5b9a0077fa50a7b8f73c582862cf96904958cad17`
MD5	`2aa33dbc76ef7b00edb71f11e86d63a0`
BLAKE2b-256	`06f5b6ca8d9ddd7de186b4f5ebe7902c9793bcdc4a19f2146ca047390c2c65c8`

See more details on using hashes here.

File details

Details for the file costwise_mcp-0.2.0-py3-none-any.whl.

File metadata

Download URL: costwise_mcp-0.2.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for costwise_mcp-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0a6afb622bd485896427093545c4100744d280f723098d7e0824c2a970287c2`
MD5	`ba894b3297708fac63f48b122b549f4d`
BLAKE2b-256	`63d373950d9425878576154928cdd136cc65d366d348612bdf525fccbe40e380`

See more details on using hashes here.

costwise-mcp 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CostWise MCP SDK

Install

Quick Start

Team Policies (AI Governance)

Team tier levels

Error Handling

Error types

Framework Integration

OpenAI SDK

Anthropic SDK

LangChain

LlamaIndex

Google Gemini

Custom Configuration

Supported Models (75 models, 10 providers)

Privacy

API Reference

CostPolicy(api_key, backend_url, config, project_id)

policy.optimize(prompt, model, task_type, max_budget, team) → Decision

policy.report(decision, actual_tokens, output_tokens, latency_ms)

Decision object

Get Your API Key

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`CostPolicy(api_key, backend_url, config, project_id)`

`policy.optimize(prompt, model, task_type, max_budget, team) → Decision`

`policy.report(decision, actual_tokens, output_tokens, latency_ms)`

`Decision` object