Skip to main content

Version: 0.4

`cubepi.providers`

AssistantMessage

class

BaseProvider

class

BaseProvider(self)

Concrete base class for built-in cubepi providers.

Built-in providers (Anthropic, OpenAI, OpenAI Responses, Faux) inherit from this class to gain the persistent listener registry used by cubepi.tracing and other observers. User-defined providers may also inherit from BaseProvider to opt in, or remain duck-typed against the Provider Protocol (which only requires stream()).

Concrete subclasses must implement stream() and call _fire_listeners at three points: after the request payload is finalized, for each StreamEvent pushed onto the stream, and exactly once in a finally block after the stream terminates.

Per-call mutators (StreamOptions.on_payload, StreamOptions.on_response) retain their existing single-slot semantics and fire independently of the persistent listener registry below.

Content

attribute

ImageContent

class

Message

attribute

MessageStream

class

MessageStream(self)

Model

class

ModelCost

class

OnChunkCallback

attribute

Persistent observer. Fires for every StreamEvent pushed onto the stream (start, text_delta, thinking_delta, toolcall_delta, done, error, ...). Heavy listeners should early-return on irrelevant event types — this hook fires hot. Return value is ignored.

OnPayloadCallback

attribute

Optional callback for inspecting/replacing provider payloads before sending. Return a dict to replace the payload, or None to keep unchanged.

OnRequestCallback

attribute

Persistent observer. Fires just before HTTP send, after any per-call StreamOptions.on_payload mutation has been applied. Receives the final wire payload dict and the Model. Return value is ignored.

OnResponseBodyCallback

attribute

Persistent observer. Fires exactly once per stream() call, in a finally block, after the stream terminates.

body: assembled provider response as a dict (same shape a non-streaming call to the provider would have returned), or None if the stream failed before a response could be assembled.
exc: the exception that ended the stream (including asyncio.CancelledError), or None on normal completion. Return value is ignored.

OnResponseCallback

attribute

Optional callback invoked after an HTTP response is received.

Provider

class

ProviderResponse

class

ProviderResponse(self, status: int, headers: dict[str, str] = dict())

HTTP response metadata exposed to on_response callbacks.

StreamEvent

class

StreamOptions

class

Options bag for Provider.stream(), transparent to the agent loop.

TextContent

class

ThinkingBudgets

class

Token budgets for each thinking level.

ThinkingContent

class

ThinkingLevel

attribute

ToolCall

class

ToolDefinition

class

ToolResultMessage

class

Usage

class

UserMessage

class

adjust_max_tokens_for_thinking

function

adjust_max_tokens_for_thinking(base_max_tokens: int, model_max_tokens: int, reasoning_level: ThinkingLevel, custom_budgets: ThinkingBudgets | None = None) -> tuple[int, int]

Adjust max_tokens to reserve space for a thinking budget.

Given a base max_tokens (the desired output capacity), increases it to accommodate the thinking budget while respecting the model's hard cap. If the model cap is too small to fit both, the thinking budget is reduced to leave at least min_output_tokens (1024) for output.

Returns

A (max_tokens, thinking_budget) tuple.

FauxProvider

class

FauxProvider(self, *, tokens_per_second: float | None = None, token_size_min: int = 3, token_size_max: int = 5)

faux_assistant_message

function

faux_assistant_message(content: str | FauxContentBlock | list[FauxContentBlock], *, stop_reason: str = 'stop', error_message: str | None = None) -> AssistantMessage

faux_text

function

faux_text(text: str) -> TextContent

faux_thinking

function

faux_thinking(thinking: str) -> ThinkingContent

faux_tool_call

function

faux_tool_call(name: str, arguments: dict[str, Any], *, id: str | None = None) -> ToolCall

THINKING_LEVELS

attribute

clamp_thinking_level

function

clamp_thinking_level(model: Model, level: ThinkingLevel) -> ThinkingLevel

Clamp level to the nearest supported level for model.

If level is already supported, return it unchanged. Otherwise search upward first (higher intensity), then downward, through the ordered level list to find the closest available level.

get_supported_thinking_levels

function

get_supported_thinking_levels(model: Model) -> list[ThinkingLevel]

Return the thinking levels supported by model.

Non-reasoning models only support ["off"].
For reasoning models, levels are filtered through the model's thinking_level_map. A level mapped to None is unsupported. "xhigh" is only included when it has an explicit (non-None) mapping. All other levels are included by default when the map omits them.

models_are_equal

function

models_are_equal(a: Model | None, b: Model | None) -> bool

Return True if a and b refer to the same model.

Comparison is by id and provider. Returns False when either argument is None.