OpenAI Provider
CubePi ships two OpenAI providers covering the two API surfaces:
OpenAIProviderâ Chat Completions API (/v1/chat/completions). Use this for the GPT-4/5 family and most OpenAI-compatible servers (vLLM, LiteLLM, DeepSeek, Qwen, MiniMax, DouBao, âĻ).OpenAIResponsesProviderâ Responses API (/v1/responses). Use this when you want server-side state and reasoning summaries.
Both implement the same Provider protocol; pick one per agent.
Chat Completions: OpenAIProviderâ
from cubepi import Model
from cubepi.providers.openai import OpenAIProvider
provider = OpenAIProvider(
api_key="sk-âĻ", # or reads OPENAI_API_KEY
base_url=None, # set for OpenAI-compatible servers
extra_body=None, # merged into every request
extra_headers=None,
payload_quirks=None, # ["max_completion_tokens_alias", âĻ]
)
model = Model(
id="gpt-5",
provider="openai",
reasoning=True, # enables thinking level mapping
max_tokens=8192,
context_window=128_000,
)
Thinking on Chat Completionsâ
OpenAI exposes reasoning content through delta.reasoning_content on
o-series and gpt-5 models. CubePi captures it as ThinkingContent and
emits thinking_* events identically to Anthropic. The same
ThinkingLevel enum ("off" â "high") works.
Many OpenAI-compatible OSS backends emit reasoning under different fields. CubePi understands three in priority order:
delta.reasoning_content(DeepSeek, Qwen, DouBao)delta.reasoning(vLLM)delta.reasoning_details(MiniMax)
No configuration needed â the provider picks whichever field is present.
extra_body for OSS quirksâ
Most OpenAI-compatible servers accept extensions through the request body. Set them once at construction:
provider = OpenAIProvider(
api_key="âĻ",
base_url="https://api.deepseek.com/v1",
extra_body={"enable_thinking": True, "stream_options": {"include_usage": True}},
)
If you need per-request mutation, use on_payload (see below).
payload_quirksâ
Some servers require max_tokens instead of max_completion_tokens:
provider = OpenAIProvider(
api_key="âĻ",
payload_quirks=["max_completion_tokens_alias"],
)
CubePi renames the key on the way out.
Pointing at vLLM / LiteLLM / DeepSeekâ
provider = OpenAIProvider(
api_key="dummy", # vLLM ignores it
base_url="http://localhost:8000/v1",
extra_headers={"Authorization": "Bearer dummy"},
)
For LiteLLM:
provider = OpenAIProvider(
api_key=os.environ["LITELLM_KEY"],
base_url="https://litellm.internal/v1",
)
Responses API: OpenAIResponsesProviderâ
from cubepi.providers.openai_responses import OpenAIResponsesProvider
provider = OpenAIResponsesProvider(api_key="sk-âĻ")
model = Model(id="gpt-5", provider="openai_responses", reasoning=True)
The Responses API keeps state server-side (referenced by
previous_response_id). CubePi tracks AssistantMessage.response_id
and feeds it back automatically â your code looks identical to the
Chat Completions path.
Use the Responses provider when:
- You want reasoning summaries (not just text) surfaced as thinking blocks.
- You're using the
o-series and want the server to hold the reasoning chain across turns (smaller payloads, faster reuse).
Stay on OpenAIProvider when you want full control over the message
list and prompt caching strategy.
on_payload / on_responseâ
Same shape as the Anthropic provider. The payload dict
differs (messages instead of messages + system separately,
OpenAI-style tools schema), so inspect it once before mutating.
async def add_user_metadata(payload, model):
payload["user"] = "u-42" # billable user attribution
return payload
agent = Agent(provider=provider, model=model, on_payload=add_user_metadata)
Tool callingâ
Tool definitions are auto-converted to OpenAI's
{"type": "function", "function": {...}} shape. The streaming format
emits incremental JSON arguments under toolcall_delta; CubePi
buffers and parses them through
cubepi.utils.json_parse.parse_streaming_json
so partials always validate to the closest well-formed object.
Multiple parallel tool calls in one assistant message just work â they're routed through the same parallel executor as the Anthropic provider.
Common pitfallsâ
stream_options.include_usagerejected â Some compatibles reject the wholestream_optionsfield.on_payloadcannot fix this: cubepi 0.3 callskwargs.setdefault("stream_options", {})after your callback runs, so deleting the key inon_payloadis silently undone. Workarounds:- Subclass
OpenAIProviderand overridestream()to skip thesetdefaultfor your backend. - Set
include_usage=Falseinon_payload(the field still goes out, but is usually accepted as a no-op even by strict backends). - Open an issue against cubepi to add a
payload_quirksentry such as"no_stream_options"for native opt-out.
- Subclass
- Thinking events but no
thinking_*events â Your backend surfaces reasoning under a non-standard field. Either add a fourth branch via PR or transcode it withon_payload. - Mixed providers in one process â Each provider holds its own
HTTP client. Reuse a single instance per
(base_url, api_key)pair instead of creating one per agent. - Usage shows 0 input tokens â Most compatibles omit usage
entirely or only emit it on the final chunk. Inspect the trailing
chunk in
on_payloadfor a hint, or treat token counts as best-effort on those backends.
See alsoâ
- Anthropic Provider â the other built-in.
- Custom Provider â write your own from scratch.
- Recipes â Multi-Provider Failover â combine both providers for resilience.