Multi-turn Conversations
A "turn" in CubePi is one round of: user input â model response (and
maybe tools) â optional more model responses to tool results. The
agent's _messages list grows across turns; this guide covers how to
drive multi-turn flows correctly and how to inject input while the
agent is mid-thought.
The basic patternâ
Just call prompt again after the previous call returns:
await agent.prompt("Hi, my name is Sam.")
await agent.prompt("What's my name?")
# â "Your name is Sam."
History lives on agent.state.messages. CubePi appends each user
message, each assistant message, and each tool result. The provider
gets the full list every time, so context windows matter (see
Context Management below).
In-flight steering: agent.steer()â
Sometimes the user has a correction or extra context while the model
is still mid-turn. Use steer():
import asyncio
async def main():
task = asyncio.create_task(agent.prompt("Plan a 5-day trip to Kyoto."))
await asyncio.sleep(2)
# User changed their mind:
agent.steer(UserMessage(content=[TextContent(text="Make it 3 days, not 5.")]))
await task
A steering message is enqueued; the loop picks it up between turns (in practice, between a tool batch and the next model call). The agent sees it before its next response â it's not lost.
steering_mode on Agent controls drain behaviour:
"one-at-a-time"(default) â one queued message per pickup point."all"â every queued message is drained at once.
Queued follow-ups: agent.follow_up()â
follow_up is for "after the current run, start a new turn with
this." It's the typical pattern for a chat UI: the user types while
the assistant is still responding.
agent.follow_up(UserMessage(content=[TextContent(text="And what about Osaka?")]))
# When the current prompt() finishes, the loop picks this up
# automatically and starts a new turn.
If the agent is idle when you call follow_up, you still need to
trigger a run â most apps call await agent.resume() once prompt()
returns, to drain the queue.
resume() â continue from the last messageâ
resume() is the "pick up where we left off" entry point. Two uses:
- After loading from a checkpointer. The state has messages but
no in-flight prompt.
resume()looks at the last message and acts:- assistant â expects a queued steer/follow_up to convert into a new user turn; otherwise raises.
- tool_result â re-invokes the model with the tool output.
- user â re-invokes the model on the user message.
- After abort. Once
agent.abort()has cleanly torn down a run, you can resume.
async with SQLiteCheckpointer("conv.db") as cp:
agent = Agent(provider=âĻ, model=âĻ, checkpointer=cp, thread_id="conv-1")
await agent.prompt("hello") # loads existing history first
await agent.prompt("how are you?")
The first prompt() after instantiation loads any existing thread.
Subsequent calls just append.
Context managementâ
CubePi does not truncate or summarise context on your behalf. The full message list is sent to the model on every turn. Strategies:
- Manual truncation â Implement a
transform_contextmiddleware that returns a sliding window. - Summarisation pass â Periodically inject a summary message and
drop older ones via
transform_context. - Different
convert_to_llmâ Reshape the history just before serialisation (the last opportunity), without mutatingagent.state.messages. The user-visible history stays full; the model sees less.
See Middleware â Examples for a working example.
Cancellation and idle waitsâ
agent.abort() # signals the current run to stop
await agent.wait_for_idle() # awaits the run-cleanup
wait_for_idle() is a no-op if the agent is already idle. It's safe
to call anywhere.
Restoring state from diskâ
from cubepi.checkpointer import SQLiteCheckpointer
async with SQLiteCheckpointer("conv.db") as cp:
agent = Agent(
provider=provider,
model=model,
checkpointer=cp,
thread_id="user-42",
)
# First prompt() restores the saved history if any.
await agent.prompt("continue our chat")
The _extra slot (an arbitrary dict[str, Any]) is also restored.
Middleware that wants to persist per-thread state should write into
context.extra; the checkpointer's save_extra is called at
agent_end.
Common pitfallsâ
prompt()while anotherprompt()is in flight raisesRuntimeError. Usesteer()orfollow_up(), orwait_for_idle()first.resume()with last message = assistant and no queue raises"Cannot continue from message role: assistant". Either queue a follow-up first or callprompt()instead.- History grows unbounded â without a
transform_contextmiddleware you'll eventually hit context limits. Plan for truncation/summarisation early. - Concurrent agents on the same
thread_idâ Append-only is safe for ordering but two agents writing the same thread will interleave messages. Use one agent instance per thread or coordinate at the application layer.
See alsoâ
- Streaming Events â exact event order around steering/follow_up.
- Checkpointing â SQLite â persisting history.
- Recipes â Persistent Chat â full multi-turn app with history reload.