版本：0.4

Composition Rules

When you pass multiple middlewares — Agent(middleware=[m1, m2, m3]) — CubePi composes them according to per-hook rules that differ on purpose. The right way to think about it is: each hook has the composition rule that makes sense for its job, and you don't have to remember "before" or "after" precedence guesses.

The rules at a glance

Hook	Rule	Order matters?
`transform_context`	Chain — each sees previous output	Yes
`convert_to_llm`	Last wins	Only the last one runs
`transform_system_prompt`	Chain	Yes
`before_tool_call`	First block stops	First in list wins blocks
`after_tool_call`	Later overrides earlier	Last write wins
`should_stop_after_turn`	Any `True` stops (OR)	No
`after_model_response`	Chain with merge semantics	See below

`transform_context` and `transform_system_prompt`

Chain: m1's output becomes m2's input becomes m3's input. Useful for layered transforms:

agent = Agent(
    middleware=[
        SlidingWindow(max_messages=20),    # m1: drop oldest
        InjectSummary(),                    # m2: prepend a summary block
    ],
)

m2 sees the truncated list. The user-visible agent.state.messages is untouched — middleware only changes what the model receives.

`convert_to_llm`

Last-wins on purpose: this is the final transform before wire serialisation. Multiple owners would fight; pick one. CubePi enforces that the last middleware in the list that implements convert_to_llm is the one that runs.

If you find yourself needing two convert_to_llm middlewares, collapse them into one (call site composition: write one that calls both).

`before_tool_call`

First block=True short-circuits the rest. Use to chain policy layers from most-restrictive to least:

agent = Agent(
    middleware=[
        RateLimiter(),       # blocks on rate quota
        SafetyFilter(),      # blocks on dangerous args
        AuditLogger(),       # never blocks; just records
    ],
)

If RateLimiter returns block=True, SafetyFilter and AuditLogger's before_tool_call don't run. AuditLogger.after_tool_call still fires because that's a different hook.

`after_tool_call`

Each middleware can return an AfterToolCallResult with some fields set; CubePi merges them, with later results overriding earlier ones for any field that's not None. The full result:

class AfterToolCallResult(BaseModel):
    content: list[Content] | None = None
    details: Any = None
    is_error: bool | None = None
    terminate: bool | None = None

Pattern: an early middleware adds rich details, a later one sanitises content for the model. Both run; the merged result combines details from one with the redacted content from the other.

`should_stop_after_turn`

Any middleware returning True ends the run. The rest of the chain isn't evaluated.

agent = Agent(
    middleware=[
        MaxTurns(10),
        BudgetCap(usd=0.5),
        FinalAnswerSentinel(),   # stops when assistant says "FINAL ANSWER"
    ],
)

`after_model_response`

Chain with structured merge. Each middleware sees the current response (which may have been replaced by an earlier middleware) and returns a TurnAction:

response: AssistantMessage | None — if non-None, replaces the current response for downstream middlewares and for what the loop ultimately persists.
inject_messages: list[Message] — appended into a single list across the whole chain, then added to context before the next turn.
decision: "natural" | "stop" | "loop_to_model" — the last middleware's value wins.

agent = Agent(
    middleware=[
        ProfanityRedactor(),    # rewrites response
        StructuredOutputValidator(),  # may decide="loop_to_model"
        EventLogger(),          # decision unchanged
    ],
)

If StructuredOutputValidator returns decision="loop_to_model" and EventLogger returns decision="natural", the loop sees "natural" — because last wins. Reorder if that's not what you wanted.

Mixing middleware with constructor callables

Agent(...) also accepts explicit hook callables (convert_to_llm=…, before_tool_call=…, etc.). When both are present, the explicit callable wins:

agent = Agent(
    middleware=[LoggingMiddleware()],
    before_tool_call=my_explicit_hook,   # overrides the middleware version
)

Use the explicit form for one-off hooks; use middleware classes when behaviour is a coherent bundle.

A note on `Middleware` base class

The base Middleware class's unimplemented methods raise NotImplementedError. compose_middleware detects this by comparing to the base method and only wires hooks the middleware actually overrides. You don't need to pass-implement every method.

class JustTransform(Middleware):
    async def transform_context(self, messages, *, signal=None):
        return messages[-10:]
    # No other hooks. CubePi won't call them.

The rules at a glance​

transform_context and transform_system_prompt​

convert_to_llm​

before_tool_call​

after_tool_call​

should_stop_after_turn​

after_model_response​

Mixing middleware with constructor callables​

A note on Middleware base class​

See also​