Inspecting traces with cubepi trace
The JsonlSpanExporter writes one file per trace under
./cubepi-traces/<date>/<trace_id>.jsonl. The cubepi trace CLI (provided by
the trace-cli extra) reads those files so you can see exactly what a run did
— which LLM and tool calls fired, in what order, what each returned, where it
errored, and the token counts — without re-running it.
pip install 'cubepi[trace-cli]' # or: uv sync --extra trace-cli
cubepi trace --help
--dir defaults to ./cubepi-traces; pass --dir <path> if your traces live
elsewhere. Each file is one trace: the run plus any nested subagent runs,
which inherit the parent's trace_id and so land in the same file.
ls — list recent traces
cubepi trace ls # newest first; -n N to limit
| column | meaning |
|---|---|
started | trace start time (UTC) |
trace_id | the id you pass to view / follow / stats |
spans | span count for the whole trace (incl. subagents) |
status | ok or error |
duration | wall-clock span of the trace |
input | the user's prompt, to identify the run |
Filter by run metadata (--meta)
If the host stamped run-scoped metadata onto the trace (via
tracing_context(metadata=…) — e.g. cubebox records conversation_id,
user_id, org_id, workspace_id on the root invoke_agent span), filter to
just those traces:
cubepi trace ls --meta conversation_id=conv_123
cubepi trace ls --meta user_id=usr_9 --meta org_id=org_1 # repeatable = AND, exact match
Each --meta KEY=VALUE is matched exactly against the trace's root metadata;
repeating the flag ANDs the conditions.
To display metadata values as columns (rather than only filter by them),
add --show-meta KEY[,KEY…]:
cubepi trace ls --show-meta conversation_id,user_id
cubepi trace ls --meta org_id=org_1 --show-meta conversation_id # filter + show
(Or see all of a single trace's metadata with cubepi trace view <id> -v.)
view — render a trace as a span tree
A trace-id prefix is enough (the table truncates ids); an ambiguous prefix lists candidates.
cubepi trace view 1cd97cdb
trace
└── invoke_agent 14425.8ms [0x1cd97cdb]
├── cubepi.turn 1283.1ms [0x5cfda93e]
│ ├── chat deepseek-v4-flash 1208.7ms tok 6845/68 [0x0d130229]
│ └── execute_tool subagent 9610.2ms subagent [0x38bdd10a]
│ └── invoke_agent 9601.0ms [0x8094f99b] ← subagent run, nested
│ └── cubepi.turn 9598.4ms [0x57c5cfc7]
│ ├── chat deepseek-v4-flash 1190.3ms [0x8205ca6b]
│ └── execute_tool web_search 6500.2ms web_search [0xca4e59fc]
└── cubepi.turn 491.9ms ERROR [0xce25f242]
└── chat deepseek-v4-flash 427.2ms ERROR [0x0bff68ec]
└── error: Error code: 400 - ... `tool_use` ids were found without
`tool_result` blocks immediately after: call_01_...
Read it top-down: invoke_agent (a run) → cubepi.turn (one agent-loop turn)
→ chat <model> (an LLM call, with tok <input>/<output>) and
execute_tool <name> (a tool call). A subagent shows up as
execute_tool subagent with its own invoke_agent → cubepi.turn → … nested
directly beneath it. The [0x…] suffix on each node is the span's span_id —
grep it in the raw JSONL to inspect that exact span. Errors print inline under
the failing span.
Flags:
cubepi trace view <id> --content # expand gen_ai prompts / tool args / results
cubepi trace view <id> -v # expand ALL span attributes (verbose, large)
--content requires the run to have been recorded with
record_content=True (see Content & Redaction).
follow — watch a trace live
cubepi trace follow <id> # polls as spans complete; good for a run in progress
stats — aggregate across traces
cubepi trace stats --by model # latency p50/p95, error rate, tokens
cubepi trace stats --by tool --since 2026-05-20
stats also accepts --meta KEY=VALUE (same semantics as ls) to aggregate
only the traces that match — e.g. latency / error-rate / tokens for one user or
conversation:
cubepi trace stats --by model --meta user_id=usr_9
cubepi trace stats --by tool --meta conversation_id=conv_123
convert — reconstruct an API request body
When you need to replay a specific LLM call — reproduce a failure, test a prompt
change, or run a raw curl against the same context — convert reads a recorded
chat span and outputs the complete request body.
Requires record_content=True.
# Default: last chat span in the trace, OpenAI JSON format
cubepi trace convert <trace_id>
# Select which LLM call to reconstruct
cubepi trace convert <trace_id> --turn 2 # 2nd chat span (1-indexed)
cubepi trace convert <trace_id> --span 0xbb7eb1 # by span_id prefix (from `view`)
# Output formats
cubepi trace convert <trace_id> --format openai # default — JSON request body
cubepi trace convert <trace_id> --format anthropic # Anthropic Messages API body
cubepi trace convert <trace_id> --format curl # shell-executable curl command
The [0x…] span id from view output goes directly into --span:
├── chat kimi-k2.6 31704.5ms [0xbb7eb192] ← paste as: --span 0xbb7eb1
├── chat kimi-k2.6 32420.2ms [0x7c76f48d] ← or: --span 0x7c76f4
The reconstructed body includes the full conversation history, the system prompt,
all tool definitions, and request parameters (model, max_tokens,
temperature). Pipe to python -m json.tool, jq, or directly to a replay script.
Beyond the CLI
The files are plain JSONL — one span per line — so you can parse them directly
(jq, python -c) to pull a specific attribute (gen_ai.usage.*,
gen_ai.tool.call.result, gen_ai.input.messages, …). Error detail lives in a
span event named gen_ai.client.operation.exception.
A bundled cubepi-trace skill drives this CLI for debugging ("why did the run
end with no reply?", "the tool result is wrong"). It encodes the fast path
(ls → view <prefix>) and the token/cache-rate conventions.
npx skills add cubeplexai/cubepi@cubepi-trace -a claude-code