Tracing Overview
CubePi emits OpenTelemetry spans that follow the GenAI Semantic Conventions v1.41 so any OTel-compatible backend (Jaeger, Tempo, Honeycomb, Datadog, AWS X-Ray, Azure Monitor, âĻ) can ingest agent runs without custom instrumentation.
Attach a Tracer to an Agent and every prompt produces a tree of spans you
can pivot, query, and join with the rest of your service traces:
invoke_agent <agent_name> [INTERNAL] one per agent.prompt()
âââ cubepi.turn [INTERNAL] one per LLM round-trip
âââ chat <model> [CLIENT] the LLM call itself
âââ execute_tool <tool_name> [INTERNAL] each tool invocation
âââ tools/call <tool_name> [CLIENT] (MCP tools only)
Each layer carries standard gen_ai.* attributes â gen_ai.operation.name,
gen_ai.request.model, gen_ai.provider.name, gen_ai.usage.input_tokens,
gen_ai.usage.output_tokens, gen_ai.response.finish_reasons, âĻ
What ships out of the boxâ
- Tracer â builds an SDK
TracerProvider, attaches oneBatchSpanProcessorper exporter, wires the cubepi event stream into spans. - Meter â sibling for OTel histograms:
gen_ai.client.operation.duration,gen_ai.client.operation.time_to_first_chunk,gen_ai.client.token.usage. - JsonlSpanExporter â write one JSON line per span to
./cubepi-traces/<date>/<run_id>.jsonl. Useful for local dev and offline debugging; works with any OTel viewer that reads JSONL. - OTLP â bring your own exporter via
opentelemetry-exporter-otlp-proto-http(HTTP) orâĻ-grpcand hand it toTracer(exporters=[âĻ]). - W3C trace context propagation â outgoing MCP calls inject the active
traceparentas an HTTP header so an instrumented MCP server can continue the trace. tracer.attached(agent)/meter.attached(agent)â async context managers that RAII-wrap attach/detach, so cleanup is oneasync withblock instead of an explicittry/finally.atexitflush hook âTracer(atexit_flush=True)(default) registers a process-exit handler that sync-flushes any buffered spans, so callers who forgetawait tracer.shutdown()still get their spans exported on normal exit / Ctrl-C / unhandled exception.tracing_context()â set per-run tags and metadata (cubepi.tags = ("beta-arm",),cubepi.metadata.user_id = "u-42") via a contextvar-scoped block. Concurrent agents each see their own values.
What it costsâ
- One pure-Python recorder per agent run subscribing to the agent's event stream and the provider's listener registry â no monkey-patching, no extra threads.
- One OTel SDK span per layer above.
BatchSpanProcessorbatches export off the hot path. - No payloads are recorded by default.
gen_ai.input.messages,gen_ai.output.messages, raw request/response, and tool args/results all require explicit opt-in viarecord_content=Trueso you don't accidentally ship PII to your backend. See Content & Redaction.
When to use each pieceâ
| You want | Use |
|---|---|
| Trace one local agent run and inspect a JSONL file | Tracer + JsonlSpanExporter |
| Ship to Jaeger / Tempo / Honeycomb / Datadog | Tracer + OTLP exporter |
| Latency + token histograms next to the spans | Meter alongside Tracer |
| Record prompts / model outputs for evaluation | Tracer(record_content=True) |
| Redact PII before it leaves the process | Tracer(redact=âĻ) |
Tag runs with user_id / session_id / A-B arm | tracing_context(tags=âĻ, metadata=âĻ) |
| One-liner cleanup, no try/finally | async with tracer.attached(agent): âĻ |
Forget to call shutdown() and not lose spans | Tracer(atexit_flush=True) (default) |
| Continue a trace from an upstream service | Tracer(resource=âĻ) + W3C traceparent (auto for MCP, manual for HTTP) |
Where to go nextâ
- Getting Started â install the extra and emit your first spans
- OTLP & Backends â point cubepi at Jaeger, Tempo, Honeycomb, âĻ
- Content Recording & Redaction â record prompts and responses safely
- Metrics â histograms via
Meter