Tracing
OpenTelemetry tracing for Resonate applications.
Distributed tracing shows you the complete execution flow of your Resonate functions across workers and the server. This is invaluable for debugging performance issues and understanding complex workflows.
What is distributed tracing?#
Traditional logs show what happened on individual services. Tracing shows the entire path a request takes through your system:
Request enters → Server creates promise → Worker A starts task →
Worker A calls Worker B → Worker B completes → Worker A completes →
Promise resolvesEach step is a span. Spans are linked together into a trace that shows the complete request lifecycle.
OpenTelemetry support#
The Resonate TypeScript SDK supports OpenTelemetry for automatic tracing of:
- Function execution (start, completion, duration)
- Context operations (
ctx.run(),ctx.sleep()) - RPC calls between workers
- Promise creation and resolution
- Retries and failures
Use the @resonatehq/opentelemetry package to enable tracing.
Setup#
Install the package#
npm install @resonatehq/opentelemetryConfigure OpenTelemetry#
import { ResonateOpenTelemetry } from "@resonatehq/opentelemetry";
import { Resonate } from "@resonatehq/sdk";
// Initialize OpenTelemetry
const otel = new ResonateOpenTelemetry({
serviceName: "my-resonate-app",
exporterEndpoint: "http://localhost:4318/v1/traces", // OTLP endpoint
});
// Create Resonate instance with tracing
const resonate = new Resonate({
url: "http://localhost:8001",
// OpenTelemetry context automatically propagated
});OTLP Exporter endpoint#
OpenTelemetry Protocol (OTLP) is the standard way to send traces. Point it to your collector or backend:
- Jaeger:
http://localhost:4318/v1/traces - Zipkin:
http://localhost:9411/api/v2/spans - Tempo:
http://localhost:4318/v1/traces - Datadog:
http://localhost:8126/v0.4/traces(via Datadog Agent) - Honeycomb:
https://api.honeycomb.io/v1/traces(with API key)
Trace backends#
Choose a backend to visualize and query traces:
Jaeger (open source)#
Quick start with Docker:
docker run -d --name jaeger \
-p 4318:4318 \
-p 16686:16686 \
jaegertracing/all-in-one:latestAccess UI at http://localhost:16686
Grafana Tempo (open source)#
Lightweight, cost-effective tracing backend:
version: '3.8'
services:
tempo:
image: grafana/tempo:latest
ports:
- "4318:4318" # OTLP receiver
volumes:
- ./tempo.yaml:/etc/tempo.yaml
command: ["-config.file=/etc/tempo.yaml"]
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_INSTALL_PLUGINS=grafana-tempo-datasourceDatadog APM#
Commercial solution with powerful features:
import { ResonateOpenTelemetry } from "@resonatehq/opentelemetry";
const otel = new ResonateOpenTelemetry({
serviceName: "my-resonate-app",
exporterEndpoint: "http://localhost:8126/v0.4/traces", // Datadog Agent
});Traces appear in Datadog APM UI automatically.
Honeycomb#
Cloud-native observability platform:
import { ResonateOpenTelemetry } from "@resonatehq/opentelemetry";
const otel = new ResonateOpenTelemetry({
serviceName: "my-resonate-app",
exporterEndpoint: "https://api.honeycomb.io/v1/traces",
headers: {
"x-honeycomb-team": process.env.HONEYCOMB_API_KEY,
},
});AWS X-Ray#
AWS-native tracing:
import { AWSXRayPropagator } from "@opentelemetry/propagator-aws-xray";
import { AWSXRayIdGenerator } from "@opentelemetry/id-generator-aws-xray";
const otel = new ResonateOpenTelemetry({
serviceName: "my-resonate-app",
exporterEndpoint: "http://localhost:2000", // X-Ray daemon
idGenerator: new AWSXRayIdGenerator(),
propagator: new AWSXRayPropagator(),
});What gets traced#
Function execution#
Every Resonate function creates a span:
resonate.register("processOrder", function* (ctx, order) {
// Span: "processOrder" (duration = function execution time)
const result = yield* ctx.run(validateOrder, order);
return result;
});Span attributes:
resonate.function.name- Function nameresonate.promise.id- Promise IDresonate.worker.group- Worker group
Context operations#
Each ctx.run(), ctx.sleep(), etc. creates child spans:
resonate.register("checkout", function* (ctx, cart) {
// Parent span: "checkout"
const validated = yield* ctx.run(validateCart, cart);
// Child span: "validateCart"
yield* ctx.sleep(1000);
// Child span: "sleep(1000ms)"
const charged = yield* ctx.run(chargeCard, cart.total);
// Child span: "chargeCard"
return charged;
});RPC calls#
When one worker calls another, spans are linked across workers:
// Worker A
resonate.register("orderWorkflow", async (ctx, order) => {
// Span: "orderWorkflow" on Worker A
const result = await resonate.rpc(
`inventory-${order.id}`,
"checkInventory",
order.items,
resonate.options({ target: "poll://any@inventory-workers" })
);
// Creates linked span on Worker B
return result;
});
// Worker B (inventory-workers group)
resonate.register("checkInventory", async (ctx, items) => {
// Span: "checkInventory" on Worker B
// Parent: "orderWorkflow" on Worker A
});Parent-child relationships are maintained via OpenTelemetry context propagation.
Retries and failures#
Failed attempts create error spans:
resonate.register("unreliableTask", async (ctx) => {
// Each retry attempt gets its own span
// Failed attempts marked with error: true
// Successful retry shows full history
});Analyzing traces#
Find slow functions#
Look for spans with high duration:
Jaeger: Filter by min duration (e.g., >5s)
Datadog: Sort by latency, look at p95/p99
Honeycomb: Use HEATMAP(duration_ms) to visualize distribution
Identify bottlenecks#
Trace view shows where time is spent:
orderWorkflow (10s total)
├─ validateOrder (0.1s)
├─ checkInventory (8s) ← BOTTLENECK
└─ chargeCard (1.9s)Focus optimization on checkInventory.
Debug failures#
Failed spans show error details:
orderWorkflow (FAILED)
├─ validateOrder (SUCCESS)
├─ checkInventory (FAILED) ← error: "out of stock"Click into failed span to see error message, stack trace, and context.
Track distributed workflows#
See how work flows across workers:
Worker A: orderWorkflow
├─ Worker B: checkInventory
├─ Worker B: reserveInventory
└─ Worker C: sendConfirmationUnderstand the complete execution path.
Sampling#
High-volume systems generate too many traces. Use sampling to reduce overhead:
import { ParentBasedSampler, TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const otel = new ResonateOpenTelemetry({
serviceName: "my-resonate-app",
sampler: new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
}),
});Strategies:
- Head-based sampling: Sample at trace creation (10% of all traces)
- Tail-based sampling: Keep interesting traces (all errors, slow requests)
- Adaptive sampling: Adjust rate based on traffic
Most backends support tail-based sampling (sample after seeing full trace).
Correlation with logs and metrics#
Trace ID in logs:
import { trace } from "@opentelemetry/api";
const span = trace.getActiveSpan();
const traceId = span?.spanContext().traceId;
console.log(`Processing order [traceId=${traceId}]`);Search logs by trace ID to see detailed context.
Metrics from traces:
Backends can generate metrics from span data:
- Request rate by function name
- Latency percentiles (p50, p95, p99)
- Error rates
- Span counts
Best practices#
- Enable tracing from day one - Hard to add later
- Use meaningful span names - "processOrder" not "function_123"
- Add custom attributes - Enrich spans with business context:
span.setAttribute("order.id", orderId);
span.setAttribute("user.id", userId);- Trace sparingly in hot paths - Use sampling for high-throughput functions
- Correlate traces with logs - Include trace ID in log messages
- Set up alerts on trace metrics - Monitor error rate, latency from spans
- Review traces regularly - Don't wait for incidents to look at traces
Limitations#
Python SDK: OpenTelemetry support is not yet implemented. Use logs and metrics for observability until tracing is added.
Server tracing: The Resonate server itself doesn't emit traces yet. You can trace SDK/worker activity, but server coordination isn't visible in traces.
Troubleshooting#
No traces appearing#
Check exporter endpoint:
curl http://localhost:4318/v1/traces
# Should return 404 (endpoint exists but needs POST)Check OpenTelemetry initialization:
console.log("OpenTelemetry initialized:", otel);Enable debug logging:
const otel = new ResonateOpenTelemetry({
serviceName: "my-app",
logLevel: "debug", // See what's being exported
});Spans not linked across workers#
Cause: Context propagation not working.
Solution: Ensure @resonatehq/opentelemetry is initialized on all workers. Context is automatically propagated via Resonate's RPC mechanism.
High cardinality warnings#
Cause: Too many unique span attributes (e.g., user IDs, promise IDs).
Solution: Use sampling or limit high-cardinality attributes:
// Don't add unique IDs as span names
span.setAttribute("order.id", orderId); // Attribute, not nameSummary#
For development:
- Use Jaeger locally (easy Docker setup)
- Enable tracing from day one
- Trace a few example workflows to understand behavior
For production:
- Use managed backend (Datadog, Honeycomb, Tempo)
- Enable sampling (10-30% of traces)
- Correlate traces with logs and metrics
- Set up alerts on trace-derived metrics
Key insights from tracing:
- Where time is spent (find bottlenecks)
- How work flows across workers (understand distributed execution)
- Why failures happen (error context and stack traces)
Tracing complements logs and metrics. Together, they give you complete observability:
- Logs: What happened (events)
- Metrics: How much/how fast (aggregates)
- Traces: Why and where (causality)