Module 3: Distributed tracing and OpenTelemetry
With metrics from Module 1 and centralized logging from Module 2, you can detect issues and understand what went wrong. However, a challenge remains: when your application experiences slow response times, metrics show overall slowness and logs show individual service behavior, but neither reveals where in the call chain the delay originates.
Your organization’s application—the frontend, backend, database, and notifier services—processes every user request as a chain of HTTP calls across multiple services. When performance degrades, you need to see the complete path of each request, including the time spent in each hop.
In this module, you will learn distributed tracing concepts, verify the Tempo tracing backend, understand how the Go application is instrumented with the OpenTelemetry SDK, and then activate the full telemetry pipeline end-to-end. By the end, you will observe live traces, metrics, and logs flowing from all four services through the OpenTelemetry Collector to Tempo, Prometheus, and Loki—and correlate them across all three signals in real time.
- Learning objectives
- Understanding distributed tracing
- Understanding the OpenTelemetry collector architecture
- Exercise 1: Verify Tempo deployment
- Exercise 2: Explore the application’s OpenTelemetry instrumentation
- Exercise 3: Verify the OpenTelemetry operator and pre-deployed components
- Exercise 4: Create the sidecar collector in your namespace
- Exercise 5: Create the Instrumentation CR in your namespace
- Exercise 6: Enable OpenTelemetry on the applications
- Exercise 7: View and explore live traces
- Exercise 8: Explore the central collector pipeline
- Exercise 9: Zero-code Python auto-instrumentation
- Learning outcomes
- Module summary
Learning objectives
By the end of this module, you’ll be able to:
-
Understand distributed tracing concepts (spans, traces, context propagation)
-
Understand how Tempo stores and serves trace data
-
Understand the two-tier sidecar-to-central OpenTelemetry Collector architecture
-
Create a sidecar
OpenTelemetryCollectorCR and configure its pipeline -
Create an
InstrumentationCR for zero-code agent injection -
Enable OpenTelemetry on the workshop application
-
Observe live traces in the Observe → Traces console UI and query them with TraceQL
-
Understand how the central collector fans out telemetry to Tempo, Prometheus, and Loki
-
Enable zero-code Python auto-instrumentation on the notifier service
-
Observe a four-hop trace spanning Go and Python services in a single waterfall
Understanding distributed tracing
Before enabling tracing, you need to understand how distributed tracing works in microservice architectures.
Core tracing concepts
Trace: The complete journey of a single request through your system.
-
Example: A user submits a note → frontend service → backend service → database service
-
A trace captures the entire chain, start to finish, with timing and status at each step.
Span: One unit of work within a trace.
-
Example:
backend: POST /api/notes— one hop in the note-creation trace -
Contains: operation name, start time, duration, HTTP status code, and key-value attributes
Context propagation: The mechanism that ties spans together across service boundaries.
-
Each service receives an incoming trace context via the
traceparentHTTP header -
It creates a child span linked to the caller’s span, then propagates the context further downstream
-
Uses the W3C Trace Context standard (
traceparent,tracestateheaders)
Parent-child relationships: Spans form a tree.
-
Root span:
frontend: GET /new-note -
Child span:
backend: POST /api/notes(called by frontend) -
Grandchild span:
database: POST /api/events(called by backend)
Why distributed tracing matters
Without tracing: You see symptoms, not causes.
-
Metrics show: 95th-percentile API response time increased from 80ms to 950ms
-
Logs show: all three services logged warnings at the same time
-
Question: Which service is actually slow?
With tracing: You see the complete picture.
-
Trace shows: frontend→backend took 20ms; backend→database took 880ms (bottleneck found)
-
Other services operated normally
-
Answer: The database service query needs optimization
Tempo architecture
Tempo is a distributed tracing backend optimized for storing and querying traces on cost-effective object storage.
Key components:
-
Distributor: Receives trace data from instrumented applications over OTLP (gRPC port 4317, HTTP port 4318)
-
Ingester: Buffers spans in memory and writes them to object storage
-
Querier: Serves trace queries by reading from both the ingester cache and object storage in parallel
-
Query Frontend: Load-balances query traffic across Querier pods
-
Compactor: Merges and optimizes stored trace blocks over time
Storage: Tempo requires S3-compatible object storage. In this workshop the TempoStack uses in-cluster object storage provisioned by the OpenShift Data Foundation (ODF) NooBaa Multi-Cloud Gateway.
Integration: Tempo is queried by the Distributed Tracing UI plugin built into the OpenShift console via the Cluster Observability Operator. There is no separate Jaeger pod—the query interface is embedded directly in the console.
Understanding the OpenTelemetry collector architecture
Before enabling telemetry, understand how data flows from the application to each observability backend.
Two-tier collector pattern
This workshop uses a two-tier OpenTelemetry Collector topology. Every signal—traces, metrics, and logs—flows through the same two hops before reaching a purpose-built backend:
Application pods (%OPENSHIFT_USERNAME%-observability-demo)
+--------------------------------------------------------+
| frontend / backend / database / notifier |
| +----------+ OTLP HTTP (localhost:4318) |
| | app |---> otc-container (sidecar collector) |
| +----------+ |
+--------------------------------------------------------+
| All signals: traces + metrics + logs
| OTLP gRPC (cluster DNS, port 4317)
v
central-collector-collector.observability-demo.svc:4317
+------------------------------------------------------------------+
| Central collector – deployment mode, 2 replicas |
| (namespace: observability-demo) |
| |
| Signal routing: |
| traces --> otlp/tempo --> TempoStack distributor :4317 |
| traces --> spanmetrics --> traces/spanmetrics pipeline |
| metrics --> prometheusremotewrite |
| --> COO Prometheus /api/v1/write :9090 |
| logs --> otlphttp/logs |
| --> LokiStack gateway :8080 (application tenant) |
+------------------------------------------------------------------+
Why two tiers?
-
Sidecar collector: Runs within the application pod as a second container (
otc-container). The app sends tolocalhost:4318(no network hop). The sidecar enriches every span, metric data point, and log record with Kubernetes metadata (k8s.pod.name,k8s.deployment.name,k8s.namespace.name) via thek8sattributesprocessor, then forwards all three signals over a single gRPC connection to the central collector. -
Central collector: Runs as a shared
Deployment(2 replicas) inobservability-demo. It receives all signals from every user’s sidecar and routes them to different backends using different protocols and auth mechanisms. It also runs thespanmetricsconnector, which generates RED metrics directly from incoming trace spans.
This pattern keeps the sidecar simple (no secrets, no TLS config, no auth tokens) while centralising the complex backend integrations in one place.
Processors in the sidecar collector
The sidecar uses four processors in order:
| Processor | Function |
|---|---|
|
Prevents the collector from consuming more than 75% of available pod memory |
|
Detects OpenShift infrastructure attributes (such as |
|
Calls the Kubernetes API to attach pod name, namespace, deployment name, node name, and pod UID to every span, metric, and log record |
|
Accumulates records before sending to reduce network overhead |
Span metrics connector in the central collector
The central collector uses a spanmetrics connector to generate RED metrics automatically from incoming traces:
-
Rate:
traces_spanmetrics_calls_total— request count per service and operation -
Errors:
traces_spanmetrics_calls_total{status.code="STATUS_CODE_ERROR"}— error count -
Duration:
traces_spanmetrics_latency_bucket— latency histogram with configurable buckets
These metrics are published to the COO-managed Prometheus instance via prometheusremotewrite. The COO MonitoringStack has enableRemoteWriteReceiver: true set, which activates the /api/v1/write endpoint that Prometheus exposes for ingest.
Exercise 1: Verify Tempo deployment
The Tempo distributed tracing stack was deployed via GitOps as part of the workshop infrastructure. Verify that all components are running and ready.
Steps
-
Verify the Tempo Operator is running:
oc get pods -n openshift-tempo-operatorExpected outputNAME READY STATUS RESTARTS AGE tempo-operator-controller-manager-xxxxx 2/2 Running 0 1h -
Check the TempoStack instance:
oc get tempostack -n openshift-tempo-operatorExpected outputNAME AGE CONDITION tempo 1h ReadyThe
CONDITIONmust be Ready before traces can be ingested. -
Verify each Tempo component is running:
oc get pods -n openshift-tempo-operator -l app.kubernetes.io/instance=tempoExpected outputNAME READY STATUS RESTARTS AGE tempo-tempo-compactor-xxxxx 1/1 Running 0 1h tempo-tempo-distributor-xxxxx 1/1 Running 0 1h tempo-tempo-ingester-0 1/1 Running 0 1h tempo-tempo-querier-xxxxx 1/1 Running 0 1h tempo-tempo-query-frontend-xxxxx 1/1 Running 0 1h -
Confirm the distributor service endpoint (used by the OpenTelemetry Collector to write traces):
oc get svc tempo-tempo-distributor -n openshift-tempo-operatorExpected outputNAME TYPE CLUSTER-IP PORT(S) tempo-tempo-distributor ClusterIP 172.30.x.x 4317/TCP, 4318/TCPPort 4317 is OTLP gRPC and port 4318 is OTLP HTTP. The central OpenTelemetry Collector forwards traces to this endpoint.
Verify
Check that your tracing infrastructure is operational:
-
✓ Tempo Operator pod is Running (2/2 containers)
-
✓ TempoStack instance condition is Ready
-
✓ All Tempo component pods (distributor, ingester, querier, query-frontend, compactor) are Running
-
✓ Tempo distributor service exposes ports 4317 and 4318
What you learned: The TempoStack operator deploys and manages all Tempo components. The distributor is the write endpoint; the query-frontend is the read endpoint used by the OpenShift console UI plugin.
Exercise 2: Explore the application’s OpenTelemetry instrumentation
The three Go services—frontend, backend, and database—are already instrumented with the OpenTelemetry SDK. Telemetry generation is gated by an environment variable so it can be enabled without rebuilding the container image.
Steps
-
Inspect the shared telemetry package:
The repository contains a shared
telemetrypackage used by all three services atsrc/telemetry/telemetry.go.Open the Source Code tab in the workshop application (the running frontend) and navigate to telemetry/telemetry.goto browse the file with syntax highlighting.Key points in this package:
-
Enabled()function: Returnstrueonly whenOTEL_ENABLED=trueis set in the environment. All SDK initialization is skipped when false, so the application behaves identically to an uninstrumented binary. -
Setup()function: Initializes three OTLP HTTP exporters when enabled—trace, metric, and log—all targetingOTEL_EXPORTER_OTLP_ENDPOINT. Once telemetry is enabled in Exercise 6, this will point tohttp://localhost:4318(the injected sidecar collector).
-
-
Inspect the backend service instrumentation:
Open the Source Code tab in the workshop application and select backend/main.goto browse the file directly.You will see three instrumentation layers activated when
OTEL_ENABLED=true:Layer Purpose telemetry.Setup()Creates the global TraceProvider, MeterProvider, and LoggerProvider from the OTLP exporters
otelslog.NewHandler()Bridges the Go standard
sloglogger to the OTel log exporter—every structured log line is emitted as an OTel log record carrying the active trace IDotelhttp.NewTransport()Wraps the outbound HTTP client so the W3C
traceparentheader is injected into every downstream call -
Understand the server-side instrumentation:
The inbound HTTP handler is wrapped with
otelhttp.NewHandler(), which:-
Creates a server span for every inbound request
-
Extracts the incoming
traceparentheader and registers this span as a child of the calling service’s span -
Automatically records HTTP attributes (
http.method,http.route,http.status_code) on the span
-
-
Understand context propagation across services:
The trace context flows automatically through your microservices:
Browser +-- frontend (otelhttp server span: GET /new-note) | injects traceparent into outbound request +-- backend (otelhttp server span: POST /api/notes) | injects traceparent into outbound request +-- database (otelhttp server span: POST /api/events)Because every service uses
otelhttpfor both inbound (handler) and outbound (transport) HTTP calls, the trace ID and parent span ID are automatically threaded through the entire call chain with no manual span creation required in business logic code. -
Review the
enable-otel.yamlpatch file:Open the Source Code tab in the workshop application and select enable-otel.yamlto browse the full patch file.Or view it in the terminal:
This file contains three
Deploymentpatches—one per service. When applied, each patch:-
Sets
OTEL_ENABLED=trueto activate the SDK -
Sets
OTEL_SERVICE_NAMEto the service name (becomes theservice.nameresource attribute) -
Sets
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318to send telemetry to the injected sidecar -
Adds the pod annotation
sidecar.opentelemetry.io/inject: "sidecar"to trigger sidecar injection -
Sets
serviceAccountName: otel-collector-sidecarfor RBAC access to the Kubernetes API
-
-
Verify the current state of the deployments:
oc get deployment frontend backend \ -n %OPENSHIFT_USERNAME%-observability-demo \ -o custom-columns='NAME:.metadata.name,CONTAINERS:.spec.template.spec.containers[*].name' oc get statefulset database \ -n %OPENSHIFT_USERNAME%-observability-demo \ -o custom-columns='NAME:.metadata.name,CONTAINERS:.spec.template.spec.containers[*].name'Expected outputNAME CONTAINERS frontend frontend backend backend NAME CONTAINERS database databaseEach service currently has a single application container. After enabling OpenTelemetry later in this module, each pod will gain a second container—the injected sidecar collector.
Verify
-
✓
src/telemetry/telemetry.gogates all SDK initialization onOTEL_ENABLED=true -
✓
otelhttp.NewHandler()creates server spans and extracts incoming trace context -
✓
otelhttp.NewTransport()injects thetraceparentheader into all outbound HTTP calls -
✓
otelslog.NewHandler()bridges Go structured logs to OTel log records -
✓ Current deployments have one container each (no OTel yet)
What you learned: Effective OpenTelemetry Go instrumentation uses three layers—trace provider, otelhttp middleware, and log bridge—to emit traces, metrics, and correlated logs from a single SDK setup call. The otelhttp transport ensures W3C trace context propagation across all service boundaries automatically, without any manual span creation in application code.
Exercise 3: Verify the OpenTelemetry operator and pre-deployed components
Before creating resources in your namespace, confirm the OpenTelemetry Operator and the shared observability-demo infrastructure are healthy.
Steps
-
Verify the OpenTelemetry Operator pod is running:
oc get pods -n openshift-operators \ -l app.kubernetes.io/name=opentelemetry-operatorExpected outputNAME READY STATUS RESTARTS AGE opentelemetry-operator-controller-xxxxx 2/2 Running 0 1h -
Confirm the operator registered the CRDs:
oc api-resources | grep opentelemetryExpected outputinstrumentations opentelemetry.io true Instrumentation opentelemetrycollectors opentelemetry.io true OpenTelemetryCollector -
Inspect the pre-deployed central collector:
oc get opentelemetrycollector central-collector -n observability-demoExpected outputNAME MODE VERSION central-collector deployment 0.140.0-2 -
Verify the central collector pods are running:
oc get pods -n observability-demo \ -l app.kubernetes.io/name=central-collector-collectorExpected outputNAME READY STATUS RESTARTS AGE central-collector-collector-xxxxx 1/1 Running 0 1h central-collector-collector-xxxxx 1/1 Running 0 1hTwo replicas provide resilience for the shared collection endpoint.
-
Inspect the central collector service (this is where sidecars forward telemetry):
oc get svc central-collector-collector -n observability-demoExpected outputNAME TYPE CLUSTER-IP PORT(S) central-collector-collector ClusterIP 172.30.x.x 4317/TCP, 4318/TCP
Verify
-
✓ OpenTelemetry Operator pod is Running (2/2)
-
✓
OpenTelemetryCollectorandInstrumentationCRDs are registered -
✓
central-collectorexists inobservability-demoin deployment mode with 2 replicas -
✓
central-collector-collectorservice exposes ports 4317 and 4318
What you learned: The central collector is pre-deployed in the shared observability-demo namespace by GitOps. Your task is to create the per-namespace sidecar collector and Instrumentation CR in your own namespace, then wire the application into that pipeline.
Exercise 4: Create the sidecar collector in your namespace
You will deploy a sidecar-mode OpenTelemetryCollector CR in your %OPENSHIFT_USERNAME%-observability-demo namespace. When this CR exists, the OpenTelemetry Operator automatically injects a sidecar container into any pod in the namespace that carries the annotation sidecar.opentelemetry.io/inject: "sidecar".
Steps
-
Verify the required ServiceAccount is present in your namespace:
oc get serviceaccount otel-collector-sidecar \ -n %OPENSHIFT_USERNAME%-observability-demoThis ServiceAccount was pre-created for your namespace with the RBAC permissions needed by the
k8sattributesandresourcedetectionprocessors (read access to pods, namespaces, and nodes).Expected outputNAME SECRETS AGE otel-collector-sidecar 0 1h -
Create the sidecar OpenTelemetryCollector CR:
cat <<EOF | oc apply -f - apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector metadata: name: sidecar namespace: %OPENSHIFT_USERNAME%-observability-demo spec: mode: sidecar serviceAccount: otel-collector-sidecar config: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: memory_limiter: check_interval: 1s limit_percentage: 75 spike_limit_percentage: 15 resourcedetection: detectors: [openshift] timeout: 2s k8sattributes: auth_type: serviceAccount passthrough: false extract: metadata: - k8s.namespace.name - k8s.deployment.name - k8s.node.name - k8s.pod.name - k8s.pod.uid batch: timeout: 10s send_batch_size: 1024 exporters: otlp_grpc: endpoint: central-collector-collector.observability-demo.svc:4317 tls: insecure: true sending_queue: enabled: true queue_size: 5000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s max_elapsed_time: 10m service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, resourcedetection, k8sattributes, batch] exporters: [otlp_grpc] metrics: receivers: [otlp] processors: [memory_limiter, resourcedetection, k8sattributes, batch] exporters: [otlp_grpc] logs: receivers: [otlp] processors: [memory_limiter, resourcedetection, k8sattributes, batch] exporters: [otlp_grpc] EOF -
Verify the CR was accepted:
oc get opentelemetrycollector sidecar -n %OPENSHIFT_USERNAME%-observability-demoExpected outputNAME MODE VERSION sidecar sidecar 0.140.0-2In
sidecarmode, the operator does not create a standaloneDeployment. Instead it stores the container spec and injects it into pods on demand when the annotation is detected.
Understand the pipeline
The sidecar carries all three signal types over the same processor chain:
otlp receiver (localhost:4317 gRPC / localhost:4318 HTTP)
|
memory_limiter -- drop if pod memory > 75%
|
resourcedetection -- add k8s.cluster.name, cloud.platform
|
k8sattributes -- add k8s.pod.name, k8s.deployment.name,
k8s.namespace.name, k8s.pod.uid
|
batch -- buffer and flush (max 10s / 1024 records)
|
otlp_grpc exporter -> central-collector-collector.observability-demo.svc:4317
(gRPC, insecure + retry queue for resiliency)
The sidecar does not route signals to different destinations—that responsibility belongs to the central collector. All signals arrive at the central on a single gRPC stream.
Verify
-
✓
otel-collector-sidecarServiceAccount exists in%OPENSHIFT_USERNAME%-observability-demo -
✓
sidecarOpenTelemetryCollector CR exists insidecarmode -
✓ CR configuration includes all three pipelines (traces, metrics, logs)
-
✓ Exporter is
otlp_grpcwith queue/retry enabled and endpointcentral-collector-collector.observability-demo.svc:4317
Exercise 5: Create the Instrumentation CR in your namespace
The Instrumentation CR is a template that tells the OpenTelemetry Operator how to configure the auto-instrumentation agent init-container when injecting into pods. You need one per namespace. In this exercise you create it for %OPENSHIFT_USERNAME%-observability-demo—it will be used in Exercise 9 when you enable Python auto-instrumentation on the notifier service.
Steps
-
Create the Instrumentation CR:
cat <<EOF | oc apply -f - apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation metadata: name: my-instrumentation namespace: %OPENSHIFT_USERNAME%-observability-demo spec: exporter: endpoint: http://localhost:4318 sampler: type: parentbased_traceidratio argument: "1.0" propagators: - tracecontext - baggage python: env: - name: OTEL_EXPORTER_OTLP_PROTOCOL value: http/protobuf EOFKey fields explained:
-
spec.exporter.endpoint: http://localhost:4318— the auto-instrumented process sends telemetry to the sidecar on localhost (both containers live in the same pod) -
spec.sampler.type: parentbased_traceidratiowithargument: "1.0"— 100% sampling, suitable for a workshop -
spec.propagators: [tracecontext, baggage]— W3C Trace Context headers so the incomingtraceparentfrom the Go backend is read and spans are linked into the existing trace -
spec.python.env OTEL_EXPORTER_OTLP_PROTOCOL: http/protobuf— forces the Python SDK to use HTTP/protobuf rather than gRPC, matching the sidecar receiver on port 4318
-
-
Verify the CR was accepted:
oc get instrumentation my-instrumentation -n %OPENSHIFT_USERNAME%-observability-demoExpected outputNAME AGE my-instrumentation 10s
Verify
-
✓
my-instrumentationInstrumentation CR exists in%OPENSHIFT_USERNAME%-observability-demo -
✓
spec.exporter.endpointishttp://localhost:4318
What you learned: The Instrumentation CR is a per-namespace configuration template. Unlike the sidecar CR (which provides a container spec merged into application pods), the Instrumentation CR is referenced by pods via the instrumentation.opentelemetry.io/inject-<language> annotation — the Operator reads it and injects a language-specific agent init-container plus the necessary environment variables with no application changes required.
Exercise 6: Enable OpenTelemetry on the applications
Now you’ll activate the OpenTelemetry SDK in all three Go services. This adds the sidecar injection annotation, the SDK environment variables, and the correct ServiceAccount to each deployment without rebuilding container images.
Steps
-
Set your namespace as a shell variable:
NAMESPACE="%OPENSHIFT_USERNAME%-observability-demo" -
Add the OTEL environment variables to all three services:
for app in frontend backend; do oc set env deployment/${app} -n ${NAMESPACE} \ OTEL_ENABLED=true \ OTEL_SERVICE_NAME=${app} \ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \ OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf done oc set env statefulset/database -n ${NAMESPACE} \ OTEL_ENABLED=true \ OTEL_SERVICE_NAME=database \ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \ OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufExpected outputdeployment.apps/frontend updated deployment.apps/backend updated statefulset.apps/database updated -
Add the sidecar injection annotation to each pod template:
for app in frontend backend; do oc patch deployment/${app} -n ${NAMESPACE} \ --type=strategic \ -p='{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"sidecar"}}}}}' done oc patch statefulset/database -n ${NAMESPACE} \ --type=strategic \ -p='{"spec":{"template":{"metadata":{"annotations":{"sidecar.opentelemetry.io/inject":"sidecar"}}}}}'The annotation value
sidecarmust match the name of theOpenTelemetryCollectorCR you created in Exercise 4. When the operator sees this annotation on a pod being created, it injects the collector container spec from that CR.Expected outputdeployment.apps/frontend patched deployment.apps/backend patched statefulset.apps/database patched -
Set the ServiceAccount on all three services so the injected sidecar can call the Kubernetes API:
for app in frontend backend; do oc set serviceaccount deployment/${app} \ otel-collector-sidecar \ -n ${NAMESPACE} done oc set serviceaccount statefulset/database \ otel-collector-sidecar \ -n ${NAMESPACE}Expected outputdeployment.apps/frontend serviceaccount updated deployment.apps/backend serviceaccount updated statefulset.apps/database serviceaccount updated -
Monitor the rolling restart:
databaseis a StatefulSet with aReadWriteOncePVC. Its defaultRollingUpdatestrategy terminates the existing pod before starting the replacement, so the volume is cleanly released. Expect a few seconds of database unavailability during this step.oc rollout status deployment/frontend deployment/backend \ -n ${NAMESPACE} oc rollout status statefulset/database \ -n ${NAMESPACE}Expected outputdeployment "frontend" successfully rolled out deployment "backend" successfully rolled out statefulset rolling to 1 pods at revision database-xxxxx waiting for statefulset rolling update to complete 0 pods at revision database-xxxxx... statefulset rolling update complete 1 pods at revision database-xxxxx... -
Verify sidecar injection occurred:
oc get pods -n ${NAMESPACE} \ -o custom-columns='NAME:.metadata.name,CONTAINERS:.spec.containers[*].name'Expected outputNAME CONTAINERS frontend-xxxxx frontend, otc-container backend-xxxxx backend, otc-container database-0 database, otc-containerThe
otc-containeris the injected OpenTelemetry Collector sidecar. Each pod now has two containers: the application and its dedicated collector. -
Confirm the sidecar container is running and pipelines have started:
oc logs -n ${NAMESPACE} \ -l app=frontend -c otc-container | tail -10Expected output (excerpt)Everything is ready. Begin running and processing data. Pipeline started (traces). Pipeline started (metrics). Pipeline started (logs). -
Generate application traffic to produce telemetry:
FRONTEND_URL=$(oc get route frontend \ -n ${NAMESPACE} \ -o jsonpath='{.spec.host}') for i in $(seq 1 30); do curl -sk -o /dev/null "https://$FRONTEND_URL/" curl -sk -o /dev/null -X POST "https://$FRONTEND_URL/api/notes" \ -H "Content-Type: application/json" \ -d "{\"title\":\"Test $i\",\"content\":\"OTel test\"}" sleep 1 done echo "Traffic generation complete"This sends 30 iterations of mixed GET and POST requests, generating spans at each hop through frontend → backend → database.
Verify
-
✓ OTEL environment variables set on all three deployments
-
✓ Sidecar injection annotation present on all three pod templates
-
✓ All three deployments rolled out successfully
-
✓ Each pod has two containers: the application and
otc-container(sidecar) -
✓ Sidecar logs show all three pipelines started
-
✓ Traffic generated against the frontend route
What you learned: The OpenTelemetry Operator’s sidecar injection mechanism transforms a running pod by adding a pre-configured collector container, without rebuilding the image. The annotation sidecar.opentelemetry.io/inject: "sidecar" tells the operator to use the sidecar CR you created in Exercise 4 as the container specification.
Exercise 7: View and explore live traces
With traffic flowing and telemetry active, you can now view real traces from your application in the OpenShift console and use TraceQL to query them.
Steps
-
Navigate to Observe → Traces in the OpenShift console.
-
Set the search parameters:
-
Namespace:
%OPENSHIFT_USERNAME%-observability-demo -
Time range: Last 5 minutes
Click Search or the refresh icon.
The scatter plot should now show individual points, one per trace, with y-axis showing duration in milliseconds.
-
-
Review the scatter plot and trace list:
Each point in the scatter plot represents a single trace:
-
X-axis: trace start time
-
Y-axis: total trace duration (ms)
-
Bubble size: number of spans in the trace
Clusters of points at the top of the chart indicate slow traces. The trace list below shows:
-
Trace name: root span operation (for example,
frontend: GET /) -
Spans: total number of spans in the trace
-
Duration: end-to-end time
-
Start time: when the root span began
Click one of the higher points to open the trace detail view.
-
-
Explore the trace waterfall:
The waterfall shows a horizontal bar for each span, indented to reflect parent-child relationships. The
otelhttplibrary creates two spans per service boundary: a server span named after the matched route (POST /api/notes) and a client span named after the HTTP method (HTTP POST) for each outbound call:frontend: POST /api/notes [=====================================] 134ms frontend: HTTP POST [===================================] 130ms backend: POST /api/notes [========================] 100ms backend: HTTP POST [================] 56ms database: POST /notes [===============] 55ms backend: HTTP POST [========] 25msBar length represents duration. The server span for each service is the parent of that service’s outgoing client spans.
-
Click a span to expand its attributes:
The attributes on any span come from three distinct layers—the application itself, the Go OTel SDK semantic conventions, and the sidecar collector processors.
Frontend span (
frontend: POST /api/notes)Attribute Value / description http.methodGEThttp.status_code200http.routeMatched route pattern (for example,
/)baggage.client.platformweb— set by the frontend baggage middleware and propagated to every downstream span via the W3Cbaggageheaderbaggage.request.sourceworkshop-demo— identifies the request origin; visible on all three service spans in this traceBackend span (
backend: POST /api/notes)Attribute Value / description http.methodPOSThttp.status_code200db.systemchainsql— identifies the downstream data storedb.operationINSERT— derived from the HTTP method (POST → INSERT,GET → SELECT)db.sql.tablenotes— extracted from the request pathpeer.servicedatabase— the logical name of the service callednet.peer.namedatabase— DNS hostname of the database servicedb.response.status_codeHTTP status returned by the database service
baggage.client.platformweb— forwarded from the W3CbaggageHTTP headerbaggage.request.sourceworkshop-demo— forwarded from the W3CbaggageHTTP headerDatabase span (
database: POST /notes)Attribute Value / description db.systemchainsqldb.operationINSERTdb.sql.tablenotesnote.idUnique ID assigned to the created note record
note.titleTitle string from the request body
note.content.lengthCharacter length of the note content
event.idID of the audit event recorded alongside the note
event.sourceService name that triggered the event
event.http_statusHTTP status of the original request that created the event
Sidecar-added attributes (present on every span regardless of service):
-
Kubernetes attributes (
k8sattributesprocessor):k8s.pod.name,k8s.deployment.name,k8s.namespace.name -
Resource attributes (
resourcedetectionprocessor):cloud.platform,k8s.cluster.name
-
-
Observe W3C Baggage propagation:
Baggage is a key-value store carried inside the
baggageHTTP header alongsidetraceparent. The frontend sets two members before every outbound request:baggage: client.platform=web,request.source=workshop-demoEach downstream service reads this header and records every baggage member as a span attribute prefixed with
baggage.. This means the same logical context—where the request came from—is searchable as a span attribute on every service in the chain.All TraceQL queries below include
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" }to scope results to your namespace. The Tempo backend is shared across all workshop users, so this filter is required to see only your traces.Filter all traces that originated from the web frontend:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["baggage.client.platform"] = "web" }Or find every span produced by the workshop demo requests, regardless of service:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["baggage.request.source"] = "workshop-demo" } -
Use TraceQL to find slow database spans:
Click Show query beneath the filter bar to reveal the TraceQL editor. TraceQL is the Tempo query language, similar to PromQL for Prometheus or LogQL for Loki.
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && resource.service.name = "database" && duration > 50ms }This filters to only traces in your namespace where the database service had at least one span exceeding 50ms. Because random delays of up to 60ms are injected in the backend and database services, you should see several results.
Query by the specific table the slow operation touched:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["db.sql.table"] = "notes" && duration > 30ms }Or filter by database operation type:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["db.operation"] = "INSERT" && resource.service.name = "database" } -
Use TraceQL to surface business-level data:
The custom
note.andevent.attributes open the trace store as a queryable record of application events. Find all traces that created a note:{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["note.id"] != "" }Or find traces by peer service to understand the backend-to-database communication pattern:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["peer.service"] = "database" && span["db.sql.table"] = "notes" } -
Find error traces:
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && status = error }If any requests returned HTTP errors, this will surface the traces where something went wrong, with all spans visible for root cause analysis.
-
Correlate a trace with logs:
Note the Trace ID shown at the top of a trace detail view (a 32-character hex string).
Navigate to Observe → Logs.
Query for log lines containing that trace ID:
{kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |= "<your-trace-id>"Because the application uses
otelslog.NewHandler(), every structured log line emitted during a traced request carries the active trace ID as a log field. This lets you move directly from a slow span to the exact log lines emitted during that span.
Verify
-
✓ Traces appear in Observe → Traces for namespace
%OPENSHIFT_USERNAME%-observability-demo -
✓ Trace waterfall shows spans from all three services (frontend, backend, database)
-
✓ Frontend spans carry
baggage.client.platform=webandbaggage.request.source=workshop-demo -
✓ Backend spans carry
db.system,db.operation,db.sql.table,peer.service,db.response.status_code -
✓ Database spans carry
note.id,note.title,note.content.lengthon note-creation requests -
✓ All spans include Kubernetes metadata (
k8s.pod.name,k8s.deployment.name) -
✓ TraceQL query
{ .k8s.namespace.name = "%OPENSHIFT_USERNAME%-observability-demo" && span["baggage.request.source"] = "workshop-demo" }returns results -
✓ A trace ID from the trace view can be found in Observe → Logs
What you learned: The workshop application uses two complementary instrumentation strategies. The sidecar collector processors automatically enrich spans with Kubernetes metadata—no application code required. The Go OTel SDK adds semantic-convention attributes (db., peer., net.) for production-grade observability, business-level attributes (note., event.*) for application-specific queries, and W3C Baggage to propagate contextual metadata across all service boundaries. TraceQL lets you query across infrastructure, application, and business dimensions using a single query language.
Exercise 8: Explore the central collector pipeline
The central collector in observability-demo receives telemetry from all sidecar collectors and routes it to three backends. Inspect the configuration and verify each export path.
Steps
-
View the central collector configuration:
oc get opentelemetrycollector central-collector \ -n observability-demo \ -o jsonpath='{.spec.config}' | yq .Note the four export destinations:
-
otlp/tempo→tempo-tempo-distributor.openshift-tempo-operator.svc:4317— traces, TLS + bearer token auth +X-Scope-OrgID: devheader -
prometheusremotewrite→ COO Prometheus/api/v1/write— application metrics pushed from pods -
prometheusremotewrite(viatraces/spanmetricspipeline) → same endpoint — span-derived RED metrics -
otlphttp/logs→ LokiStack gateway atopenshift-logging— logs, OTLP/HTTP JSON + bearer token + service CA TLS
-
-
Inspect the spanmetrics connector:
oc get opentelemetrycollector central-collector \ -n observability-demo \ -o jsonpath='{.spec.config}' | yq '.connectors.spanmetrics'The
spanmetricsconnector acts as both an exporter (receiving spans from the traces pipeline) and a receiver (producing metric records for thetraces/spanmetricspipeline). It generates latency histograms and call-count counters for everyservice.name+span.namecombination automatically. -
Query the generated span metrics in Prometheus:
Navigate to Observe → Metrics and enter:
sum(rate(traces_spanmetrics_calls_total{service_name="frontend"}[5m])) by (span_name)This shows the request rate per operation for the frontend service—derived solely from traces, with no Prometheus client library code in the application.
To see the p95 latency for backend operations:
histogram_quantile(0.95, sum(rate(traces_span_metrics_duration_milliseconds_bucket{k8s_namespace_name="%OPENSHIFT_USERNAME%-observability-demo", k8s_deployment_name="backend"}[5m])) by (span_name, le) ) -
Understand the logs pipeline:
The central collector runs a dedicated logs pipeline that maps OTEL resource attributes to the label keys expected by the LokiStack
openshift-loggingmulti-tenancy gateway:processors: resource/logs: (1) attributes: - {key: kubernetes.namespace_name, from_attribute: k8s.namespace.name, action: upsert} - {key: kubernetes.pod_name, from_attribute: k8s.pod.name, action: upsert} - {key: kubernetes.container_name, from_attribute: k8s.container.name, action: upsert} - {key: log_type, value: application, action: upsert} transform/logs: (2) log_statements: - context: log statements: - set(attributes["level"], ConvertCase(severity_text, "lower")) exporters: otlphttp/logs: (3) endpoint: https://logging-loki-gateway-http.openshift-logging.svc.cluster.local:8080/api/logs/v1/application/otlp encoding: json tls: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt auth: authenticator: bearertokenauth (4)1 LokiStack in openshift-loggingtenancy mode requireskubernetes.namespace_name(notk8s.namespace.name) for tenant routing.2 Derives a levelfield from OTELseverity_text(lowercased) for log filtering in the Loki UI.3 Sends log records as OTLP/HTTP JSON to the LokiStack gateway’s application tenant endpoint. 4 Uses the pod’s service account token for bearer auth—the central collector SA has loki.grafana.com/applicationcreate rights via ClusterRoleBinding.Verify a log record arrived in Loki by navigating to Observe → Logs in the OpenShift console and querying:
{kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} | jsonYou should see structured log records from
frontend,backend, anddatabase, each containingtraceIDandspanIDfields that link them to the traces you viewed in Exercise 7.
Verify
-
✓ Central collector configuration shows
otlp/tempo,prometheusremotewrite, andotlphttp/logsexporters -
✓
spanmetricsconnector configuration is visible -
✓
traces_spanmetrics_calls_totalmetric exists in Prometheus for your services -
✓
traces_spanmetrics_latency_buckethistogram is queryable for p95 latency -
✓ Observe → Logs shows structured log records from
%OPENSHIFT_USERNAME%-observability-demowithtraceIDfields
What you learned: The central collector is a fanout hub—one OTLP receiver, four exporters. Each signal type is independently processed and routed: traces go to Tempo (with TLS and multi-tenancy headers), metrics are remote-written to COO Prometheus, span-derived RED metrics follow the same path via the spanmetrics connector, and logs are attribute-remapped and forwarded to the LokiStack application tenant using service-account bearer auth.
Exercise 9: Zero-code Python auto-instrumentation
The workshop application includes a fourth service: notifier, a Python/FastAPI microservice. The backend calls notifier after every note create, update, or delete operation. The notifier records the event in the database service.
Open src/notifier/app.py and notice what is absent: there are no OpenTelemetry imports of any kind. The file contains only FastAPI route handlers and an httpx call. Yet by the end of this exercise, full traces—including spans for every notifier HTTP call—will appear in the trace waterfall.
Open the Source Code tab in the workshop application and select notifier/app.py to compare it with the Go services.
|
This demonstrates the difference between the manual SDK approach used by the Go services and the zero-code auto-instrumentation provided by the OpenTelemetry Operator.
| Service | Language | Instrumentation method |
|---|---|---|
frontend |
Go |
Manual SDK ( |
backend |
Go |
Manual SDK ( |
database |
Go |
Manual SDK ( |
notifier |
Python |
Zero-code: OTel Operator init-container injection |
Step 1: Observe the missing notifier span
Before enabling auto-instrumentation, generate traffic and look at a trace waterfall.
-
Generate several note-creation requests:
FRONTEND_URL=$(oc get route frontend \ -n %OPENSHIFT_USERNAME%-observability-demo \ -o jsonpath='{.spec.host}') for i in $(seq 1 10); do curl -sk -o /dev/null -X POST "https://$FRONTEND_URL/api/notes" \ -H "Content-Type: application/json" \ -d "{\"title\":\"Auto-instrumentation test $i\",\"content\":\"Exercise 9\"}" sleep 1 done -
Open Observe → Traces in the OpenShift console, set namespace to
%OPENSHIFT_USERNAME%-observability-demo, and click a recent trace.The waterfall will show three hops:
frontend → backend → database. The backend called notifier, but no notifier span appears because the Python process emits no telemetry without an agent.
Step 2: Annotate the notifier deployment
A single oc patch command adds the two annotations that trigger both sidecar injection (same as the Go services) and Python agent injection.
-
Patch the notifier deployment:
oc patch deployment notifier \ -n %OPENSHIFT_USERNAME%-observability-demo \ --type=json \ -p='[ { "op": "add", "path": "/spec/template/metadata/annotations", "value": { "sidecar.opentelemetry.io/inject": "sidecar", "instrumentation.opentelemetry.io/inject-python": "my-instrumentation" } }, { "op": "add", "path": "/spec/template/spec/serviceAccountName", "value": "otel-collector-sidecar" } ]'The annotation value
my-instrumentationreferences the Instrumentation CR you created in Exercise 5 in the same namespace.When this annotated pod is scheduled, the OpenTelemetry Operator’s mutating admission webhook injects an init container that downloads
opentelemetry-distroandopentelemetry-instrumentation-fastapiinto a shared volume. The Python process picks them up viaPYTHONPATHand a configurator hook—no code changes or image rebuilds required. -
Watch the rollout:
oc rollout status deployment/notifier -n %OPENSHIFT_USERNAME%-observability-demo -
Confirm the pod now has two containers (application + sidecar):
oc get pods -n %OPENSHIFT_USERNAME%-observability-demo \ -l app=notifier \ -o custom-columns='NAME:.metadata.name,CONTAINERS:.spec.containers[*].name'Expected outputNAME CONTAINERS notifier-xxxxx notifier, otc-container -
Verify the Python SDK is active by checking notifier logs:
oc logs -n %OPENSHIFT_USERNAME%-observability-demo \ -l app=notifier -c notifier | head -20Look for OpenTelemetry bootstrap messages such as
Instrumenting FastAPIorOpenTelemetry SDK configured.
Step 3: Generate traffic and observe the four-hop trace
-
Send another batch of note-creation requests:
FRONTEND_URL=$(oc get route frontend \ -n %OPENSHIFT_USERNAME%-observability-demo \ -o jsonpath='{.spec.host}') for i in $(seq 1 15); do curl -sk -o /dev/null -X POST "https://$FRONTEND_URL/api/notes" \ -H "Content-Type: application/json" \ -d "{\"title\":\"Traced note $i\",\"content\":\"With notifier\"}" sleep 1 done echo "Done" -
Return to Observe → Traces in the console. A note-creation trace will now show a fourth hop. The
backend: HTTP POSTclient span that previously had no child now has a notifier server span beneath it:frontend: POST /api/notes [=====================================] 134ms frontend: HTTP POST [===================================] 130ms backend: POST /api/notes [========================] 100ms backend: HTTP POST [================] 56ms database: POST /notes [===============] 55ms backend: HTTP POST [========] 25ms notifier: POST /notify [======] 20ms notifier: HTTP POST [===] 10msThe
notifierspans are produced entirely byopentelemetry-instrumentation-fastapiandopentelemetry-instrumentation-httpx—no code was changed inapp.py. -
Click on the
notifier: POST /notifyspan and inspect its attributes:-
http.method,http.status_code,http.route— added by FastAPI auto-instrumentation -
k8s.pod.name,k8s.deployment.name— added by the sidecark8sattributesprocessor -
service.name: notifier— set via theOTEL_SERVICE_NAMEenv var injected by the operator
-
-
Notice that the
traceparentcontext passed from backend to notifier is preserved correctly: the notifier root span appears as a child of the backend span, maintaining the single unified trace tree.
Compare: manual SDK vs auto-instrumentation
| Characteristic | Go (manual SDK) | Python (auto-instrumentation) |
|---|---|---|
Code change required |
Yes ( |
No |
Image rebuild required |
Yes |
No |
Activation mechanism |
|
Pod annotation |
Span granularity |
Full control (custom spans) |
Framework-level (HTTP in/out) |
Custom attributes |
Full control — |
Limited without code changes |
W3C Baggage |
Yes — frontend injects, all services propagate |
Yes — read from |
Best for |
Apps with source access |
Apps without source access or rapid onboarding |
Verify
-
✓ Notifier pod has two running containers:
notifierandotc-container -
✓ Notifier logs show OpenTelemetry SDK bootstrap messages
-
✓ Traces for note creation show four service hops:
frontend → backend → notifier → database -
✓ Notifier spans are children of the backend span (W3C Trace Context propagation works)
-
✓
k8s.pod.nameandk8s.deployment.nameattributes are present on notifier spans -
✓
app.pywas not modified at any point in this exercise
What you learned: The Instrumentation CR is a namespace-level template. A single annotation—instrumentation.opentelemetry.io/inject-python—causes the OpenTelemetry Operator’s admission webhook to inject an init-container that downloads and configures the Python agent at pod start. No source changes, no image rebuilds, no SDK imports. The W3C Trace Context standard ensures the notifier’s spans slot directly into the existing trace tree built by the Go services.
Learning outcomes
By completing this module, you should now understand:
-
✓ A trace is the complete journey of a request; a span is one operation within that trace
-
✓ Context propagation via W3C
traceparentheaders links spans across service boundaries automatically -
✓ Tempo stores traces on object storage with distinct distributor (write) and query-frontend (read) components
-
✓ The sidecar-to-central collector pattern decouples application-facing collection from backend-facing export
-
✓ A sidecar OpenTelemetryCollector in
sidecarmode injects a container into annotated pods without any image changes -
✓ The
k8sattributesandresourcedetectionprocessors automatically enrich telemetry with Kubernetes and infrastructure metadata -
✓ TraceQL filters traces by service name, operation, duration, status, and any span attribute
-
✓ The
spanmetricsconnector in the central collector generates RED metrics from traces, eliminating the need for a Prometheus client library -
✓ Auto-instrumentation via the
InstrumentationCR enables zero-code telemetry for Python applications -
✓ A Python/FastAPI service can produce full traces—including W3C context propagation—with zero source-code changes
Business impact: You now have a complete three-signal observability pipeline for all four microservices. A single request to your application automatically produces:
-
A distributed trace showing the exact call path and per-service timing
-
RED metrics (rate, errors, duration) queryable in Prometheus—without any metrics code in the application
-
Structured log records correlated to the trace via trace ID—without any manual log attribute configuration
Module summary
You activated the full distributed tracing and OpenTelemetry pipeline for the workshop application—from infrastructure verification through to live trace visualization, TraceQL queries, and zero-code Python auto-instrumentation.
What you accomplished:
-
Verified the Tempo distributed tracing backend and understood its component architecture
-
Learned the two-tier sidecar-to-central OpenTelemetry Collector topology
-
Explored the OpenTelemetry SDK instrumentation already built into the Go services
-
Verified the OpenTelemetry Operator and the pre-deployed central collector in
observability-demo -
Created a sidecar
OpenTelemetryCollectorCR—carries all three signals (traces, metrics, logs) to the central collector over a single gRPC connection -
Created an
InstrumentationCR—used as the agent injection template for Python auto-instrumentation -
Enabled the Go SDK and triggered sidecar injection on the three Go deployments
-
Analyzed live traces in Observe → Traces, identified span-level bottlenecks, and correlated spans with log records using TraceQL
-
Explored the central collector’s full routing: traces → Tempo, metrics + RED metrics → COO Prometheus remote write, logs → LokiStack application tenant
-
Enabled zero-code Python auto-instrumentation on the notifier service using a single pod annotation
-
Observed a four-hop trace (
frontend → backend → notifier → database) with full context propagation across Go and Python services
Key concepts mastered:
-
Trace and span: A trace is a tree of spans; each span represents one service operation with timing and attributes
-
Context propagation:
traceparentHTTP headers link spans across service boundaries using the W3C standard -
TempoStack: Operator-managed, with distributor, ingester, querier, query-frontend, and compactor components
-
Sidecar mode: The operator injects the collector as a second container into annotated pods
-
k8sattributes and resourcedetection: Processors that attach Kubernetes context metadata to all telemetry
-
Two-tier architecture: Sidecar handles application-facing collection; central collector handles backend-facing export and routing
-
spanmetrics connector: Generates Prometheus-queryable RED metrics automatically from trace spans
-
Auto-instrumentation:
InstrumentationCR + pod annotation enables agent injection for Python without code changes -
Cross-language tracing: W3C Trace Context headers propagate correctly between Go (
otelhttp) and Python (opentelemetry-instrumentation-httpx), forming a single unified trace tree
Continue to the conclusion to review key takeaways and next steps.