Module 2: Logging with LokiStack

After implementing user workload monitoring in Module 1, you now need to address log management challenges. While metrics show when issues occur, investigating root causes requires efficient log analysis across distributed services.

Your organization runs a distributed application with 12+ microservices. When issues occur, developers currently access individual pod logs manually. This process is time-consuming and error-prone, especially when correlating events across multiple services.

In this module, you’ll implement centralized logging with LokiStack and learn to write LogQL queries that quickly surface relevant log entries across your entire application.

Learning objectives

By the end of this module, you’ll be able to:

  • Understand LokiStack architecture and how it differs from traditional logging solutions

  • Configure ClusterLogForwarder to send application logs to Loki

  • Write LogQL queries to search and filter logs across multiple services

  • Correlate logs with metrics to accelerate troubleshooting

  • Create log-based alerts for critical application events

Understanding LokiStack architecture

Before configuring logging, you need to understand how LokiStack works and why it’s efficient for Kubernetes environments.

What is Loki?

Loki is a horizontally scalable, highly available log aggregation system inspired by Prometheus. Unlike traditional logging systems that index the full text of log messages, Loki only indexes metadata (labels) and stores log content separately.

Key differences from traditional logging:

  • Traditional (Elasticsearch): Indexes every word in every log message → high storage cost, slower writes

  • Loki: Indexes only labels (namespace, pod, container) → lower storage cost, faster writes, efficient for Kubernetes

LokiStack components:

  • Distributor: Receives log streams from clients

  • Ingester: Writes logs to storage and serves recent queries

  • Querier: Handles LogQL queries against stored logs

  • Compactor: Maintains index and deletes expired logs

Log forwarding flow

In OpenShift, logs flow through this pipeline:

  1. Application writes logs to stdout/stderr

  2. Container runtime captures logs to node filesystem

  3. OpenShift Logging Operator’s collector pods read logs

  4. ClusterLogForwarder routes logs to LokiStack

  5. Loki stores logs with labels for querying

You’ll configure the ClusterLogForwarder to ensure application logs reach Loki.

Exercise 1: Verify logging infrastructure

You need to verify that the logging stack was deployed via GitOps and is ready to receive logs.

The LokiStack instance and logging operator were pre-configured, so you’ll confirm the components are running.

Steps

  1. Log into the OpenShift console at OpenShift Console (if not already logged in)

  2. Verify the logging operator is running:

    oc get pods -n openshift-logging
    Expected output
    NAME                                           READY   STATUS    RESTARTS   AGE
    cluster-logging-operator-xxxxx                 1/1     Running   0          1h
    collector-xxxxx                                2/2     Running   0          1h
    collector-xxxxx                                2/2     Running   0          1h
    logging-loki-compactor-0                       1/1     Running   0          1h
    logging-loki-distributor-xxxxx                 1/1     Running   0          1h
    logging-loki-gateway-xxxxx                     2/2     Running   0          1h
    logging-loki-index-gateway-0                   1/1     Running   0          1h
    logging-loki-ingester-0                        1/1     Running   0          1h
    logging-loki-querier-xxxxx                     1/1     Running   0          1h
    logging-loki-query-frontend-xxxxx              1/1     Running   0          1h
  3. Check the LokiStack instance:

    oc get lokistack -n openshift-logging logging-loki -o custom-columns=Name:metadata.name,Size:.spec.size
    Expected output
    Name           Size
    logging-loki   1x.extra-small

    The status should be Ready.

  4. Verify the ClusterLogForwarder configuration:

    oc get clusterlogforwarder -n openshift-logging instance -o custom-columns=Name:metadata.name,Outputs:.spec.outputs[0].name
    Expected output
    Name       Outputs
    instance   lokistack
  5. View the ClusterLogForwarder configuration:

    oc get clusterlogforwarder -n openshift-logging instance -o json| jq .spec

    Look for pipelines that forward application and infrastructure logs to the default output (LokiStack).

Verify

Check that your logging infrastructure is operational:

  • ✓ Logging operator pod is Running

  • ✓ Collector pods are Running on each node

  • ✓ LokiStack components (distributor, ingester, querier) are Running

  • ✓ LokiStack status is Ready

  • ✓ ClusterLogForwarder exists with pipelines configured

What you learned: OpenShift Logging uses a collector (based on Vector) to gather logs from all pods and forward them to LokiStack according to ClusterLogForwarder rules.

Troubleshooting

Issue: LokiStack status shows "Degraded" or "Pending"

Solution: . Check LokiStack pods for errors: oc get pods -n openshift-logging | grep loki . Describe the LokiStack: oc describe lokistack logging-loki -n openshift-logging . If reason is MissingObjectStorageSecret, check sync Job status:

+

oc get job logging-loki-s3-secret-sync -n openshift-logging
oc logs job/logging-loki-s3-secret-sync -n openshift-logging
  1. Check for storage issues: oc get pvc -n openshift-logging

  2. Verify storage class exists: oc get storageclass

Issue: No collector pods running

Solution: . Check DaemonSet status: oc get daemonset -n openshift-logging . View collector logs: oc logs -n openshift-logging -l app.kubernetes.io/component=collector --tail=100 . Verify ClusterLogForwarder resource exists: oc get clusterlogforwarder -n openshift-logging

Issue: ClusterLogForwarder not found

Solution: The logging stack may not be fully configured. Check that the GitOps deployment completed successfully.

Exercise 2: Generate application logs

To practice querying logs, you need an application that generates diverse log entries. In this exercise, you’ll deploy a simple three-service Go application:

  • frontend receives HTTP requests and forwards them to backend

  • backend writes request events to database

  • database exposes a REST API backed by persistent ChaiSQL storage on disk

Each service writes structured request logs to stdout, which Loki collects.

The manifest also deploys a k6 load generator that starts automatically once the frontend is healthy. It runs continuously in the background, sending ~10–60 req/s of realistic traffic (pings, note reads/writes, and intentional 404 errors) so your dashboards and queries always have data to work with — no manual traffic generation required.

Steps

  1. Deploy the logging demo application manifest from GitHub:

    NAMESPACE="%OPENSHIFT_USERNAME%-observability-demo"
    curl -sL https://raw.githubusercontent.com/cldmnky/observability-workshop/main/src/deploy.yaml \
      | sed "s/__NAMESPACE__/${NAMESPACE}/g" \
      | oc apply -f -
  2. Wait for the deployments to roll out:

    oc rollout status deployment/frontend -n %OPENSHIFT_USERNAME%-observability-demo
    oc rollout status deployment/backend -n %OPENSHIFT_USERNAME%-observability-demo
    oc rollout status statefulset/database -n %OPENSHIFT_USERNAME%-observability-demo
    oc rollout status deployment/loadgenerator -n %OPENSHIFT_USERNAME%-observability-demo

    The application images are pre-built and hosted on quay.io/cldmnky/observability/. Pods should start within a minute or two. The load generator uses an init container to wait for the frontend to be healthy before it starts sending traffic.

  3. Verify all deployments are running, including the load generator:

    oc get pods -n %OPENSHIFT_USERNAME%-observability-demo -l app=frontend
    oc get pods -n %OPENSHIFT_USERNAME%-observability-demo -l app=backend
    oc get pods -n %OPENSHIFT_USERNAME%-observability-demo -l app=database
    oc get pods -n %OPENSHIFT_USERNAME%-observability-demo -l app=loadgenerator
  4. Verify the frontend Route and open the UI:

    oc get route frontend -n %OPENSHIFT_USERNAME%-observability-demo
    FRONTEND_HOST=$(oc get route frontend -n %OPENSHIFT_USERNAME%-observability-demo -o jsonpath='{.spec.host}')
    echo "https://${FRONTEND_HOST}"
  5. Open the frontend URL in your browser, create at least two workshop notes, and click Export Markdown.

    Demo application UI

    The exported file (workshop-notes.md) confirms notes are persisted and downloadable from the frontend.

  6. Confirm the load generator is producing traffic by checking its logs:

    oc logs -n %OPENSHIFT_USERNAME%-observability-demo deployment/loadgenerator --tail=20

    You should see k6 output showing iteration counts and request results. The load generator runs ~10–60 req/s continuously, mixing successful requests with intentional 404 errors — so logs, metrics, and traces are populated immediately.

  7. View logs from each tier to see request traces:

    oc logs -n %OPENSHIFT_USERNAME%-observability-demo deployment/frontend --tail=20
    oc logs -n %OPENSHIFT_USERNAME%-observability-demo deployment/backend --tail=20
    oc logs -n %OPENSHIFT_USERNAME%-observability-demo statefulset/database --tail=20
    Sample output
    2026/02/11 10:23:14 service=frontend method=POST path=/api/notes status=201 duration_ms=5
    2026/02/11 10:23:15 service=frontend method=GET path=/ping status=200 duration_ms=3
    2026/02/11 10:23:16 service=frontend method=GET path=/error status=404 duration_ms=4
    2026/02/11 10:23:14 service=backend method=POST path=/api/notes status=201 duration_ms=3
    2026/02/11 10:23:16 service=backend method=GET path=/api/error status=404 duration_ms=2
    2026/02/11 10:23:14 service=database method=POST path=/notes status=201 duration_ms=1
    2026/02/11 10:23:16 service=database method=POST path=/events status=201 duration_ms=1

    Open the console and navigate to Workloads → Pods → Aggregated Logs to see logs from all pods in the namespace.

    Aggregated logs view in OpenShift Console

Verify

Check that logs are being generated:

  • frontend logs include /ping (200) and /not-found (404) from the load generator

  • backend logs show forwarded note create and read requests

  • database logs show persisted /events API calls

  • loadgenerator pod is running and its logs show k6 iteration output

  • ✓ Frontend Route is reachable and serves the notes UI

  • ✓ Notes can be exported from the frontend as workshop-notes.md

  • ✓ Logs visible via oc logs command

  • ✓ All five pods are running in the %OPENSHIFT_USERNAME%-observability-demo namespace

What you learned: Applications write logs to stdout/stderr, which OpenShift captures and forwards to your logging backend. Multiple services can run in the same namespace.

Exercise 3: Query logs with LogQL

Now you’ll access the console’s Observe/Logs web interface and write LogQL queries to search logs across all your applications.

LogQL is Loki’s query language, similar to PromQL but for logs instead of metrics.

Steps

  1. Access the OpenShift console and navigate to ObserveLogs

    This opens the log query interface powered by Loki.

The Logs interface will display a warning saying "Forbidden: Missing permissions to get logs". This is expected and can be ignored, as the query interface will still work for Loki queries.

Click on the "show query" button to open the LogQL query editor

LogQL query editor in OpenShift Console
  1. Write your first LogQL query to see all logs from your namespace:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |json

    Click Run query

    You should see log entries from all pods in the %OPENSHIFT_USERNAME%-observability-demo namespace.

    LogQL query results showing logs from all pods
  2. Filter logs to show only error responses (404):

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |= "404" |json

    The |= operator filters for log lines containing "404".

  3. Query logs from a specific pod:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo", kubernetes_pod_name=~"frontend.*"} |json

    The =~ operator matches a regular expression.

  4. Query frontend note activity generated from the UI:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo", kubernetes_pod_name=~"frontend.*"} |= "/api/notes" |json

This shows note creation and note update calls from the frontend route.

  1. Combine multiple filters to find successful frontend requests:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo", kubernetes_pod_name=~"frontend.*"} |= "path=/ping" |= "status=200" |json

    This shows successful /ping requests from the frontend pod.

  2. Use LogQL to count error rates:

    sum(rate({kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |= "404" [5m]))

    This calculates the rate of 404 errors per second over the last 5 minutes.

  3. Query logs from multiple applications:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |json | line_format "{{.kubernetes_pod_name}}: {{.message}}"

    The |json stage parses the collector’s JSON envelope, extracting labeled fields such as message, kubernetes_pod_name, and kubernetes_namespace_name. line_format then templates a new display line from those fields.

  4. Use | unwrap to compute average response latency directly from log fields:

    LogQL unwrapping enables metric-style aggregations over numeric values embedded in log lines. The pipeline follows three stages:

    1. Parse|json, | logfmt, or | regexp extracts named labels from the log line

    2. Unwrap| unwrap <field> promotes a label to a numeric sample value; optionally use duration() or bytes() for unit conversion

    3. Aggregate — range-vector functions avg_over_time, sum_over_time, max_over_time, min_over_time, quantile_over_time operate on the unwrapped samples

    This is valuable when a key signal (latency, error codes, payload size) lives only in log lines and has not yet been instrumented as a Prometheus metric.

    avg_over_time(
      {
        kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo",
        kubernetes_pod_name=~"frontend.*"
      }
      |json
      | regexp `duration_ms=(?P<duration_ms>\d+)`
      | unwrap duration_ms [5m]
    )

    |json parses the JSON envelope; | regexp extracts duration_ms from the message field as a new label; | unwrap duration_ms promotes it to a numeric sample value; avg_over_time computes the average frontend response time in milliseconds over the last 5 minutes — without requiring a separate Prometheus metric.

Verify

Check that you can query logs effectively:

  • ✓ Logs from %OPENSHIFT_USERNAME%-observability-demo namespace are visible

  • ✓ Filter queries return only matching log lines

  • ✓ Pod-specific queries work with regex patterns

  • ✓ Rate calculations show error frequency

  • ✓ Logs from multiple applications can be queried together

  • | unwrap computes numeric aggregations over values embedded in log lines

What you learned: LogQL uses labels (like namespace, pod name) to select log streams, then filters content with operators like |= (contains) and != (does not contain). Parsing stages (|json, | regexp) extract structured fields, and | unwrap unlocks metric-style aggregations over those fields.

Troubleshooting

Issue: No logs appear in query results

Solution: . Wait 1-2 minutes for logs to be ingested by Loki . Verify pods are running: oc get pods -n %OPENSHIFT_USERNAME%-observability-demo . Check collector is forwarding logs: oc logs -n openshift-logging -l app.kubernetes.io/component=collector --tail=100 . Verify time range in query interface (expand to last 1 hour)

Issue: Query returns "Error: parse error"

Solution: . Check LogQL syntax (labels in braces, filters with pipes) . Ensure label names are correct: kubernetes_namespace_name not namespace . Use quotes around label values: {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"}

Issue: Rate queries return no data

Solution: . Ensure enough matching logs exist (generate more traffic) . Check time range: [5m] requires data in last 5 minutes . Verify filter matches log content: |= "404" requires "404" in log lines

Exercise 4: Correlate logs with metrics

The real power of observability comes from correlating different signals. You’ll use metrics from Module 1 alongside logs to diagnose an issue faster.

Steps

  1. Generate a burst of errors to simulate an incident:

    oc run -n %OPENSHIFT_USERNAME%-observability-demo error-generator --rm -i --restart=Never --image=quay.io/curl/curl:8.11.1 -- \
      sh -c 'for i in $(seq 1 100); do curl -s -o /dev/null http://frontend:8080/error; sleep 0.1; done'
  2. Check metrics to detect the traffic spike:

    Navigate to ObserveMetrics

    Query:

    sum by (service) (rate(http_requests_total{code="404", namespace="%OPENSHIFT_USERNAME%-observability-demo}[5m]))

    You should see a traffic spike while the error generator is running.

  3. Switch to ObserveLogs and correlate with log data:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |= "404" |= "error" | json

    This shows the actual error log messages during the spike.

  4. Use time range alignment to see the correlation:

    In the logs interface, set the time range to match when you saw the metric spike (last 15 minutes).

    Compare the timestamp of log entries with the metric spike time.

  5. Extract structured data from logs:

    {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"}
      |= "404"
      | regexp `(?P<timestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}).*method=(?P<method>\w+).*status=(?P<status>\d{3})`
      | line_format "{{.timestamp}} - {{.method}} {{.status}}"

    This extracts timestamp, HTTP method, and status code from log lines.

OpenShift also provides out-of-the-box correlation between metrics and logs in the console.

Navigate to Workloads → Deployments → In the top right, click the "dotted kube" icon → "Signal Correlation"

Signal correlation view in OpenShift Console

Verify

Check that you can correlate metrics and logs:

  • ✓ Metric query shows frontend traffic spike

  • ✓ Log query reveals error messages during the same time period

  • ✓ Time ranges align between metrics and logs views

  • ✓ Structured data can be extracted from log lines

  • ✓ Correlation reduces investigation time from minutes to seconds

What you learned: Metrics tell you when and where problems occur. Logs provide the detailed context explaining why. Together, they reduce mean time to resolution through unified analysis.

Troubleshooting

Issue: Metric and log timestamps don’t align

Solution: . Check timezone settings in console (both views should use same timezone) . Verify system clocks are synchronized across cluster nodes . Allow 30-60 seconds for metric scraping and log ingestion delays

Issue: Cannot extract structured data with regexp

Solution: . Verify log format matches regex pattern . View raw logs first: {kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} . Test regex pattern against actual log samples . Use simpler filters first, then add complexity

Exercise 5: Log-based alerting

Just as Prometheus evaluates PromQL expressions to generate metric-based alerts, LokiStack includes a built-in ruler component that evaluates LogQL expressions on a schedule and fires alerts when thresholds are exceeded. Those alerts are then forwarded to Alertmanager, which handles routing, grouping, silencing, and notification — exactly the same pipeline used by metric alerts.

How it works

The Loki ruler evaluates AlertingRule resources (a CRD provided by the Loki Operator). When an expression matches, the ruler sends the alert to Alertmanager, which routes it according to its normal configuration (email, PagerDuty, Slack receivers, etc.).

For the application tenant, AlertingRule resources must be created in the application namespace and that namespace must carry the label openshift.io/loki-rules: "true". The workshop pre-provisions this label on your demo namespace.

A complete AlertingRule that fires when the 404 error rate exceeds 0.5 per second for 2 minutes looks like this:

apiVersion: loki.grafana.com/v1
kind: AlertingRule
metadata:
  name: high-error-rate
  namespace: %OPENSHIFT_USERNAME%-observability-demo
  labels:
    openshift.io/loki-rules: "true"
spec:
  tenantID: application
  groups:
  - name: application-errors
    interval: 30s
    rules:
    - alert: HighApplicationErrorRate
      expr: |
        sum(rate({kubernetes_namespace_name="%OPENSHIFT_USERNAME%-observability-demo"} |= "404" [5m])) > 0.5
      for: 2m
      labels:
        severity: warning
        team: platform
      annotations:
        summary: "High error rate in %OPENSHIFT_USERNAME%-observability-demo"
        description: "More than 0.5 errors/second for 2 minutes in namespace %OPENSHIFT_USERNAME%-observability-demo."

The key fields map directly to Prometheus alerting semantics:

Field Meaning

expr

LogQL expression evaluated by the Loki ruler. A non-zero result means the alert is active.

for

The expression must stay true for this duration before the alert transitions from Pending to Firing.

labels

Key/value pairs attached to the alert — used by Alertmanager for routing and silencing.

annotations

Human-readable summary and description shown in the console and notifications.

Alert lifecycle

Once created, the alert follows the same lifecycle as any Prometheus alert:

  1. Inactive — LogQL expression evaluates to zero (no 404s above threshold)

  2. Pending — Expression is true but for duration has not elapsed yet

  3. Firing — Expression has been true for longer than for; Alertmanager receives the alert and routes it

Navigate to ObserveAlertingAlerting rules in the OpenShift console to see log-based alerts alongside metric-based ones.

What you learned: LokiStack’s ruler component evaluates LogQL expressions and fires alerts through the same Alertmanager pipeline used by Prometheus. This means log patterns — error bursts, missing heartbeats, security events — can trigger the same notifications and on-call workflows as metric alerts, with no additional tooling required.

Learning outcomes

By completing this module, you should now understand:

  • ✓ LokiStack’s label-based indexing approach reduces storage costs compared to full-text indexing

  • ✓ ClusterLogForwarder routes application logs from pods to Loki for centralized storage

  • ✓ LogQL queries use labels to select streams and filters to search log content

  • ✓ Correlating logs with metrics reduces troubleshooting time through unified analysis

  • ✓ LokiStack’s ruler evaluates LogQL expressions and forwards alerts to Alertmanager, using the same routing and notification pipeline as metric-based alerts

Business impact:

You’ve eliminated the manual log searching process. Instead of accessing individual pods and searching files manually, you now have:

  • Centralized logs from all 12+ microservices in 1 queryable system

  • Ability to search across all services simultaneously with LogQL

  • Correlation between metric spikes and log messages

  • Automated alerts on critical log patterns

Estimated time savings: Reduced log investigation time from 30-60 minutes to 2-5 minutes per incident.

Next steps: Module 3 will add distributed tracing, enabling you to follow individual requests across all microservices to identify performance bottlenecks.

Module summary

You successfully implemented centralized logging with LokiStack.

What you accomplished:

  • Verified the LokiStack logging infrastructure deployed via GitOps

  • Generated diverse application logs across multiple services

  • Wrote LogQL queries to filter and search logs across distributed applications

  • Correlated metric spikes with detailed log messages for faster root cause analysis

  • Learned how LokiStack’s ruler and Alertmanager work together for log-based alerting

Key concepts mastered:

  • LokiStack: Label-based log aggregation system optimized for Kubernetes

  • ClusterLogForwarder: Routes logs from pods to configured outputs (Loki)

  • LogQL: Query language for searching and filtering log streams

  • Log correlation: Combining logs with metrics for comprehensive troubleshooting

  • Log-based alerting: AlertingRule CRs evaluated by the Loki ruler feed alerts into Alertmanager for routing, silencing, and notification

LogQL operators learned:

  • {label="value"}: Select log streams by label

  • |=: Filter for lines containing text

  • !=: Exclude lines containing text

  • |~: Regex match filter

  • rate(): Calculate rate of log entries

  • regexp: Extract structured data from logs

  • | unwrap <field>: Promote a parsed label to a numeric sample for metric aggregations

  • avg_over_time / sum_over_time / max_over_time: Aggregate unwrapped numeric values over a time window

Continue to Module 3 to add distributed tracing capabilities.