Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.allthingslinux.org/llms.txt

Use this file to discover all available pages before exploring further.

This page covers how to verify service health, inspect logs, collect metrics with Prometheus, and set up alerting for the atl.chat stack.

Health checks

Every Compose service defines a Docker health check. You can view overall stack health at a glance:
# Show status of all containers including health
docker compose ps
Check a specific container’s health status:
docker inspect --format='{{.State.Health.Status}}' <container-name>

Per-service health check commands

The table below lists the exact health check command defined in each service’s Compose file. These run automatically at 30-second intervals.
ContainerHealth check commandWhat it verifies
atl-irc-serverJSON-RPC query via Unix socket (nc -U /home/unrealircd/unrealircd/data/rpc.socket)UnrealIRCd is running and the RPC interface responds
atl-irc-servicespgrep -f atheme-servicesAtheme process is alive
atl-xmpp-servercurl -sf http://localhost:5280/statusProsody HTTP server responds on port 5280
atl-xmpp-nginxwget -q -O - --no-check-certificate https://localhost:5281/healthNginx HTTPS proxy for Prosody responds
atl-bridgepgrep -f bridge.__main__Bridge Python process is alive
atl-thelounge(none defined)No built-in health check — verify manually (see below)
atl-irc-webpanel(none defined)No built-in health check — verify manually (see below)

Manual health verification

For services without a Compose health check, or for deeper verification:
# UnrealIRCd — test TLS connection
openssl s_client -connect localhost:6697 -servername irc.localhost </dev/null 2>/dev/null | head -5

# UnrealIRCd — JSON-RPC info endpoint (requires auth; use WEBPANEL_RPC_USER/PASSWORD from .env)
curl -s -u "$WEBPANEL_RPC_USER:$WEBPANEL_RPC_PASSWORD" \
  -X POST http://localhost:8600/ \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"rpc.info","params":{},"id":1}' | jq .

# Atheme — check the HTTP API responds
curl -sf http://localhost:8081/ && echo "Atheme HTTP OK"

# Prosody — HTTP status page
curl -sf http://localhost:5280/status

# Prosody — XMPP C2S port connectivity
nc -zv localhost 5222

# Bridge — check process is running
docker exec atl-bridge pgrep -f "bridge.__main__" && echo "Bridge running"

# The Lounge — HTTP connectivity
curl -sf http://localhost:9000/ -o /dev/null && echo "The Lounge OK"

# WebPanel — HTTP connectivity
curl -sf http://localhost:8080/ -o /dev/null && echo "WebPanel OK"

Log inspection

Docker Compose logs

# Follow all service logs
docker compose logs --follow

# Follow a specific service
docker compose logs --follow atl-irc-server

# Show last 100 lines for a service
docker compose logs --tail=100 atl-bridge

# Search logs for errors
docker compose logs atl-bridge 2>&1 | grep -i error

# Filter by time range (last hour)
docker compose logs --since=1h atl-xmpp-server

Dozzle (dev profile)

In the dev profile, Dozzle runs on port 8082 and provides a browser-based log viewer for all atl-* containers. Open http://localhost:8082 to view real-time logs with filtering and search. Dozzle is configured with DOZZLE_FILTER=name=atl-* so it only shows atl.chat containers. It is not included in the production profile.

Per-service log destinations

Most services log to stdout/stderr, captured by Docker’s json-file logging driver. Use docker compose logs to view them. The Lounge is an exception — it writes logs to files under /var/opt/thelounge/ (bind-mounted to data/thelounge/ on the host):
ServiceLog destinationNotes
UnrealIRCdstdout (JSON format via /dev/stdout)docker compose logs atl-irc-server
Athemestdout (via /dev/stdout logfile)docker compose logs atl-irc-services
Prosodystdout (console sink)docker compose logs atl-xmpp-server
Bridgestderr (loguru)docker compose logs atl-bridge
The Lounge/var/opt/thelounge/data/thelounge/
WebPanelnginx access/error logsdocker compose logs atl-irc-webpanel

Common log patterns to watch for

# UnrealIRCd — connection errors and netsplits
docker compose logs atl-irc-server 2>&1 | grep -iE "error|split|refused"

# Atheme — failed authentication attempts
docker compose logs atl-irc-server 2>&1 | grep -i "SASL"

# Prosody — authentication failures and certificate issues
docker compose logs atl-xmpp-server 2>&1 | grep -iE "auth.*fail|certificate|tls"

# Bridge — Discord or IRC connection issues
docker compose logs atl-bridge 2>&1 | grep -iE "disconnect|error|reconnect"

Key metrics and thresholds

Monitor these metrics to catch issues before they affect users:
MetricSourceWarning thresholdCritical threshold
Container health statusdocker inspectAny service unhealthyAny service unhealthy for > 5 min
IRC client connectionsUnrealIRCd RPC (rpc.info)Drops to 0 unexpectedly
IRC server-to-server linksUnrealIRCd logsLink count changesNetsplit detected
XMPP C2S connectionsProsody OpenMetricsDrops to 0 unexpectedly
XMPP auth failure rateProsody OpenMetrics> 10/min sustained> 50/min sustained
Bridge process alivepgrep health checkProcess not found
TLS certificate expiryCertificate files in data/certs/< 14 days< 3 days
Disk usage (data/ volume)Host filesystem> 80%> 95%
Container restart countdocker inspectAny restart> 3 restarts in 10 min
Memory usage per containerdocker stats> 500 MB (Prosody)> 1 GB

Checking certificate expiry

# Check the IRC TLS certificate expiry date
openssl x509 -enddate -noout -in data/certs/live/irc.localhost/fullchain.pem

# Check all certificates in data/certs/
find data/certs/live -name "fullchain.pem" -exec \
  sh -c 'echo "{}:"; openssl x509 -enddate -noout -in "{}"' \;

Checking disk and memory usage

# Disk usage of data directories
du -sh data/*/

# Container memory and CPU usage
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Metrics collection with Prometheus

Prosody exposes an OpenMetrics-compatible endpoint via the http_openmetrics module. This is the primary metrics source for the stack.

Prosody OpenMetrics endpoint

The endpoint is availabl to your prometheus.yml under scrape_configs:
scrape_configs:
  # Prosody XMPP server metrics (OpenMetrics)
  - job_name: xmpp-prosody
    metrics_path: /metrics
    scheme: http
    scrape_interval: 30s
    scrape_timeout: 10s
    static_configs:
      - targets: ["atl-xmpp-server:5280"] # Use container name within Docker network
        labels:
          service: prosody
          environment: production
Note: If Prometheus runs outside the Docker network, replace atl-xmpp-server with the host IP and use the mapped port (default 5280). Ensure PROSODY_OPENMETRICS_CIDR includes the Prometheus server’s IP.

Available Prosody metrics

The http_openmetrics and measure_modules modules expose these metric families:
CategoryMetricsDescription
Connectionsprosody_c2s_connections_total, prosody_s2s_connections_totalClient and server-to-server connection counts
Authenticationprosody_c2s_auth_success_total, prosody_c2s_auth_failure_totalAuth success/failure counters
Messagesprosody_messages_sent_total, prosody_messages_received_totalMessage throughput
Presenceprosody_presence_sent_total, prosody_presence_received_totalPresence update counters
Storageprosody_storage_operations_total, prosody_storage_errors_totalDatabase operation counters
HTTPprosody_http_requests_total, prosody_http_request_duration_secondsBOSH/WebSocket request metrics
Systemprosody_memory_usage_bytes, prosody_cpu_usage_seconds_totalResource usage
Modulesprosody_module_*Per-module status gauges (0=ok, 1=info, 2=warn, 3=error)

Other services

UnrealIRCd, Atheme, The Lounge, and the Bridge do not expose native Prometheus metrics endpoints. Monitor these services using Docker health checks, log analysis, and the JSON-RPC interfaces where available (UnrealIRCd on port 8600, Atheme on port 8081).

Alerting strategy

Choose an alerting approach based on your infrastructure: If you already run Prometheus, add alerting rules for the atl.chat stack:
# Example Prometheus alerting rules — add to your rules file
groups:
  - name: atl-chat
    rules:
      # Prosody down
      - alert: ProsodyDown
        expr: up{job="xmpp-prosody"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Prosody XMPP server is down"

      # High XMPP auth failure rate
      - alert: ProsodyHighAuthFailures
        expr: rate(prosody_c2s_auth_failure_total[5m]) > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High XMPP authentication failure rate"

      # High Prosody memory usage
      - alert: ProsodyHighMemory
        expr: prosody_memory_usage_bytes > 500000000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Prosody memory usage exceeds 500 MB"

Option 2: Uptime Kuma (lightweight, self-hosted)

Uptime Kuma provides a simple web UI for monitoring endpoints. Configure HTTP/TCP checks for each service:
Check typeTargetInterval
TCPlocalhost:6697 (IRC TLS)60s
HTTPhttp://localhost:5280/status (Prosody)60s
HTTPhttp://localhost:9000 (The Lounge)60s
HTTPhttp://localhost:8080 (WebPanel)60s
TCPlocalhost:5222 (XMPP C2S)60s

Option 3: Cron-based health pings (minimal)

For simple setups, a cron job can run health checks and send notifications on failure:
#!/usr/bin/env bash
# /etc/cron.d/atl-health-check — runs every 5 minutes
# Checks all atl.chat services and sends a notification on failure

SERVICES=("atl-irc-server" "atl-irc-services" "atl-xmpp-server" "atl-bridge")

for svc in "${SERVICES[@]}"; do
  status=$(docker inspect --format='{{.State.Health.Status}}' "$svc" 2>/dev/null)
  if [ "$status" != "healthy" ]; then
    echo "ALERT: $svc is $status" | mail -s "atl.chat health alert" ops@example.com
  fi
done

Grafana dashboard recommendations

If you use Grafana with Prometheus, create dashboards covering:
  1. Connection overview — active C2S/S2S connections over time, auth success/failure rates
  2. Message flow — messages per second, peak usage times
  3. Performance — Prosody memory and CPU usage, HTTP request duration
  4. Errors and health — auth failures, storage errors, container restart counts
  5. Infrastructure — disk usage for data/ volumes, container resource consumption