Monitoring - All Things Linux

This page covers how to verify service health, inspect logs, collect metrics with Prometheus, and set up alerting for the atl.chat stack.

Health checks

Every Compose service defines a Docker health check. You can view overall stack health at a glance:

# Show status of all containers including health
docker compose ps

Check a specific container’s health status:

docker inspect --format='{{.State.Health.Status}}' <container-name>

Per-service health check commands

The table below lists the exact health check command defined in each service’s Compose file. These run automatically at 30-second intervals.

Container	Health check command	What it verifies
`atl-irc-server`	JSON-RPC query via Unix socket (`nc -U /home/unrealircd/unrealircd/data/rpc.socket`)	UnrealIRCd is running and the RPC interface responds
`atl-irc-services`	`pgrep -f atheme-services`	Atheme process is alive
`atl-xmpp-server`	`curl -sf http://localhost:5280/status`	Prosody HTTP server responds on port 5280
`atl-xmpp-nginx`	`wget -q -O - --no-check-certificate https://localhost:5281/health`	Nginx HTTPS proxy for Prosody responds
`atl-bridge`	`pgrep -f bridge.__main__`	Bridge Python process is alive
`atl-thelounge`	(none defined)	No built-in health check — verify manually (see below)
`atl-irc-webpanel`	(none defined)	No built-in health check — verify manually (see below)

Manual health verification

For services without a Compose health check, or for deeper verification:

# UnrealIRCd — test TLS connection
openssl s_client -connect localhost:6697 -servername irc.localhost </dev/null 2>/dev/null | head -5

# UnrealIRCd — JSON-RPC info endpoint (requires auth; use WEBPANEL_RPC_USER/PASSWORD from .env)
curl -s -u "$WEBPANEL_RPC_USER:$WEBPANEL_RPC_PASSWORD" \
  -X POST http://localhost:8600/ \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"rpc.info","params":{},"id":1}' | jq .

# Atheme — check the HTTP API responds
curl -sf http://localhost:8081/ && echo "Atheme HTTP OK"

# Prosody — HTTP status page
curl -sf http://localhost:5280/status

# Prosody — XMPP C2S port connectivity
nc -zv localhost 5222

# Bridge — check process is running
docker exec atl-bridge pgrep -f "bridge.__main__" && echo "Bridge running"

# The Lounge — HTTP connectivity
curl -sf http://localhost:9000/ -o /dev/null && echo "The Lounge OK"

# WebPanel — HTTP connectivity
curl -sf http://localhost:8080/ -o /dev/null && echo "WebPanel OK"

Log inspection

Docker Compose logs

# Follow all service logs
docker compose logs --follow

# Follow a specific service
docker compose logs --follow atl-irc-server

# Show last 100 lines for a service
docker compose logs --tail=100 atl-bridge

# Search logs for errors
docker compose logs atl-bridge 2>&1 | grep -i error

# Filter by time range (last hour)
docker compose logs --since=1h atl-xmpp-server

Dozzle (dev profile)

In the dev profile, Dozzle runs on port 8082 and provides a browser-based log viewer for all atl-* containers. Open http://localhost:8082 to view real-time logs with filtering and search. Dozzle is configured with DOZZLE_FILTER=name=atl-* so it only shows atl.chat containers. It is not included in the production profile.

Per-service log destinations

Most services log to stdout/stderr, captured by Docker’s json-file logging driver. Use docker compose logs to view them. The Lounge is an exception — it writes logs to files under /var/opt/thelounge/ (bind-mounted to data/thelounge/ on the host):

Service	Log destination	Notes
UnrealIRCd	stdout (JSON format via `/dev/stdout`)	`docker compose logs atl-irc-server`
Atheme	stdout (via `/dev/stdout` logfile)	`docker compose logs atl-irc-services`
Prosody	stdout (console sink)	`docker compose logs atl-xmpp-server`
Bridge	stderr (loguru)	`docker compose logs atl-bridge`
The Lounge	`/var/opt/thelounge/`	`data/thelounge/`
WebPanel	nginx access/error logs	`docker compose logs atl-irc-webpanel`

Common log patterns to watch for

# UnrealIRCd — connection errors and netsplits
docker compose logs atl-irc-server 2>&1 | grep -iE "error|split|refused"

# Atheme — failed authentication attempts
docker compose logs atl-irc-server 2>&1 | grep -i "SASL"

# Prosody — authentication failures and certificate issues
docker compose logs atl-xmpp-server 2>&1 | grep -iE "auth.*fail|certificate|tls"

# Bridge — Discord or IRC connection issues
docker compose logs atl-bridge 2>&1 | grep -iE "disconnect|error|reconnect"

Key metrics and thresholds

Monitor these metrics to catch issues before they affect users:

Metric	Source	Warning threshold	Critical threshold
Container health status	`docker inspect`	Any service `unhealthy`	Any service `unhealthy` for > 5 min
IRC client connections	UnrealIRCd RPC (`rpc.info`)	—	Drops to 0 unexpectedly
IRC server-to-server links	UnrealIRCd logs	Link count changes	Netsplit detected
XMPP C2S connections	Prosody OpenMetrics	—	Drops to 0 unexpectedly
XMPP auth failure rate	Prosody OpenMetrics	> 10/min sustained	> 50/min sustained
Bridge process alive	`pgrep` health check	—	Process not found
TLS certificate expiry	Certificate files in `data/certs/`	< 14 days	< 3 days
Disk usage (`data/` volume)	Host filesystem	> 80%	> 95%
Container restart count	`docker inspect`	Any restart	> 3 restarts in 10 min
Memory usage per container	`docker stats`	> 500 MB (Prosody)	> 1 GB

Checking certificate expiry

# Check the IRC TLS certificate expiry date
openssl x509 -enddate -noout -in data/certs/live/irc.localhost/fullchain.pem

# Check all certificates in data/certs/
find data/certs/live -name "fullchain.pem" -exec \
  sh -c 'echo "{}:"; openssl x509 -enddate -noout -in "{}"' \;

Checking disk and memory usage

# Disk usage of data directories
du -sh data/*/

# Container memory and CPU usage
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Metrics collection with Prometheus

Prosody exposes an OpenMetrics-compatible endpoint via the http_openmetrics module. This is the primary metrics source for the stack.

Prosody OpenMetrics endpoint

The endpoint is availabl to your prometheus.yml under scrape_configs:

scrape_configs:
  # Prosody XMPP server metrics (OpenMetrics)
  - job_name: xmpp-prosody
    metrics_path: /metrics
    scheme: http
    scrape_interval: 30s
    scrape_timeout: 10s
    static_configs:
      - targets: ["atl-xmpp-server:5280"] # Use container name within Docker network
        labels:
          service: prosody
          environment: production

Note: If Prometheus runs outside the Docker network, replace atl-xmpp-server with the host IP and use the mapped port (default 5280). Ensure PROSODY_OPENMETRICS_CIDR includes the Prometheus server’s IP.

Available Prosody metrics

The http_openmetrics and measure_modules modules expose these metric families:

Category	Metrics	Description
Connections	`prosody_c2s_connections_total`, `prosody_s2s_connections_total`	Client and server-to-server connection counts
Authentication	`prosody_c2s_auth_success_total`, `prosody_c2s_auth_failure_total`	Auth success/failure counters
Messages	`prosody_messages_sent_total`, `prosody_messages_received_total`	Message throughput
Presence	`prosody_presence_sent_total`, `prosody_presence_received_total`	Presence update counters
Storage	`prosody_storage_operations_total`, `prosody_storage_errors_total`	Database operation counters
HTTP	`prosody_http_requests_total`, `prosody_http_request_duration_seconds`	BOSH/WebSocket request metrics
System	`prosody_memory_usage_bytes`, `prosody_cpu_usage_seconds_total`	Resource usage
Modules	`prosody_module_*`	Per-module status gauges (0=ok, 1=info, 2=warn, 3=error)

Other services

UnrealIRCd, Atheme, The Lounge, and the Bridge do not expose native Prometheus metrics endpoints. Monitor these services using Docker health checks, log analysis, and the JSON-RPC interfaces where available (UnrealIRCd on port 8600, Atheme on port 8081).

Alerting strategy

Choose an alerting approach based on your infrastructure:

Option 1: Prometheus + Alertmanager (recommended for production)

If you already run Prometheus, add alerting rules for the atl.chat stack:

# Example Prometheus alerting rules — add to your rules file
groups:
  - name: atl-chat
    rules:
      # Prosody down
      - alert: ProsodyDown
        expr: up{job="xmpp-prosody"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Prosody XMPP server is down"

      # High XMPP auth failure rate
      - alert: ProsodyHighAuthFailures
        expr: rate(prosody_c2s_auth_failure_total[5m]) > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High XMPP authentication failure rate"

      # High Prosody memory usage
      - alert: ProsodyHighMemory
        expr: prosody_memory_usage_bytes > 500000000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Prosody memory usage exceeds 500 MB"

Option 2: Uptime Kuma (lightweight, self-hosted)

Uptime Kuma provides a simple web UI for monitoring endpoints. Configure HTTP/TCP checks for each service:

Check type	Target	Interval
TCP	`localhost:6697` (IRC TLS)	60s
HTTP	`http://localhost:5280/status` (Prosody)	60s
HTTP	`http://localhost:9000` (The Lounge)	60s
HTTP	`http://localhost:8080` (WebPanel)	60s
TCP	`localhost:5222` (XMPP C2S)	60s

Option 3: Cron-based health pings (minimal)

For simple setups, a cron job can run health checks and send notifications on failure:

#!/usr/bin/env bash
# /etc/cron.d/atl-health-check — runs every 5 minutes
# Checks all atl.chat services and sends a notification on failure

SERVICES=("atl-irc-server" "atl-irc-services" "atl-xmpp-server" "atl-bridge")

for svc in "${SERVICES[@]}"; do
  status=$(docker inspect --format='{{.State.Health.Status}}' "$svc" 2>/dev/null)
  if [ "$status" != "healthy" ]; then
    echo "ALERT: $svc is $status" | mail -s "atl.chat health alert" ops@example.com
  fi
done

Grafana dashboard recommendations

If you use Grafana with Prometheus, create dashboards covering:

Connection overview — active C2S/S2S connections over time, auth success/failure rates
Message flow — messages per second, peak usage times
Performance — Prosody memory and CPU usage, HTTP request duration
Errors and health — auth failures, storage errors, container restart counts
Infrastructure — disk usage for data/ volumes, container resource consumption

Deployment — production deployment runbook
Troubleshooting — cross-service diagnostic commands and common issues
Backups — backup and restore procedures
Security — secret management and network isolation
SSL/TLS — certificate management and expiry monitoring
Environment Variables — complete variable reference
Ports Reference — complete port registry

​Health checks

​Per-service health check commands

​Manual health verification

​Log inspection

​Docker Compose logs

​Dozzle (dev profile)

​Per-service log destinations

​Common log patterns to watch for

​Key metrics and thresholds

​Checking certificate expiry

​Checking disk and memory usage

​Metrics collection with Prometheus

​Prosody OpenMetrics endpoint

​Available Prosody metrics

​Other services

​Alerting strategy

​Option 1: Prometheus + Alertmanager (recommended for production)

​Option 2: Uptime Kuma (lightweight, self-hosted)

​Option 3: Cron-based health pings (minimal)

​Grafana dashboard recommendations

​Related pages