DevOps - Prometheus: Proactive SSL Monitoring with Blackbox Exporter
The Invisible Failure: Expired Certificates
Every well-versed admin has a horror story about a production outage caused by a forgotten SSL certificate. Modern infrastructure has hundreds of certificates (Internal PKI, Let’s Encrypt, Cloudflare). Tracking them in a spreadsheet is a path to failure. You need an automated system that alerts you weeks before a crash happens.
The Tool: Blackbox Exporter
The Prometheus Blackbox Exporter allows you to probe endpoints from the “outside.” It doesn’t just check if the server is up; it performs the full TLS handshake and extracts the certificate metadata.
1. Configuration
In your blackbox.yml:
1
2
3
4
5
6
7
modules:
http_2xx_tls:
prober: http
http:
preferred_chain_brands: ["ISRG Root X1"] # Example for Let's Encrypt
fail_if_ssl: false
fail_if_not_ssl: true
2. Prometheus Scrape Job
Tell Prometheus which endpoints to check:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
scrape_configs:
- job_name: "ssl_expiry"
metrics_path: /probe
params:
module: [http_2xx_tls]
static_configs:
- targets:
- https://example.com
- https://api.mysite.net:443
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: 127.0.0.1:9115 # Blackbox exporter address
The Alert: The ‘Golden Signal’ for SSL
The metric you care about is probe_ssl_earliest_cert_expiry. It returns the expiry time as a Unix timestamp.
1
2
3
4
5
6
7
8
9
10
groups:
- name: SSL_Alerts
rules:
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
for: 1h
labels:
severity: warning
annotations:
summary: "SSL Certificate for expires in "
Tips & Tricks
- SNI Issues: If you host multiple sites on one IP, ensure the Blackbox prober is sending the correct
server_name(SNI). In newer versions, this is handled automatically via thetargetparameter. - Internal PKI: To monitor internal services using a private CA, you must mount your CA certificate into the Blackbox Exporter’s container at
/etc/ssl/certs. - The ‘Chain of Trust’ Check:
probe_ssl_last_chain_infois a niche metric that tells you if the full chain (including intermediates) is being sent correctly. A browser might work with a missing intermediate, but many API clients (like Python’srequests) will fail.
Summary
Proactive monitoring is the hallmark of a experienced administrator. By integrating SSL expiry into your Prometheus/Alertmanager stack, you move from “Firefighting” to “Fire Prevention.”