Secrets Management Beyond a Vault
The Vault Is Not a Secrets Program
Most teams ship Vault or AWS Secrets Manager, point their services at it, and check the box. Audit asks if secrets are centralized — yes. Security review asks if secrets are encrypted at rest — yes. The conversation ends there, and nobody asks the harder question: what happens when a secret leaks anyway?
The answer is almost always "we rotate it manually, file an incident ticket, and spend two days tracing which services need updated environment variables." That is not a secrets program. That is a filing cabinet with a better lock.
A real secrets program has three layers that most teams skip: automated rotation so leaked credentials expire before they can be weaponized, just-in-time (JIT) access so credentials only exist when a workload needs them, and blast-radius analysis so you know — before an incident — exactly what an attacker gains by compromising any given secret.
This post is about building those three layers on top of whatever central store you already have.
The Rotation Gap
Static secrets are the original sin of secrets management. A database password that hasn't rotated in 18 months is a ticking clock. If that credential surfaces in a git history, a debug log, a curl command in Slack, or an overly verbose error message, an attacker has unlimited time to act.
The standard advice is "rotate secrets every 90 days." The practical reality is that nobody does it unless it's automated, because manual rotation breaks things. Services need restarts, configs need redeployment, and the engineer who owns the rotation is usually the one who gets paged at 2 AM when the old credential expires before the new one propagates.
Automated rotation fixes the process problem by making rotation invisible. The sequence looks like this:
The grace period is critical. If you revoke the old credential the instant the new one exists, any service that cached the old one will break before it gets a chance to renew. A 15-minute overlap handles nearly every practical cache TTL.
Implementing Rotation with Vault Dynamic Secrets
Vault's dynamic secrets engine generates credentials on demand rather than storing static ones. For Postgres this looks like:
# vault-database-config.hcl
path "database/config/prod-postgres" {
capabilities = ["create", "update"]
}
path "database/roles/app-readonly" {
capabilities = ["read"]
}# Configure the database engine
vault secrets enable database
vault write database/config/prod-postgres \
plugin_name=postgresql-database-plugin \
connection_url="postgresql://{{username}}:{{password}}@postgres:5432/proddb" \
allowed_roles="app-readonly,app-readwrite" \
username="vault-root" \
password="$VAULT_ROOT_PASSWORD"
# Create a role with a 1-hour TTL
vault write database/roles/app-readonly \
db_name=prod-postgres \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"Now each service requests a credential at startup:
import hvac
import os
def get_db_credential():
client = hvac.Client(url=os.environ["VAULT_ADDR"], token=os.environ["VAULT_TOKEN"])
secret = client.secrets.database.generate_credentials(name="app-readonly")
return {
"username": secret["data"]["username"],
"password": secret["data"]["password"],
"lease_id": secret["lease_id"],
"lease_duration": secret["lease_duration"],
}The credential expires after an hour. If it leaks in a log file, the attacker has at most 60 minutes — and only the narrow permissions of that role.
Just-in-Time Access
Dynamic secrets handle data plane credentials. JIT handles control plane access — the credentials your engineers use to log into production systems, run database queries, or access cloud consoles.
The pattern: no persistent credentials for humans. When an engineer needs production access, they request it, the system grants a short-lived credential, and the credential disappears when the session ends.
This eliminates a huge attack surface. There are no long-lived admin passwords to phish. There is no "shared prod credentials" document in Confluence. Every access is tied to a specific person, a specific reason, and a specific time window — all audit-logged.
For AWS, this maps to temporary STS credentials via IAM Identity Center. For databases, Vault's database engine covers it. For SSH, Vault's SSH secrets engine issues signed certificates that expire:
# Request a signed SSH certificate good for 30 minutes
vault write ssh-client-signer/sign/prod-engineer \
public_key=@~/.ssh/id_ed25519.pub \
valid_principals="ubuntu" \
ttl="30m"Blast-Radius Analysis
The unglamorous part of secrets management is the question nobody wants to model: "If this secret leaks, what exactly can an attacker do?"
Blast-radius analysis means building and maintaining a map of what each secret grants access to. It lives somewhere queryable — a spreadsheet is fine to start, a proper asset inventory is better.
The columns that matter:
| Secret | Grants access to | Scope | Rotation TTL | Leak impact |
|---|---|---|---|---|
| db/prod-readwrite | Production Postgres | All tables, write | 1h dynamic | High — PII + financial data |
| aws/deploy-role | S3 deployment bucket | Write to one prefix | 1h STS | Medium — can corrupt deploys |
| stripe/webhook-signing | Stripe webhook validation | Verify inbound webhooks | Manual, 90d | Low — can spoof events |
| github/actions-token | GitHub Actions | Repo push, no admin | Per-workflow | Medium — can tamper artifacts |
The "Leak impact" column forces a conversation your team probably hasn't had. Not every secret is equal. A leaked read-only analytics credential is annoying. A leaked production database write credential is a breach.
This map also tells you where to invest in tighter controls. High-impact secrets should have shorter TTLs, stricter access policies, and more aggressive alerting on anomalous usage.
Secret Detection in CI/CD
Secrets leak through code. Everyone knows this. Fewer teams do anything systematic about it beyond a one-time git history scan.
The right posture is detection at the point of commit, before the secret ever reaches a remote:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.2
hooks:
- id: gitleaks
name: Detect hardcoded secrets
entry: gitleaks protect --staged --redact --config=.gitleaks.toml
language: golang
pass_filenames: false# .gitleaks.toml
[extend]
useDefault = true
[[rules]]
id = "internal-service-token"
description = "Internal service token"
regex = '''svc_[a-zA-Z0-9]{32,}'''
tags = ["internal", "service-token"]
[allowlist]
description = "Allowlisted patterns"
regexes = [
'''EXAMPLE_KEY''',
'''PLACEHOLDER_SECRET''',
]Add the same check in CI so commits that bypassed the pre-commit hook get caught before merge:
# .github/workflows/secret-scan.yml
name: Secret Scan
on: [pull_request, push]
jobs:
gitleaks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}When a secret is detected, the response matters as much as the detection. Have a runbook ready: revoke the secret immediately (assume it's compromised the moment it touches a remote), rotate to a new one, and run a log analysis to determine if the old one was used externally.
Connecting the Pieces
None of these layers work in isolation. Rotation without blast-radius analysis means you're rotating secrets you don't fully understand. JIT without detection means engineers still paste secrets into Slack to share them. Detection without rotation means you find the leak after it's already been sitting in git history for six months.
The integrated flow looks like this:
Key Takeaways
- Centralizing secrets in Vault or Secrets Manager is necessary but not sufficient — rotation, JIT, and detection are the layers that make it a real program.
- Dynamic secrets with short TTLs are the most effective single control: a leaked credential that expires in an hour limits attacker opportunity regardless of how it leaked.
- Just-in-time human access eliminates the persistent admin credentials that are the most dangerous targets in any environment.
- Blast-radius analysis should be done before an incident, not during one — you need to know what each secret grants before you're scrambling to contain a breach.
- Secret detection in pre-commit hooks catches the most common leak vector (accidental commits) at the cheapest possible point in the pipeline.
- Build a runbook for "secret confirmed leaked" that the team has practiced — response speed matters as much as detection.