Skip to main content
Security

Secrets Management Beyond a Vault

Ravinder··8 min read
SecuritySecrets ManagementVaultReliability
Share:
Secrets Management Beyond a Vault

The Vault Is Not a Secrets Program

Most teams ship Vault or AWS Secrets Manager, point their services at it, and check the box. Audit asks if secrets are centralized — yes. Security review asks if secrets are encrypted at rest — yes. The conversation ends there, and nobody asks the harder question: what happens when a secret leaks anyway?

The answer is almost always "we rotate it manually, file an incident ticket, and spend two days tracing which services need updated environment variables." That is not a secrets program. That is a filing cabinet with a better lock.

A real secrets program has three layers that most teams skip: automated rotation so leaked credentials expire before they can be weaponized, just-in-time (JIT) access so credentials only exist when a workload needs them, and blast-radius analysis so you know — before an incident — exactly what an attacker gains by compromising any given secret.

This post is about building those three layers on top of whatever central store you already have.

The Rotation Gap

Static secrets are the original sin of secrets management. A database password that hasn't rotated in 18 months is a ticking clock. If that credential surfaces in a git history, a debug log, a curl command in Slack, or an overly verbose error message, an attacker has unlimited time to act.

The standard advice is "rotate secrets every 90 days." The practical reality is that nobody does it unless it's automated, because manual rotation breaks things. Services need restarts, configs need redeployment, and the engineer who owns the rotation is usually the one who gets paged at 2 AM when the old credential expires before the new one propagates.

Automated rotation fixes the process problem by making rotation invisible. The sequence looks like this:

sequenceDiagram participant Scheduler participant Vault participant Database participant ServiceA participant ServiceB Scheduler->>Vault: trigger rotation for db/prod-password Vault->>Database: generate new credential (new user or password update) Database-->>Vault: ack new credential Vault->>Vault: store new version, keep old version active (grace period) Vault-->>ServiceA: next lease renewal returns new credential Vault-->>ServiceB: next lease renewal returns new credential Note over ServiceA,ServiceB: grace period: both old and new valid Scheduler->>Vault: expire old credential after grace period Vault->>Database: revoke old credential

The grace period is critical. If you revoke the old credential the instant the new one exists, any service that cached the old one will break before it gets a chance to renew. A 15-minute overlap handles nearly every practical cache TTL.

Implementing Rotation with Vault Dynamic Secrets

Vault's dynamic secrets engine generates credentials on demand rather than storing static ones. For Postgres this looks like:

# vault-database-config.hcl
path "database/config/prod-postgres" {
  capabilities = ["create", "update"]
}
 
path "database/roles/app-readonly" {
  capabilities = ["read"]
}
# Configure the database engine
vault secrets enable database
 
vault write database/config/prod-postgres \
  plugin_name=postgresql-database-plugin \
  connection_url="postgresql://{{username}}:{{password}}@postgres:5432/proddb" \
  allowed_roles="app-readonly,app-readwrite" \
  username="vault-root" \
  password="$VAULT_ROOT_PASSWORD"
 
# Create a role with a 1-hour TTL
vault write database/roles/app-readonly \
  db_name=prod-postgres \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

Now each service requests a credential at startup:

import hvac
import os
 
def get_db_credential():
    client = hvac.Client(url=os.environ["VAULT_ADDR"], token=os.environ["VAULT_TOKEN"])
    secret = client.secrets.database.generate_credentials(name="app-readonly")
    return {
        "username": secret["data"]["username"],
        "password": secret["data"]["password"],
        "lease_id": secret["lease_id"],
        "lease_duration": secret["lease_duration"],
    }

The credential expires after an hour. If it leaks in a log file, the attacker has at most 60 minutes — and only the narrow permissions of that role.

Just-in-Time Access

Dynamic secrets handle data plane credentials. JIT handles control plane access — the credentials your engineers use to log into production systems, run database queries, or access cloud consoles.

The pattern: no persistent credentials for humans. When an engineer needs production access, they request it, the system grants a short-lived credential, and the credential disappears when the session ends.

flowchart LR A[Engineer] -->|access request| B[Access Gateway] B -->|check policy| C[PagerDuty / Jira\nOn-call verification] C -->|approved| B B -->|generate credential| D[Vault / AWS STS] D -->|short-lived token\nmax 4 hours| A A -->|use credential| E[Production DB / Console] D -->|audit log| F[SIEM] A -->|session end| D D -->|revoke credential| E

This eliminates a huge attack surface. There are no long-lived admin passwords to phish. There is no "shared prod credentials" document in Confluence. Every access is tied to a specific person, a specific reason, and a specific time window — all audit-logged.

For AWS, this maps to temporary STS credentials via IAM Identity Center. For databases, Vault's database engine covers it. For SSH, Vault's SSH secrets engine issues signed certificates that expire:

# Request a signed SSH certificate good for 30 minutes
vault write ssh-client-signer/sign/prod-engineer \
  public_key=@~/.ssh/id_ed25519.pub \
  valid_principals="ubuntu" \
  ttl="30m"

Blast-Radius Analysis

The unglamorous part of secrets management is the question nobody wants to model: "If this secret leaks, what exactly can an attacker do?"

Blast-radius analysis means building and maintaining a map of what each secret grants access to. It lives somewhere queryable — a spreadsheet is fine to start, a proper asset inventory is better.

The columns that matter:

Secret Grants access to Scope Rotation TTL Leak impact
db/prod-readwrite Production Postgres All tables, write 1h dynamic High — PII + financial data
aws/deploy-role S3 deployment bucket Write to one prefix 1h STS Medium — can corrupt deploys
stripe/webhook-signing Stripe webhook validation Verify inbound webhooks Manual, 90d Low — can spoof events
github/actions-token GitHub Actions Repo push, no admin Per-workflow Medium — can tamper artifacts

The "Leak impact" column forces a conversation your team probably hasn't had. Not every secret is equal. A leaked read-only analytics credential is annoying. A leaked production database write credential is a breach.

This map also tells you where to invest in tighter controls. High-impact secrets should have shorter TTLs, stricter access policies, and more aggressive alerting on anomalous usage.

Secret Detection in CI/CD

Secrets leak through code. Everyone knows this. Fewer teams do anything systematic about it beyond a one-time git history scan.

The right posture is detection at the point of commit, before the secret ever reaches a remote:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.2
    hooks:
      - id: gitleaks
        name: Detect hardcoded secrets
        entry: gitleaks protect --staged --redact --config=.gitleaks.toml
        language: golang
        pass_filenames: false
# .gitleaks.toml
[extend]
useDefault = true
 
[[rules]]
id = "internal-service-token"
description = "Internal service token"
regex = '''svc_[a-zA-Z0-9]{32,}'''
tags = ["internal", "service-token"]
 
[allowlist]
description = "Allowlisted patterns"
regexes = [
  '''EXAMPLE_KEY''',
  '''PLACEHOLDER_SECRET''',
]

Add the same check in CI so commits that bypassed the pre-commit hook get caught before merge:

# .github/workflows/secret-scan.yml
name: Secret Scan
on: [pull_request, push]
jobs:
  gitleaks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

When a secret is detected, the response matters as much as the detection. Have a runbook ready: revoke the secret immediately (assume it's compromised the moment it touches a remote), rotate to a new one, and run a log analysis to determine if the old one was used externally.

Connecting the Pieces

None of these layers work in isolation. Rotation without blast-radius analysis means you're rotating secrets you don't fully understand. JIT without detection means engineers still paste secrets into Slack to share them. Detection without rotation means you find the leak after it's already been sitting in git history for six months.

The integrated flow looks like this:

flowchart TD A[Secret Created] --> B[Blast-Radius Tag\nimpact: high/medium/low] B --> C{Dynamic capable?} C -->|Yes| D[Vault dynamic secret\nshort TTL] C -->|No| E[Static secret\nmanual rotation\nshorter max TTL] D --> F[Services consume\nvia Vault agent] E --> F F --> G[Git pre-commit hook\ndetects if secret escapes] G --> H[CI secret scan\nfallback detection] H --> I{Leak detected?} I -->|Yes| J[Revoke immediately\nRotate\nIncident response] I -->|No| K[Periodic rotation\ntriggered by scheduler] K --> D

Key Takeaways

  • Centralizing secrets in Vault or Secrets Manager is necessary but not sufficient — rotation, JIT, and detection are the layers that make it a real program.
  • Dynamic secrets with short TTLs are the most effective single control: a leaked credential that expires in an hour limits attacker opportunity regardless of how it leaked.
  • Just-in-time human access eliminates the persistent admin credentials that are the most dangerous targets in any environment.
  • Blast-radius analysis should be done before an incident, not during one — you need to know what each secret grants before you're scrambling to contain a breach.
  • Secret detection in pre-commit hooks catches the most common leak vector (accidental commits) at the cheapest possible point in the pipeline.
  • Build a runbook for "secret confirmed leaked" that the team has practiced — response speed matters as much as detection.