Troubleshooting
Audience: Application developer
Symptoms diagnosable from the pod side. For Vault policy issues, cluster configuration, or metric queries, see operators/monitoring.
Pod starts but cannot connect to the database
| Symptom | Likely cause | What to check |
|---|---|---|
authentication failed for user "DBUSER" |
Annotation mode defaults used — env var name is DBUSER, not what the app reads |
Set <dbname>.env-key-dbuser and <dbname>.env-key-dbpassword explicitly |
role "DBUSER" does not exist |
Same as above, app is reading the default env var name | Confirm the env var names match what the app expects |
connection refused |
Wrong host or port in URI template | Check <dbname>.template; verify the DB hostname resolves from inside the pod |
SSL connection required |
App connects without TLS, DB requires it | Add ?sslmode=require (or verify-full) to your URI template |
| Credentials work but expire after a few hours | token_period not set on the Vault role |
Ask your operator to check vault read auth/kubernetes/role/<role> — token_period must be non-zero |
Pod env contains __VDBI_PH_...
This placeholder is set by the webhook in NRI mode and should be replaced by the NRI plugin before the container process starts.
If kubectl exec -- env still shows the placeholder string rather than
real credentials, the NRI plugin did not perform the substitution.
Common causes:
- NRI plugin DaemonSet pod not running on the node. Check that the DaemonSet has a ready pod on the same node as your pod:
- Pod was scheduled on a node where containerd NRI is not enabled. The operator must enable NRI in containerd's configuration on every node that runs injected workloads.
- Pod does not carry the correct label. The NRI plugin filters pods
by the
vault-db-injector: "true"label (or the operator-configured equivalent). A pod admitted without that label receives placeholders that are never substituted.
The operator can check vdbi_nri_unwrap_failures_total{reason} to see
why substitution failed. See operators/monitoring
for the metric reference.
Pod stuck in ContainerCreating after admission
The NRI plugin runs synchronously at container creation. If the plugin fails to resolve credentials, containerd may stall the container start.
Likely causes:
- Vault login failed at NRI substitution. The plugin authenticates to Vault using the pod's ServiceAccount. If the SA is not bound to any Vault role, the login is rejected.
- Pod ServiceAccount not present in the Vault role's
bound_service_account_names. Ask your operator to verify the Vault role: The SA name and namespace must appear in the output. - Pod admitted outside the webhook's namespace selector. If the pod was created in a namespace not covered by the webhook, the webhook was not called, no placeholders were set, and the NRI plugin skips the pod (no placeholders to substitute). The pod should start normally in this case — if it stalls, it is unrelated to the injector.
The operator metric vdbi_nri_unwrap_failures_total{reason} is the
primary signal for NRI substitution failures. See
operators/monitoring for the full metric
catalog and suggested alert rules.
Checking the annotation the webhook set
After pod admission, inspect the annotations the webhook wrote:
Verify that:
- Each
<dbname>.roleis present and points to the expected Vault role. - Each
<dbname>.uuidis set (written by the webhook at admission). If it is missing, the webhook did not process the pod. - No stale or misspelled annotation keys are present.
Webhook rejection messages
Pod events often contain the rejection reason from the webhook:
A FailedCreate or webhook-related event with a Vault error message
indicates the injector returned a non-200 response at admission time.
Share the full event text with your operator.