Master Helm hooks, test suites, and dependency management for production-grade Kubernetes deployments. Build reliable charts with atomic migrations and validation.
Advanced Helm Charts: Mastering Hooks, Tests and Dependencies for Reliable Kubernetes Deployments
You’ve deployed your first Helm charts successfully. Your applications run in Kubernetes, and basic templating feels comfortable. Then reality hits: a database migration fails mid-deployment, leaving your production environment in an inconsistent state. Or worse, you discover a misconfigured service only after it’s handling live traffic.
These scenarios expose the gap between basic Helm usage and production-grade deployment orchestration. Standard charts handle simple workloads well, but complex applications demand more: automated pre-deployment validations, coordinated database migrations, graceful dependency management, and comprehensive testing before traffic reaches new pods.
This guide shows you how to build Helm charts that handle these challenges elegantly. You’ll implement hooks that execute migrations atomically with proper rollback handling, create test suites that validate your entire deployment before marking it successful, and architect dependency relationships that work across complex microservice topologies.
Prerequisites
Before diving in, ensure you have:
- Kubernetes cluster (1.24+) with kubectl configured
- Helm 3.12+ installed locally
- Familiarity with basic Helm concepts: templates, values, releases
- Working knowledge of Kubernetes resources: Deployments, Services, Jobs, ConfigMaps
- A container registry accessible from your cluster (Docker Hub, ECR, GCR)
You should be comfortable creating a basic Helm chart with helm create and understand the template directory structure.
Architecture and Key Concepts
Helm’s advanced features operate on a lifecycle model that extends beyond simple resource creation. Understanding this lifecycle is crucial for building reliable deployment workflows.
flowchart TD
subgraph "Helm Release Lifecycle"
A[helm install/upgrade] --> B{Pre-install/upgrade Hooks}
B -->|Success| C[Deploy Kubernetes Resources]
B -->|Failure| R1[Rollback & Abort]
C --> D{Post-install/upgrade Hooks}
D -->|Success| E{Run Helm Tests}
D -->|Failure| R2[Rollback Resources]
E -->|Success| F[Release Complete]
E -->|Failure| R3[Mark Release Failed]
end
subgraph "Hook Execution Order"
H1[pre-install weight=-5] --> H2[pre-install weight=0]
H2 --> H3[pre-install weight=5]
end
subgraph "Dependency Resolution"
P1[Parent Chart] --> P2{Condition Check}
P2 -->|enabled| P3[Load Subchart]
P2 -->|disabled| P4[Skip Subchart]
P3 --> P5[Merge Values]
P5 --> P6[Render Templates]
end
Hooks are Kubernetes resources with special annotations that tell Helm to execute them at specific lifecycle points. Unlike regular chart resources, hooks run to completion before the lifecycle continues.
Tests are post-deployment validation jobs that verify your release works correctly. They run on-demand via helm test and determine whether your deployment actually succeeded.
Dependencies define relationships between charts, allowing you to compose complex applications from smaller, reusable components with conditional inclusion and value overrides.
Step-by-Step Implementation
Implementing Pre-Install and Post-Upgrade Hooks for Database Migrations
Database migrations represent the canonical use case for Helm hooks. You need migrations to complete before your application starts, and you need them to roll back cleanly if they fail.
Let’s build a migration hook that handles PostgreSQL schema updates:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
| # templates/db-migration-job.yaml
{{- if .Values.migrations.enabled }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "myapp.fullname" . }}-db-migrate
labels:
{{- include "myapp.labels" . | nindent 4 }}
app.kubernetes.io/component: migration
annotations:
# Hook type: runs before install and before upgrade
"helm.sh/hook": pre-install,pre-upgrade
# Weight determines execution order (lower runs first)
"helm.sh/hook-weight": "-5"
# Delete previous hook job before creating new one
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
# Prevent infinite retry loops
backoffLimit: {{ .Values.migrations.backoffLimit | default 3 }}
# Auto-cleanup after completion
ttlSecondsAfterFinished: {{ .Values.migrations.ttlSeconds | default 600 }}
template:
metadata:
labels:
{{- include "myapp.selectorLabels" . | nindent 8 }}
app.kubernetes.io/component: migration
spec:
restartPolicy: Never
{{- with .Values.migrations.securityContext }}
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
initContainers:
# Wait for database to be ready before running migrations
- name: wait-for-db
image: busybox:1.36
command:
- sh
- -c
- |
echo "Waiting for database at {{ .Values.database.host }}:{{ .Values.database.port }}"
until nc -z {{ .Values.database.host }} {{ .Values.database.port }}; do
echo "Database not ready, sleeping..."
sleep 2
done
echo "Database is ready"
containers:
- name: migrate
image: "{{ .Values.migrations.image.repository }}:{{ .Values.migrations.image.tag }}"
imagePullPolicy: {{ .Values.migrations.image.pullPolicy | default "IfNotPresent" }}
command:
- /bin/sh
- -c
- |
set -e
echo "Starting database migration..."
echo "Current schema version:"
# Run migration tool (example using golang-migrate)
migrate -path /migrations -database "$DATABASE_URL" version || echo "No migrations applied yet"
echo "Applying pending migrations..."
migrate -path /migrations -database "$DATABASE_URL" up
echo "Migration completed. New schema version:"
migrate -path /migrations -database "$DATABASE_URL" version
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ .Values.database.existingSecret | default (printf "%s-db-credentials" (include "myapp.fullname" .)) }}
key: url
resources:
{{- toYaml .Values.migrations.resources | nindent 12 }}
{{- with .Values.migrations.volumeMounts }}
volumeMounts:
{{- toYaml . | nindent 12 }}
{{- end }}
{{- with .Values.migrations.volumes }}
volumes:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
|
💡 The hook-delete-policy: before-hook-creation,hook-succeeded annotation is critical. It cleans up successful jobs automatically but preserves failed ones for debugging.
Now let’s add a seed data hook that runs after initial installation only:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| # templates/db-seed-job.yaml
{{- if and .Values.seed.enabled (not .Release.IsUpgrade) }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "myapp.fullname" . }}-db-seed
labels:
{{- include "myapp.labels" . | nindent 4 }}
app.kubernetes.io/component: seed
annotations:
"helm.sh/hook": post-install
# Run after migrations (which have weight -5)
"helm.sh/hook-weight": "0"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
backoffLimit: 1
ttlSecondsAfterFinished: 300
template:
spec:
restartPolicy: Never
containers:
- name: seed
image: "{{ .Values.seed.image.repository }}:{{ .Values.seed.image.tag }}"
command:
- /bin/sh
- -c
- |
set -e
echo "Seeding initial data..."
# Check if data already exists to make seed idempotent
EXISTING_COUNT=$(psql "$DATABASE_URL" -t -c "SELECT COUNT(*) FROM users WHERE email = 'admin@example.com'" | tr -d ' ')
if [ "$EXISTING_COUNT" -eq "0" ]; then
echo "Creating admin user..."
psql "$DATABASE_URL" -c "INSERT INTO users (email, role, created_at) VALUES ('admin@example.com', 'admin', NOW())"
else
echo "Admin user already exists, skipping..."
fi
echo "Seed completed"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ .Values.database.existingSecret | default (printf "%s-db-credentials" (include "myapp.fullname" .)) }}
key: url
{{- end }}
|
⚠️ Always make seed operations idempotent. If a hook fails partway through, Helm may retry it, and you don’t want duplicate data.
For sophisticated rollback handling, implement a rollback hook that reverts migrations on failure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| # templates/db-rollback-job.yaml
{{- if .Values.migrations.rollbackEnabled }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "myapp.fullname" . }}-db-rollback
labels:
{{- include "myapp.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": pre-rollback
"helm.sh/hook-weight": "-10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
backoffLimit: 1
template:
spec:
restartPolicy: Never
containers:
- name: rollback
image: "{{ .Values.migrations.image.repository }}:{{ .Values.migrations.image.tag }}"
command:
- /bin/sh
- -c
- |
set -e
echo "Rolling back last migration..."
# Store current version for logging
CURRENT=$(migrate -path /migrations -database "$DATABASE_URL" version 2>&1 | tail -1)
echo "Current version: $CURRENT"
# Roll back one migration
migrate -path /migrations -database "$DATABASE_URL" down 1
NEW=$(migrate -path /migrations -database "$DATABASE_URL" version 2>&1 | tail -1)
echo "Rolled back to version: $NEW"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ .Values.database.existingSecret | default (printf "%s-db-credentials" (include "myapp.fullname" .)) }}
key: url
{{- end }}
|
Creating Comprehensive Helm Test Suites
Helm tests validate that your deployment actually works after all resources are created. They’re Pods with the helm.sh/hook: test annotation, executed via helm test <release>.
Build a multi-stage test suite that validates connectivity, health, and configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
| # templates/tests/test-connection.yaml
apiVersion: v1
kind: Pod
metadata:
name: {{ include "myapp.fullname" . }}-test-connection
labels:
{{- include "myapp.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
restartPolicy: Never
containers:
- name: test-api-connectivity
image: curlimages/curl:8.4.0
command:
- /bin/sh
- -c
- |
set -e
echo "=== Testing API Connectivity ==="
# Test internal service DNS resolution
echo "Testing DNS resolution for {{ include "myapp.fullname" . }}..."
nslookup {{ include "myapp.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local
# Test HTTP connectivity to health endpoint
echo "Testing HTTP connectivity..."
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" \
--connect-timeout 10 \
--max-time 30 \
--retry 5 \
--retry-delay 3 \
"http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/health")
if [ "$RESPONSE" = "200" ]; then
echo "✓ Health endpoint returned 200 OK"
else
echo "✗ Health endpoint returned $RESPONSE"
exit 1
fi
# Test readiness endpoint
echo "Testing readiness..."
READY=$(curl -s -o /dev/null -w "%{http_code}" \
"http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/ready")
if [ "$READY" = "200" ]; then
echo "✓ Readiness endpoint returned 200 OK"
else
echo "✗ Readiness endpoint returned $READY"
exit 1
fi
echo "=== Connectivity Tests Passed ==="
---
# templates/tests/test-database.yaml
apiVersion: v1
kind: Pod
metadata:
name: {{ include "myapp.fullname" . }}-test-database
labels:
{{- include "myapp.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
"helm.sh/hook-weight": "0"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
restartPolicy: Never
containers:
- name: test-db-connection
image: postgres:15-alpine
command:
- /bin/sh
- -c
- |
set -e
echo "=== Testing Database Connectivity ==="
# Test basic connection
echo "Connecting to database..."
if psql "$DATABASE_URL" -c "SELECT 1" > /dev/null 2>&1; then
echo "✓ Database connection successful"
else
echo "✗ Database connection failed"
exit 1
fi
# Verify schema version matches expected
echo "Checking schema version..."
SCHEMA_VERSION=$(psql "$DATABASE_URL" -t -c "SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1" 2>/dev/null | tr -d ' ')
EXPECTED_VERSION="{{ .Values.migrations.expectedVersion | default "" }}"
if [ -n "$EXPECTED_VERSION" ] && [ "$SCHEMA_VERSION" != "$EXPECTED_VERSION" ]; then
echo "✗ Schema version mismatch. Expected: $EXPECTED_VERSION, Got: $SCHEMA_VERSION"
exit 1
fi
echo "✓ Schema version: $SCHEMA_VERSION"
# Verify critical tables exist
echo "Verifying critical tables..."
{{- range .Values.tests.requiredTables }}
if psql "$DATABASE_URL" -c "SELECT 1 FROM {{ . }} LIMIT 1" > /dev/null 2>&1; then
echo "✓ Table '{{ . }}' exists and is accessible"
else
echo "✗ Table '{{ . }}' missing or inaccessible"
exit 1
fi
{{- end }}
echo "=== Database Tests Passed ==="
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ .Values.database.existingSecret | default (printf "%s-db-credentials" (include "myapp.fullname" .)) }}
key: url
---
# templates/tests/test-config.yaml
apiVersion: v1
kind: Pod
metadata:
name: {{ include "myapp.fullname" . }}-test-config
labels:
{{- include "myapp.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
restartPolicy: Never
containers:
- name: test-configuration
image: curlimages/curl:8.4.0
command:
- /bin/sh
- -c
- |
set -e
echo "=== Testing Application Configuration ==="
# Fetch and validate configuration via API
CONFIG=$(curl -s "http://{{ include "myapp.fullname" . }}:{{ .Values.service.port }}/api/config/public")
# Verify environment is correct
ENV=$(echo "$CONFIG" | grep -o '"environment":"[^"]*"' | cut -d'"' -f4)
EXPECTED_ENV="{{ .Values.environment }}"
if [ "$ENV" = "$EXPECTED_ENV" ]; then
echo "✓ Environment correctly set to: $ENV"
else
echo "✗ Environment mismatch. Expected: $EXPECTED_ENV, Got: $ENV"
exit 1
fi
# Verify feature flags if applicable
{{- if .Values.featureFlags }}
echo "Verifying feature flags..."
{{- range $flag, $enabled := .Values.featureFlags }}
FLAG_VALUE=$(echo "$CONFIG" | grep -o '"{{ $flag }}":[^,}]*' | cut -d':' -f2)
if [ "$FLAG_VALUE" = "{{ $enabled }}" ]; then
echo "✓ Feature flag '{{ $flag }}' = {{ $enabled }}"
else
echo "✗ Feature flag '{{ $flag }}' expected {{ $enabled }}, got $FLAG_VALUE"
exit 1
fi
{{- end }}
{{- end }}
echo "=== Configuration Tests Passed ==="
|
📝 Structure tests with hook-weight to control execution order. Run connectivity tests first (weight -5), then database tests (weight 0), then application-specific validations (weight 5+).
Managing Multi-Service Dependencies with Conditions and Aliases
Complex microservice architectures require sophisticated dependency management. Let’s build a parent chart that orchestrates multiple services with conditional inclusion and value overrides.
First, define dependencies in Chart.yaml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
| # Chart.yaml
apiVersion: v2
name: ecommerce-platform
description: Complete e-commerce platform with microservices
type: application
version: 1.0.0
appVersion: "2024.1"
dependencies:
# Core services - always required
- name: api-gateway
version: "2.x.x"
repository: "https://charts.example.com"
# Database - can use external or deploy PostgreSQL
- name: postgresql
version: "13.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
alias: db
# Cache - optional Redis deployment
- name: redis
version: "18.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
alias: cache
# Message queue - choose between RabbitMQ or Kafka
- name: rabbitmq
version: "12.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: messageQueue.rabbitmq.enabled
alias: mq-rabbit
- name: kafka
version: "26.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: messageQueue.kafka.enabled
alias: mq-kafka
# Microservices
- name: user-service
version: "1.x.x"
repository: "file://../user-service"
condition: services.user.enabled
- name: order-service
version: "1.x.x"
repository: "file://../order-service"
condition: services.order.enabled
- name: payment-service
version: "1.x.x"
repository: "file://../payment-service"
condition: services.payment.enabled
# Observability stack
- name: prometheus
version: "25.x.x"
repository: "https://prometheus-community.github.io/helm-charts"
condition: observability.prometheus.enabled
alias: metrics
- name: grafana
version: "7.x.x"
repository: "https://grafana.github.io/helm-charts"
condition: observability.grafana.enabled
alias: dashboards
|
Now create a comprehensive values file that configures all dependencies:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
| # values.yaml
global:
# Shared values available to all subcharts
imageRegistry: "registry.example.com"
imagePullSecrets:
- name: registry-credentials
storageClass: "fast-ssd"
# Service mesh configuration
serviceMesh:
enabled: true
istio:
mtls: STRICT
# Shared database credentials (subcharts reference these)
database:
host: "db-postgresql"
port: 5432
name: "ecommerce"
# Shared cache configuration
cache:
host: "cache-redis-master"
port: 6379
# PostgreSQL subchart configuration (aliased as 'db')
postgresql:
enabled: true # Set false to use external database
db:
auth:
postgresPassword: "" # Set via --set or external secret
username: "ecommerce"
password: ""
database: "ecommerce"
primary:
persistence:
enabled: true
size: 50Gi
storageClass: "fast-ssd"
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
metrics:
enabled: true
serviceMonitor:
enabled: true # Enable when observability.prometheus.enabled is true
# Redis subchart configuration (aliased as 'cache')
redis:
enabled: true
cache:
architecture: replication
auth:
enabled: true
password: ""
master:
persistence:
enabled: true
size: 10Gi
replica:
replicaCount: 2
persistence:
enabled: true
size: 10Gi
metrics:
enabled: true
# Message queue selection (mutually exclusive)
messageQueue:
rabbitmq:
enabled: true
kafka:
enabled: false
mq-rabbit:
auth:
username: "ecommerce"
password: ""
replicaCount: 3
persistence:
enabled: true
size: 20Gi
metrics:
enabled: true
# Microservices configuration
services:
user:
enabled: true
order:
enabled: true
payment:
enabled: true
user-service:
replicaCount: 3
image:
tag: "v1.2.0"
database:
host: "db-postgresql"
port: 5432
name:
|