Learn to deploy a private EKS cluster with zero public exposure. Step-by-step Terraform setup, OpenVPN configuration, and observability tools for production.
Securing Your EKS Cluster: A Hands-On Guide to Private Networking with OpenVPN Access
Every EKS tutorial starts the same way: create a cluster, expose the API endpoint publicly, configure kubectl, and you’re done. It works for demos. It’s terrible for production.
The moment you deploy a public EKS endpoint, you’ve opened a door that every scanner on the internet will find within hours. AWS’s control plane is hardened, but why accept that attack surface when you don’t have to? The real challenge isn’t creating a private cluster—it’s maintaining developer productivity after you do.
This guide walks through building an EKS cluster with zero public exposure, establishing VPN access that developers won’t hate, and deploying observability tools that function entirely within your private network. We’ll cover the decisions that trip up experienced engineers and the configurations that prevent 3 AM pages.
Prerequisites
Before starting, ensure you have:
- AWS CLI v2 configured with credentials that have EKS, VPC, and EC2 full access
- Terraform >= 1.5 (we’ll use it for infrastructure)
- kubectl >= 1.28
- Helm >= 3.12
- An AWS account with service quotas sufficient for 3+ nodes and a NAT Gateway
- A domain name you control (for VPN certificate management)
- Basic familiarity with Kubernetes networking concepts
📝 This guide uses us-east-1, but the architecture works in any region. Adjust availability zones accordingly.
Architecture and Key Concepts
The architecture follows a defense-in-depth approach. Your EKS control plane API lives entirely within AWS’s managed infrastructure with only private endpoint access. Worker nodes run in private subnets with no direct internet access. All outbound traffic routes through NAT Gateways, and all inbound developer access routes through an OpenVPN Access Server in a dedicated public subnet.
flowchart TB
subgraph Internet
DEV[Developer Workstation]
end
subgraph AWS VPC ["VPC 10.0.0.0/16"]
subgraph Public Subnets
NAT[NAT Gateway]
OVPN[OpenVPN Access Server<br/>10.0.1.x]
end
subgraph Private Subnets
subgraph EKS Cluster
CP[EKS Control Plane<br/>Private Endpoint Only]
NG1[Node Group AZ-a<br/>10.0.10.x]
NG2[Node Group AZ-b<br/>10.0.11.x]
end
subgraph Observability
PROM[Prometheus<br/>10.0.20.x]
GRAF[Grafana<br/>10.0.20.x]
end
end
end
DEV -->|OpenVPN UDP 1194| OVPN
OVPN -->|Private Routes| CP
OVPN -->|Private Routes| NG1
OVPN -->|Private Routes| NG2
NG1 & NG2 -->|Outbound Only| NAT
NAT -->|ECR, CloudWatch| Internet
CP --> NG1 & NG2
NG1 & NG2 --> PROM
PROM --> GRAF
Key Design Decisions
Private API endpoint only: The EKS control plane has no public endpoint. This means kubectl commands only work from within the VPC or through VPN.
Split tunneling disabled by default: All traffic from connected developers routes through the VPN. This prevents DNS leaks and ensures consistent access patterns.
NAT Gateway for outbound: Nodes need to pull images from ECR and send logs to CloudWatch. NAT Gateway provides this without exposing nodes directly.
Observability stays internal: Prometheus and Grafana run as workloads in the cluster, accessible only through VPN. No public dashboards, no exposed metrics endpoints.
Step-by-Step Implementation
Building the Private VPC Foundation
The VPC design prevents the most common security misconfiguration: accidentally placing resources in public subnets. We use explicit subnet naming and separate CIDR ranges for different purposes.
Create a file named vpc.tf:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
| # vpc.tf - Private-first VPC for EKS
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
locals {
cluster_name = "private-eks-cluster"
vpc_cidr = "10.0.0.0/16"
# Explicit subnet purposes prevent misplacement
public_subnets = {
"us-east-1a" = "10.0.1.0/24" # VPN and NAT only
"us-east-1b" = "10.0.2.0/24" # VPN and NAT only
}
private_subnets = {
"us-east-1a" = "10.0.10.0/24" # EKS nodes
"us-east-1b" = "10.0.11.0/24" # EKS nodes
}
# Separate range for observability workloads
observability_subnets = {
"us-east-1a" = "10.0.20.0/24"
"us-east-1b" = "10.0.21.0/24"
}
}
resource "aws_vpc" "main" {
cidr_block = local.vpc_cidr
enable_dns_hostnames = true # Required for EKS private endpoint
enable_dns_support = true # Required for EKS private endpoint
tags = {
Name = "${local.cluster_name}-vpc"
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
}
}
# Public subnets - only for NAT and VPN
resource "aws_subnet" "public" {
for_each = local.public_subnets
vpc_id = aws_vpc.main.id
cidr_block = each.value
availability_zone = each.key
map_public_ip_on_launch = false # Explicit, even in public subnet
tags = {
Name = "${local.cluster_name}-public-${each.key}"
Type = "public"
# Do NOT add kubernetes.io/role/elb tag - we don't want public LBs
}
}
# Private subnets for EKS nodes
resource "aws_subnet" "private" {
for_each = local.private_subnets
vpc_id = aws_vpc.main.id
cidr_block = each.value
availability_zone = each.key
tags = {
Name = "${local.cluster_name}-private-${each.key}"
Type = "private"
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
# Internet Gateway - only for VPN and NAT
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${local.cluster_name}-igw"
}
}
# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
domain = "vpc"
tags = {
Name = "${local.cluster_name}-nat-eip"
}
depends_on = [aws_internet_gateway.main]
}
# Single NAT Gateway (use multiple for production HA)
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public["us-east-1a"].id
tags = {
Name = "${local.cluster_name}-nat"
}
}
# Route table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${local.cluster_name}-public-rt"
}
}
# Route table for private subnets
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
tags = {
Name = "${local.cluster_name}-private-rt"
}
}
# Associate public subnets
resource "aws_route_table_association" "public" {
for_each = aws_subnet.public
subnet_id = each.value.id
route_table_id = aws_route_table.public.id
}
# Associate private subnets
resource "aws_route_table_association" "private" {
for_each = aws_subnet.private
subnet_id = each.value.id
route_table_id = aws_route_table.private.id
}
# VPC Endpoints for EKS private access
# These allow EKS nodes to access AWS APIs without internet routing
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = [for s in aws_subnet.private : s.id]
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${local.cluster_name}-ecr-api-endpoint"
}
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = [for s in aws_subnet.private : s.id]
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${local.cluster_name}-ecr-dkr-endpoint"
}
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private.id]
tags = {
Name = "${local.cluster_name}-s3-endpoint"
}
}
resource "aws_vpc_endpoint" "sts" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.sts"
vpc_endpoint_type = "Interface"
subnet_ids = [for s in aws_subnet.private : s.id]
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "${local.cluster_name}-sts-endpoint"
}
}
resource "aws_security_group" "vpc_endpoints" {
name = "${local.cluster_name}-vpc-endpoints-sg"
description = "Security group for VPC endpoints"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [local.vpc_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${local.cluster_name}-vpc-endpoints-sg"
}
}
|
⚠️ The VPC endpoints are critical. Without them, your EKS nodes cannot pull images from ECR or authenticate with AWS services. Many “private EKS doesn’t work” issues trace back to missing endpoints.
Deploying the Private EKS Cluster
Create eks.tf for the cluster configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
| # eks.tf - Private EKS cluster with no public endpoint
resource "aws_iam_role" "eks_cluster" {
name = "${local.cluster_name}-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
resource "aws_security_group" "eks_cluster" {
name = "${local.cluster_name}-cluster-sg"
description = "Security group for EKS cluster control plane"
vpc_id = aws_vpc.main.id
# Allow inbound from VPN subnet
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [for s in local.public_subnets : s] # VPN lives here
description = "Allow kubectl from VPN"
}
# Allow inbound from worker nodes
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [for s in local.private_subnets : s]
description = "Allow from worker nodes"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${local.cluster_name}-cluster-sg"
}
}
resource "aws_eks_cluster" "main" {
name = local.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = "1.29"
vpc_config {
subnet_ids = [for s in aws_subnet.private : s.id]
endpoint_private_access = true # Enable private endpoint
endpoint_public_access = false # Disable public endpoint completely
security_group_ids = [aws_security_group.eks_cluster.id]
}
# Enable control plane logging
enabled_cluster_log_types = [
"api",
"audit",
"authenticator",
"controllerManager",
"scheduler"
]
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_vpc_endpoint.ecr_api,
aws_vpc_endpoint.ecr_dkr,
aws_vpc_endpoint.s3,
aws_vpc_endpoint.sts,
]
tags = {
Name = local.cluster_name
}
}
# Node IAM role
resource "aws_iam_role" "eks_nodes" {
name = "${local.cluster_name}-node-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_node_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks_nodes.name
}
resource "aws_iam_role_policy_attachment" "eks_ecr_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks_nodes.name
}
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${local.cluster_name}-nodes"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = [for s in aws_subnet.private : s.id]
instance_types = ["t3.medium"]
scaling_config {
desired_size = 3
max_size = 5
min_size = 2
}
update_config {
max_unavailable = 1
}
depends_on = [
aws_iam_role_policy_attachment.eks_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_ecr_policy,
]
tags = {
Name = "${local.cluster_name}-nodes"
}
}
# Output the cluster endpoint for VPN configuration
output "cluster_endpoint" {
value = aws_eks_cluster.main.endpoint
}
output "cluster_ca_certificate" {
value = aws_eks_cluster.main.certificate_authority[0].data
sensitive = true
}
|
Apply the Terraform:
1
2
3
| terraform init
terraform plan
terraform apply
|
đź’ˇ The cluster takes 10-15 minutes to create. The private endpoint means you cannot access it yet—that’s intentional. We need the VPN first.
Deploying OpenVPN Access Server
OpenVPN Access Server provides a web UI for user management and client configuration distribution. We’ll deploy it on an EC2 instance in the public subnet.
Create openvpn.tf:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
| # openvpn.tf - OpenVPN Access Server deployment
variable "ssh_key_name" {
description = "Name of the SSH key pair for OpenVPN instance access"
type = string
}
variable "admin_cidr_blocks" {
description = "CIDR blocks allowed to access admin UI and SSH (restrict to your IP)"
type = list(string)
default = ["0.0.0.0/0"] # Override this in production!
}
# Find the latest OpenVPN Access Server AMI
data "aws_ami" "openvpn" {
most_recent = true
owners = ["679593333241"] # OpenVPN Inc.
filter {
name = "name"
values = ["OpenVPN Access Server Community Image*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
resource "aws_security_group" "openvpn" {
name = "${local.cluster_name}-openvpn-sg"
description = "Security group for OpenVPN Access Server"
vpc_id = aws_vpc.main.id
# Admin web UI - restrict to your IP in production
ingress {
from_port = 943
to_port = 943
protocol = "tcp"
cidr_blocks = var.admin_cidr_blocks
description = "Admin web UI"
}
# Client web UI
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "Client web UI"
}
# OpenVPN UDP
ingress {
from_port = 1194
to_port = 1194
protocol = "udp"
cidr_blocks = ["0.0.0.0/0"]
description = "OpenVPN UDP"
}
# SSH for initial setup - remove after configuration
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = var.admin_cidr_blocks
description = "SSH access"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${local.cluster_name}-openvpn-sg"
}
}
resource "aws_instance" "openvpn" {
ami = data.aws_ami.openvpn.id
instance_type = "t3.small"
subnet_id = aws_subnet.public["us-east-1a"].id
vpc_security_group_ids = [aws_security_group.openvpn.id]
associate_public_ip_address = true
key_name = var.ssh_key_name
root_block_device {
volume_size = 20
encrypted = true
}
tags = {
Name = "${local.cluster_name}-openvpn"
}
}
resource "aws_eip" "openvpn" {
instance = aws_instance.openvpn.id
domain = "vpc"
tags = {
Name = "${local.cluster_name}-openvpn-eip"
}
}
output "openvpn_public_ip" {
value = aws_eip.openvpn.public_ip
}
|
After deploying, SSH into the OpenVPN server and complete initial setup:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| ssh -i your-key.pem openvpnas@<openvpn_public_ip>
# The first login triggers the configuration wizard
# Key settings:
# - Agreement: yes
# - Primary Access Server node: yes
# - Network interface: (accept default, usually eth0)
# - Admin UI port: 943
# - TCP port: 443
# - UDP port: 1194
# - Route client traffic through VPN: yes (important for kubectl)
# - Route DNS through VPN: yes
# - Use local authentication: yes
# - Private subnets to route: 10.0.0.0/16 (your VPC CIDR)
# Set the admin password
sudo passwd openvpn
|
Configure the VPN routing for EKS access:
1
2
3
4
5
6
7
8
| # Access the admin UI at https://<openvpn_public_ip>:943/admin
# Navigate to VPN Settings > Routing
# Add these routes:
# - 10.0.0.0/16 (VPC CIDR - routes all VPC traffic through VPN)
#
# Under DNS Settings:
# - Set DNS servers to VPC DNS: 10.0
|