Highlights
- Kubernetes allows unrestricted pod to pod communication by default, and most clusters never change this.
- Network policies are the first layer of defense but only work if your CNI plugin supports them.
- Mutual TLS (mTLS) encrypts traffic between pods and verifies both endpoints.
- Service meshes like Istio, Linkerd, and Cilium automate mTLS provisioning and rotation.
- Workload identity eliminates the need for static cloud credentials inside pods.
- DNS based policies and FQDN egress filtering prevent pods from reaching unauthorized external endpoints.
- Zero trust networking treats every connection as untrusted regardless of network location.
- In a default Kubernetes cluster, every pod can communicate with every other pod across all namespaces without restriction.
The Default Kubernetes Network Model and Why It Is Dangerous
Kubernetes networking follows a simple design principle: every pod gets its own IP address, and every pod can reach every other pod without NAT. This flat networking model, defined in the Kubernetes networking specification, eliminates the complexity of port mapping and makes service discovery straightforward.
The problem is that this model treats the cluster network as a trusted zone. There is no built in segmentation between namespaces, no encryption of pod to pod traffic, and no authentication of service identity at the network level. A compromised pod in the frontend namespace can freely connect to database pods in the backend namespace, scan internal services, or exfiltrate data through unrestricted egress.
This matters because containers are inherently multi tenant. A typical production cluster runs dozens to hundreds of services, some facing the internet, some processing sensitive data, some running third party code. Without network controls, the security posture of your entire cluster is only as strong as your weakest workload.
Securing East/West Traffic: Pod to Pod Communication
Network Policies: The Foundation
Kubernetes NetworkPolicy resources let you define ingress and egress rules at the pod level using label selectors, namespace selectors, and CIDR blocks. They function like firewall rules for pod traffic.
Critical prerequisite: Your Container Network Interface (CNI) plugin must support NetworkPolicy enforcement. The following table summarizes support across common CNIs:
| CNI Plugin | NetworkPolicy Support | Additional Policy Features |
|---|---|---|
| Calico | Yes | Global network policies, DNS policies, host endpoint protection |
| Cilium | Yes | L7 policies, DNS aware filtering, identity based policies |
| Antrea | Yes | Tiered policies, ClusterNetworkPolicy |
| Weave Net | Yes | Basic NetworkPolicy only |
| Flannel | No | Requires Calico addon for policy support |
| kubenet | No | No policy engine |
| AWS VPC CNI | Partial | Requires Calico or Cilium addon |
Default deny policy: The single most impactful security control you can apply is a default deny ingress and egress policy in every namespace. This flips the model from "allow all" to "deny all," requiring explicit rules for every permitted connection.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: backend
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressAfter applying this policy, no pod in the backend namespace can receive incoming connections or make outgoing connections until you create allow rules.
Allow specific traffic: Create targeted policies that permit only the connections your application requires.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: backend
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
tier: frontend
podSelector:
matchLabels:
app: web
ports:
- protocol: TCP
port: 8080This policy allows only pods labeled app: web in namespaces labeled tier: frontend to connect to the API server on port 8080. Everything else is denied.
Egress controls for DNS: When you apply a default deny egress policy, you must explicitly allow DNS resolution or your pods will not be able to resolve service names.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-egress
namespace: backend
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53mTLS: Encrypting and Authenticating Pod to Pod Traffic
Network policies control which pods can communicate, but they do not encrypt the traffic or verify the identity of the communicating parties. On a shared cluster network, any pod with access to the underlying network can potentially sniff traffic between other pods using tools like tcpdump.
Mutual TLS (mTLS) solves both problems. Each pod presents a TLS certificate to prove its identity, and both sides of the connection validate the other's certificate before exchanging data. All traffic is encrypted in transit.
Implementing mTLS without a service mesh: You can configure mTLS directly in your applications by generating certificates with cert manager, distributing them as Kubernetes secrets, and configuring your application's TLS settings. This approach works but creates significant operational overhead: you need to manage certificate lifecycle, rotation, and revocation for every service.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-server-cert
namespace: backend
spec:
secretName: api-server-tls
issuerRef:
name: cluster-ca
kind: ClusterIssuer
commonName: api-server.backend.svc.cluster.local
dnsNames:
- api-server
- api-server.backend
- api-server.backend.svc.cluster.local
duration: 720h
renewBefore: 48hImplementing mTLS with a service mesh: A service mesh automates the entire mTLS lifecycle. It injects a sidecar proxy (or uses eBPF in Cilium's case) that handles certificate provisioning, rotation, and enforcement transparently.
Istio configures mesh wide mTLS with a single PeerAuthentication resource:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICTSetting mode: STRICT at the mesh level ensures every pod to pod connection within the mesh uses mTLS. Pods that cannot present a valid certificate are rejected.
Linkerd enables mTLS by default for all meshed workloads without any additional configuration. When you inject the Linkerd proxy into a deployment, it automatically provisions a TLS identity from the mesh's trust anchor and encrypts all communication with other meshed pods.
Cilium offers mTLS without sidecars using its eBPF dataplane and SPIFFE based identity system. This approach avoids the resource overhead of sidecar proxies while providing mutual authentication and encryption.
| Service Mesh | mTLS Model | Sidecar Required | L7 Policy Support | Resource Overhead |
|---|---|---|---|---|
| Istio | Envoy sidecar | Yes | Full (HTTP, gRPC, TCP) | Moderate to high |
| Linkerd | Linkerd2 proxy | Yes | HTTP and gRPC | Low |
| Cilium | eBPF with WireGuard or IPsec | No | L3/L4 plus HTTP | Very low |
Authorization Policies: L7 Traffic Control
Network policies operate at L3/L4, controlling which pods can connect on which ports. Service meshes extend this to L7, letting you define policies based on HTTP methods, paths, headers, and gRPC service names.
Istio AuthorizationPolicy example:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: api-server-authz
namespace: backend
spec:
selector:
matchLabels:
app: api-server
rules:
- from:
- source:
principals:
- "cluster.local/ns/frontend/sa/web-service"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/v1/orders", "/api/v1/orders/*"]
- from:
- source:
principals:
- "cluster.local/ns/monitoring/sa/prometheus"
to:
- operation:
methods: ["GET"]
paths: ["/metrics"]This policy allows the web-service service account in the frontend namespace to access the orders API using GET and POST, while Prometheus can only scrape the metrics endpoint. All other requests are denied.
Securing North/South Traffic: Pod to Cloud Communication
The Problem with Static Credentials
Many teams provision cloud access for their pods using static credentials stored as Kubernetes secrets: AWS access keys, GCP service account key files, or Azure service principal passwords. This approach introduces several risks.
Static credentials do not expire unless manually rotated. They can be exfiltrated from the cluster through compromised pods, exposed via environment variables in crash dumps, or leaked through misconfigured logging. If a single credential is compromised, the attacker has persistent access to the associated cloud resources until someone detects and revokes it.
Workload Identity: The Modern Approach
Every major cloud provider now supports workload identity federation, which lets Kubernetes pods authenticate to cloud APIs using short lived tokens tied to their Kubernetes service account identity. No static credentials are needed.
AWS IAM Roles for Service Accounts (IRSA):
IRSA creates a trust relationship between a Kubernetes service account and an IAM role. When a pod assumes that role, it receives temporary credentials (valid for 1 to 12 hours) from AWS STS via a projected service account token.
apiVersion: v1
kind: ServiceAccount
metadata:
name: s3-reader
namespace: data-pipeline
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/s3-read-only"apiVersion: apps/v1
kind: Deployment
metadata:
name: data-processor
namespace: data-pipeline
spec:
template:
spec:
serviceAccountName: s3-reader
containers:
- name: processor
image: myregistry.example.com/data-processor:v2.1
# No AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed
# The AWS SDK automatically uses the projected tokenGCP Workload Identity:
GCP Workload Identity binds a Kubernetes service account to a Google Cloud service account. Pods authenticated as the Kubernetes service account automatically receive federated tokens scoped to the Google Cloud service account's IAM permissions.
apiVersion: v1
kind: ServiceAccount
metadata:
name: bigquery-writer
namespace: analytics
annotations:
iam.gke.io/gcp-service-account: "bq-writer@myproject.iam.gserviceaccount.com"Azure Workload Identity Federation:
Azure uses federated identity credentials to map a Kubernetes service account to an Azure managed identity or app registration.
apiVersion: v1
kind: ServiceAccount
metadata:
name: keyvault-reader
namespace: secrets-management
annotations:
azure.workload.identity/client-id: "12345678-abcd-efgh-ijkl-123456789012"
labels:
azure.workload.identity/use: "true"Least Privilege for Cloud Access
Workload identity eliminates static credentials, but you still need to scope permissions correctly. Apply the principle of least privilege to every cloud IAM binding.
AWS: Use IAM policy conditions to restrict access by Kubernetes namespace and service account.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-pipeline-bucket",
"arn:aws:s3:::data-pipeline-bucket/*"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
}
]
}Avoid wildcard permissions. An IAM role with s3:* on * gives a compromised pod access to every S3 bucket in the account. Scope actions to the minimum required and resources to specific ARNs.
Restricting Instance Metadata Access
In cloud environments, nodes expose an instance metadata service (IMDS) at 169.254.169.254. Any pod on the node can query this endpoint to obtain the node's IAM credentials, which typically have broader permissions than individual pod roles.
Block IMDS access for pods that do not need it:
# Cilium network policy to block metadata service access
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: block-imds
namespace: default
spec:
endpointSelector: {}
egressDeny:
- toCIDR:
- "169.254.169.254/32"On AWS EKS, enable IMDSv2 with a hop limit of 1 to prevent containers from accessing node metadata:
aws ec2 modify-instance-metadata-options \
--instance-id i-1234567890abcdef0 \
--http-tokens required \
--http-put-response-hop-limit 1Egress Security: Controlling Outbound Traffic
DNS Based Egress Policies
Standard Kubernetes network policies support egress filtering by IP address and CIDR block, but cloud service endpoints use dynamic IP ranges that change frequently. DNS based egress policies let you filter by domain name instead.
Cilium DNS aware egress policy:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-s3-egress
namespace: data-pipeline
spec:
endpointSelector:
matchLabels:
app: data-processor
egress:
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: kube-system
k8s:k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: ANY
- toFQDNs:
- matchPattern: "*.s3.amazonaws.com"
- matchPattern: "*.s3.us-east-1.amazonaws.com"
toPorts:
- ports:
- port: "443"
protocol: TCPThis policy allows the data processor to resolve DNS and connect to S3 endpoints on port 443, but blocks all other egress traffic.
Egress Gateways
For workloads that need to communicate with external APIs or on premises systems that use IP allowlisting, egress gateways provide a stable source IP for outbound traffic. Istio, Cilium, and dedicated egress gateway controllers route outbound traffic through specific nodes with static IPs.
Putting It All Together: A Layered Security Architecture
The following layers build on each other to create a zero trust networking posture for your Kubernetes cluster:
Layer 1: Network segmentation. Apply default deny network policies in every namespace. Create allow rules for each required communication path. This prevents unauthorized lateral movement.
Layer 2: Mutual authentication and encryption. Deploy a service mesh or configure mTLS with cert manager. Ensure every pod to pod connection is authenticated and encrypted. This prevents eavesdropping and impersonation.
Layer 3: Application layer authorization. Use service mesh authorization policies to control which services can call which endpoints with which HTTP methods. This enforces least privilege at the API level.
Layer 4: Cloud identity federation. Replace all static cloud credentials with workload identity. Scope IAM permissions to the minimum required. Block IMDS access from pods. This limits the blast radius of a compromised pod to only the cloud resources it legitimately needs.
Layer 5: Egress filtering. Restrict outbound traffic to known, required destinations using DNS based policies. Route traffic through egress gateways where IP allowlisting is required. This prevents data exfiltration and command and control communication.
Common Pitfalls and How to Avoid Them
Pitfall 1: Applying network policies without testing. A default deny policy applied to a namespace without corresponding allow rules will break every application in that namespace. Always test in a staging environment first, or deploy in audit mode (supported by Calico and Cilium) before enforcing.
Pitfall 2: Setting mTLS to PERMISSIVE mode permanently. PERMISSIVE mode accepts both plaintext and mTLS connections, making it useful for migration. However, leaving it in PERMISSIVE mode indefinitely defeats the purpose. Plan a migration timeline and switch to STRICT mode.
Pitfall 3: Overly broad IAM roles for workload identity. Replacing static credentials with workload identity is a significant improvement, but attaching an AdministratorAccess policy to a pod's IAM role recreates the same risk. Audit IAM permissions quarterly.
Pitfall 4: Forgetting to secure the service mesh control plane. The Istio control plane (istiod) issues certificates and distributes configuration to every sidecar. If compromised, an attacker can disable mTLS, modify routing, or inject malicious sidecars. Harden the mesh control plane with RBAC, network policies, and resource limits.
Conclusion
Securing Kubernetes communication requires addressing both internal (pod to pod) and external (pod to cloud) traffic with distinct but complementary controls. Network policies provide segmentation, mTLS provides authentication and encryption, L7 authorization policies enforce least privilege at the application level, and workload identity eliminates static credentials for cloud access.
Start with default deny network policies and workload identity. These two controls alone eliminate the majority of common attack vectors: lateral movement inside the cluster and credential theft for cloud services. Layer in mTLS and L7 policies as your security posture matures. Treat this as a progressive journey, not a single project, and measure your progress by the percentage of namespaces with enforced policies and the number of static credentials remaining in your cluster.
FAQs
Q1: What are the prerequisites for implementing Kubernetes network policies?
The primary prerequisite is a CNI plugin that supports NetworkPolicy enforcement. Calico, Cilium, and Antrea are the most widely used options. If your cluster uses Flannel or the default kubenet CNI, you will need to install a policy capable CNI alongside it or replace it entirely. On managed Kubernetes services, check your provider's documentation. EKS uses the AWS VPC CNI by default, which does not support network policies natively. You need to install Calico or Cilium as an addon. GKE supports network policies with its built in Dataplane V2 (Cilium based) if you enable it during cluster creation. AKS supports network policies through Azure NPM or Calico. Beyond the CNI requirement, you need a solid understanding of your application's communication patterns so you can write accurate allow rules after applying a default deny policy.
Q2: How does mTLS differ from standard TLS in Kubernetes pod communication?
Standard TLS encrypts traffic and lets the client verify the server's identity, but the server does not verify the client. This is the model used by HTTPS in web browsers. Mutual TLS adds a second verification step: the server also demands a certificate from the client and validates it against a trusted certificate authority. In a Kubernetes context, this means both the calling service and the receiving service prove their identity before exchanging data. This prevents a compromised pod from impersonating a legitimate service, which standard TLS cannot do. Service meshes make mTLS operationally feasible by automating certificate issuance, rotation, and revocation. Without a service mesh, you would need to manage certificates for every service manually using tools like cert manager, which adds significant complexity. Istio Service Mesh course provides labs that walk through mTLS configuration and verification in a live cluster.
Q3: Should I use Istio, Linkerd, or Cilium for mTLS in my cluster?
The choice depends on your requirements and operational constraints. Istio provides the most feature rich policy engine with granular L7 authorization, traffic management, and observability, but it consumes more resources and has a steeper learning curve. Linkerd is lighter weight, simpler to operate, and enables mTLS by default with minimal configuration, making it ideal for teams that want encrypted communication without complex traffic management features. Cilium uses eBPF instead of sidecar proxies, resulting in lower latency and resource overhead, and it integrates network policy enforcement with mTLS in a single tool. If you already run Cilium as your CNI, adding mTLS is straightforward. For teams that need advanced traffic management and L7 policies, Istio is the strongest option. For teams that want simplicity and minimal overhead, Linkerd or Cilium are better choices.
Q4: How does workload identity work, and does it eliminate all credential risks?
Workload identity creates a trust relationship between a Kubernetes service account and a cloud IAM identity (an IAM role on AWS, a Google Cloud service account on GCP, or a managed identity on Azure). When a pod runs with that service account, the Kubernetes API server issues a short lived, signed token. The pod presents this token to the cloud provider's security token service, which validates it against the trust relationship and returns temporary cloud credentials. These credentials expire automatically, typically within 1 to 12 hours. Workload identity eliminates static credential risks like long lived access keys stored in secrets, but it does not eliminate all credential risks. You still need to scope IAM permissions carefully. A pod with workload identity bound to an overly permissive IAM role still has more access than it should. Additionally, the node's own IAM role may be accessible through the instance metadata service if you do not restrict IMDS access.
Q5: Can I implement pod to pod security without a service mesh?
Yes, but with more manual effort. Network policies handle traffic segmentation without any additional components beyond a policy capable CNI. For mTLS without a service mesh, you can use cert manager to issue TLS certificates to each service and configure your applications to present and verify client certificates. This approach works for smaller deployments with a handful of services. For encryption without authentication, you can enable WireGuard or IPsec at the CNI level. Both Calico and Cilium support transparent encryption of all pod to pod traffic without requiring any application changes or sidecar proxies. The tradeoff is that CNI level encryption encrypts all traffic between nodes but does not provide per pod identity verification the way mTLS does. For most teams with more than ten services, a lightweight service mesh like Linkerd pays for itself in operational simplicity compared to managing certificates and TLS configuration manually across every service.
Q6: How do I test network policies before enforcing them in production?
Start by deploying policies in audit or logging mode. Calico supports a Log action in its global network policy resources, which logs connections that would be denied without actually blocking them. Cilium supports a similar audit mode through its policy enforcement configuration. Review the logs for a week to identify legitimate traffic patterns that your policies would block. Alternatively, deploy your policies in a staging cluster that mirrors production and run integration tests to verify that all service to service communication paths function correctly. Tools like Illuminatio and Cyclonus can generate test traffic to validate network policy behavior against expected outcomes. Once you are confident in your policies, apply them to production namespaces one at a time, starting with the least critical workloads. Monitor application error rates and latency closely for 48 hours after each enforcement change. Kubernetes for Beginners and CKA certification path cover network policy fundamentals with interactive labs for practice.
Discussion