Implementing Zero-Trust with Cilium

2026-02-20 platform kubernetes postgresql cilium kafka

Traditional Kubernetes security often stops at Ingress + TLS + application authentication. But attackers rarely respect architectural diagrams. Once inside a cluster, lateral movement becomes trivial unless the network itself enforces identity.

Cilium's eBPF identity-aware networking changes the model. Instead of trusting IP ranges or firewalls, the kernel enforces rules based on workload identity, protocol semantics, and behavior before packets reach the container runtime. I will work through a typical application stack running in Kubernetes. I will save some of the more complex configurations like database HA for another day.

Architecture:

Internet → Load Balancer → Frontend → Backend API → Postgres + Kafka

Ingress & Edge Security

The first phase focuses on establishing a trustworthy boundary at the moment traffic enters the cluster. A load balancer may terminate TLS and redirect HTTP to HTTPS, but infrastructure configuration alone is not a security control — it is a convenience feature. The objective of this phase is to ensure the cluster itself independently validates that only encrypted, expected traffic reaches the frontend workloads. Even if the load balancer is misconfigured, compromised, or bypassed through internal routing, the application pods should behave as though they only exist behind a hardened HTTPS gateway.

To accomplish this, we treat the load balancer as an external actor whose behavior must be verified rather than assumed. Cilium policies will define what the frontend considers legitimate traffic, not merely what the load balancer intends to send. The goal is to anchor trust inside the cluster: the frontend accepts connections only from the correct source and only on the correct protocol, and observability tooling confirms this continuously. By the end of this phase, the edge of the system is no longer defined by infrastructure but by enforceable kernel policy.

Enforce HTTPS Only

This snippet defines a Cilium network policy that restricts which traffic can reach a specific group of pods. It targets all pods in the frontend namespace labeled role: frontend and permits only inbound TCP connections on port 443, effectively ensuring that these frontend workloads accept HTTPS traffic exclusively while blocking all other ingress paths.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: frontend-https-only
  namespace: frontend
spec:
  endpointSelector:
    matchLabels:
      role: frontend
  ingress:
  - toPorts:
    - ports:
      - port: "443"
        protocol: TCP

Once this policy exists, all other ports are implicitly denied. Even if a container exposes port 80, packets are dropped at the kernel before reaching the application stack.

Restrict to Load Balancer

Deploy a Cilium network policy that limits access to your frontend pods so they only accept HTTPS traffic originating from your load balancer’s subnet. The policy selects all pods labeled role: frontend in the frontend namespace and permits ingress exclusively from the 10.0.0.0/24 CIDR block, restricting traffic to TCP port 443. This ensures that only trusted upstream infrastructure can reach the frontend service, reinforcing a controlled and secure traffic path.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: frontend-lb-only
  namespace: frontend
spec:
  endpointSelector:
    matchLabels:
      role: frontend
  ingress:
  - fromCIDR:
    - 10.0.0.0/24
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

Only traffic originating from the trusted ingress range can reach frontend pods, eliminating direct pod-to-pod exposure.

Verify Redirect

Observing dropped HTTP and allowed HTTPS confirms the redirect behavior is enforced by infrastructure and verified by kernel telemetry.

hubble observe --port 80
hubble observe --port 443

API Layer Identity Protection

With the public entry secured, the next step is to prevent lateral movement inside the cluster. The backend API represents the core business logic, so the question shifts from “is this traffic encrypted?” to “is this caller allowed to exist?” In a traditional network, services rely on network location or internal routing to imply trust. Here, we deliberately remove that assumption and require each request to prove its origin and intent before the application even evaluates it.

The objective of this phase is to turn the network into a contract validator. Only the frontend service should be capable of reaching the API, and only in ways the API expects to be used. Requests that lack authentication indicators or attempt unsupported operations should never reach application code. By enforcing identity, paths, and methods at the network layer, the API becomes insulated from malformed or malicious requests, reducing load, attack surface, and the likelihood that security depends on developer discipline.

Only Frontend May Call API

Apply a Cilium network policy that restricts access to backend API pods so they only accept traffic from authorized frontend workloads. The policy selects all pods in the backend namespace labeled role: backend-api and permits ingress exclusively from pods carrying the role: frontend label. It further limits allowed connections to TCP port 8080, ensuring that only the intended frontend services can communicate with the backend API over the designated application port.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-frontend-only
  namespace: backend
spec:
  endpointSelector:
    matchLabels:
      role: backend-api
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP

Traffic is allowed only when the calling workload identity matches the frontend role, preventing spoofing or lateral movement.

Restrict Methods and Paths

Apply a Cilium network policy that tightly controls how frontend workloads may interact with backend API pods. The policy selects all pods labeled role: backend-api in the backend namespace and allows ingress only from pods carrying the role: frontend label. It further restricts permitted traffic to TCP port 8080 and enforces HTTP‑level rules that allow only GET and POST requests targeting paths under /api/v1/. Any other HTTP methods—such as PUT, PATCH, or DELETE—as well as requests to paths outside the /api/v1/ prefix, are explicitly disallowed and will be blocked.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-http-restrictions
  namespace: backend
spec:
  endpointSelector:
    matchLabels:
      role: backend-api
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/api/v1/.*"
        - method: "POST"
          path: "/api/v1/.*"

The network layer now enforces the API contract so unexpected verbs or paths never reach application code.

Require Authorization Header

This step adds an HTTP‑level rule that ensures all incoming requests include an Authorization header matching the provided regular expression. By defining a Cilium HTTP rule that checks for the presence of Authorization: .*, the policy verifies only that an authorization header exists in the request, nothing more. It does not validate the header’s contents, structure, token type, signature, or authenticity; it simply enforces that some value is present in the Authorization field. Authentication and authorization still need to be validated.

rules:
  http:
  - path: "/api/v1/.*"
    method: "GET"
    headers:
    - "Authorization: .*"

Unauthenticated requests are rejected before consuming application CPU resources.

Protecting The Data

Databases should have the smallest possible worldview. Unlike application services, they are not meant to participate in general network communication — they serve a single purpose for a small set of callers. This phase focuses on shrinking the database’s reachable universe to exactly one identity and one port. If any other component attempts access, the cluster should behave as though the database does not exist.

The objective is containment. Even if another workload is compromised, the attacker gains no network path to the persistence layer. Additionally, the database should not be capable of initiating outbound communication, eliminating common exfiltration patterns such as reverse shells or command callbacks. By the end of this phase, the database operates as a sealed dependency: reachable only by the API and incapable of independently interacting with the rest of the environment.

Only API Can Reach Postgres

Next apply a Cilium network policy that limits which workloads may connect to your PostgreSQL database. The policy selects all pods labeled app: postgres in the database namespace and allows ingress only from pods carrying the role: backend-api label. It further restricts permitted connections to TCP port 5432, ensuring that only the backend API service can initiate database traffic. Any other source pods, namespaces, or ports are not allowed to reach the PostgreSQL instance under this policy.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: postgres-api-only
  namespace: database
spec:
  endpointSelector:
    matchLabels:
      app: postgres
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: backend-api
    toPorts:
    - ports:
      - port: "5432"
        protocol: TCP

The database accepts connections exclusively from the backend API identity, preventing compromised frontend access.

Deny All Outbound Connections

Apply a Cilium network policy that prevents PostgreSQL pods from initiating any outbound connections. The policy selects all pods labeled app: postgres in the database namespace and defines an empty egress list, which explicitly blocks all outgoing traffic. Because no destinations, ports, or protocols are permitted, the database pods cannot reach any external services, APIs, or other workloads. This policy does not validate or inspect any specific protocols or destinations—it simply enforces a complete egress deny posture.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: postgres-egress-deny
  namespace: database
spec:
  endpointSelector:
    matchLabels:
      app: postgres
  egress: []

The database becomes unable to initiate outbound communication, blocking reverse shells or data exfiltration attempts.

Kafka Message Bus Security

A message bus introduces a different kind of risk because it intentionally connects multiple producers and consumers. Unlike a database, the goal is not to isolate access entirely but to precisely control how data flows between participants. This phase focuses on ensuring each service interacts with Kafka only in the way its role requires — publishing specific events or consuming specific streams, but never both indiscriminately.

The objective is to convert Kafka from a shared communication medium into a governed exchange. Internal services should be identified by workload identity, while external partners must be restricted by explicit trust boundaries. Beyond connectivity, the cluster will enforce which topics and operations are permitted. The result is a messaging layer where access reflects data ownership, preventing accidental cross-service visibility and limiting the blast radius of compromised credentials.

Internal vs External Clients

Allow access for the known workloads. Internal workloads are authenticated by identity while external consumers are restricted by known network ranges.

ingress:
  - fromEndpoints:
    - matchLabels:
        kafka-client: internal
  - fromCIDR:
    - 10.20.30.0/24

Assumptions made when defining traffic sources using either endpoint labels or CIDR ranges. When a policy uses fromEndpoints with a selector such as kafka-client=internal, it assumes that this label is applied consistently and accurately to the intended pods, and that no untrusted workload can obtain it. The policy does not validate namespace boundaries, workload identity, or whether such pods actually exist—it simply trusts the label. Likewise, when a policy allows traffic from a CIDR block like 10.20.30.0/24, it assumes that the subnet represents a trusted and well‑controlled network segment. Cilium does not verify what systems reside in that range. It relies on the underlying network to enforce those guarantees.

Topic Authorization

Apply this policy to upgrade from simple port-blocking to eBPF-driven Deep Packet Inspection (DPI). It restricts backend-api access on port 9092 to two specific actions: producing to the orders topic and consuming from events. This granular approach ensures that even if a service is breached, it remains locked out of unauthorized topics and data streams.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: kafka-topic-policy
  namespace: kafka
spec:
  endpointSelector:
    matchLabels:
      app: kafka
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: backend-api
    toPorts:
    - ports:
      - port: "9092"
        protocol: TCP
      rules:
        kafka:
        - role: "produce"
          topic: "orders"
        - role: "consume"
          topic: "events"

The kernel enforces which services may publish or consume specific topics, preventing cross-tenant data access. Rather than granting the backend full reign over the bus, this configuration specifically white-lists two distinct actions. This ensures that even if a backend service is compromised, it cannot probe other topics or disrupt sensitive data streams elsewhere in the pipeline.

ClientID Filtering

Add a layer of identity-based authorization to your Kafka traffic. Beyond just restricting the topic, it uses eBPF to validate the clientID field within the Kafka protocol itself. By requiring the ID payment-service to match the produce action on the orders topic, you ensure that only authorized service instances can write data, effectively preventing credential spoofing or impersonation attacks within the cluster.

toPorts:
- ports:
  - port: "9092"
    protocol: TCP
  rules:
     kafka:
     - role: "produce"
       topic: "orders"
       clientID: "payment-service"

Authorization is tied to the logical service identity embedded in Kafka protocol metadata.

Visibility and Compliance

Security controls only matter if they can be demonstrated and observed in real time. After enforcing boundaries across entry, services, persistence, and messaging, the final phase ensures the system can continuously prove those boundaries exist. Rather than trusting configuration files, we rely on runtime telemetry to confirm that the architecture behaves exactly as designed.

The objective here is evidence. Every allowed and denied connection should be visible, unexpected communication should stand out immediately, and historical records should exist for auditing purposes. By exporting flow data and observing live traffic patterns, the cluster produces an operational map of its own security posture. At the end of this phase, the environment is not only secured but verifiably secure, turning architecture diagrams into measurable behavior.

Observe Denied Traffic

hubble observe --verdict DROPPED

Denied flows immediately reveal attempted lateral movement or misconfiguration.

Export Flow Logs

hubble:
  metrics:
    enabled:
    - dns
    - drop
    - tcp

Flow records can be exported into SIEM platforms for auditing and forensic retention.

Where This Leaves Us

The cluster now enforces trust boundaries at multiple layers simultaneously rather than relying on application correctness or infrastructure assumptions. The load balancer provides the public interface, but kernel enforcement ensures only encrypted traffic is accepted and only from the expected source. The frontend can communicate only with the API and only through approved operations, transforming the network into a contract validator instead of a passive transport layer.

The backend API becomes an identity-verified service rather than an addressable endpoint. Requests lacking authentication indicators never consume compute resources, and unexpected operations are filtered before they enter application logic. The database exists as a sealed dependency reachable only by a single caller and incapable of initiating outbound connections, eliminating common lateral movement and exfiltration patterns.

Kafka shifts from being a shared pipe to a governed data exchange where workloads are restricted to specific topics and behaviors. Visibility tooling provides a continuously provable security posture, making every allowed and denied connection observable and auditable. Instead of trusting configuration, the cluster now produces runtime evidence of enforcement, enabling both operational confidence and regulatory compliance.