Deep Dive Into Karpenter

kubernetes aws karpenter
Karpenter logo over blurred blue sparkled background

Here I follow up on my previous article about autoscaling in Kubernetes with a deep dive into Karpenter with some examples in AWS EKS.

Karpenter is a node provisioning system for Kubernetes that reacts directly to pending pods rather than abstract capacity signals. It observes unschedulable pods, evaluates their scheduling constraints, and creates nodes that are shaped to the workload rather than forcing workloads to adapt to pre-defined node groups. Karpenter does not replace Kubernetes scheduling. It does not manage pods, priorities, or disruption by itself. It also does not eliminate the need for capacity planning or cost awareness. What it changes is where decisions are made. Instead of encoding capacity assumptions into node pools ahead of time, those decisions move closer to the actual workload. This shift is powerful, but it also changes the operational model of a cluster. Many of the issues teams encounter with Karpenter come from treating it as a faster Cluster Autoscaler rather than as a different approach to node lifecycle management.

Architecture and Core Concepts

Karpenter runs as a controller in the cluster. Its core loop is simple: watch for unschedulable pods, evaluate constraints, and provision nodes that satisfy those constraints. The details matter. The main concepts are:

  • Provisioners or NodePools: Define high-level constraints such as instance families, zones, capacity type, labels, and taints.
  • NodeClasses: Describe provider-specific configuration, such as AMI selection, subnets, security groups, and block device mappings.
  • Requirements: Hard scheduling constraints derived from pod specs, including CPU, memory, architecture, operating system, and topology rules.
  • Disruption and consolidation: Rules that allow Karpenter to replace, downsize, or remove nodes when capacity is no longer needed.

Karpenter evaluates all of these together. It does not create nodes randomly and hope the scheduler makes things fit. The node shape is an output of the scheduling problem.

Installation

The installation process is straightforward, but it touches multiple systems. Most problems appear later because something was glossed over here.

To install Karpenter into an AWS EKS cluster, you first need to set up the necessary infrastructure prerequisites, including an IAM Role for Nodes (with an Instance Profile) and an IAM Role for the Controller using IAM Roles for Service Accounts (IRSA). You must also ensure your cluster's subnets and security groups are tagged with karpenter.sh/discovery: so the controller can identify where to provision resources. Once the IAM permissions and tags are in place, install Karpenter using Helm by pointing to the official OCI repository (typically oci://public.ecr.aws/karpenter/karpenter), passing in your cluster name, endpoint, and the Controller's IAM Role ARN as chart values. Finally, you must deploy a NodePool and an EC2NodeClass custom resource to define your scaling constraints and AWS-specific configurations, effectively replacing the legacy Provisioner model.

Quick Setup Checklist

  • Permissions: Create the KarpenterNodeRole and KarpenterControllerRole.
  • Discovery: Tag your subnets and security groups so Karpenter knows where to work.
  • Installation: Use Helm to deploy the controller:
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --namespace karpenter --create-namespace \
  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=${KARPENTER_IAM_ROLE_ARN}" \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.interruptionQueue=${CLUSTER_NAME}

Cloud Provider Prerequisites

In an EKS environment, Karpenter requires IAM permissions to interact with EC2 and supporting AWS services. These permissions are commonly granted using IAM Roles for Service Accounts (IRSA), which binds a Kubernetes service account to an IAM role.

At a minimum, Karpenter needs to be able to:

  • Launch and terminate EC2 instances.
  • Describe instance types, pricing, and availability.
  • Attach network interfaces and volumes.
  • Apply and read tags for discovery and cost tracking.

Note on Node Authorization: A critical step often overlooked is mapping the Node IAM Role (the role used by the instances Karpenter creates) to the aws-auth ConfigMap or configuring it via EKS Access Entries. Without this mapping, nodes will launch in EC2 but fail to join the cluster, remaining in a NotReady state.

A simplified example IAM policy illustrates the scope involved:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:RunInstances",
        "ec2:TerminateInstances",
        "ec2:CreateLaunchTemplate",
        "ec2:DeleteLaunchTemplate",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeAvailabilityZones",
        "ec2:CreateTags"
      ],
      "Resource": "*"
    }
  ]
}

In practice, this policy is usually split and constrained using conditions, such as restricting actions to resources with specific tags or limiting instance creation to known subnets. When using IRSA, this policy is attached to an IAM role referenced by the Karpenter service account:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: karpenter
  namespace: karpenter
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/KarpenterControllerRole

This model avoids static credentials and aligns with standard EKS security practices.

Controller Installation

On EKS, the controller is commonly deployed into the karpenter namespace and associated with an IAM role via IRSA. The Helm values usually include the cluster name, the AWS region, and the interruption queue name used for spot capacity handling.

Spot Interruption Handling: For Karpenter to manage Spot capacity gracefully, you must configure an SQS Interruption Queue. Karpenter listens to this queue for Spot Interruption Notices and EC2 Rebalance Recommendations. This allows the controller to proactively cordon and drain workloads before the node is reclaimed by AWS.

Karpenter is installed using Helm. A typical installation includes:

  • The Karpenter controller deployment.
  • CRDs for NodePools and NodeClasses.
  • Configuration pointing to the cluster name and provider.

At this stage, Karpenter can run but cannot create nodes until NodeClasses and NodePools exist.

Initial Validation

After installation, confirm:

  • The controller is running and stable.
  • Webhooks are registered.
  • No reconciliation errors appear in logs.

Skipping this validation often leads to confusion later when provisioning fails silently due to missing permissions or invalid CRDs.

Initial Configuration

Initial configuration determines whether Karpenter behaves as a predictable capacity mechanism or an opaque one. Decisions made here establish the policy surface area that later governance relies on.

EC2NodeClass (Infrastructure)

On EKS, the EC2NodeClass maps directly to EC2 configuration. This is where you define the "How" of node provisioning: AMI selection, disk sizing, networking, and tagging. These should be treated as durable defaults rather than tuning knobs to ensure infrastructure consistency.

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: general-purpose
spec:
  role: KarpenterNodeRole-my-cluster # Must be mapped in aws-auth or Access Entries
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  tags:
    cost-center: platform

NodePools (Policy)

NodePools encode policy rather than preference. They establish the framework for your provisioning strategy, setting the guardrails for instance families, capacity types (Spot vs. On-Demand), and topology constraints. Well-defined NodePools keep cluster behavior understandable as the environment scales.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: app-default
spec:
  template:
    spec:
      nodeClassRef:
        name: general-purpose
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["m", "c", "r"]
        - key: karpenter.k8s.aws/capacity-type
          operator: In
          values: ["spot", "on-demand"]
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h

Pod-Level Assumptions

Karpenter's effectiveness depends on accurate pod specifications. Resource requests, topology constraints, and tolerations are not optional hints; they are the primary inputs used to calculate node sizing. If pod requests are omitted or inaccurate, Karpenter cannot "right-size" the infrastructure, leading to either resource waste or scheduling failures.

Things That Are Commonly Overlooked

Interaction With Existing Node Groups

Many clusters run Karpenter alongside managed node groups. This hybrid model can work, but it needs intent. Questions to answer early:

  • Which workloads should never land on Karpenter nodes?
  • Which node groups are considered baseline capacity?
  • How does disruption differ between the two?

Without clear answers, capacity behavior becomes difficult to reason about.

Pod Disruption Budgets

Karpenter consolidation respects Pod Disruption Budgets, but only if they exist. Workloads without budgets may experience more churn than expected. Audit critical workloads and ensure disruption budgets reflect operational reality.

A simple Pod Disruption Budget for a stateless service:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-service-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api-service

This ensures that consolidation or node expiration does not reduce capacity below an acceptable threshold.

Karpenter consolidation respects Pod Disruption Budgets, but only if they exist. Workloads without budgets may experience more churn than expected.

Audit critical workloads and ensure disruption budgets reflect operational reality.

DaemonSets and System Overhead

DaemonSets establish the baseline cost of every node. Karpenter accounts for their resource requests during sizing, but only when those requests are defined.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-observability
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: node-observability
  template:
    metadata:
      labels:
        app: node-observability
    spec:
      containers:
        - name: agent
          image: example/agent:1.2
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

On each node, these requests are subtracted from allocatable capacity before pending pods are evaluated. When DaemonSets omit requests, Karpenter assumes zero overhead and may select instance sizes that saturate immediately after initialization.

Explicit requests turn DaemonSets from implicit assumptions into predictable inputs for node sizing. Karpenter accounts for DaemonSet overhead when sizing nodes. That only works if requests are set correctly. Underspecified DaemonSets lead to nodes that look sufficient on paper but fill up immediately.

Node Expiration

Node expiration is a powerful tool for hygiene and patching. It is also easy to misuse. Short expiration times increase churn. Long expiration times reduce the benefit. Choose values deliberately and monitor the effects.

Operational Concerns

Cost Visibility

In AWS, cost visibility often depends on consistent tagging. Ensure that Karpenter-applied tags propagate to EC2 instances so that Cost Explorer and CUR reports can break down spend by NodePool or workload class. Karpenter can reduce waste, but it can also surface uncomfortable truths about workload behavior. Track the following:

  • Cost by NodePool
  • Cost by capacity type
  • Frequency of node churn

Unexpected cost increases often point to missing constraints rather than bugs.

Failure Modes

Karpenter introduces new failure modes:

  • API throttling from the cloud provider
  • Quota exhaustion
  • Image or bootstrap failures

Alert on provisioning failures and reconcile errors. Silent failures are the most damaging.

Debugging Provisioning Decisions

When Karpenter does not create nodes, the reason is usually visible in events or logs. Develop the habit of inspecting:

  • Pending pod events
  • Karpenter controller logs
  • Evaluated scheduling constraints

This shortens incident response significantly.

Ongoing Governance as Workloads Change

Evolving NodePools

NodePools should evolve with workloads. New instance types, new architectures, and new pricing models appear regularly. Review NodePool definitions on a schedule, not only when something breaks.

Guardrails Over Time

As teams gain confidence, there is a tendency to relax constraints. This often leads to unpredictable clusters. Strong guardrails include:

  • Explicit instance family allow lists
  • Clear separation between spot and on-demand workloads
  • Labeling that reflects ownership

Observability and Feedback Loops

Karpenter works best when its decisions are visible. Useful signals include:

  • Time to schedule after scale events
  • Node lifetime distributions
  • Consolidation frequency

Use these signals to adjust configuration rather than relying on intuition.

Where This Leaves Us

Karpenter changes how capacity decisions are made in a Kubernetes cluster. It rewards accurate workload specifications, clear policy boundaries, and active governance. Teams that treat it as a drop-in replacement for older autoscaling approaches often struggle. Teams that embrace the shift toward workload-driven provisioning tend to see more predictable behavior over time.

The real work begins after installation. Configuration, observation, and iteration determine whether Karpenter becomes a quiet workhorse or a constant source of surprises.

Previous Post Next Post