Post

DevOps - Kubernetes: Mastering Resource Management with Quotas and LimitRanges

DevOps - Kubernetes: Mastering Resource Management with Quotas and LimitRanges

The Chaos of the Ungoverned Cluster

In the early stages of Kubernetes adoption, the focus is often on simply getting applications to run. However, as more teams move to a shared production cluster, a “Tragedy of the Commons” occurs. A single microservice with a memory leak or an infinite loop in a Java application can consume all available resources on a node, triggering the OOM (Out of Memory) killer and starving critical system components like the Kubelet or API server.

A smart DevOps engineer knows that stability is not accidental. It requires strict resource governance using ResourceQuotas and LimitRanges. This guide explains how to enforce fiscal and technical responsibility within your namespaces in a 2020 production environment.

1. ResourceQuotas: The Hard Ceiling

ResourceQuotas are applied at the namespace level. They limit the total amount of resources that all pods in that namespace can consume collectively.

Implementation: The ‘Production’ Guardrail

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
  namespace: production-api
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "20"
    services: "10"

With this quota, the API team can deploy as many pods as they like, but they cannot exceed 8 physical CPU cores in total. This prevents one team from accidentally (or intentionally) monopolising the entire cluster budget.

2. LimitRanges: The Standardised Pod

While quotas limit the namespace, LimitRanges limit the individual pod or container. They provide a “Safety Net” by ensuring that every container has a default resource request and limit, even if the developer forgets to include them in their deployment manifest.

Implementation: Preventing ‘Runaway’ Containers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-mem-limit-range
spec:
  limits:
    - default: # The limit used if none is specified
        cpu: 500m
        memory: 512Mi
      defaultRequest: # The request used if none is specified
        cpu: 200m
        memory: 256Mi
      max: # The maximum any single container can request
        cpu: "2"
        memory: 4Gi
      type: Container

Why this is critical: If a container starts without limits, it is classified as “Best Effort” by the Kubernetes scheduler. In a resource crunch, “Best Effort” pods are the first to be killed. By enforcing a defaultRequest, you ensure your pods are properly placed on nodes with enough head-room.

3. The ‘Requests’ vs. ‘Limits’ Paradox

Understanding the difference between these two is the key to cluster density:

  • Requests: What the scheduler uses to find a node. If you request 2GB, the scheduler ensures the node has 2GB free.
  • Limits: What the kernel (cgroups) uses to throttle the container. If you exceed your CPU limit, you are throttled. If you exceed your memory limit, you are killed (OOMKilled).

Tip: For critical production services, set Requests equal to Limits. This places the pod in the “Guaranteed” Quality of Service (QoS) class, making it the most stable and least likely to be evicted during a node failure.

4. Monitoring Resource Pressure

Applying quotas is only half the battle. You must monitor the “Pressure” within your namespaces using Prometheus and Grafana.

  • CLI: kubectl describe quota -n production-api
  • Prometheus: Use the kube_resourcequota metric to trigger an alert when a team is at 90% of their limit. “Hey Team API, you are about to hit your quota. Do you need a budget increase or are your apps leaking?”

5. Horizontal Pod Autoscaler (HPA) Integration

Once limits are in place, you can safely use the HPA. The HPA relies on metrics like CPU percentage. If you haven’t defined a CPU request, the HPA cannot calculate the percentage and will fail to scale your application during a traffic spike.

Summary: Governance as an Enabler

Resource governance is often viewed as a restriction, but in a professional environment, it is an enabler. It provides the mathematical certainty that one team’s failure won’t cause a global outage. By mastering ResourceQuotas and LimitRanges, you transform a fragile, unpredictable cluster into a stable, multi-tenant platform. This is the hallmark of a mature Kubernetes operations strategy. It ensures that the “noisy neighbour” problem is a thing of the past and that your infrastructure remains cost-effective and resilient in 2020.

This post is licensed under CC BY 4.0 by the author.