-
@ Andy
2025-03-25 10:00:52Kubernetes and Linux Swap: A Practical Perspective
After reviewing kernel documentation on swap management (e.g., Linux Swap Management), KEP-2400 (Kubernetes Node Memory Swap Support), and community discussions like this post on ServerFault, it's clear that the topic of swap usage in modern systems—especially Kubernetes environments—is nuanced and often contentious. Here's a practical synthesis of the discussion.
The Rationale for Disabling Swap
We disable SWAP on our Linux servers to ensure stable and predictable performance by relying on available RAM, avoiding the performance degradation and unnecessary I/O caused by SWAP usage. If an application runs out of memory, it’s usually due to insufficient RAM allocation or a memory leak, and enabling SWAP only worsens performance for other applications. It's more efficient to let a leaking app restart than to rely on SWAP to prevent OOM crashes.
With modern platforms like Kubernetes, memory requests and limits are enforced, ensuring apps use only the RAM allocated to them, while avoiding overcommitment to prevent resource exhaustion.
Additionally, disabling swap may protect data from data remanence attacks, where sensitive information could potentially be recovered from the swap space even after a process terminates.
Theoretical Capability vs. Practical Deployment
Linux provides a powerful and flexible memory subsystem. With proper tuning (e.g., swappiness, memory pinning, cgroups), it's technically possible to make swap usage efficient and targeted. Seasoned sysadmins often argue that disabling swap entirely is a lazy shortcut—an avoidance of learning how to use the tools properly.
But Kubernetes is not a traditional system. It's an orchestrated environment that favors predictability, fail-fast behavior, and clear isolation between workloads. Within this model:
- Memory requests and limits are declared explicitly.
- The scheduler makes decisions based on RAM availability, not total virtual memory (RAM + swap).
- Swap introduces non-deterministic performance characteristics that conflict with Kubernetes' goals.
So while the kernel supports intelligent swap usage, Kubernetes intentionally sidesteps that complexity.
Why Disable Swap in Kubernetes?
-
Deterministic Failure > Degraded Performance\ If a pod exceeds its memory allocation, it should fail fast — not get throttled into slow oblivion due to swap. This behavior surfaces bugs (like memory leaks or poor sizing) early.
-
Transparency & Observability\ With swap disabled, memory issues are clearer to diagnose. Swap obfuscates root causes and can make a healthy-looking node behave erratically.
-
Performance Consistency\ Swap causes I/O overhead. One noisy pod using swap can impact unrelated workloads on the same node — even if they’re within their resource limits.
-
Kubernetes Doesn’t Manage Swap Well\ Kubelet has historically lacked intelligence around swap. As of today, Kubernetes still doesn't support swap-aware scheduling or per-container swap control.
-
Statelessness is the Norm\ Most containerized workloads are designed to be ephemeral. Restarting a pod is usually preferable to letting it hang in a degraded state.
"But Swap Can Be Useful..."
Yes — for certain workloads (e.g., in-memory databases, caching layers, legacy systems), there may be valid reasons to keep swap enabled. In such cases, you'd need:
- Fine-tuned
vm.swappiness
- Memory pinning and cgroup-based control
- Swap-aware monitoring and alerting
- Custom kubelet/systemd integration
That's possible, but not standard practice — and for good reason.
Future Considerations
Recent Kubernetes releases have introduced experimental swap support via KEP-2400. While this provides more flexibility for advanced use cases — particularly Burstable QoS pods on cgroupsv2 — swap remains disabled by default and is not generally recommended for production workloads unless carefully planned. The rationale outlined in this article remains applicable to most Kubernetes operators, especially in multi-tenant and performance-sensitive environments.
Even the Kubernetes maintainers acknowledge the inherent trade-offs of enabling swap. As noted in KEP-2400's Risks and Mitigations section, swap introduces unpredictability, can severely degrade performance compared to RAM, and complicates Kubernetes' resource accounting — increasing the risk of noisy neighbors and unexpected scheduling behavior.
Some argue that with emerging technologies like non-volatile memory (e.g., Intel Optane/XPoint), swap may become viable again. These systems promise near-RAM speed with large capacity, offering hybrid memory models. But these are not widely deployed or supported in mainstream Kubernetes environments yet.
Conclusion
Disabling swap in Kubernetes is not a lazy hack — it’s a strategic tradeoff. It improves transparency, predictability, and system integrity in multi-tenant, containerized environments. While the kernel allows for more advanced configurations, Kubernetes intentionally simplifies memory handling for the sake of reliability.
If we want to revisit swap usage, it should come with serious planning: proper instrumentation, swap-aware observability, and potentially upstream K8s improvements. Until then, disabling swap remains the sane default.