Before RKE2 — What Problems Does It Solve?
Imagine you built an amazing web application. It's running on one server. Then thousands of users show up. One server can't handle it. You need many servers working together — but how do you coordinate them?
Think of a restaurant kitchen. One chef can cook a few meals. But for a busy restaurant you need many chefs, a head chef (manager), someone to take orders, and a system to make sure every dish gets cooked and served. Kubernetes is that entire kitchen management system for your software. RKE2 is a specific, security-first way to set up that kitchen.
The Core Problems
Container Orchestration
You have dozens or hundreds of containers. Someone needs to decide which server runs which container, restart them if they crash, and scale them up/down.
Security & Compliance
Government agencies and enterprises need Kubernetes that is hardened out of the box — CIS benchmarks, FIPS encryption, and audit logging by default.
Operational Simplicity
Setting up vanilla Kubernetes is hard (dozens of certificates, configs, binaries). RKE2 bundles everything into a single binary with sane defaults.
What Exactly Is RKE2?
RKE2 (Rancher Kubernetes Engine 2), also known as RKE Government, is a fully conformant Kubernetes distribution built by SUSE / Rancher. It's designed for environments where security is non-negotiable.
RKE2 combines the ease of use of K3s (Rancher's lightweight distro) with the close upstream alignment of RKE1 — and adds hardened security on top.
The Name Explained
| Part | Meaning |
|---|---|
| RKE | Rancher Kubernetes Engine — Rancher's family of K8s distributions |
| 2 | Second generation, replacing RKE1 (which reached end-of-life in July 2025) |
| "Government" | Originally built to meet U.S. federal security standards (DISA STIG, CIS, FIPS) |
What You Get Out of the Box
- CNCF-certified Kubernetes (passes all conformance tests)
- Embedded etcd — no external database needed
- containerd runtime (Docker-free)
- CIS Kubernetes Benchmark hardening by default
- FIPS 140-2 compliant cryptography
- Built-in Canal CNI (Calico + Flannel)
- Built-in NGINX Ingress Controller
- Built-in CoreDNS and Metrics Server
- Helm Controller for managing add-ons
- SELinux support
RKE2 Architecture — The Big Picture
An RKE2 cluster has two types of machines (called nodes):
Server Node (Control Plane)
The "brain" of the cluster. It decides what runs where, stores the cluster state, and exposes the Kubernetes API. In production you run 3 server nodes for high availability.
Agent Node (Worker)
The "muscles" of the cluster. These actually run your applications (as containers inside Pods). You can have as many agent nodes as you need.
How the Pieces Talk to Each Other
Port Reference
| Port | Protocol | Purpose | Used By |
|---|---|---|---|
| 9345 | TCP | RKE2 supervisor API (node registration) | Server & Agent nodes |
| 6443 | TCP | Kubernetes API server | kubectl, agents, external |
| 2379-2380 | TCP | etcd client & peer communication | Server nodes only |
| 10250 | TCP | kubelet metrics / exec | All nodes |
| 8472 | UDP | VXLAN (Canal/Flannel overlay) | All nodes |
Core Components Explained
Let's break down every piece running inside an RKE2 cluster, explained so anyone can understand:
We'll keep using our restaurant kitchen analogy. Each Kubernetes component maps to a role in the kitchen.
kube-apiserver
The Front Desk. Every request goes through here — "create a pod", "list services", "delete a deployment". Port 6443.
etcd
The Recipe Book. A distributed key-value database storing the entire state of the cluster. RKE2 embeds it — no separate setup.
kube-scheduler
The Assignment Manager. Picks the best node for new pods based on CPU, memory, affinity rules.
kube-controller-manager
The Quality Inspector. Watches state and makes corrections. If 3 replicas desired but 2 running — creates the 3rd.
kubelet
The Line Cook. Runs on every node. Receives instructions and makes sure containers are running.
kube-proxy
The Waiter. Manages network rules so requests reach the right pod — even across nodes.
containerd
The Oven. The container runtime that pulls images, starts containers, manages their lifecycle. Not Docker.
Canal CNI
The Intercom. Calico handles network policies. Flannel creates the overlay so all pods can communicate.
NGINX Ingress
The Entrance. Routes external HTTP/HTTPS traffic to correct services based on hostnames and paths.
CoreDNS
Name Tags. Internal DNS so pods find services by name instead of IP addresses.
Metrics Server
The Thermometer. Collects CPU/memory usage. Required for kubectl top and autoscaling.
Helm Controller
Recipe Installer. Auto-deploys Helm charts shipped with RKE2 via HelmChart CRDs.
How RKE2 Starts — The Bootstrap Sequence
The single binary contains everything: apiserver, scheduler, controller-manager, etcd, kubelet, and containerd.
Self-signed CA + all component certificates are created automatically in /var/lib/rancher/rke2/server/tls/.
The embedded etcd launches and initializes the cluster state database.
API server comes online, listening on port 6443.
Unlike RKE1 (which used Docker), RKE2 runs control plane components as static pods managed by the kubelet.
The Helm Controller deploys bundled add-ons: Canal CNI, CoreDNS, NGINX Ingress, and Metrics Server.
A token is written to /var/lib/rancher/rke2/server/node-token — this is what agent nodes use to join.
How RKE2 Differs from K3s & RKE1
Rancher/SUSE offers three Kubernetes distributions. Here's how they compare:
| Feature | RKE1 (EOL) | K3s | RKE2 |
|---|---|---|---|
| Status | End-of-life (Jul 2025) | Active | Active (recommended) |
| Target | General purpose | Edge / IoT / Dev | Enterprise / Government |
| Runtime | Docker | containerd | containerd |
| Data Store | External etcd | SQLite / etcd / MySQL / Postgres | Embedded etcd only |
| CIS Hardened | Manual effort | Manual effort | By default |
| FIPS 140-2 | No | No | Yes |
| SELinux | Limited | Supported | Full support |
| Secrets Encryption | Manual | Optional flag | Enabled by default |
| Min Resources | 4 GB RAM | 512 MB RAM | 4 GB RAM |
| Control Plane | Docker containers | Single process | Static pods via kubelet |
| Windows Nodes | Yes | No | Yes |
Choose RKE2 if: you need production-grade security, compliance certifications, enterprise support, or you're replacing RKE1.
Choose K3s if: you're running on edge devices, Raspberry Pi's, CI/CD environments, or need minimal resource usage.
Avoid RKE1: it's end-of-life. Migrate to RKE2 or K3s.
Hands-On: Installing a Single-Node Cluster
Let's set up a working RKE2 cluster from scratch. We'll start with a single server node — perfect for learning.
Prerequisites
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 20.04+ / RHEL 8+ / SLES 15+ | Ubuntu 22.04 LTS |
| CPU | 2 cores | 4 cores |
| RAM | 4 GB | 8 GB |
| Disk | 20 GB | 50+ GB SSD |
| User | root or sudo access | |
RKE2 provides an installer script that sets up the binary and systemd service:
# Download and run the official installer
curl -sfL https://get.rke2.io | sudo sh -
# What just happened?
# 1. Downloaded the rke2 binary to /usr/local/bin/rke2
# 2. Created a systemd service: rke2-server.service
# 3. Created config directory: /etc/rancher/rke2/
# Create the config directory
sudo mkdir -p /etc/rancher/rke2
# Create config file
sudo tee /etc/rancher/rke2/config.yaml <<EOF
# Specify a custom node name
node-name: my-first-server
# Enable CIS hardening profile
profile: cis
# Use Canal CNI (default, but being explicit)
cni: canal
# Write kubeconfig with this permission
write-kubeconfig-mode: "0644"
# TLS Subject Alternative Names
tls-san:
- my-server.example.com
- 10.0.0.100
EOF
# Enable the service so it starts on boot
sudo systemctl enable rke2-server.service
# Start RKE2 (this takes 2-5 minutes on first run)
sudo systemctl start rke2-server.service
# Watch the startup logs in real-time
sudo journalctl -u rke2-server -f
The first start downloads container images and bootstraps etcd + all control plane components. It can take 2–10 minutes depending on your internet speed.
# Add RKE2 binaries to PATH
echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrc
# Point kubectl to the RKE2 kubeconfig
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
# Apply changes immediately
source ~/.bashrc
# Verify kubectl works
kubectl version --short
# Check node status (should show "Ready")
kubectl get nodes
NAME STATUS ROLES AGE VERSION
my-first-server Ready control-plane,etcd,master 3m v1.30.x+rke2r1
# Check all system pods are running
kubectl get pods -A
NAMESPACE NAME READY STATUS
kube-system etcd-my-first-server 1/1 Running
kube-system kube-apiserver-my-first-server 1/1 Running
kube-system kube-controller-manager-my-first-server 1/1 Running
kube-system kube-scheduler-my-first-server 1/1 Running
kube-system rke2-canal-xxxxx 2/2 Running
kube-system rke2-coredns-rke2-coredns-xxxxx 1/1 Running
kube-system rke2-ingress-nginx-controller-xxxxx 1/1 Running
kube-system rke2-metrics-server-xxxxx 1/1 Running
# Save the node token (needed for adding more nodes later)
sudo cat /var/lib/rancher/rke2/server/node-token
K10abc123def456...::server:xyz789...
You now have a working single-node RKE2 Kubernetes cluster with all core components running.
Hands-On: Adding Worker Nodes
A single node is great for learning, but in production you need dedicated worker (agent) nodes. Here's how to add them.
# Install RKE2 in AGENT mode (not server)
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sudo sh -
sudo mkdir -p /etc/rancher/rke2
sudo tee /etc/rancher/rke2/config.yaml <<EOF
# Point to the server's supervisor port
server: https://10.0.0.100:9345
# Paste the token from the server node
token: K10abc123def456...::server:xyz789...
# Give this worker a friendly name
node-name: worker-01
EOF
sudo systemctl enable --now rke2-agent.service
# Back on the server node:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
my-first-server Ready control-plane,etcd,master 30m v1.30.x+rke2r1
worker-01 Ready <none> 2m v1.30.x+rke2r1
# Label the worker for organization
kubectl label node worker-01 node-role.kubernetes.io/worker=worker
Simply repeat Steps 1-3 on each additional machine, changing the node-name each time (worker-02, worker-03, etc.).
Hands-On: High-Availability (HA) Setup
For production, you need 3 server nodes so that if one dies, the cluster keeps running.
etcd needs a majority of nodes alive to accept writes. With 3 nodes, you can lose 1 (2/3 majority). With 2 nodes, losing 1 means no majority — cluster is stuck!
curl -sfL https://get.rke2.io | sudo sh -
sudo mkdir -p /etc/rancher/rke2
sudo tee /etc/rancher/rke2/config.yaml <<EOF
token: my-shared-secret-token-12345
tls-san:
- rke2.example.com
- 10.0.0.100
- 10.0.0.101
- 10.0.0.102
EOF
sudo systemctl enable --now rke2-server.service
# Wait until fully ready before starting server 2!
curl -sfL https://get.rke2.io | sudo sh -
sudo mkdir -p /etc/rancher/rke2
sudo tee /etc/rancher/rke2/config.yaml <<EOF
server: https://10.0.0.100:9345
token: my-shared-secret-token-12345
tls-san:
- rke2.example.com
- 10.0.0.100
- 10.0.0.101
- 10.0.0.102
EOF
sudo systemctl enable --now rke2-server.service
# /etc/nginx/nginx.conf (on a separate LB machine)
stream {
upstream rke2_api {
least_conn;
server 10.0.0.100:6443 max_fails=3 fail_timeout=5s;
server 10.0.0.101:6443 max_fails=3 fail_timeout=5s;
server 10.0.0.102:6443 max_fails=3 fail_timeout=5s;
}
upstream rke2_supervisor {
least_conn;
server 10.0.0.100:9345 max_fails=3 fail_timeout=5s;
server 10.0.0.101:9345 max_fails=3 fail_timeout=5s;
server 10.0.0.102:9345 max_fails=3 fail_timeout=5s;
}
server { listen 6443; proxy_pass rke2_api; }
server { listen 9345; proxy_pass rke2_supervisor; }
}
Networking Deep Dive (CNI)
If pods are apartments in different buildings (nodes), the CNI is the road system and phone lines that connect them. Without CNI, pods on different nodes can't talk to each other.
Supported CNI Plugins
| CNI | What It Does | Best For |
|---|---|---|
| Canal (default) | Flannel (VXLAN overlay) + Calico (network policies). Simple, reliable. | Most use cases |
| Calico | Full Calico: BGP routing + advanced network policies. No overlay = better perf. | Large clusters, bare metal |
| Cilium | eBPF-based networking. Advanced observability, L7 policies, service mesh. | Modern stacks, security |
| Multus | Meta-plugin: attaches multiple network interfaces to pods. | Telco / NFV workloads |
How to Switch CNI
# In /etc/rancher/rke2/config.yaml BEFORE first start:
cni: cilium # Option A: Cilium
cni: calico # Option B: Calico
cni: multus,canal # Option C: Multus + Canal
You must choose the CNI before the first server start. Changing CNI on a running cluster requires a full reinstall.
Security & CIS Hardening
Security is RKE2's biggest selling point. Let's understand what it does automatically and what you need to do manually.
What RKE2 Does Automatically
Secrets Encryption at Rest
K8s Secrets stored in etcd are encrypted. Attackers with access to etcd data can't read your passwords.
Pod Security Standards
With profile: cis, RKE2 enforces restricted pod security blocking privileged containers by default.
Network Policies
Canal/Calico enforces network policy rules defining which pods can communicate.
Audit Logging
API server audit logs record who did what and when. Found at /var/lib/rancher/rke2/server/logs/.
FIPS 140-2 Cryptography
All TLS uses FIPS-validated BoringCrypto — required by government agencies.
Minimal Attack Surface
No Docker daemon. Container images scanned with Trivy. Only essential components run.
What You Must Do Manually
# 1. Enable the CIS profile in config
profile: cis
# 2. Set kernel parameters (on EVERY node)
sudo tee /etc/sysctl.d/90-rke2-cis.conf <<EOF
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
EOF
sudo sysctl --system
# 3. Create the etcd user (required for CIS profile)
sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U
# 4. Protect the etcd data directory
sudo chmod 700 /var/lib/rancher/rke2/server/db
Configuration Reference
# /etc/rancher/rke2/config.yaml — Complete Reference
# ── Cluster Identity ──
node-name: prod-server-01 # Friendly node name
token: my-super-secret-token # Shared secret for joining
server: https://lb.example.com:9345 # Only for joining nodes
# ── TLS ──
tls-san:
- lb.example.com
- 10.0.0.100
# ── Security ──
profile: cis
secrets-encryption: true
write-kubeconfig-mode: "0600"
# ── Networking ──
cni: canal
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
cluster-dns: 10.43.0.10
# ── etcd ──
etcd-snapshot-schedule-cron: "0 */6 * * *"
etcd-snapshot-retention: 10
# ── Container Runtime ──
system-default-registry: registry.example.com
# ── Node Labels & Taints ──
node-label:
- environment=production
node-taint:
- dedicated=control-plane:NoSchedule
# ── Disable Components ──
disable:
- rke2-ingress-nginx
File System Layout
| Path | Purpose |
|---|---|
/etc/rancher/rke2/config.yaml | Main configuration file |
/etc/rancher/rke2/rke2.yaml | Kubeconfig for kubectl |
/var/lib/rancher/rke2/bin/ | Binaries: kubectl, crictl, ctr |
/var/lib/rancher/rke2/server/ | Server data: etcd, manifests, tls certs |
/var/lib/rancher/rke2/server/node-token | Token for joining new nodes |
/var/lib/rancher/rke2/server/db/ | etcd database files |
/var/lib/rancher/rke2/server/manifests/ | Auto-deploy manifests |
Day-2 Operations: Upgrades, Backups & Monitoring
Upgrading RKE2
Always upgrade server nodes first, one at a time, then agent nodes. Never skip more than one minor version.
# Method 1: Automated with the install script
curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION="v1.31.2+rke2r1" sudo sh -
sudo systemctl restart rke2-server
# Method 2: System Upgrade Controller (recommended for HA)
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
etcd Backups
# Take an on-demand snapshot
rke2 etcd-snapshot save --name my-backup
# List existing snapshots
rke2 etcd-snapshot ls
Name Size Created
my-backup 3.2 MB 2026-03-17T10:30:00Z
# Restore from a snapshot (disaster recovery)
sudo systemctl stop rke2-server
sudo rke2 server --cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/my-backup
sudo systemctl start rke2-server
Monitoring
# Quick health checks
kubectl get nodes # All nodes Ready?
kubectl get pods -A # All pods Running?
kubectl top nodes # CPU/Memory usage
kubectl top pods -A # Pod resource usage
# Check etcd health
sudo /var/lib/rancher/rke2/bin/etcdctl \
--cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
--key=/var/lib/rancher/rke2/server/tls/etcd/server-client.key \
endpoint health
Troubleshooting Cheat Sheet
| Problem | Diagnostic Command | Common Fix |
|---|---|---|
| Server won't start | journalctl -u rke2-server -f | Check port conflicts, firewall, disk space |
| Agent can't join | journalctl -u rke2-agent -f | Verify token, server URL, port 9345 |
| Node "NotReady" | kubectl describe node <name> | Check CNI pods, disk/memory pressure |
| Pods stuck "Pending" | kubectl describe pod <name> | Not enough resources, taints |
| Pods can't connect | kubectl get pods -n kube-system | CNI pods running? UDP 8472 open? |
| kubectl not working | echo $KUBECONFIG | Set KUBECONFIG path |
| etcd unhealthy | etcdctl endpoint health | Check disk I/O, restore snapshot |
Essential Commands
# ── Service Logs ──
journalctl -u rke2-server -f
journalctl -u rke2-agent -f
# ── Container Runtime ──
sudo /var/lib/rancher/rke2/bin/crictl ps
sudo /var/lib/rancher/rke2/bin/crictl logs <id>
# ── Network Debug ──
kubectl run debug --image=busybox --rm -it -- sh
# ── Full Uninstall ──
sudo /usr/local/bin/rke2-uninstall.sh # Server
sudo /usr/local/bin/rke2-agent-uninstall.sh # Agent
Real-World Example: Deploy an App End-to-End
A simple Nginx web app with 3 replicas, exposed via a Service and Ingress so the outside world can reach it at myapp.example.com.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-app
spec:
replicas: 3
selector:
matchLabels:
app: my-web-app
template:
metadata:
labels:
app: my-web-app
spec:
containers:
- name: nginx
image: nginx:1.25-alpine
ports:
- containerPort: 80
resources:
requests: { cpu: 100m, memory: 64Mi }
limits: { cpu: 200m, memory: 128Mi }
EOF
kubectl get deploy my-web-app
NAME READY UP-TO-DATE AVAILABLE
my-web-app 3/3 3 3
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: my-web-app-svc
spec:
selector:
app: my-web-app
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-web-app-ingress
spec:
ingressClassName: nginx
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-web-app-svc
port:
number: 80
EOF
# Internal test
kubectl run curl-test --image=curlimages/curl --rm -it -- \
curl http://my-web-app-svc.default.svc.cluster.local
# External test
curl http://myapp.example.com
# Scale up!
kubectl scale deployment my-web-app --replicas=5
The Full Request Flow
Summary & Where to Go Next
What You've Learned
- What RKE2 is and why it exists (security-first Kubernetes)
- The full architecture: server nodes, agent nodes, and all core components
- How RKE2 differs from K3s and RKE1
- How to install a single-node cluster from scratch
- How to add worker nodes and set up HA with 3 servers
- How networking works (CNI: Canal, Calico, Cilium)
- Security features and CIS benchmark hardening
- Day-2 ops: upgrades, etcd backups, and monitoring
- How to deploy a real application end-to-end
Recommended Next Steps
Official Docs
Read the full docs at docs.rke2.io — the Configuration and Advanced sections.
Rancher Manager
Install Rancher on top of RKE2 for a beautiful web UI to manage clusters and workloads.
Longhorn Storage
Add distributed storage with Longhorn — Rancher's cloud-native storage solution.
cert-manager
Automate TLS certificate management with Let's Encrypt for your Ingress resources.
You've gone from zero knowledge to understanding RKE2's architecture, security model, installation process, and day-to-day operations. Go build something amazing.