k3s on Hetzner: notes from running production clusters

TL;DR. k3s on Hetzner is a strong cost-control move when you are willing to operate the cluster. Mind the Flannel MTU on Hetzner private networks, separate stateless and stateful workloads at the storage layer, keep observability minimal but real, and treat backups as a tested practice rather than a config setting.

A managed Kubernetes service is the right answer for most teams. When it is not the right answer (cost, control, locality of data), self-hosted k3s on a low-cost provider like Hetzner is one of the better options. We have run several clusters of this shape in production for over a year. This post is the set of decisions that have held up.

Why k3s ¶

One binary, one Go process per node. No separate kubelet, kube-proxy, container runtime fan-out. When something is wrong, one place to look.
SQLite or external datastore. Single-node clusters and small HA clusters can run on the bundled SQLite. Above three control plane nodes, switch to PostgreSQL or etcd. Both are well documented.
Sane defaults. Traefik, ServiceLB, local-path provisioner enabled by default. We turn most of those off, but they are useful to know about.
Lightweight enough for small nodes. A two vCPU, 4 GB control plane node is realistic. Workers can be CX22-class machines without burning all their budget on the kubelet.

The trade-off is that you operate it. Upgrades, certificate rotation, etcd or Postgres backup, node replacement: all on you. None of those are hard, but they are not zero.

Picking the Hetzner product mix ¶

Hetzner has two products that matter: Cloud (CX, CPX, CCX VMs) and Robot (dedicated bare metal). Different trade-offs:

Concern	Cloud (CPX)	Robot (AX, EX)
Cost per CPU and memory	Higher	Much lower for the same resources
Provisioning time	Seconds	Hours, manual
Private network	Built-in vSwitch	Need vSwitch, ZeroTier, or WireGuard
Disk	NVMe, fixed sizes	NVMe, larger disks, RAID options
Replacement	API call	Support ticket
Right for	Workers that scale up and down	Database, persistent storage, big-memory work

We end up with mixed clusters: control plane on Cloud (small, easy to replace), heavy workers on Robot (cheaper at scale, persistent disk).

Networking: the MTU detail that bites ¶

k3s defaults to Flannel with VXLAN. This works on Hetzner, but only if pod-to-pod traffic stays inside the private network. Two specific gotchas:

MTU. Hetzner private networks use an MTU of 1450 in some configurations. Flannel VXLAN encapsulation needs another 50 bytes. Set the Flannel MTU to 1400 (or 1370 with WireGuard) at install time. Forgetting this gives you intermittent dropped packets that look like application bugs for weeks.
Public network for control plane. If your control plane is on Cloud and your workers are on Robot, you need a private network or a VPN between them. The vSwitch product covers Cloud to Cloud. For Cloud to Robot you typically run WireGuard or ZeroTier as a daemonset on every node. We have used both. ZeroTier is simpler. WireGuard is slightly faster.

If you can tolerate a bit more setup, Cilium replaces Flannel and removes a class of overlay issues entirely. We default to Flannel and switch to Cilium when we want native eBPF observability or NetworkPolicy beyond the basics.

Ingress: pick one pattern, stick to it ¶

We almost never use the bundled Traefik. Two patterns we run depending on the cluster:

External nginx outside the cluster acting as a TLS terminator and routing to NodePorts. Simpler to debug, and the TLS termination layer is reusable for non-Kubernetes traffic. Downside: two systems to update.
In-cluster ingress controller (ingress-nginx or Traefik configured properly) with a Hetzner Load Balancer in front. Cleaner from a single-pane-of-glass perspective. Requires cert-manager and a tested certificate rotation path.

For small clusters, external nginx wins on operational simplicity. For larger clusters with many TLS-bound hostnames and frequent rotation, in-cluster wins.

Storage: stateless local, stateful outside ¶

Hetzner Cloud has Volumes (block, attachable to VMs). Fine, integrate via the hcloud-csi driver. Catch: volumes are tied to a location and cannot move across data centres without a backup and restore.

For Robot servers, you have local NVMe and that is essentially it. Two patterns:

Local-path provisioner for everything that does not need replication. Caches, scratch volumes, anything where node loss equals acceptable data loss.
Replicated storage for stateful workloads. Longhorn works on Hetzner if you accept the operational cost. Rook-Ceph is more powerful and significantly more work. Our default is “host the database outside Kubernetes on a dedicated Robot box, snapshot to S3-compatible storage”. Less elegant, far easier to operate.

Observability: the minimum useful set ¶

For small clusters we run one Prometheus, one Grafana, Loki for logs. All inside the cluster, all backed up daily to S3-compatible storage. The setup that has saved us repeatedly:

Node-level metrics first. Disk, CPU, memory, network. If you only had one dashboard, this is it.
Kubelet and apiserver metrics second. When pods fail to start or schedule, the answer lives here.
Application metrics third. Per-app dashboards are nice but rarely the first place you look at 3am.
A “node was NotReady for more than ten minutes” alert. This single rule catches more outages than the next ten alerts combined.

We do not run Jaeger or any tracing system on small clusters. The signal-to-effort ratio is rarely worth it below a certain scale.

Backups: the thing people skip ¶

Most often skipped on self-hosted Kubernetes:

Cluster state. If using etcd, snapshot daily and copy off the cluster. k3s has etcd-snapshot save built in. If using Postgres, the Postgres backup is the cluster backup.
Persistent volumes. Either use Hetzner volume snapshots (cheap, fast, regional) or run an in-cluster backup tool like Velero. Test restores, do not assume them.
Workload definitions. Argo CD or Flux turns this into a non-issue: cluster is a derived artifact, recoverable from git.

Upgrades: one minor at a time ¶

k3s is a single binary. Upgrading is replacing the binary on each node and restarting the service. In practice:

Read the release notes first. k3s tracks Kubernetes minor versions; deprecations matter.
Upgrade control plane first, one node at a time. Wait for kubectl get nodes to show the new version stable for at least a few minutes before moving to the next.
Upgrade workers in waves. Cordon, drain, upgrade, uncordon. Pause for application metrics to settle before the next wave.
Do not skip minor versions. k3s and Kubernetes both support skipping to a degree, but the easy path is one minor version at a time.

We have not been bitten by a k3s upgrade in production. We have been bitten by application bugs that only triggered after the upgrade. The pattern is to upgrade in a pre-production cluster a week ahead, run real load against it, then promote.

Operating cost: a reference setup ¶

A reference cluster we run for a small SaaS:

Three control plane nodes (CPX21): about 15 EUR per month each.
Five worker nodes (CPX31): about 25 EUR per month each.
One Robot server for the database (AX42): about 40 EUR per month.
Hetzner Load Balancer (LB11): about 5 EUR per month.

Total: roughly 185 EUR per month, taxes and traffic excluded. The same workload on a managed control plane in a hyperscaler with similar resources is several times that. The trade is operator time. For teams that already do this kind of work, the cost difference is real money.

Trade-offs ¶

Operator time. Even a boring k3s cluster needs maintenance. Budget for it.
No managed addons. No managed CSI driver upgrades, no managed control plane patches. You own them.
Vendor lock-in is small. Workloads stay portable as long as the manifests are clean. Migrating to a managed service later is a config change and a DNS swap, not a rewrite.

Bottom line ¶

k3s on Hetzner is not the right call for every workload, but it is a good choice when you want low cost, full control, and you are willing to operate the result. Get the MTU right at install time, separate stateful and stateless concerns at the storage layer, keep observability simple, and treat backups as a tested practice rather than a config setting. Done that way, the cluster is genuinely boring, which is the only state worth aiming for.