One of the most important and also the least understood part of kubernetes is cluster networking. I guess you already read through that and now you’re a bit confused about how to implement the required model, considering you were presented with about ~30 different options. A bit more googling would reduce that number to a few more popular choices. Nevertheless container networking is an involved subject and it’s easy to get lost in the details of any specific solution.
In one of our solutions, what we needed was mentioned in the docs like this:
L2 networks and linux bridging
If you have a “dumb” L2 network, such as a simple switch in a “bare-metal” environment, you should be able to do something similar to the above GCE setup. Note that these instructions have only been tried very casually - it seems to work, but has not been thoroughly tested. If you use this technique and perfect the process, please let us know.
Yes, we had a “dumb” L2
network, (I mean, … who hasn’t) but that’s not
only reason we needed this implementation. The applications running on the
cluster were needed to be accessed from outside the cluster directly. Most
of the networking solutions mentioned in the documentation on the other hand,
are focused on the within-cluster communication. That’s natural since
kubernetes is the go-to solution for micro-services oriented architectures,
where small applications keep communicating with each other on the same cluster.
We shouldn’t interpret this like “internal and external networks must be isolated” though. While it’s possible and even suggested by a lot of tools to do so, it’s not entirely required, not in bare-metal environments anyway. It’s perfectly OK to use outside routable IPs for the pods running in the cluster, just not practical if the outside network happens to be the internet.
But how? Next line in the documentation goes:
Follow the “With Linux Bridge devices” section of this very nice tutorial from Lars Kellogg-Stedman.
Yes it’s a very nice one, a bit more towards docker-only environments though. And the 2018 update about macvlan driver certainly doesn’t help clear the confusion.
With CNI
So here is a simpler recipe
-
Note the subnet for your dumb
L2
network (e.g.10.10.0.0/16
) -
IP addresses for your nodes will be like
node1 10.10.1.0/16 node2 10.10.2.0/16 node3 10.10.3.0/16 node4 10.10.4.0/16 ... node100 10.10.100.0/16
-
Set IP forward for your nodes
sysctl -w net.ipv4.ip_forward=1
-
Configure your kubelets to use CNI (you probably already have this, as everyone uses CNI nowadays)
--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin
-
Configure the bridge CNI plugin (you probably already installed the plugin too, as it’s one of the default ones. if not extract a release to
/opt/cni/bin
on all nodes.)cat <<EOF > /etc/cni/net.d/10-bridge.conf { "cniVersion": "0.3.1", "name": "data", "type": "bridge", "bridge": "cni0", "promiscMode": true, "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "10.10.0.0/16", "rangeStart": "10.10.1.1", "rangeEnd": "10.10.1.101", "gateway": "10.10.1.0" } ] ], "routes": [ { "dst": "0.0.0.0/0" } ] } } EOF
This instructs kubelet to utilize CNI for pod networking which causes our CNI
compatible container runtime to use the configured plugin(s). This specific
bridge
plugin will create veth pairs for all of our pods to connect our pods
to the same bridge. It will also configure the bridge we specify here, like
setting the necessary promisc attribute for example.
IPAM (ip address management) is configured to be host-local
meaning the IP
addresses for pods running on this host, will be managed by the host itself,
without any centralized component like DHCP. This example will set pod IPs in
the 10.10.1.1 - 10.10.1.101
range, which means you can have 100 pods. Note
that this range and the gateway will need to be changed for each node, like
10.10.2.1 - 10.10.2.101
and 10.10.2.0
for node2 and so on.
Final touches
The last step is connecting to the bare-metal network, simply by adding the actual network interface to this bridge. If it’s a single ethernet like eth1
sudo ip link set dev eth1 master cni0
If it’s a LACP bond you should add that instead. You also need to set your node’s IP address to the bridge itself, not the actual network interface.
sudo ip a add 10.10.1.0/16 dev cni0
If you do have any external networks (like 10.11.0.0
, 10.12.0.0
, …) there
should be a gateway for your subnet to reach them (like 10.10.0.1
). You should
set it at your nodes as well. And of-course, use whatever your linux
distribution provides, to keep your network configuration persistent over
reboots (NetworkManager, ifupdown, systemd, netplan, etc.).
Happy hunting!