Securing k8s cluster
Hi,
I have created k8s cluster using following command.
linode-cli k8s-alpha create example-cluster --nodes=20
Now before making it live on the production, I want to make it totally secure. I checked if I deploy any app on it it is all open for the world by default. If I try to add any iptables rules then k8s cluster crashes.
How can I make it secure?
14 Replies
Currently it does make sense for you to add iptables rules to your cluster Nodes if you feel the need to. For example, you can disable ssh this way. However all listening Kubernbetes (control plane) services running on the Nodes are authenticated with mutual TLS, and your workloads can only be accessed via authenticated Calico IP-IP tunnels, unless you have exposed the workload to the Internet with a "Service" (see below).
These ports must be left open:
- TCP port
10250
inbound from192.168.128.0/17
, Kubelet health checks - UDP port
51820
inbound from192.168.128.0/17
, Wireguard tunneling for kubectl proxy - TCP
179
inbound from192.168.128.0/17
, Calico BGP traffic - TCP/UDP port
30000 - 32767
inbound from All, NodePorts for workload Services
If you find that your cluster is non-functional with these ports exposed, please let me know.
If you don't want a workload (Deployment, StatefulSet, or Daemonset) exposed to the Internet, then you can delete the corresponding Service object or change the Service type to ClusterIP. The pods will continue to execute on the cluster and have Pod IPs within the cluster in the range 10.2.0.0/16
. If you retain a ClusterIP Service for the workload they will also be fronted by a distributed proxy IP in the range 10.128.0.0/16
within the cluster. You can reach these IPs on the cluster Nodes, or using the kubectl proxy
or kubectl port-forward
commands from your local machine. This is absolutely appropriate for in-cluster only services such as databases or backend services. These services can be secured within the cluster using NetworkPolicy, in the case of LKE this is currently implemented by Calico.
You can optionally expose a ClusterIP service with an Ingress resource, which will be automatically proxied by an Ingress controller that you deploy. The Ingress controller will itself be a NodePort or LoadBalancer Service. If you choose ingress-nginx for example, then you can have it terminate TLS for your services by associating them with TLS Secrets.
The following is the most robust way to expose workloads to the Internet with Kubernetes:
Internet traffic to
-> Ingress controller LoadBalancer Service (automatically configured NodeBalancer) to
-> Workload ClusterIP Service to
-> Workload Pods
This way, your workloads can only be reached from Internet via Ingress, and expose only a Service network IP in the range 10.128.0.0/16
, which can only be reached from within the cluster. All services also have DNS names within the cluster, which is the preferred way to reach them.
If you do expose your workload with a NodePort or LoadBalancer Service (without using an Ingress), then it's up to you to ensure that the workload has appropriate authentication (via TLS or other means). You can do this by configuring the workload resource (Deployment, StatefulSet, or DaemonSet) with TLS material and authentication configuration that will be specific to the workload that you're running (for example a database or web application).
As a recap, Kubernetes workloads can be reached in one or more ways.
- By Pod IP, 10.2.0.0/16, in-cluster only.
- By Service IP, 10.128.0.0/16, in-cluster only and if the workload has a an associated Service resource of type ClusterIP, NodePort, or LoadBalancer.
- By NodePort, a port on the Nodes in the range 30000 - 32768, from the Internet if the workload has an associated Service resource of type NodePort or LoadBalancer.
- By LoadBalancer, an automatically configured NodeBalancer, from the Internet if the workload as an associated Service resource of type LoadBalancer.
- By Ingress (an HTTP hostname or path), fronted by an Ingress controller of your choice, if the workload has an associated Service and Ingress resource. The Ingress controller should be deployed as LoadBalancer service and your workload should be deployed as a ClusterIP service.
For more detail, please refer to the Kubernetes documentation on Services.
Thanks dude,
These ports must be left open:
TCP port 10250 inbound from 192.168.128.0/17, Kubelet health checks
UDP port 51820 inbound from 192.168.128.0/17, Wireguard tunneling for kubectl proxy
TCP 179 inbound from 192.168.128.0/17, Calico BGP traffic
TCP/UDP port 30000 - 32767 inbound from All, NodePorts for workload Services
But what about DDoS kind of attacks?
Here's an additional note on firewalling LKE:
In an LKE cluster, both of the following types of workload endpoints cannot be reached from the Internet:
- Pod IPs, which use a per-cluster virtual network in the range 10.2.0.0/16
- ClusterIP Services, which use a per-cluster virtual network in the range 10.128.0.0/16
All of the following types of workloads can be reached from the Internet:
- NodePort Services, which listen on all Nodes with ports in the range 30000-32768.
- LoadBalancer Services, which automatically deploy and configure a NodeBalancer.
- Any manifest which uses hostNetwork: true and specifies a port.
- Most manifests which use hostPort and specify a port.
Exposing workloads to the public Internet through the above methods can be convenient, but they can also carry a security risk. You may wish to manually install firewall rules on your cluster nodes; to do so, please see this community post. Linode is developing services which will allow for greater flexibility for the network endpoints of these types of workloads in the future.
I'd like to chime in with some additional info from some experiments I've ran on LKE.
As mentioned above LKE nodes are quite open by default (e.g. having SSH port open). On the other hand I wasn't able to find information in the documentation how can I set StackScript for those nodes because they seem to be provisioned from some Linode's internal templates (I'm happy to be corrected on this :) ) For me this means an open SSH port with unknown SSH configuration.
I wanted to automate/simplify node provisioning as much as possible so I tried to figure out how can change the node configuration after it's provisioned via automations. Luckly kubectl node-shell (https://github.com/kvaps/kubectl-node-shell) works just fine so I was able to use nsenter
to change things on the node directly. It's not ideal because it's not tied to node's provisioning but it's good enough for now, considering there is a firewall on the Linode's roadmap.
What I did was create a DaemonSet which will apply my custom firewall on every node (periodically) with the firewall being added as a ConfigMap (both in kube-system
namespace). I could've decided to do other things (like shutting down SSH to the node since I don't plan on using it) but firewall seemed like the most universal thing here.
After some trial and error I've determined that nsenter --target 1 --net -- sh /path/to/my-firewall-script.sh
was sufficient (--net
needed to modify node's iptables), after making sure the container has iptables
(using basic Apline image). I didn't use --mount
option because it changes your paths and accessing ConfigMap mounts is a bit more problematic (they are hidden in the pod's dir on host). I've used the rules with ports listed above.
Despite being a somewhat hacky solution, this way every new node will eventually get the proper firewall (a matter of starting the DaemonsSet pods on it).
I wrote here both to bounce the idea by more people and see if maybe someone had a better, less hacky idea. If it inspires someone to come up with a better solution, I'll be happy to check it out too.
iptables used and work for me based on the info listed by asauber above:
iptables -vD INPUT -j node-firewall
iptables -vF node-firewall
iptables -vX node-firewall
iptables -vN node-firewall
iptables -vA INPUT -j node-firewall
iptables -vA node-firewall -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -vA node-firewall -p tcp --dport 10250 -s 192.168.128.0/17 -j ACCEPT
iptables -vA node-firewall -p udp --dport 51820 -s 192.168.128.0/17 -j ACCEPT
iptables -vA node-firewall -p udp --dport 179 -s 192.168.128.0/17 -j ACCEPT
iptables -vA node-firewall -p tcp --dport 30000:32767 -j ACCEPT
iptables -vA node-firewall -p udp --dport 30000:32767 -j ACCEPT
iptables -vA node-firewall -i eth0 -j REJECT
I have experienced 3 times that when there is problem with linode network, my LKE cluster with firewalld on will have problem. The problem is dns not working, even after I restart the core-dns.
Is there any advise on how to properly secure the LKE cluster? I need some nodeport to be accessible only by VM in Linode but not accessible from internet
Hello, I have an issue very much related to this! I'm trying to secure an LKE cluster using Calico's GlobalNetworkPolicy
, with rules based on this guide:
- Allow all traffic from
192.168.128.0/17
so any host in my private network can talk to the k8s cluster: I believe this covers the first 3 recommended rules from @asauber. I also allow the pod/service CIDRs for good measure. - Deny all other incoming traffic (I don't want any ports open to the internet).
- Allow all outgoing traffic from k8s.
Here are the effects of that:
- A NodePort service is available from within the private network, but blocked when using the host's public IP -- awesome.
- Traffic from a NodeBalancer can still come in as it's from 192.168.whatever.
- AFAICT the cluster & overlay network magic runs normally.
- I can still interact via
kubectl
, and SSH works: because of Calico's nice failsafe rules I assume. - Here's the problem:
kubectl exec
no longer works, in any case where it did just before applying the rule. I was using this for calicoctl, and when the "Deny" rule is active it just hangs. If Ikubectl edit globalnetworkpolicy default.drop-other-ingress
to effectively disable the rule, that works to open it up again.
So I'm puzzled, and don't understand the workings of kubectl exec
well enough to know why that would be affected where other kubectl
commands aren't. In either case it connects to the K8s API server, which then talks to the individual nodes, right?
- The API server is allowed by a failsafe rule for port 6443. I'm speculating here but if
exec
uses a different port and doesn't come from a192.168.128.0/17
address, then it might be blocked. - I tried allowing the API server's public IP with no luck, but can't find a private IP to know one way or the other.
Any advice appreciated - thanks in advance!
NB. I updated Calico to 3.16 to get automatic hostendpoints & it seems to be working well after tweaking the env/config to its original state.
I managed to secure my nodes with the current version of Calico 3.9 with the following GlobalNetworkPolicy:
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: default-worker-policy
spec:
order: 0
selector: "all()"
ingress:
# Failsafe
- action: Allow
protocol: "TCP"
destination:
ports: [22, 179, 2379, 2380, 6443, 6666, 6667, 10250]
- action: Allow
protocol: "UDP"
destination:
ports: [68, 51820]
# Add custom rules here
- action: Log
- action: Deny
egress:
- action: Allow
The allowed inbound ports should already be part of Calico's default failsafe configuration. Better be safe than sorry. I have added 6443/tcp, which is part of newer Calico releases' failsafe configuration.
This manifests blocks mostly anything from the LAN and to the NodePorts, and is a good starting point where you can add more rules to customise it further. Also no issues for me running kubectl exec
with it.
The only issue I have with this configuration and Calico 3.9 is that there's no auto HostEndpoint option, so you'll have to manually create HostEndpoint for your nodes and remember to keep them up-to-date if you ever recycle your nodes. My hope is to see Linode deploy a newer version of Calico, which I'm told is in the roadmap.
EDIT 1: edited failsafe rules to apply to any source address.
Following up on my earlier post, I figured out how to get that essential traffic coming through. Had to poke a hole for the API server / k8s master, which (for my cluster at least) I discovered is in this 172.31 range.
[192.168.128.0/17, 10.2.0.0/16, 10.128.0.0/16, 172.31.0.0/16]
These ports must be left open:
TCP port 10250 inbound from 192.168.128.0/17, Kubelet health checks
UDP port 51820 inbound from 192.168.128.0/17, Wireguard tunneling for kubectl proxy
TCP 179 inbound from 192.168.128.0/17, Calico BGP traffic
TCP/UDP port 30000 - 32767 inbound from All, NodePorts for workload Services
Is does not a problem for you to have these ports open to all private ip in 192.168.128.0/17 ?
Same observation for calico.
I tried in this way with no luck, as we dont know the private address of control-plane and selector seems not work for managed master.
- action: Allow
protocol: TCP
source:
selector: has(node-role.kubernetes.io/master)
destination:
ports:
- 10250
Even though this post is in regard to iptables, I want to call out some current limitations in using the Cloud Firewall product with LKE clusters.
Using the Drop
default inbound policy on Cloud Firewall will break the Calico IPIP overlay network in an LKE cluster. This is because the only firewall rules currently supported operate on layer 4 protocols (TCP, UDP, and ICMP).
Native Cloud Firewall integration with LKE is something we'd like to add in the future, but for those looking for an immediate solution for securing an LKE cluster from the public internet, the following configuration will provide a configuration that will allow only NodePort and services required for the cluster itself over the Linode private network, while dropping all other layer 4 traffic, both public and private:
Default Inbound Policy: Accept
Cloud Firewall Inbound Rules:
- TCP port 10250 from 192.168.128.0/17
Accept
, Kubelet health checks - UDP port 51820 from 192.168.128.0/17
Accept
, Wireguard tunneling for kubectl proxy - TCP port 179 from 192.168.128.0/17
Accept
, Calico BGP traffic - TCP/UDP port 30000 - 32767 192.168.128.0/17
Accept
, NodePorts for workload Services - TCP All Ports All IPv4/All IPv6
Drop
, Block all other TCP traffic - UDP All Ports All IPv4/All IPv6
Drop
, Block all other UDP traffic - ICMP All Ports All IPv4/All IPv6
Drop
, Block all ICMP traffic
With this configuration, it will be necessary to use a LoadBalancer service to expose applications to the internet. Swapping the private subnet (192.168.128.0/17) for All IPv4/All IPv6
in the NodePort rule will allow NodePort services to be reached from the internet.
Additionally, keep in mind that recycling nodes in an LKE cluster will cause the nodes to be deleted and replaced. Node recycle is required during K8s version upgrades, which will be necessary at least once per year. If you recycle the nodes in your LKE cluster, the list of Linodes in the Firewall will need to be updated.
@thorner is there any plan when LKE will overcome the current limitations in using the Cloud Firewall product with LKE clusters?
There is no definitive timeline on when the compatibility between LKE and Cloud Firewalls will be more cohesive. However, I have added your feedback to internal tracking for product improvements. Once these limitations have been diminished, we'll be sure to let you know.
This is a script you can use on a daemonset to automatically add nodes to a firewall. LINODE_LABEL and FIREWALL_LABEL have to be exposed via envvars:
#!/bin/bash
set -o pipefail
set -o nounset
set -o errexit
LINODE_ID=$(linode-cli linodes ls --label "${LINODE_LABEL}" --json|jq .[].id)
FIREWALL_ID=$(linode-cli firewalls ls --label "${FIREWALL_LABEL}" --json|jq .[].id)
FIREWALL_DEVICES=$(linode-cli firewalls devices-list "${FIREWALL_ID}" --json|jq .[].entity.id)
if grep "${LINODE_ID}" <<< "${FIREWALL_DEVICES}" > /dev/null 2>&1; then
echo INFO: Instance already present in firewall
else
linode-cli firewalls device-create --id "${LINODE_ID}" --type linode "${FIREWALL_ID}"
fi