Details on LKE
I've signed up for and have been migrating several things to LKE, mostly lower-traffic and less critical things. I have several questions I can't find solid documentation around, and things I would like to know and/or see solved before migrating things that matter more.
A lot of these boil down to what I would consider "production concerns", which aren't necessarily things I would expect out of the beta, but I would like to make sure these things are on your radar and planned to be addressed for when it goes GA or at some point thereafter.
- Are masters highly available?
- If so, can you describe a little about how that is achieved?
- Are masters and etcd both clustered?
- What sort of environment do masters/etcd it run in?
- How does incoming traffic to the master work/route?
- If not, can you share if and/or when HA managed masters are planned?
- What sort of backups are taken (etcd snapshots, etc) on what schedule in case the unimaginable happens?
- If so, can you describe a little about how that is achieved?
- How is cluster internal traffic handled?
- Does it overlay using internal or external IP addresses?
- If it routes traffic externally, how secure is cluster traffic, and are there plans to change this?
- On what schedule are masters patched?
- What plans are in place for addressing critical security vulnerabilities (like CVE-2018-1002105 back in the day) quickly?
- It appears that you do not currently support IPv6 dualstack in LKE.
- Do you plan to add dualstack support to
Pod
s/CNI in LKE?- Will I be allocated space for a cluster dynamically, will I be able to provision it using a subnet (Additional IPv6) I get manually assigned?
- Could you share a rough estimate of timeline?
- Do you plan to add dualstack support to
LoadBalancer
s in LKE, and if so could you share a rough estimate of when?
- Do you plan to add dualstack support to
- Is any information available on what datacenters are planned to be rolled out available?
- Not having LKE and Object Storage available in the same datacenter is a shame:
- Many database operators (like KubeDB) can back-up and restore with an s3 compatible endpoint, which is important to remove sole reliance on underlying storage and cluster availability.
- Collecting metrics through Prometheus is incredibly common, and getting them into Thanos is important for long-term storage.
- Deploying cloud-native applications that deals with any sort of file uploads or large files. It is less than ideal to run a
ReadWriteMany
PersistentVolume
provisioner in-cluster or store them in a database. - Storing container images as the backend of a private registry is well suited for object storage.
- Not having LKE and Object Storage available in the same datacenter is a shame:
2 Replies
This is great feedback, and your questions are definitely valid, especially as we head into general availability. I've worked with our Linode Kubernetes Engine team to review your feedback and questions; I've broken down each question category and its answers below.
Are masters highly available?
Masters are highly available; we aren't able to share specific information as to how our setup works. Etcd backups are taken on a frequent and regular basis.
How is cluster internal traffic handled? What are our plans to address critical security vulnerabilities?
We use Calico with layer 3 IP-IP tunnels, which are authenticated by source address. We authenticate the source address at the host level. Regarding security vulnerabilities: We patch user control planes on a regular basis. Nodes are patched via deletion and recreation; this will only be done on an infrequent basis for stability and security enhancement. In the future, we'll have the ability to mark Nodes as "do not delete except for critical updates," which will help to prevent unwanted recreation.
Do we plan to add IPv6 dualstack to LKE?
We are interested in adding this capability, but we don't yet have a timeline to share.
Do we have information about what data centers we'll deploy LKE in for Object Storage purposes?
We would love to have LKE and Object Storage in every data center, and the use cases you've mentioned are some reasons to be excited about this possibility. We're working to roll these features out to each of our data centers over time.
@jyoo These answers are very helpful, thank you. Regarding the masters being highly available, does that mean there is more than 1? On https://www.linode.com/products/kubernetes/ it says:
All of your master and node components are monitored and will automatically recover if they fail.
Does this mean that they are automatically recovered if there is a problem, but we can expect downtime (for masters, meaning we can't deploy more pods, etc.), for instance, at the time of an upgrade because there is only 1 master?
This is not very clear, so I would just like to understand.
Thank you.