Getting a lot of network issues with Linode
I'm creating a "sandbox" for my CICD experiments, using:
- a mySql managed DB
- a k8s cluster
I'm provisioning the infrastructure using terraform and the linode/terraform plugin (1.29.4).
I'm operating also manually with kubectl and other tools. My region of choice is Frankfurt.
All of this using WSL on my Windows 10 Workstation
Since some days I'm experiencing a lot of issues, mainly network issues, both with terraform that with other tools. Up to about one week ago, all was flowing, and I am not able to rerun the same provisioning as before.
Some of the issues, I had so far:
- the db provisioning cannot be completed because the provider "disconnects" instead of waiting for the resource, like it was before. Therefore the mysql dbs are not created.
- the k8s cluster is often not responsive, and I get "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout". Both from terraform and kubectl. Or similar messages pointing actually that I cannot get in touch with my cluster
- ArgoCD (one of the apps I install in my cluster) becomes instable and not repsonsive.
All of these are for sure network issues. I observe infrastructure to be slow, provisioning of resources is slow even if created from UI, disconnection and timeouts from K8s and MySQL (the rare times I'm able to create them successfully).
But at the same time, I'm not able to exclude the issue is somewhere else.
Is someone else experiencing this?
Any advice?
EDIT: Another strange and hateable error:
After 28 mins of DB provisioning through Terraform…
module.lin_mysql_dev.linode_database_mysql.cluster: Still creating… [28m20s elapsed]
╷
│ Error: failed to wait for database active: failed to get db status: [502] Internal server error
When this happens, the next attempt with TF will say:
│ Error: failed to get cert for the specified mysql database: [400] Your database is provisioning; please wait until provisioning is complete to perform this operation.
And finally, when the DB will be active, TF will destroy and recreate it.
I'm wasting HOURS!
1 Reply
Since you mentioned that this deployment was working well up until about a week ago and that you believed this to be a network issue, the first thing I suggest is to attempt to diagnose any network connectivity problems you're experiencing. Using MTRs to follow your network route can be extremely helpful. Additionally, you may want to run the following curl
command:
for i in $(seq 50); do curl -iL --connect-timeout 5 <your.IP.address>; done | grep 200 | wc -l
For a bit of context, our Network Operations team is currently aware of some ISPs located in Poland that are experiencing intermittent connectivity issues to the Frankfurt data center. If you are in Poland, my suggestion would be to check your MTR reports for packet loss that can be attributed to your ISP and get in contact with them directly.
If the issue turns out not to be network-related, you can check out the recent closed PRs on the Linode Terraform Provider GitHub to see if they might have affected your deployment. If you find a recent change is the cause of this behavior, you can open a new issue to let the team know.