SSH SOCKS performance nosedive in new Linode architecture
Pages and resources (css, images, etc.) loaded through the proxy take a long time to load, and often fail to load before the browser gives up, reporting that the server did not respond in time. (The proxy on the old server introduces negligible delay in browsing.)
If I restart my browser, suddenly loading 15 or so tabs, they will generally all fail to load. (The proxy on the old server would successfully load such batched requests.)
I've confirmed that sshd and firewall configurations are the same between the two servers.
I've dropped firewall rules on the new server entirely and set all chains to accept, but the performance problems remain.
I've tried comparing sysctl variables. Most seem similar. I noted that kernel.threads-max was about 16k on the old server and about 8k on the new server, but I'd be surprised if that wasn't enough headroom to proxy a web page (or even 15).
I've tried clients from different source networks, but they all exhibit these issues when connecting through the new proxy. (They also all work fine through the old proxy.)
Both servers have low load and CPU/memory usage reported through htop/top, /proc/meminfo, etc.
ulimit -a on both machines are similar. Notable differences: pending signals is 4k on the new server instead of 16k, and max user processes is 4k (new) instead of unlimited. Again, while the new server has lower values, I'd expect them to cover proxying general single-client web traffic.
The slow performance includes sites on the same Linode as the proxy server. (These Linode-hosted sites load quickly outside of the proxy.)
SSH SOCKS proxies are not optimized for single-core systems, and work better on the 8-core old-architecture Linode. (I doubt this, as ssh was around long before multicore processors.)
There's something wrong on the client side. (I doubt this, as the configurations for connecting to both Linodes is the same.)
There's something wrong on the client networks. (I doubt this, as multiple networks have the same problems connecting to the new Linode, but they all work connecting to the old Linode.)
There's something on Linode's new infrastructure that impacts my use case. (I doubt this, as there aren't a lot of other complaints.)
My testing was incomplete or flawed, and some of those things above are actually causing the problem.
There's something I haven't found within my Linode that significantly affects performance.
My particular Linode's host hardware/network has issues.
I'm really eager to move forward on the new Linode infrastructure, but the current performance is effectively unusable. Any help would be greatly appreciated! Thanks!
3 Replies
First thing I would check is how your DNS is being handled. Are you doing local DNS queries, or are you tunneling them through the SOCKS proxy (e.g. network.proxy.socksremotedns = true in Firefox)?
I have my DNS tunneled, and the presence or absence of a caching DNS server on the SOCKS server can make a pretty big difference in network performance.
Additional testing:
Loading sites like cnn and google via curl on the Linode occurs quickly. In other words, the Linode isn't failing to contact these destination sites. However, the proxy is failing to relay them to the client.
P.S. The old machine, which didn't experience such problems, wasn't set to use a local resolver to answer DNS queries from local processes. That said, it was using different DNS servers, so the differences in the servers likely caused the difference in perceived performance through the proxy.