Need help debugging a random connection timeout

Question

Need help debugging a random connection timeout

general

forum:Krupux 11 years, 7 months ago

Hi everyone,

Linux newbie here, I need help debugging a random connection timeout between my app server and my database server.

Servers:

Linode 768, Debian 6 (64bit) - App server (www1)

Ruby 2

Rails 3 (with Rainbows! as server)

Sidekiq (async background message processor)

pgbouncer

Linode 512, Debian 6 (64bit) - DB server (db1)

Postgres 9.2

Sphinx Search

Redis 2.6.11 (with AOF persistence)

Both are talking through private ip. Redis is used as my main Rails cache storage.

Problem:

Sometimes my application server would throw error like these:

Redis::TimeoutError (Connection timed out)

ActionView::Template::Error (Connection timed out):

It happened randomly, it can happen whether there are <10 people or >60 people active on the site.

The strange thing is, my postgres connection NEVER had such problem (timing out).

Another things to note are:

When I was still using memcache instead of redis, I get the random connection timeout to memcached as well.

Same thing when I was still using MySQL, my database connection never timed out.

Things I've tried:

I've monitored my server using new relic. My CPU, memory, IO, and bandwith seems to be OK. Average response time is acceptable 133ms.

I've upgraded to latest gems, ruby, redis, etc.

I've set my redis timeout = 0, tcp-keepalive = 60. From redis "info", rejected_connection stats is at 0.

I've opened support ticket, and they suggested I did a mtr, which seems to be ok:

mtr --report db1
HOST: www1                        Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. db1                           0.0%    10    0.5   0.5   0.4   0.8   0.1

mtr --report www1
HOST: db1                         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. www1                          0.0%    10    0.5   0.6   0.4   1.0   0.2

However, I can't do an mtr as the timeout happen, because it's so random I tend to only saw it via the Rails log.

I hope I didn't missed out any details. Any ideas where to start pinpointing where the problem is?

1 Reply

forum:Krupux · Answer 1 · April 6, 2013, 4:37 a.m.

forum:Krupux 11 years, 7 months ago

I'm moving Redis to app server (localhost) and see whether it stops the problem.

EDIT:

I've been monitoring for 2 days so far, and the problem seems to magically goes away after restarting both server (both now Linode 1GB) for the Nextgen free upgrade.

I also did upgrade my Linux kernel to latest (3.8.4 x64), and aptitude safe-upgrade all of the installed packages.

So at this point of time I've no idea whether it's fixed because of the increased memory, or the new machine/infrastructure, or some other thing.

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

Need help debugging a random connection timeout

1 Reply

Reply

Tips: