Getting random mysql errors intermitently and ideas?
Can not connect to the database. Lost connection to MySQL server during query
13 Replies
-Chris
I am on host45.linode.com
running Getnoo Kernel 2.4.29-linode39-1um
my mysql.log is empty wich probably means I never turned it on. I will have to read up and try to turn it on.
Here is what mysql.err shows
Number of processes running now: 1
mysqld process hanging, pid 4682 - killed
051229 10:30:42 mysqld restarted
Number of processes running now: 1
mysqld process hanging, pid 22519 - killed
051230 12:34:00 mysqld restarted
051230 12:34:04 mysqld ended
051230 13:50:00 mysqld started
Number of processes running now: 1
mysqld process hanging, pid 1255 - killed
mysqld restarted
Number of processes running now: 1
mysqld process hanging, pid 1780 - killed
051230 13:51:32 mysqld restarted
Number of processes running now: 1
mysqld process hanging, pid 1816 - killed
051230 13:54:17 mysqld restarted
I noticed the same problem with mysql. I am also on host45, with gentoo but with a 2.6 kernel. Basically, mysqld crashes randomly, normally mysqld then restarts automatically, but not all the time, and then my web site cannot work anymore, I have to restart mysql manually.
I tried to upgrade mysql from 4.0.25 to 4.1.14, but it didn't change anything …
On a side note, I noticed another weird thing : last time I rebooted my linode (few days ago), one of my service (spamd) did a segfault on startup, so I had to start it manually.
What's happening is similar to what Fifo reports, I get that kind of log in /var/log/mysql/mysql.err :
Number of processes running now: 1
mysqld process hanging, pid 18585 - killed
060101 08:21:03 mysqld restarted
and then in /var/log/mysql/mysqld.err :
060101 8:21:05 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files…
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer…
060101 8:21:06 InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 43784.
InnoDB: Doing recovery: scanned up to log sequence number 0 43784
InnoDB: Last MySQL binlog file position 0 79, file name ./zlinode-bin.000007
060101 8:21:06 InnoDB: Flushing modified pages from the buffer pool…
060101 8:21:06 InnoDB: Started; log sequence number 0 43784
/usr/sbin/mysqld: ready for connections.
Version: '4.1.14-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Gentoo Linux mysql-
4.1.14
the mysql.err continues to display the following everytuime this happens.
Number of processes running now: 1
mysqld process hanging, pid 1181 - killed
060105 17:52:49 mysqld restarted
here is mysqld.err
060105 17:53:04 InnoDB: Started
/usr/sbin/mysqld: ready for connections.
Version: '4.0.25-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Gentoo Linux mysql-4.0.25-r2
Warning: mysql_connect(): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in /var/www/localhost/htdocs/xxxx.php on line 17
Can not connect to the database. Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
@fifo:
This Problem keeps occurring. This morning it locked up the server and I couldnt access anything. The only thing that fixes it is a reboot. Anything I can try?
Locking up the Linode sounds like you might have been swap thrashing, or hit the IO-Limiter. Check /proc/swaps and /proc/io_status.
I'd give the mysql binary install from mysql.com a try. Also:
-Chris
Filename Type Size Used Priority
/dev/ubd/1 partition 263160 0 -1
here is the io_status
iocount=57000 iorate=0 iotokens=400000 tokenrefill=512 token_max=400000
I will have to give the mysql binary a try
One thing is that I didn't update my fundamental packages (like glibc and gcc) since this summer, so maybe it is a bug in glibc or another library that mysql depends on and that has been fixed since then. How old is your glibc, fifo ? Mine is at version 2.3.5-r1, and gcc is 3.3.5.20050130-r1, I'm about to upgrade to latest versions and see if it improved things …
What make me think that it may be a glibc problem is that I have another weird problem with a completely different tool, that is ImageMagick. Images converted and resized with it sometimes get garbages in them, altough the same version of ImageMagick on my local computer doesn't introduce these garbages. What's really weird is that these garbages appear each time at new random places but only on certain photos and always on these same photos, and the original photos on the server are perfectly intact. It's like if certain sequence of bytes in a file trigger some kind of stack overflow or anything that later on make the program behave in an unpredictable way.
I also tried to use NetPBM instead of ImageMagick, but this time I get garbages as well but on DIFFERENT photos.
I don't think these are bugs in these software because these are quite mature packages, beside I've never had any problems with them on my own local computer.
All this lead my to thing that there's something very fondamental that is unreliable. I hope it's just a bad glibc version.
About swap trashing, it's very unlikely because the traffic of my site is very low most of the time. I almost never use swap (I'm on a linode 160)
UPDATE : so I set that variable threadcachesize to 80 (not 40 as I said above) in my.cnf and updated glibc to 2.3.5-r2, and since then the problem hasn't shown up (in about 24 hours) To set the variable in /etc/mysql/my.cnf, you have to add that line in the [mysqld] section
set-variable = threadcachesize=80
However , I still have problems with ImageMagick and NetPBM, I'll try to recompile them (since I updated gcc too)
But since early this morning (midnight in France) it started again to behave wrongly exactly like two weeks ago before I changed the threadcachesize variable.
Now I have a script that automatically restarts mysqld when it happens, but it's still very annoying especially because each time databases need to be repaired after mysqld has crashed.