DRBD kills network connectivity

We are trying to setup a Highly Available file and database server (NFS, MySQL) following the instructions in the guide at

~~[http://library.linode.com/linux-ha/highly-available-file-database-server-ubuntu-10.04" target="_blank">](http://library.linode.com/linux-ha/high … untu-10.04">http://library.linode.com/linux-ha/highly-available-file-database-server-ubuntu-10.04](

We get to the point in the guide where we need to start DRBD service using the command

service drbd start

Right after we issue the command we get the following error in the console

[327432.458166] block drbd0: [drbd0_worker/773] sock_sendmsg time expired, ko = 4294967295
[327432.958202] block drbd0: PingAck did not arrive in time. 
[327432.958256] block drbd0: short read expecting header on sock: r=-512 
[327436.930217] block drbd1: PingAck did not arrive in time. 
[327436.930272] block drbd1: short read expecting header on sock: r=-512 
[327438.458203] block drbd0: drbd_send_block() failed

And all the network connectivity is lost ONLY in the first node. It is impossible to ping or connect using the public or private IPs. Even after issuing the command /etc/init.d/networking restart the connectivity is not restored. We are forced to reboot the Linode in order to restore network connectivity.

Has anyone experienced an issue like this? We find it very strange that we have issues only in one of the nodes.

Our setup:

Linux 2.6.32-45-generic-pae #102-Ubuntu SMP Wed Jan 2 22:10:16 UTC 2013 i686 GNU/Linux

/etc/hosts (identical in both servers):

127.0.0.1       localhost.localdomain   localhost
66.228.39.73    db1.comcastlmd.com      db1
66.228.38.167   db2.comcastlmd.com      db2

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/etc/drbd.d/global_common.conf (identical in both servers):

global {
        usage-count yes;
        # minor-count dialog-refresh disable-ip-verification
}

common {
        protocol C;

        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }

        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb;
        }

        disk {
                # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
                # no-disk-drain no-md-flushes max-bio-bvecs
        }

        net {
                # sndâbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
                # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
                # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
        }

        syncer {
                # rate after al-extents use-rle cpu-mask verify-alg csums-alg
        }
}

r0.res (in first node):

resource r0 {
    protocol C;
    syncer {
        rate 4M;
    }
    startup {
        wfc-timeout 60;
        degr-wfc-timeout 60;
    }
    net {
        cram-hmac-alg sha1;
        shared-secret "St9PHefu";
    }
    on db1 {
        device /dev/drbd0;
        disk /dev/xvdc;
        address 192.168.152.87:7789;
        meta-disk internal;
    }
    on db2 {
        device /dev/drbd0;
        disk /dev/xvdc;
        address 192.168.152.92:7789;
        meta-disk internal;
    }
}

r1.res (in first node):

resource r1 {
    protocol C;
    syncer {
        rate 4M;
    }
    startup {
        wfc-timeout 60;
        degr-wfc-timeout 60;
    }
    net {
        cram-hmac-alg sha1;
        shared-secret "St9PHefu";
    }
    on db1 {
        device /dev/drbd1;
        disk /dev/xvdd;
        address 192.168.152.87:7790;
        meta-disk internal;
    }
    on db2 {
        device /dev/drbd1;
        disk /dev/xvdd;
        address 192.168.152.92:7790;
        meta-disk internal;
    }
}

r0.res (in second node):

resource r0 {
    protocol C;
    syncer {
        rate 4M;
    }
    startup {
        wfc-timeout 60;
        degr-wfc-timeout 60;
    }
    net {
        cram-hmac-alg sha1;
        shared-secret "St9PHefu";
    }
    on db1 {
        device /dev/drbd0;
        disk /dev/xvdc;
        address 192.168.152.87:7789;
        meta-disk internal;
    }
    on db2 {
        device /dev/drbd0;
        disk /dev/xvdc;
        address 192.168.152.92:7789;
        meta-disk internal;
    }
}

r1.res (in second node):

resource r1 {
    protocol C;
    syncer {
        rate 4M;
    }
    startup {
        wfc-timeout 60;
        degr-wfc-timeout 60;
    }
    net {
        cram-hmac-alg sha1;
        shared-secret "St9PHefu";
    }
    on db1 {
        device /dev/drbd1;
        disk /dev/xvdd;
        address 192.168.152.87:7790;
        meta-disk internal;
    }
    on db2 {
        device /dev/drbd1;
        disk /dev/xvdd;
        address 192.168.152.92:7790;
        meta-disk internal;
    }
}

Let me know if you need any additional information. I will be happy to provide you access to the nodes if you would like to connect and help.

Thank you!

0 Replies

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct