Unexpected downtime 1st of August 2014


Advanced search

Message boards : Server backend and mirrors : Unexpected downtime 1st of August 2014

Author Message
Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 13119 - Posted: 1 Aug 2014, 14:33:05 UTC
Last modified: 1 Aug 2014, 14:38:01 UTC

Yesterday we saw increased latency on one of our network cards (up to 450ms instead of the usual <1ms). We started diagnostics on the link together with the ISP of the fibre optics channel that is used for the bulk of the BURP data downloads.

Today the network card and two hard-drives (yes, two more...gah) just flat out died. The service is back up after around 1.5 hours of downtime but work will continue on fixing the hardware issues that remain.

The downtime should have been no more than a few minutes but was extended due to the replacement card not being compatible with the kernel configuration that the server was running. It took a little while extra to migrate the settings to a new kernel, compile it and install it.

Tomorrow we will be installing replacement drives and the service will be slower than usual while the raids sync up.

To avoid similar issues in the future there is a plan to migrate to a smarter network setup where the 3 network cards that the server has are used as failovers for each other. The other network hardware at the site supports such a setup, so it is just a matter of actually getting it done.

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 13138 - Posted: 2 Aug 2014, 14:36:58 UTC
Last modified: 2 Aug 2014, 14:37:44 UTC

Replacement drives have arrived (huh what?). They have started self tests and are about to go into the raid in a few hours from now, expect some slowdowns due to re-syncing.

funkydude
Send message
Joined: 23 Dec 13
Posts: 275
Credit: 2,478,281
RAC: 0
Message 13140 - Posted: 2 Aug 2014, 15:57:09 UTC

You certainly don't ride the luck train here.


Post to thread

Message boards : Server backend and mirrors : Unexpected downtime 1st of August 2014