Message boards :
Server backend and mirrors :
Unexpected downtime 1st of August 2014
Message board moderation
Author | Message |
---|---|
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
Yesterday we saw increased latency on one of our network cards (up to 450ms instead of the usual <1ms). We started diagnostics on the link together with the ISP of the fibre optics channel that is used for the bulk of the BURP data downloads. Today the network card and two hard-drives (yes, two more...gah) just flat out died. The service is back up after around 1.5 hours of downtime but work will continue on fixing the hardware issues that remain. The downtime should have been no more than a few minutes but was extended due to the replacement card not being compatible with the kernel configuration that the server was running. It took a little while extra to migrate the settings to a new kernel, compile it and install it. Tomorrow we will be installing replacement drives and the service will be slower than usual while the raids sync up. To avoid similar issues in the future there is a plan to migrate to a smarter network setup where the 3 network cards that the server has are used as failovers for each other. The other network hardware at the site supports such a setup, so it is just a matter of actually getting it done. |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
Replacement drives have arrived (huh what?). They have started self tests and are about to go into the raid in a few hours from now, expect some slowdowns due to re-syncing. |
funkydude Send message Joined: 23 Dec 13 Posts: 275 Credit: 2,478,281 RAC: 0 |
You certainly don't ride the luck train here. |