Server disk issues Oct 15-21

Message boards : Server backend and mirrors : Server disk issues Oct 15-21
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 11062 - Posted: 15 Oct 2011, 17:23:55 UTC
Last modified: 15 Oct 2011, 19:29:13 UTC

One of the storage disks spontaneously started failing read requests today. Normally this simply means that the 2TB of data is marked as "failed", a spare is activated and the raid continues as if nothing happened. Unfortunately this particular drive also carries some of the boot data needed by the storage server for start up, swap and system logs. This makes things slightly more complicated as those portions are outside the raid and cannot simply be pulled without human intervention.

The faulty drive slowed down data for the database and caused it to crash which in turn crashed/stalled all services that relied on the DB (BOINC, BURP, website etc.). There doesn't seem to be any indications of data loss yet but everything is being verified.

We will possibly see some downtime as a consequence of this the next few days.
ID: 11062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 11063 - Posted: 15 Oct 2011, 18:53:18 UTC
Last modified: 15 Oct 2011, 23:44:48 UTC

Plan for recovery:

1) [Done: 18:00 UTC] Deactivate swap
2) [Done: 19:20 UTC] Replicate readable portions of the disk to mirror drive
3) [Done: 21:50 UTC] Assess damage and verify copy against backups
Damage to root drive is critical. The drive cannot be used. Restoring from backups

4) [Done: 22:20 UTC] Reinstall bootloader on mirror drive
5) [Done: 23:43 UTC] Boot into mirror drive
6) [Done: 23:44 UTC] Remove faulty drive
7) [In progress] Replace with new drive
ID: 11063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Server backend and mirrors : Server disk issues Oct 15-21