Local weather and power issues

Message boards : Server backend and mirrors : Local weather and power issues
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 10761 - Posted: 15 Feb 2011, 20:00:30 UTC
Last modified: 15 Feb 2011, 20:15:17 UTC

The weather has been quite windy here recently and today there has been several power outages and spikes. The last one took down the database server despite it being behind an uninterruptable power supply.

Downtime was around 2 minutes and an additional 15 minutes were spent recovering the results and hosts database tables so that workunits could again be sent out. Around 300 workunits in the database have been affected by this and the server may no longer have any recollection of having sent them out for rendering.

This is the first time that the UPS has failed to kick in properly.

In related news: The motherboard of one of the machines that are usually running workunit validation just died. It is under warranty and is being replaced as quickly as possible. However, this will temporarily affect the speed with which the server can validate results and grant credit.
ID: 10761 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 10762 - Posted: 16 Feb 2011, 8:03:12 UTC - in response to Message 10761.  
Last modified: 16 Feb 2011, 10:33:04 UTC

It was confirmed earlier today that the incident was indeed caused by snow entering one of the local power transformation stations and the spikes/outages occurred when switching to redirect power through other nearby stations and then back again.

A little funny thing: One of the machines outside of the UPS protection survived the outages without going down, while the server (which is inside UPS protection) went down. The power holes were around 0.25-0.5 seconds each.

[edit:] I just received word that the motherboard has indeed been tested faulty by the distributor and a replacement will be made available asap.
Furthermore the UPS is slightly overloaded now that the webserver is handling all the load of validations. This is the reason why it failed to kick in properly. It will be replaced with a considerably larger model some time next week when the snow is gone.
ID: 10762 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Server backend and mirrors : Local weather and power issues