Joined: 16 Jun 04
Things are back online after the two outages - one just before Xmas and another right after the beginning of the new year.
What happened in the first outage was that the "download" connection went down into super-slow-mode (like it has done a few times before). There is still no indication as to why this happens from time to time. The primary connection (usually only running the webserver and other interactive stuff) took over a bit of the load but to the BOINC client it would still look like a connection problem 2 out of 3 times. This situation was initially fixed remotely by switching entirely to the primary connection and then later, on the 23rd, by remotely rebooting the download connection modem.
On January the 2nd, a power supply short-circuited and caused the entire server cluster - and everything else sharing the UPS that the PSU was connected to - to immediately power off as a safety precaution in order to avoid starting a fire. The power supply was meant to provide power to the server doing post-processing work on the Sunflower workunits once rendered by the clients.
It didn't take long to find the problem, replace the PSU with a spare, run the necessary file system integrity checks and then start the servers back up. Unfortunately I didn't remember to start the feeder and I've been a bit too busy this week to notice it until today. Luckily no data was lost and no other hardware apart from the PSU was harmed - and I got to experience the terrifying howl of a UPS and power distribution system in short-circuit protection mode.
Apart from this slight stumble into 2013 things should now start moving forward again.