Short version: Stuff got too hot, things are fine:
Long-winded and technical version:
Turns out that this was caused by an I/O overload at around 18:05 UTC. The server was running a number of things at the same time (due to the maintenance and other things):
- A session finished and went into encode mode. This causes a very fast bulk read of a lot of data followed by a slower trickle and a lot of CPU while the encoding is going on. Sunflower sessions in particular cause extra load because they convert the raw EXR datafiles into PNGs before encoding.
- CATS was moving data to my workstation and offsite for backups. This kept a couple of the network cards pretty busy
- The database was scanning through results to produce the hourly stats export
- A couple of google and baiku bots were scanning the old parts of the gallery
- The raid storage array was performing a consistency scan
- Normal project operation was running at the same time
Apparently this was a bit too much and the server became a bit too hot and decided to slow down, then take a little break, first for 10 secs then a longer break for around 20 secs. Eventually a hardware watchdog killed the system and restarted it. This caused the filesystem to drop the last ~14 secs of changes and revert back to the last known good configuration - which is why 2 rows were dropped from the result table.
At this point the server was simply sitting in failsafe mode waiting for an admin to acknowledge the report about DB inconsistencies. Once I noticed the warning it then took a few hours to post about it on the forums, run the database repairs and then fire BOINC+BURP back up.
The ventilation around the data server has been improved a bit and the ambient temperature is also going to be declining over the next week+months, so hopefully this should not repeat next week.
Anyways, there you have it. Long-winded indeed.