Message boards :
Server backend and mirrors :
Unexpected downtime September 2019
Message board moderation
Author | Message |
---|---|
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
Storage issues The main storage server is experiencing some issues with the raid cards and the connections to the disks. Disks are dropping randomly and it is somewhat unclear why. A replacement raid card was ordered, delivered and installed but the hardware failed completely after the first few tests - looks like it is simply dead on arrival. A replacement replacement is now being ordered. We are now running on a temporary solution and have had to rebuild the site from weekly backups + daily deltas. Fortunately no data was lost because the switch-over was instantaneous. In a few hours the disk resyncs should be complete and a number of general maintenance tasks will begin, followed by creating another full restore point. More storage issues A BIOS issue with the SATA controllers has also been causing a separate set of disk-related issues, and the BIOS has now been updated. It seems like this improved the situation. Unfortunately it requires a full shutdown of the backend to update the BIOS like this, so hopefully it will not happen too often. Storage space We're trying to use less storage overall and recently the first pass of the year-long compression project completed. Most of our old sessions have had their frames compressed with lossless compression and the disk-space occupied by the original files is now ready to be released. Future work Other issues currently on the list: - We're running out of memory very often because the Java backend does not release it (and we have a few very memory hungry sessions in the mix currently). An update to Java should hopefully fix this. - There is an issue causing the SQL database to be very slow at starting up - The raid autodetect feature on the server is broken and needs to be fixed and tested. For now the raid is initialized manually. - A new version of Blender is available and the farm needs to be updated While this is going on I'm in the process of selling the apartment that I live in. Things are very much in a state of flux. |
Dark Angel Send message Joined: 13 May 18 Posts: 10 Credit: 238,462 RAC: 3,398 |
Have been getting the same recurring error ever since this was done. Tue 01 Oct 2019 13:18:22 AEST | BURP | update requested by user Tue 01 Oct 2019 13:18:27 AEST | BURP | Sending scheduler request: Requested by user. Tue 01 Oct 2019 13:18:27 AEST | BURP | Requesting new tasks for NVIDIA GPU Tue 01 Oct 2019 13:18:30 AEST | BURP | Scheduler request completed: got 0 new tasks Tue 01 Oct 2019 13:18:30 AEST | BURP | Project is temporarily shut down for maintenance |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
Quick update: - Raid is back in shape and the raid autodetect issue was resolved and tested - Java was updated but is still using quite a lot of memory after EXR image operations. Maybe previews of big EXR sessions will have to be disabled for a while until a better solution can be found. - The database startup issue was resolved and it now starts really quickly. Reboots are no longer multi-hour endeavours Also, the SSL certificate for the site was updated today. Ongoing work: - Work has begun on support for Blender 2.80 with the interesting addition of a new rendering engine: Eevee - Clear broken sessions from the renderqueue so it can be restarted |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
A quick update on the current situation: - Blender 2.80 was ready and deployed to the farm - Corona virus took 2020 out of the calendar - BURP is essentially in maintenance mode until the end of the year |