Message boards :
Client :
Suspending and resuming the client
Message board moderation
Author | Message |
---|---|
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
Here is a bit of technical information for those of you who are interested. The nature of 3D rendering makes it very difficult to suspend a render completely (for instance by saving the current progress to disk) because a large portion of what remains to be done depens on what has already been done and what is in memory. In other words it will be too large an undertaking to save the current state of affairs to the disk to do checkpointing (because it would involve saving A LOT of data - even half a gigabyte of data or more in some cases). Instead the BURP client will tell the BOINC client that it is special; since it cannot do checkpointing. BURP will therefore always be suspended to memory instead of to disk - regardless of your configuration in your account preferences. Your operating system will take care of swapping out BURP if more memory is needed for other applications (or other BOINC projects). So one important note is that BURP will only support suspending to memory. Now, how does it work then? 1) At some point the BOINC core client decides that BURP should suspend (either because the user hit suspend or because it is time for another project to start doing work). 2) The core client will then send a message to the BURP controller that tells it to go to sleep. 3) Before going to sleep, the controller tells the current renderer (Blender for instance) to go to sleep too. BOINC core client --> BURP controller --> Renderer (Blender) Even though the BURP controller is sleeping it will keep on monitoring the messages it gets from the BOINC core client. If, at some point, it is told to wake up again it will do so and also tell the renderer to wake up and continue its work. Even more technical: The connection between the core client and the BURP controller is made with a shared memory segment communication channel. This channel is set up and maintained primarily by BOINC. The signalling for the renderer by the BURP controller is done by using OS-specific calls (SIG_STP for stopping and SIG_CONT for resuming in linux, OSX and other POSIX systems, thread/process API control calls in Windows). The implementation of all this is expected to take a while, so don't assume that there will be any test-workunits sent out right away. Now you know what is going on. You can follow the progress of this particular task on the front page progress-meters. |
Brian Stansbury Project donor Send message Joined: 4 Mar 05 Posts: 33 Credit: 2,219 RAC: 0 |
<blockquote> So one important note is that BURP will only support suspending to memory. </blockquote> One of my clients had already suspended the WU after 90 minutes processing time before I noticed. I suspended all other projects and this WU started again from the beginning and finished with no problems. This client does not enought memory to leave in memory(yet, only 256M) while suspended. So the suspend/restart function does work. Here is the WU, <a href="http://burp.boinc.dk/result.php?resultid=22058">22058</a>. Using 4.43 |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
This is the current state of progress in the implementation of the suspend/resume features: Problem: Currently suspending and resuming does not work if "suspend to memory" has not been selected in the preferences. In this case the WU will restart from the beginning next time it is allowed to run. I guess this is not really a bug - since the BURP client does what it is told: to get out of memory right away. Solution (not implemented yet): Perhaps at a later stage the BOINC core client scheduler will take non-preemptable workunits (like the BURP ones) into account when changing projects. Then it could let the BURP unit run untill it completes and _then_ do the changeover. Temporary work-around: Use "suspend to memory" or set "switch project every X min" higher than the length of a BURP unit. Problem: The BURP client continues to measure CPU time used even when it has been suspended - in other words the CPU time will keep increasing as clearly seen in this example: ... CCom: 0.349876 - 3505.230000 CCom: 0.352424 - 3557.625000 CCom: 0.354972 - 3609.891000 Suspending child... Resuming child... CCom: 0.357520 - 12137.342000 CCom: 0.360068 - 12191.020000 CCom: 0.362616 - 12246.308000 ... This causes the BURP unit to reach the maximum CPU time safety limit in some cases and also causes the client to request a wrong amount of credit for the workunit. Solution (not implemented yet): Stop measuring time spent in suspended condition. Check if measured time is indeed CPU time and not realtime which something seems to point towards. Temporary workaround: Don't suspend BURP units for a large amount of time I have no idea when/if the first problem will be addressed, however I hope to be able to fix the second problem in the next client release. |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
The newly released client version 4.19 has a fix for the suspend/resume timing problem, please check if it does indeed work out the way it is supposed to. |
![]() Project donor ![]() Send message Joined: 2 Apr 05 Posts: 19 Credit: 3,577 RAC: 0 |
Suspend/Resume doesn't seem to work here. On application switch (left in memory) CPU (99%) is split (50%) to another application running. Also CPU time, Progress, & Time to complete do not advance. It's showing that blender_4.19 was installed, but the blender app (using 311, 728k's) isn't allowing it to run. Suggestions? |
![]() Project donor ![]() Send message Joined: 14 Mar 05 Posts: 73 Credit: 25,881 RAC: 1 |
Just wait! CPU time will advance. There seems to be some initialisation stuff at the beginning of a WU. During this thime, there is no advance of cpu time, and it seems that you cannot suspend during this time. It took over 8 minutes on my notbook (celeron 2600) for the first 0.03%, then, it caught up. It still says 22 hours left after 22 minutes of cpu Time. ![]() when life gives you lemons, make lemonade! |
![]() Project donor ![]() Send message Joined: 2 Apr 05 Posts: 19 Credit: 3,577 RAC: 0 |
I've had all other apps suspended for over 45 minutes, finally caught itself. But blender exe does stay in memory even after BOINC client has been exited out of. Rebooted, opened the mgr and blender runs at 99%.... Setup to switch between apps every 15 minutes. My deadlines are far enough out it can run til it's finished. I'm patient :-) Just don't know if anybody else has gotten the same as me yet. CC4.45/AMD2000/M$ Windoze/XPHome/1GBRAM/40GBHDD :-) |
![]() Volunteer moderator Project administrator ![]() Send message Joined: 16 Jun 04 Posts: 4574 Credit: 2,100,463 RAC: 8 |
The BURP client can only do things (suspend, update stats, exit etc.) when a full line of pixels has been rendered. In the case of a very complex scene this can take a while. So when suspending you should expect a delay before the app actually suspends. Likewise when exitting there may be a certain delay as well - although I haven't tested this yet. When the subframe rendering feature is completed this delay will be cut down considerably since your client will only be working on a small portion of the actual image. The currently running session would be a good candidate for subframe rendering. |