Suspending and resuming the client

Message boards : Client : Suspending and resuming the client
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 988 - Posted: 16 May 2005, 9:46:54 UTC
Last modified: 16 May 2005, 9:49:40 UTC

Here is a bit of technical information for those of you who are interested.

The nature of 3D rendering makes it very difficult to suspend a render completely (for instance by saving the current progress to disk) because a large portion of what remains to be done depens on what has already been done and what is in memory. In other words it will be too large an undertaking to save the current state of affairs to the disk to do checkpointing (because it would involve saving A LOT of data - even half a gigabyte of data or more in some cases).
Instead the BURP client will tell the BOINC client that it is special; since it cannot do checkpointing. BURP will therefore always be suspended to memory instead of to disk - regardless of your configuration in your account preferences. Your operating system will take care of swapping out BURP if more memory is needed for other applications (or other BOINC projects).

So one important note is that BURP will only support suspending to memory.

Now, how does it work then?
1) At some point the BOINC core client decides that BURP should suspend (either because the user hit suspend or because it is time for another project to start doing work).
2) The core client will then send a message to the BURP controller that tells it to go to sleep.
3) Before going to sleep, the controller tells the current renderer (Blender for instance) to go to sleep too.

BOINC core client --> BURP controller --> Renderer (Blender)

Even though the BURP controller is sleeping it will keep on monitoring the messages it gets from the BOINC core client. If, at some point, it is told to wake up again it will do so and also tell the renderer to wake up and continue its work.

Even more technical:
The connection between the core client and the BURP controller is made with a shared memory segment communication channel. This channel is set up and maintained primarily by BOINC.
The signalling for the renderer by the BURP controller is done by using OS-specific calls (SIG_STP for stopping and SIG_CONT for resuming in linux, OSX and other POSIX systems, thread/process API control calls in Windows).

The implementation of all this is expected to take a while, so don't assume that there will be any test-workunits sent out right away.

Now you know what is going on. You can follow the progress of this particular task on the front page progress-meters.
ID: 988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Stansbury
Project donor

Send message
Joined: 4 Mar 05
Posts: 33
Credit: 2,219
RAC: 0
Message 1024 - Posted: 21 May 2005, 1:37:54 UTC - in response to Message 988.  
Last modified: 21 May 2005, 1:40:20 UTC

<blockquote>
So one important note is that BURP will only support suspending to memory.
</blockquote>
One of my clients had already suspended the WU after 90 minutes processing time before I noticed. I suspended all other projects and this WU started again from the beginning and finished with no problems. This client does not enought memory to leave in memory(yet, only 256M) while suspended. So the suspend/restart function does work. Here is the WU, <a href="http://burp.boinc.dk/result.php?resultid=22058">22058</a>. Using 4.43
ID: 1024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 1073 - Posted: 22 May 2005, 11:04:48 UTC
Last modified: 22 May 2005, 11:06:33 UTC

This is the current state of progress in the implementation of the suspend/resume features:

Problem:
Currently suspending and resuming does not work if "suspend to memory" has not been selected in the preferences. In this case the WU will restart from the beginning next time it is allowed to run. I guess this is not really a bug - since the BURP client does what it is told: to get out of memory right away.
Solution (not implemented yet):
Perhaps at a later stage the BOINC core client scheduler will take non-preemptable workunits (like the BURP ones) into account when changing projects. Then it could let the BURP unit run untill it completes and _then_ do the changeover.
Temporary work-around:
Use "suspend to memory" or set "switch project every X min" higher than the length of a BURP unit.


Problem:
The BURP client continues to measure CPU time used even when it has been suspended - in other words the CPU time will keep increasing as clearly seen in this example:
...
CCom: 0.349876 - 3505.230000
CCom: 0.352424 - 3557.625000
CCom: 0.354972 - 3609.891000
Suspending child...
Resuming child...
CCom: 0.357520 - 12137.342000
CCom: 0.360068 - 12191.020000
CCom: 0.362616 - 12246.308000
...
This causes the BURP unit to reach the maximum CPU time safety limit in some cases and also causes the client to request a wrong amount of credit for the workunit.
Solution (not implemented yet):
Stop measuring time spent in suspended condition. Check if measured time is indeed CPU time and not realtime which something seems to point towards.
Temporary workaround:
Don't suspend BURP units for a large amount of time


I have no idea when/if the first problem will be addressed, however I hope to be able to fix the second problem in the next client release.
ID: 1073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 1167 - Posted: 27 Jun 2005, 20:43:33 UTC - in response to Message 1073.  

The newly released client version 4.19 has a fix for the suspend/resume timing problem, please check if it does indeed work out the way it is supposed to.
ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kajunfisher
Project donor
Avatar

Send message
Joined: 2 Apr 05
Posts: 19
Credit: 3,577
RAC: 0
Message 1171 - Posted: 27 Jun 2005, 22:10:06 UTC

Suspend/Resume doesn't seem to work here. On application switch (left in memory) CPU (99%) is split (50%) to another application running. Also CPU time, Progress, &amp; Time to complete do not advance.

It's showing that blender_4.19 was installed, but the blender app (using 311, 728k's) isn't allowing it to run.

Suggestions?

ID: 1171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Raimund Barbeln
Project donor
Avatar

Send message
Joined: 14 Mar 05
Posts: 73
Credit: 25,881
RAC: 1
Message 1172 - Posted: 27 Jun 2005, 22:22:54 UTC
Last modified: 27 Jun 2005, 22:23:46 UTC

Just wait!

CPU time will advance.

There seems to be some initialisation stuff at the beginning of a WU.
During this thime, there is no advance of cpu time, and it seems that you cannot suspend during this time.
It took over 8 minutes on my notbook (celeron 2600) for the first 0.03%, then, it caught up. It still says 22 hours left after 22 minutes of cpu Time.


when life gives you lemons, make lemonade!
ID: 1172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kajunfisher
Project donor
Avatar

Send message
Joined: 2 Apr 05
Posts: 19
Credit: 3,577
RAC: 0
Message 1175 - Posted: 27 Jun 2005, 22:37:02 UTC

I've had all other apps suspended for over 45 minutes, finally caught itself. But blender exe does stay in memory even after BOINC client has been exited out of. Rebooted, opened the mgr and blender runs at 99%....

Setup to switch between apps every 15 minutes. My deadlines are far enough out it can run til it's finished.

I'm patient :-) Just don't know if anybody else has gotten the same as me yet.




CC4.45/AMD2000/M$ Windoze/XPHome/1GBRAM/40GBHDD :-)
ID: 1175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 1192 - Posted: 28 Jun 2005, 7:25:16 UTC
Last modified: 28 Jun 2005, 7:27:43 UTC

The BURP client can only do things (suspend, update stats, exit etc.) when a full line of pixels has been rendered. In the case of a very complex scene this can take a while.
So when suspending you should expect a delay before the app actually suspends. Likewise when exitting there may be a certain delay as well - although I haven't tested this yet.

When the subframe rendering feature is completed this delay will be cut down considerably since your client will only be working on a small portion of the actual image. The currently running session would be a good candidate for subframe rendering.
ID: 1192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Client : Suspending and resuming the client