| Author | Message |
JanusVolunteer moderator Project administrator Send message Joined: 16 Jun 04 Posts: 3310 Credit: 924,442 RAC: 1
|
|
We\'ve switched to Alpha now, this means a few changes in how things are run, first and foremost BURP has changed from being completely experimental to being a tool, although it still remains in-development.
With the current settings you can expect the following quality requirements to be followed during Alpha:
- Sessions will either be queued for rendering or rejected within 48 hours with at least 90% certainty
- Downtime will be announced and scheduled (however, accidental unscheduled downtime may happen from time to time)
- The system will pick out at most one session to render every hour
Unfortunately the client-side stage of affairs isn\'t as good as I\'d have hoped for at this point, but due to time constraints this is how things are. The following issues still remain to be fully corrected:
- Timing isn\'t entirely accurate
- Sometimes the client leaves a Blender.exe process running out of control. This may cause the client to stall and may even interfere with other projects running on the same BOINC client.
- No support for resuming from disk
So far the best temporary solution to the second client issue is to set your \"switch between projects\" time very high (like 20 hours) in your general preferences or avoid running BURP alongside with other projects.
The best solution for issue #3 is to use \"Suspend to memory\".
With all this in mind, Alpha here we go!
|
|
|
|
|
|
Good luck too everybody with the Alpha. I hope, Janus, that you get the time and/or \"manpower\" to get smoothly to the Beta phase :-)))
Let\'s Render some cool stuf ;-)))
____________

Professor Desty Nova
Researching Karma the Hard way |
|
|
|
|
snip...
Unfortunately the client-side stage of affairs isn\'t as good as I\'d have hoped for at this point, but due to time constraints this is how things are. The following issues still remain to be fully corrected:
- Timing isn\'t entirely accurate
- Sometimes the client leaves a Blender.exe process running out of control. This may cause the client to stall and may even interfere with other projects running on the same BOINC client.
- No support for resuming from disk
So far the best temporary solution to the second client issue is to set your \"switch between projects\" time very high (like 20 hours) in your general preferences or avoid running BURP alongside with other projects.
The best solution for issue #3 is to use \"Suspend to memory\".
With all this in mind, Alpha here we go!
Hurrah!!! Bring on the next stage!
In my opinion the best solution to both #2 and #3 is upgrading to the latest client (5.8.16+).
There has been code checkin that should kill blender even if it does not exit smoothly on it\'s own. This has not made it into a client at this time but should be in 5.8.16 and later.
All of the 5.8.x clients try to switch only at checkpoints. So if an app does not checkpoint it will tend to keep running until finished.
____________
BOINC WIKI

BOINCing since 2002/12/8
|
|
|
|
|
|
So far so good only had small WUs of approx 1 minute
Edit: Website is going quite slow though and also BOINC keeps on losing connection with the project servers must be slightly overloaded?
____________
Join us in Chat (see the forum) Click the Sig

Join UBT |
|
|
|
|
|
Yes... Janus must be seeing the servers smoking ;-)
Edit: something must have broke with change. the page of session 252 is with errors http://burp.boinc.dk/session.php?id=252
____________

Professor Desty Nova
Researching Karma the Hard way |
|
|
|
|
|
trying to attach to project, unable to connet?!
\'the project is temporaraly unavailable.\'
I tried two BIONC versions, 5.8.15 and 5.8.11. Havn\'t been able to connect all night. |
|
|
|
|
|
jeah same problem with attaching to burp. boinc server maybe overloaded?
- baracutio
____________
|
|
|
|
|
|
Yep overloaded at moment from looks of it, lots of the files appear to be very small so this is probally increasing the server load
PS: Does anyone know if there are any graphs to show the server load?
____________
Join us in Chat (see the forum) Click the Sig

Join UBT |
|
|
|
|
|
I didn\'t like session 254, don\'t imagine you(Janus) or the servers liked it much either. Parts that take only half a minute are not so good. Causes way too much comm traffic.
Also seems to be another problem, or a code change has been made because it won\'t ask for more work until it has finished the one it is working on. So it finishes a part then restarts other project for a few seconds while getting another part to do, then starts blender again. All that stopping and starting waste time I think.
Session 255 is much better, actually takes some time to do a part.
Steve
____________
Win98SE XP2500+ Boinc v5.8.8
And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer. |
|
|
|
|
|
I have got some bigger ones now at last, things are returning back to normal kind of
____________
Join us in Chat (see the forum) Click the Sig

Join UBT |
|
|
|
|
|
Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.
my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong
This also means that we all got less credit than we should have, granted 0.45 credits. should have been about 5 credits.
Steve
____________
Win98SE XP2500+ Boinc v5.8.8
And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer. |
|
|
|
|
Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.
my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong
This also means that we all got less credit than we should have, granted 0.45 credits.
Steve
Must be the \"old\" problem of \"Timing isn\'t entirely accurate\" of the current blender application (read first post ;-).
But on my system it usually increases time, not the other way around.
____________

Professor Desty Nova
Researching Karma the Hard way
|
|
|
|
|
Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.
my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong
This also means that we all got less credit than we should have, granted 0.45 credits.
Steve
Must be the \"old\" problem of \"Timing isn\'t entirely accurate\" of the current blender application (read first post ;-).
In the last two years have never seen it to be off like this. Would not mind if it was just a small amount but the difference is very large.
____________
Win98SE XP2500+ Boinc v5.8.8
And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.
|
|
|
|
|
|
Janus may pop along shortly to advise, things seem to be going quite well though really
____________
Join us in Chat (see the forum) Click the Sig

Join UBT |
|
|
|
|
|
I am not having issues with d/l and u/l (not much anyway, u/ls sometimes stall for 10 sec or so). However, the website does time out for me rather frequently. I had to refresh the message page 3 times just to post something.
The WUs are only taking 8-16sec on my core2duo 2.66@3.05 and 30-40sec on my P4 3.2@3.52. Both systems have a gig of RAM. There also seems to be a max of 10WUs at a time. It looks like predictor messed up from the frequent preming do to this project finishing it small work so quickly. My system has nearly 3.5-4GB of mem&swap used. Can you increase the max # of WUs we can have at one time?
edit: sorry, I had writtin that several hours ago, I was just now able to get it to post. It would seem that there is now no work? Also, the Predictor issue seems to be predictor itself. (1.3GB virtual ram per WU!)
____________
 |
|
|
JanusVolunteer moderator Project administrator Send message Joined: 16 Jun 04 Posts: 3310 Credit: 924,442 RAC: 1
|
|
Hey all, and sorry for not posting before now. There have been plenty of issues to look at.
First and foremost session 252 completely threw the image stitcher (the program that assembles image parts into frames) into a spinning frenzy. With its 3000x2000 pixels and 64 parts/frame it was a bit too much for the old stitcher. With memory allocations in the range of 3GB/frame and runtimes of 30mins due to swapping everything slowed down for a bit. This is what caused some strange numbers on some of the session pages.
The solution was to rewrite the stitcher from scratch (again, yes) so that it uses memory and time proportional to O(size) and not O(size*parts). The new stitcher has been running smoothly for 15 mins now and processes even very large frames in just seconds.
Secondly all the happy BOINC clients that suddenly discovered that BURP was back online got the MySQL pretty loaded (the stitcher eating up memory and CPU on the same machine wasn\'t making it better...). And due to the slow SQL reponses Apache (on the webserver) got too many open connections at the same time and started timing out users to avoid too much work.
For those who know what it means the main server was peaking at a load average of 14 with a bit more than 13GB swap used (and all system memory).
All of the above ment that the mirrors had to update with very high speed whenever they were able to get through (normally sync load is spread over time to avoid congestion, but when half of the attempts fail they use whatever chance they can get). This caused the main outgoing network pipeline to get quite loaded as well. The recently installed QoS-module (quality of service) worked perfectly to make sure that bandwidth was delegated properly, so that connections that DID go through would not be dropped. Great to see that piece finally working as expected.
Getting started was a bit harsh on the system, but things are clearing up now - hopefully with no further big issues.
The new image stitcher is catching up where the old left and is hammering out completed frames like crazy right now.
About the timing issue currently in the client - this is twofold.
1) It measures something which is more equal to the wall clock than to the actual CPU time burned.
2) At some point (1hr 10mins?) it will randomly start wrapping to 0 |
|
|
|
|
|
givin the short WUs earlyer, can we have a larger daily quota?
2/25/2007 3:17:15 PM|BURP|Sending scheduler request: Requested by user
2/25/2007 3:17:15 PM|BURP|Requesting 25610 seconds of new work
2/25/2007 3:17:20 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 3:17:20 PM|BURP|Message from server: No work sent
2/25/2007 3:17:20 PM|BURP|Message from server: (reached daily quota of 400 results)
2/25/2007 3:17:20 PM|BURP|Deferring communication for 46 min 52 sec
2/25/2007 3:17:20 PM|BURP|Reason: requested by project |
|
|
|
|
|
I managed to get one WU which errored out.
Now I\'m getting:
25/02/2007 23:24:11|BURP|Message from server: Your computer has 1535.49MB of memory, and a job requires 1536.00MB
____________

www.chris-kent.co.uk aka Chief.com |
|
|
JanusVolunteer moderator Project administrator Send message Joined: 16 Jun 04 Posts: 3310 Credit: 924,442 RAC: 1
|
givin the short WUs earlyer, can we have a larger daily quota?
2/25/2007 3:17:15 PM|BURP|Sending scheduler request: Requested by user
2/25/2007 3:17:15 PM|BURP|Requesting 25610 seconds of new work
2/25/2007 3:17:20 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 3:17:20 PM|BURP|Message from server: No work sent
2/25/2007 3:17:20 PM|BURP|Message from server: (reached daily quota of 400 results)
2/25/2007 3:17:20 PM|BURP|Deferring communication for 46 min 52 sec
2/25/2007 3:17:20 PM|BURP|Reason: requested by project
Daily quota upped with 50% from 200 to 300 (per CPU). This will take effect tomorrow (I guess).
|
|
|
|
|
|
I assume all the ones that come back failed are marked so someone can check to see what went wrong?
____________
Join us in Chat (see the forum) Click the Sig

Join UBT |
|
|
|
|
2/25/2007 3:49:34 PM|BURP|Sending scheduler request: Requested by user
2/25/2007 3:49:34 PM|BURP|Requesting 25619 seconds of new work
2/25/2007 3:49:39 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 3:49:39 PM|BURP|Message from server: No work sent
2/25/2007 3:49:39 PM|BURP|Message from server: (reached daily quota of 400 results)
2/25/2007 3:49:39 PM|BURP|Deferring communication for 1 hr 3 min 1 sec
2/25/2007 3:49:39 PM|BURP|Reason: requested by project
It is still doing it, I guess I will just wate till tomorrow then. ;) Of the 3 comps I have at home, only my core2duo is having problems.
|
|
|
|
|
|
Great work Janus, it\'s great to see BURP churning out results, as well as such a speedy fixing of problems.
A couple of things:
There are several sessions that have been rendered that only have two frames - is this correct?
What version of Blender is BURP currently using? I seem to remember it being updated - was that to 2.42?
Thanks, and keep up the good work! |
|
|
|
|
|
I\'m just wondering - there are many sessions shown \"In Progress\" with different memory requirements. However, my client keeps getting the message \"Your computer has 883.14MB of memory, and a job requires 1536.00MB.\" (that would match session 271 \"Robot\"). All other sessions (at about 2 am UTC: 255, 272, 276, 278) still show something like
BOINC 25 Feb 2007 23:44:41 UTC Done
Render ETA: Unknown
End ... Session hasn\'t ended yet..
Does that mean there are no WUs generated for them yet? Or does the scheduler only check the topmost WU in its cache (simply FIFO) and ignores all other sessions (although I would match their memory requirements)?
My current client is running on Linux. I don\'t even know if we have an application for that platform now that we have reached alpha?
I keep getting that message for about two hours so if you automized creation process there could be something wrong?
____________
|
|
|
|
|
|
well, I was getting work for some time, but just as of now...
2/25/2007 9:22:16 PM|BURP|Sending scheduler request: To fetch work
2/25/2007 9:22:16 PM|BURP|Requesting 24785 seconds of new work, and reporting 10 completed tasks
2/25/2007 9:22:21 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 9:22:21 PM|BURP|Message from server: No work sent
2/25/2007 9:22:21 PM|BURP|Message from server: (reached daily quota of 600 results)
2/25/2007 9:22:21 PM|BURP|Deferring communication for 19 hr 20 min 41 sec
2/25/2007 9:22:21 PM|BURP|Reason: requested by project
man these core2duos rock!
|
|
|
JanusVolunteer moderator Project administrator Send message Joined: 16 Jun 04 Posts: 3310 Credit: 924,442 RAC: 1
|
There are several sessions that have been rendered that only have two frames - is this correct?
Yes, although they should have had only 1, since they are still-images - there\'s an off-by-one error to hunt down somewhere.
What version of Blender is BURP currently using? I seem to remember it being updated - was that to 2.42?
Yes, 2.42a to be exact.
[...] Does that mean there are no WUs generated for them yet? Or does the scheduler only check the topmost WU in its cache (simply FIFO) and ignores all other sessions (although I would match their memory requirements)?
This night we were running FIFO, right now we are trying a random scheduling. Later this week we\'ll try random scheduling with forced continuation (ie. a client that has a particular session must finish all work on that before going to the next).
[quote]My current client is running on Linux. I don\'t even know if we have an application for that platform now that we have reached alpha?
Unfortunately, as mentioned earlier, I\'ve run a little short of time and that has had to hurt this project. I decided to cut down on the amount of time I spend on the clients, since that\'s where I\'m most ineffective. That includes the linux client.
However, there\'s soon to be an announcement with the current client code so that you can compile and/or fix it yourself for whatever system you like.
core2duos rock
Daily quota upped with a further 33% from 300 to 400 per CPU.
|
|
|
|
|
|
Thanks for uping the WU again. I still cant get work for it is still saying 300/cpu, however, last time it seemed to take a while to pick up your new settings. I will check in the morning. |
|
|
|
|
|
Is it normal, that i don\'t get sessions anymore? There are more then 19 000 parts in the waiting queue but my BOINC will not download any sessions... |
|
|
|
|
|
it still at 300 per cpu...
2/26/2007 7:10:06 AM|BURP|Sending scheduler request: Requested by user
2/26/2007 7:10:06 AM|BURP|Requesting 24721 seconds of new work
2/26/2007 7:10:11 AM|BURP|Scheduler RPC succeeded [server version 509]
2/26/2007 7:10:11 AM|BURP|Message from server: No work sent
2/26/2007 7:10:11 AM|BURP|Message from server: (reached daily quota of 600 results)
2/26/2007 7:10:11 AM|BURP|Deferring communication for 9 hr 42 min 36 sec
2/26/2007 7:10:11 AM|BURP|Reason: requested by project |
|
|
|
|
|
And that means?
Are there to many clients/to less servers? Or is there an other problem? Sorry, I\'m not so good in English.
€dit:
I think, It\'s a good idea to send bigger parts to the clients that the server must not have so many connections and send bigger files. A workunit with about 30 minutes to work seems good on my PC (AMD Athlon XP 2400+). At this time, the CPU\'s working time is about 3 minutes or less. |
|
|
|
|
|
On all the other sessions over the last 2 years I have been able to have more than one Burp wu at a time but not anymore. This cause some annoying behaviour as you can see from my logs. It won\'t get more until after it has finished the one it is doing.
2/26/07 4:37:58 PM|BURP|Sending scheduler request: To fetch work
2/26/07 4:37:58 PM|BURP|Requesting 4084 seconds of new work
2/26/07 4:38:04 PM|BURP|Scheduler RPC succeeded [server version 509]
2/26/07 4:38:06 PM|BURP|[file_xfer] Started download of file 297in0.zip
2/26/07 4:40:16 PM|BURP|[file_xfer] Finished download of file 297in0.zip
2/26/07 4:40:16 PM|BURP|[file_xfer] Throughput 52919 bytes/sec
2/26/07 4:40:17 PM|BURP|Starting 297in0.zip__ses0000000297_frm0000000257_prt00007.wu_1
2/26/07 4:40:17 PM|BURP|Starting task 297in0.zip__ses0000000297_frm0000000257_prt00007.wu_1 using blender version 442
2/26/07 4:48:05 PM|BURP|Computation for task 297in0.zip__ses0000000297_frm0000000257_prt00007.wu_1 finished
2/26/07 4:48:05 PM|Einstein@Home|Starting l1_0577.0_S5R1__714_S5RIa_0
2/26/07 4:48:05 PM|Einstein@Home|Starting task l1_0577.0_S5R1__714_S5RIa_0 using einstein_S5RI version 424
2/26/07 4:48:08 PM|BURP|Sending scheduler request: To fetch work
2/26/07 4:48:08 PM|BURP|Requesting 4106 seconds of new work
2/26/07 4:48:09 PM|BURP|[file_xfer] Started upload of file 297in0.zip__ses0000000297_frm0000000257_prt00007.wu_1_0
2/26/07 4:48:13 PM|BURP|Scheduler RPC succeeded [server version 509]
2/26/07 4:48:14 PM|BURP|[file_xfer] Finished upload of file 297in0.zip__ses0000000297_frm0000000257_prt00007.wu_1_0
2/26/07 4:48:14 PM|BURP|[file_xfer] Throughput 52412 bytes/sec
2/26/07 4:48:15 PM|BURP|Starting 297in0.zip__ses0000000297_frm0000000469_prt00000.wu_1
2/26/07 4:48:15 PM|BURP|Starting task 297in0.zip__ses0000000297_frm0000000469_prt00000.wu_1 using blender version 442
2/26/07 4:52:54 PM|BURP|Computation for task 297in0.zip__ses0000000297_frm0000000469_prt00000.wu_1 finished
2/26/07 4:52:54 PM|Einstein@Home|Resuming task l1_0577.0_S5R1__714_S5RIa_0 using einstein_S5RI version 424
2/26/07 4:52:56 PM|BURP|Sending scheduler request: To fetch work
2/26/07 4:52:56 PM|BURP|Requesting 4119 seconds of new work, and reporting 1 completed tasks
2/26/07 4:52:57 PM|BURP|[file_xfer] Started upload of file 297in0.zip__ses0000000297_frm0000000469_prt00000.wu_1_0
2/26/07 4:53:00 PM|BURP|[file_xfer] Finished upload of file 297in0.zip__ses0000000297_frm0000000469_prt00000.wu_1_0
2/26/07 4:53:00 PM|BURP|[file_xfer] Throughput 39854 bytes/sec
2/26/07 4:53:01 PM|BURP|Scheduler RPC succeeded [server version 509]
2/26/07 4:53:03 PM|BURP|Starting 297in0.zip__ses0000000297_frm0000000516_prt00001.wu_2
2/26/07 4:53:03 PM|BURP|Starting task 297in0.zip__ses0000000297_frm0000000516_prt00001.wu_2 using blender version 442
2/26/07 4:57:41 PM|BURP|Computation for task 297in0.zip__ses0000000297_frm0000000516_prt00001.wu_2 finished
2/26/07 4:57:41 PM|Einstein@Home|Restarting task l1_0577.0_S5R1__714_S5RIa_0 using einstein_S5RI version 424
2/26/07 4:57:44 PM|BURP|[file_xfer] Started upload of file 297in0.zip__ses0000000297_frm0000000516_prt00001.wu_2_0
2/26/07 4:57:45 PM|BURP|Sending scheduler request: To fetch work
2/26/07 4:57:45 PM|BURP|Requesting 4131 seconds of new work, and reporting 1 completed tasks
2/26/07 4:57:49 PM|BURP|[file_xfer] Finished upload of file 297in0.zip__ses0000000297_frm0000000516_prt00001.wu_2_0
2/26/07 4:57:49 PM|BURP|[file_xfer] Throughput 13299 bytes/sec
2/26/07 4:57:50 PM|BURP|Scheduler RPC succeeded [server version 509]
2/26/07 4:57:52 PM|BURP|Starting 297in0.zip__ses0000000297_frm0000000523_prt00021.wu_1
2/26/07 4:57:52 PM|BURP|Starting task 297in0.zip__ses0000000297_frm0000000523_prt00021.wu_1 using blender version 442
____________
Win98SE XP2500+ Boinc v5.8.8
And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer. |
|
|
|
|
|
Good news, it finally started giving me more than 1 wu at a time. If you changed something Janus, then I thank you.
Steve
____________
Win98SE XP2500+ Boinc v5.8.8
And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer. |
|
|
|
|
snip...
Unfortunately the client-side stage of affairs isn\'t as good as I\'d have hoped for at this point, but due to time constraints this is how things are. The following issues still remain to be fully corrected:
- Timing isn\'t entirely accurate
- Sometimes the client leaves a Blender.exe process running out of control. This may cause the client to stall and may even interfere with other projects running on the same BOINC client.
- No support for resuming from disk
So far the best temporary solution to the second client issue is to set your \"switch between projects\" time very high (like 20 hours) in your general preferences or avoid running BURP alongside with other projects.
The best solution for issue #3 is to use \"Suspend to memory\".
With all this in mind, Alpha here we go!
Hurrah!!! Bring on the next stage!
In my opinion the best solution to both #2 and #3 is upgrading to the latest client (5.8.16+).
There has been code checkin that should kill blender even if it does not exit smoothly on it\'s own. This has not made it into a client at this time but should be in 5.8.16 and later.
All of the 5.8.x clients try to switch only at checkpoints. So if an app does not checkpoint it will tend to keep running until finished.
Sorry gang I was wrong here. The code to kill orphaned apps will not be in a client until the 5.10.x clients.
____________
BOINC WIKI

BOINCing since 2002/12/8
|
|
|
|
|
The solution was to rewrite the stitcher from scratch (again, yes) so that it uses memory and time proportional to O(size) and not O(size*parts). The new stitcher has been running smoothly for 15 mins now and processes even very large frames in just seconds.
Tiles are always vertical, aren\'t they? If you use libpng directly, I think it should be possible to stitch tiles with an O(image_width) memory usage, plus any internal libpng buffer.
And yes, I realize I\'m bumping an old thread.
____________

|
|
|
|
|
|
interesting... so does any of this result in why I\'m seeing new tasks coming in at 22minutes per chunk?
____________
 |
|
|