Alpha


Advanced search

Message boards : Number crunching : Alpha

1 · 2 · Next
Author Message
Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4358
Credit: 2,094,806
RAC: 0
Message 4593 - Posted: 25 Feb 2007, 13:01:31 UTC
Last modified: 25 Feb 2007, 13:11:53 UTC

We\'ve switched to Alpha now, this means a few changes in how things are run, first and foremost BURP has changed from being completely experimental to being a tool, although it still remains in-development.
With the current settings you can expect the following quality requirements to be followed during Alpha:


  • Sessions will either be queued for rendering or rejected within 48 hours with at least 90% certainty
  • Downtime will be announced and scheduled (however, accidental unscheduled downtime may happen from time to time)
  • The system will pick out at most one session to render every hour



Unfortunately the client-side stage of affairs isn\'t as good as I\'d have hoped for at this point, but due to time constraints this is how things are. The following issues still remain to be fully corrected:


  • Timing isn\'t entirely accurate
  • Sometimes the client leaves a Blender.exe process running out of control. This may cause the client to stall and may even interfere with other projects running on the same BOINC client.
  • No support for resuming from disk


So far the best temporary solution to the second client issue is to set your \"switch between projects\" time very high (like 20 hours) in your general preferences or avoid running BURP alongside with other projects.
The best solution for issue #3 is to use \"Suspend to memory\".

With all this in mind, Alpha here we go!

Professor Desty Nova
Avatar
Send message
Joined: 21 Mar 05
Posts: 95
Credit: 248,304
RAC: 231
Message 4598 - Posted: 25 Feb 2007, 15:02:57 UTC
Last modified: 25 Feb 2007, 15:06:51 UTC

Good luck too everybody with the Alpha. I hope, Janus, that you get the time and/or \"manpower\" to get smoothly to the Beta phase :-)))

Let\'s Render some cool stuf ;-)))
____________


Professor Desty Nova
Researching Karma the Hard way

Profile Keck_Komputers
Avatar
Send message
Joined: 6 Mar 05
Posts: 94
Credit: 1,232,213
RAC: 685
Message 4599 - Posted: 25 Feb 2007, 15:48:20 UTC - in response to Message 4593.

snip...
Unfortunately the client-side stage of affairs isn\'t as good as I\'d have hoped for at this point, but due to time constraints this is how things are. The following issues still remain to be fully corrected:

  • Timing isn\'t entirely accurate
  • Sometimes the client leaves a Blender.exe process running out of control. This may cause the client to stall and may even interfere with other projects running on the same BOINC client.
  • No support for resuming from disk


So far the best temporary solution to the second client issue is to set your \"switch between projects\" time very high (like 20 hours) in your general preferences or avoid running BURP alongside with other projects.
The best solution for issue #3 is to use \"Suspend to memory\".

With all this in mind, Alpha here we go!



Hurrah!!! Bring on the next stage!

In my opinion the best solution to both #2 and #3 is upgrading to the latest client (5.8.16+).

There has been code checkin that should kill blender even if it does not exit smoothly on it\'s own. This has not made it into a client at this time but should be in 5.8.16 and later.

All of the 5.8.x clients try to switch only at checkpoints. So if an app does not checkpoint it will tend to keep running until finished.


____________
BOINC WIKI

BOINCing since 2002/12/8

Profile UBT - Halifax--lad
Avatar
Send message
Joined: 10 Sep 05
Posts: 74
Credit: 589
RAC: 0
Message 4600 - Posted: 25 Feb 2007, 16:25:46 UTC
Last modified: 25 Feb 2007, 16:26:59 UTC

So far so good only had small WUs of approx 1 minute

Edit: Website is going quite slow though and also BOINC keeps on losing connection with the project servers must be slightly overloaded?
____________
Join us in Chat (see the forum) Click the Sig


Join UBT

Professor Desty Nova
Avatar
Send message
Joined: 21 Mar 05
Posts: 95
Credit: 248,304
RAC: 231
Message 4602 - Posted: 25 Feb 2007, 18:06:25 UTC
Last modified: 25 Feb 2007, 18:10:07 UTC

Yes... Janus must be seeing the servers smoking ;-)

Edit: something must have broke with change. the page of session 252 is with errors http://burp.boinc.dk/session.php?id=252
____________


Professor Desty Nova
Researching Karma the Hard way

Profile mpan3
Send message
Joined: 24 Apr 06
Posts: 64
Credit: 29,899
RAC: 0
Message 4603 - Posted: 25 Feb 2007, 18:20:44 UTC
Last modified: 25 Feb 2007, 18:25:45 UTC

trying to attach to project, unable to connet?!
\'the project is temporaraly unavailable.\'

I tried two BIONC versions, 5.8.15 and 5.8.11. Havn\'t been able to connect all night.

baracutio
Project donor
Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4604 - Posted: 25 Feb 2007, 18:23:57 UTC

jeah same problem with attaching to burp. boinc server maybe overloaded?



- baracutio
____________

Profile UBT - Halifax--lad
Avatar
Send message
Joined: 10 Sep 05
Posts: 74
Credit: 589
RAC: 0
Message 4605 - Posted: 25 Feb 2007, 18:33:40 UTC

Yep overloaded at moment from looks of it, lots of the files appear to be very small so this is probally increasing the server load

PS: Does anyone know if there are any graphs to show the server load?
____________
Join us in Chat (see the forum) Click the Sig


Join UBT

Profile Steve Cressman
Avatar
Send message
Joined: 27 Mar 05
Posts: 142
Credit: 3,243
RAC: 0
Message 4606 - Posted: 25 Feb 2007, 18:39:03 UTC
Last modified: 25 Feb 2007, 18:40:22 UTC

I didn\'t like session 254, don\'t imagine you(Janus) or the servers liked it much either. Parts that take only half a minute are not so good. Causes way too much comm traffic.

Also seems to be another problem, or a code change has been made because it won\'t ask for more work until it has finished the one it is working on. So it finishes a part then restarts other project for a few seconds while getting another part to do, then starts blender again. All that stopping and starting waste time I think.

Session 255 is much better, actually takes some time to do a part.

Steve
____________
Win98SE XP2500+ Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.

Profile UBT - Halifax--lad
Avatar
Send message
Joined: 10 Sep 05
Posts: 74
Credit: 589
RAC: 0
Message 4607 - Posted: 25 Feb 2007, 18:40:09 UTC

I have got some bigger ones now at last, things are returning back to normal kind of
____________
Join us in Chat (see the forum) Click the Sig


Join UBT

Profile Steve Cressman
Avatar
Send message
Joined: 27 Mar 05
Posts: 142
Credit: 3,243
RAC: 0
Message 4612 - Posted: 25 Feb 2007, 19:14:01 UTC
Last modified: 25 Feb 2007, 20:00:59 UTC

Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.

my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong

This also means that we all got less credit than we should have, granted 0.45 credits. should have been about 5 credits.

Steve
____________
Win98SE XP2500+ Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.

Professor Desty Nova
Avatar
Send message
Joined: 21 Mar 05
Posts: 95
Credit: 248,304
RAC: 231
Message 4613 - Posted: 25 Feb 2007, 19:23:28 UTC - in response to Message 4612.
Last modified: 25 Feb 2007, 19:29:06 UTC

Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.

my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong

This also means that we all got less credit than we should have, granted 0.45 credits.

Steve


Must be the \"old\" problem of \"Timing isn\'t entirely accurate\" of the current blender application (read first post ;-).
But on my system it usually increases time, not the other way around.

____________


Professor Desty Nova
Researching Karma the Hard way

Profile Steve Cressman
Avatar
Send message
Joined: 27 Mar 05
Posts: 142
Credit: 3,243
RAC: 0
Message 4614 - Posted: 25 Feb 2007, 19:28:01 UTC - in response to Message 4613.

Seeing something strange with this wu , session 255. Mine is the only one that shows the right time on the result page. If you check the stdrr out for the other hosts who did this one you will see it is way off.

my host 1870 , stddr - Saved: out Time: 28:31.86 , result page - 1,787.00 sec correct
host 7333 , stddr - Saved: out Time: 25:13.71 , result page - 240.23 sec wrong
host 20247, stddr - Saved: out Time: 53:53.81 , result page - 542.59 sec wrong
host 23564, stddr - Saved: out Time: 18:01.75 , result page - 183.56 sec wrong

This also means that we all got less credit than we should have, granted 0.45 credits.

Steve


Must be the \"old\" problem of \"Timing isn\'t entirely accurate\" of the current blender application (read first post ;-).


In the last two years have never seen it to be off like this. Would not mind if it was just a small amount but the difference is very large.

____________
Win98SE XP2500+ Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.

Profile UBT - Halifax--lad
Avatar
Send message
Joined: 10 Sep 05
Posts: 74
Credit: 589
RAC: 0
Message 4615 - Posted: 25 Feb 2007, 19:28:53 UTC

Janus may pop along shortly to advise, things seem to be going quite well though really
____________
Join us in Chat (see the forum) Click the Sig


Join UBT

Profile [SETI.USA]Tank_Master
Avatar
Send message
Joined: 6 May 05
Posts: 49
Credit: 185,131
RAC: 0
Message 4616 - Posted: 25 Feb 2007, 19:55:45 UTC
Last modified: 25 Feb 2007, 20:00:57 UTC

I am not having issues with d/l and u/l (not much anyway, u/ls sometimes stall for 10 sec or so). However, the website does time out for me rather frequently. I had to refresh the message page 3 times just to post something.

The WUs are only taking 8-16sec on my core2duo 2.66@3.05 and 30-40sec on my P4 3.2@3.52. Both systems have a gig of RAM. There also seems to be a max of 10WUs at a time. It looks like predictor messed up from the frequent preming do to this project finishing it small work so quickly. My system has nearly 3.5-4GB of mem&swap used. Can you increase the max # of WUs we can have at one time?

edit: sorry, I had writtin that several hours ago, I was just now able to get it to post. It would seem that there is now no work? Also, the Predictor issue seems to be predictor itself. (1.3GB virtual ram per WU!)
____________

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4358
Credit: 2,094,806
RAC: 0
Message 4621 - Posted: 25 Feb 2007, 21:13:08 UTC
Last modified: 25 Feb 2007, 21:22:16 UTC

Hey all, and sorry for not posting before now. There have been plenty of issues to look at.
First and foremost session 252 completely threw the image stitcher (the program that assembles image parts into frames) into a spinning frenzy. With its 3000x2000 pixels and 64 parts/frame it was a bit too much for the old stitcher. With memory allocations in the range of 3GB/frame and runtimes of 30mins due to swapping everything slowed down for a bit. This is what caused some strange numbers on some of the session pages.
The solution was to rewrite the stitcher from scratch (again, yes) so that it uses memory and time proportional to O(size) and not O(size*parts). The new stitcher has been running smoothly for 15 mins now and processes even very large frames in just seconds.

Secondly all the happy BOINC clients that suddenly discovered that BURP was back online got the MySQL pretty loaded (the stitcher eating up memory and CPU on the same machine wasn\'t making it better...). And due to the slow SQL reponses Apache (on the webserver) got too many open connections at the same time and started timing out users to avoid too much work.

For those who know what it means the main server was peaking at a load average of 14 with a bit more than 13GB swap used (and all system memory).

All of the above ment that the mirrors had to update with very high speed whenever they were able to get through (normally sync load is spread over time to avoid congestion, but when half of the attempts fail they use whatever chance they can get). This caused the main outgoing network pipeline to get quite loaded as well. The recently installed QoS-module (quality of service) worked perfectly to make sure that bandwidth was delegated properly, so that connections that DID go through would not be dropped. Great to see that piece finally working as expected.

Getting started was a bit harsh on the system, but things are clearing up now - hopefully with no further big issues.
The new image stitcher is catching up where the old left and is hammering out completed frames like crazy right now.

About the timing issue currently in the client - this is twofold.
1) It measures something which is more equal to the wall clock than to the actual CPU time burned.
2) At some point (1hr 10mins?) it will randomly start wrapping to 0

Profile [SETI.USA]Tank_Master
Avatar
Send message
Joined: 6 May 05
Posts: 49
Credit: 185,131
RAC: 0
Message 4624 - Posted: 25 Feb 2007, 23:21:07 UTC

givin the short WUs earlyer, can we have a larger daily quota?

2/25/2007 3:17:15 PM|BURP|Sending scheduler request: Requested by user
2/25/2007 3:17:15 PM|BURP|Requesting 25610 seconds of new work
2/25/2007 3:17:20 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 3:17:20 PM|BURP|Message from server: No work sent
2/25/2007 3:17:20 PM|BURP|Message from server: (reached daily quota of 400 results)
2/25/2007 3:17:20 PM|BURP|Deferring communication for 46 min 52 sec
2/25/2007 3:17:20 PM|BURP|Reason: requested by project

KWSN Sir Clark
Avatar
Send message
Joined: 24 Jul 05
Posts: 11
Credit: 1,309
RAC: 0
Message 4625 - Posted: 25 Feb 2007, 23:27:06 UTC

I managed to get one WU which errored out.

Now I\'m getting:
25/02/2007 23:24:11|BURP|Message from server: Your computer has 1535.49MB of memory, and a job requires 1536.00MB

____________


www.chris-kent.co.uk aka Chief.com

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4358
Credit: 2,094,806
RAC: 0
Message 4627 - Posted: 25 Feb 2007, 23:28:37 UTC - in response to Message 4624.
Last modified: 25 Feb 2007, 23:29:13 UTC

givin the short WUs earlyer, can we have a larger daily quota?

2/25/2007 3:17:15 PM|BURP|Sending scheduler request: Requested by user
2/25/2007 3:17:15 PM|BURP|Requesting 25610 seconds of new work
2/25/2007 3:17:20 PM|BURP|Scheduler RPC succeeded [server version 509]
2/25/2007 3:17:20 PM|BURP|Message from server: No work sent
2/25/2007 3:17:20 PM|BURP|Message from server: (reached daily quota of 400 results)
2/25/2007 3:17:20 PM|BURP|Deferring communication for 46 min 52 sec
2/25/2007 3:17:20 PM|BURP|Reason: requested by project

Daily quota upped with 50% from 200 to 300 (per CPU). This will take effect tomorrow (I guess).

Profile UBT - Halifax--lad
Avatar
Send message
Joined: 10 Sep 05
Posts: 74
Credit: 589
RAC: 0
Message 4629 - Posted: 25 Feb 2007, 23:35:42 UTC

I assume all the ones that come back failed are marked so someone can check to see what went wrong?
____________
Join us in Chat (see the forum) Click the Sig


Join UBT

1 · 2 · Next
Post to thread

Message boards : Number crunching : Alpha