Modified 139

Modified 139

Description

A HD re-render of 139
This version has some extra effects in it and also is in HD quality. It will be rendered in two portions starting with the last (big) one.

Message boards : Comments and discussion : 726

1 · 2 · 3 · 4 . . . 7 · Next
Author Message
Rollo
Project donor
Send message
Joined: 11 Mar 06
Posts: 20
Credit: 489,788
RAC: 77
Message 7320 - Posted: 5 Jan 2008, 10:33:39 UTC

This session seems to be a very long one. 128 parts per frame and each part taking its time. Wouldn\'t it be better to start with an initial replication of 3 instead of 4? In this case we don\'t waste time doing to 4th rendering.

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 7321 - Posted: 5 Jan 2008, 10:40:54 UTC - in response to Message 7320.
Last modified: 5 Jan 2008, 10:46:26 UTC

This session seems to be a very long one. 128 parts per frame and each part taking its time. Wouldn\'t it be better to start with an initial replication of 3 instead of 4? In this case we don\'t waste time doing to 4th rendering.

Once it is done creating the workunits it will switch to background mode, doing somewhat what you suggest.
Also background mode allows other sessions to take priority while rendering this one.

And yes, this is probably one of the longest sessions ever rendered on BURP so far. It was designed to push both the server and the clients to the limit.

Profile DangerNerd
Project donor
Send message
Joined: 31 Mar 06
Posts: 126
Credit: 263,029
RAC: 47
Message 7325 - Posted: 5 Jan 2008, 22:59:08 UTC

Janus, A sugeestion:

Have the \"unfinished units\" page redirect somewhere on this session, at least until you can code something that will only show a max of x workunits on that page.

Right now the load from generating that page would be extreme.

Thanks for the lovely session.

DN.
____________
Our Advice is to support all useful BOINC projects. Smart people needed to give advice to those who seek answers: Give or Get Free Advice Here

Profile DangerNerd
Project donor
Send message
Joined: 31 Mar 06
Posts: 126
Credit: 263,029
RAC: 47
Message 7326 - Posted: 5 Jan 2008, 23:02:42 UTC - in response to Message 7325.

A little research indicates the page currently weighs in at 14MB.

Imagine the SQL load to generate that one? :-(

DN.

nick
Send message
Joined: 23 Aug 06
Posts: 23
Credit: 251,238
RAC: 0
Message 7327 - Posted: 6 Jan 2008, 2:49:28 UTC
Last modified: 6 Jan 2008, 3:05:16 UTC

so are you going to be working on making the time till finished more accurate? as this thing looks like it is going to be taking up most of the year.

also how many CPUs are currently working on and returning work?
____________

AC
Project donor
Avatar
Send message
Joined: 30 Sep 07
Posts: 121
Credit: 143,874
RAC: 0
Message 7328 - Posted: 6 Jan 2008, 3:10:02 UTC

What was changed to make it take so long, anyway? I only looked very quickly, but the scene looks pretty much the same... is it just a framerate increase?

I\'ve noticed a lot of WUs erroring out on my main machine, but figured that\'s because I\'m currently also rendering a [non-BURP] scene of my own and compiling several hundred thousand lines of code. It said it couldn\'t get heartbeat from BOINC, so I assumed something got starved for CPU time... but now I see one of my other boxes is also erroring out and it\'s doing nothing but BURPing.

Profile noderaser
Project donor
Avatar
Send message
Joined: 28 Mar 06
Posts: 506
Credit: 1,548,030
RAC: 61
Message 7329 - Posted: 6 Jan 2008, 5:40:50 UTC

Sweet blender, over a mil and a quarter workunits! Looks like we\'ll be in BURP for a while.

No errors for me yet.
____________

Profile Nick
Send message
Joined: 6 Jul 06
Posts: 4
Credit: 298,362
RAC: 0
Message 7330 - Posted: 6 Jan 2008, 6:57:26 UTC
Last modified: 6 Jan 2008, 6:58:21 UTC

After reading some of the other comments, I would like to add that while all of my Windows boxes were completing successfully, my linux box wasn\'t. It seems that the application requires the i386 version of the SDL library, not the x86_64 version, link to the results

Achim
Send message
Joined: 17 May 05
Posts: 182
Credit: 2,505,702
RAC: 0
Message 7332 - Posted: 6 Jan 2008, 7:55:47 UTC - in response to Message 7330.

It seems that the application requires the i386 version of the SDL library, not the x86_64 version
.
Yes. linux (and windows as well) can only link libs with the same bit numbers together. that is the disadvantage of using the 32bit app on 64bit OS.

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 7333 - Posted: 6 Jan 2008, 9:55:55 UTC - in response to Message 7325.
Last modified: 6 Jan 2008, 10:11:54 UTC

At least until you can code something that will only show a max of x workunits on that page.

Done.

Right now the load from generating that page would be extreme.

Not really, the returned result set is static and is captured by the query cache and then served directly from there. Also the new server is insanely fast compared to what the dead one could do.
But since we can make no assumptions that this is the largest session ever to be rendered (someone could render one with 100 times as many workunits...) I\'ve implemented a limit on the page as suggested. The current limit is 10\'000 unfinished workunits.

so are you going to be working on making the time till finished more accurate? as this thing looks like it is going to be taking up most of the year.

A new renderqueue mode (\"nullifying\" mode) in combination with a smaller initial replication number (3 instead of 4) can do exactly what background rendering is doing right now. The only difference is that the new mode is measureable (it starts rendering from frame 1 and moves towards the end) while background mode is not (it renders random results, at some point they turn into full workunits but it is hard to estimate anything based on that).
We will be testing the new mode on the next large background render candidate.

also how many CPUs are currently working on and returning work?

1880 machines are considered \"very active\" - ie. they have returned work within the past two days.
Your question was \"how many CPUs\": When taking the number of CPUs into account this results in 3421 CPUs. However, this number is slightly off because not all CPUs on those machines need to have been active in order for the machine to appear active.

What was changed to make it take so long, anyway? I only looked very quickly, but the scene looks pretty much the same... is it just a framerate increase?

Area lights were added with high amount of multisampling (outch!)
The complexity of the surface of the \"ball\" was upped and uses physics simulations, causing more complex reflections
Scene-based motionblur was added (increasing the rendertime by roughly a factor 8+)
Radiosity calculations slightly changed

AC
Project donor
Avatar
Send message
Joined: 30 Sep 07
Posts: 121
Credit: 143,874
RAC: 0
Message 7336 - Posted: 6 Jan 2008, 12:24:02 UTC - in response to Message 7333.
Last modified: 6 Jan 2008, 12:27:54 UTC

...the new server is insanely fast compared to what the dead one could do...


Around 7am here (12 UTC) the server was very slow and eventually started handing me these:

Sorry. The project server is currently under heavy load and has had to temporarily reject requests from your IP. Please try again after at least 17 seconds.


The CPUsec/sec was slightly above 1 at the time, and now I see the status shows stalled. On the up-side, the web server seems faster now.

Profile Velociraptor
Send message
Joined: 17 Dec 06
Posts: 18
Credit: 8,400
RAC: 0
Message 7337 - Posted: 6 Jan 2008, 13:45:18 UTC
Last modified: 6 Jan 2008, 13:52:08 UTC

would it be possible to calculate the estimated time with the avarage time per wu and wus returned per hour? and then somehow multiply with the outstanding wus?

edit:
avarage time per wu multiplied by the number of outstanding wus would give the time needed to complete the session on one machine? then this devided by the cpusec/sec done by the grid you could calculate the time needed to complete the task or not?

edit2:
or how is it done now?

Paul.C
Send message
Joined: 6 Jan 08
Posts: 2
Credit: 264
RAC: 0
Message 7338 - Posted: 6 Jan 2008, 14:50:03 UTC

Hi guys, i just joined this project.
(i remember things at uni taking AGES to render, so this will he a handy project indeed).

I just finished my first result (in about 5 mins), and the status of this session 726, just went up by 0.0003% :o)

That\'s either due to me, or just coincidence form someone else :)

regards,
Paul

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 7339 - Posted: 6 Jan 2008, 14:59:04 UTC - in response to Message 7337.
Last modified: 6 Jan 2008, 15:34:19 UTC

or how is it done now?

That\'s somewhat how it\'s done now. There is a difference between counting completed workunits and counting incomming results. The number we have access to is the number of completed workunits.
Mostly it doesn\'t matter which number you use, but in background rendering mode the number of completed workunits drops to 0 in the beginning while the number of completed results stays as high as normally. When about halfway through the session this reverses and the number of completed workunits more than doubles what it usually is (in a non-bg render) - the number of incomming results is still the same.
As mentioned earlier the basic idea is to make use of the \"new background mode\" instead of the current one because the new one allows the system to better monitor progress and also is slightly less painful for the database server to keep track of.
The new mehod a.k.a. nullify method is called so because it counters the usual priority system completely by setting all results/workunits to a fixed low priority rather than background mode that sets them to a randomly low priority....a bit technical. It requires a few changes in the way that workunits are handled in general and will have to be combined with the boosting system that boosts late workunits. The boosting system is ment to speed up quick sessions while the background/nullify system is ment to speed up slow sessions and make room for other sessions too.

the server was very slow and eventually started handing [out temporary IP bans]

With a very large amount of results in the result table the MySQL database sometimes gets quite busy (usually when more than one search/scrape bot comes by while the system is exporting stats, serving result-pages to people and updating the table with new incomming results at the same time).
When this happens the system will attempt to service as many people/bots as possible while at the same time taking into consideration what these people do. So if you access a lot of DB-heavy pages like the computer pages and result pages you may run out of \"goodwill\" from the server and get temporarily banned. On other pages, like the forum, you may be able to pull down several hundred pages without getting an ip-ban.
Usually the ip-banner only hits the scraper bots but when the system is already very busy it will start hitting real users too.

At the moment the result table is slightly too large, meaning that we get a 30sec load spike from time to time due to the way that BOINC handles the result table.
Tomorrow morning (UTC) the usual monday maintenance will reduce the size of the result table dramatically which should cure the annoying load spikes.

[Edit:] I modified the logic in the loadmonitor ip-banner slightly so that it is less likely to ban real users. It may still happen if you browse around in the same pages a lot - for instance by pressing next a lot of times on the workunit/result list or browse more than one page per second for a sustained period of time.

Profile noderaser
Project donor
Avatar
Send message
Joined: 28 Mar 06
Posts: 506
Credit: 1,548,030
RAC: 61
Message 7348 - Posted: 6 Jan 2008, 22:34:53 UTC

The downside of having all this work, is that credit for it is going to be severely delayed.
____________

nick
Send message
Joined: 23 Aug 06
Posts: 23
Credit: 251,238
RAC: 0
Message 7349 - Posted: 7 Jan 2008, 0:05:17 UTC

looks like the server has locked up again, is there anything that can be done to help?
and what are the new specs of this server?
____________

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 7356 - Posted: 7 Jan 2008, 8:31:11 UTC - in response to Message 7349.
Last modified: 7 Jan 2008, 9:18:49 UTC

looks like the server has locked up again, is there anything that can be done to help?
and what are the new specs of this server?

In which way? No lockups registered in the logs here and the server has (generally, apart from the load spikes mentioned earlier - which are hopefully being removed as we speak) been sitting iddle at between 0.2 and 0.5 load.

and what are the new specs of this server?

Since the server had so much extra power it was further stress tested by adding two more CPU-hungry processes to the mix to see if it had any negative effect whatsoever - and guess what those two CPU hungry processes where? BOINC running BURP of course!
Here\'s a link.

Buster Gunn
Send message
Joined: 3 Oct 07
Posts: 1
Credit: 10,181
RAC: 0
Message 7359 - Posted: 7 Jan 2008, 15:25:53 UTC

Doesn\'t seem that replication results are being sent out. Everything remains in pending. Too many initial units? My pending credit is now over 800.

Ken
Send message
Joined: 3 Jul 07
Posts: 8
Credit: 46,883
RAC: 0
Message 7360 - Posted: 7 Jan 2008, 15:33:42 UTC - in response to Message 7359.

Doesn\'t seem that replication results are being sent out. Everything remains in pending. Too many initial units? My pending credit is now over 800.



I also did not get any credit for any work done yesterday.. but I presume it will show up eventually.

Ken

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4461
Credit: 2,094,806
RAC: 0
Message 7361 - Posted: 7 Jan 2008, 17:04:38 UTC
Last modified: 7 Jan 2008, 17:06:49 UTC

This is expected. Please read the indepth description of background rendering as well as the note previously in this thread about why we may want to switch to a new way of doing background rendering in the future - which is in part to better be able to monitor the session while at the same time granting credit as quickly as possible.

1 · 2 · 3 · 4 . . . 7 · Next
Post to thread

Message boards : Comments and discussion : 726