A different way to create WUs

Message boards : Server backend and mirrors : A different way to create WUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9367 - Posted: 16 Mar 2009, 16:16:50 UTC

So I was thinking, there is an option for parts per frame, and at one point I suggested having more than one frame per WU so that even shorter render times would be feasible here on BURP. I was thinking tho that that is awfully static, and might have a lot of problems. Lets say I have a scene where characters walk in and out, with different backgrounds throughout the scene. In this scene some frames will take 10 seconds to render, but others 10 hours. If I upload this to BURP how many parts per frame does that make? Or sometimes that setting is not set optimally, and I get 6 hour WUs and I get 15 minute WUs. I will try to describe my thoughts on how to overcome this challenge.

First off, when the user uploads a session they would not be asked how many parts per frame. Then when the WU\'s are distributed the assignments would look less like \'render this much of frame 120\' and more like \'render as much as you can in 3 hours from frame 1 to frame 30\'. To make sure that parts of frames are not thrown out if they are clipped by the time limit, the client would split each frame into parts, maybe 10 or 100, I am not sure what would be optimal. Then it would start rendering on frame one part one, and then frame one part two, and frame three part three, etc. until it either reached three hours or reached the last frame, in this example 30. It would then upload the rendered parts to the server. The server however during this would also have a behavior change; because the server doesn\'t know how much to expect returned it would initially start by assigning frame 1-30 to a few clients for redundancy, put a lock on those frames, then the next WU would be for frames 101-130, again assigned to a few computers. This stepping of 100 frames is the key - The server can take the amount rendered and extrapolate to a per-frame render time. If 30 frames are rendered in a WU that is returned in one hour, the server would find that a frame takes 2 minutes. Or, if one part of 10 was uploaded in three hours because of the time limit, we can extrapolate the times, meaning each frame takes 30 hours to render. These times would be plotted on a graph, the constant axes being the frame number, then the variable axis being the render time for that frame. with the 100 frame stepping the server can make a general graph of the render times for the session fairly quickly.

This graph could be used for the next level of server intelligence, or what I described before that would work as a simpler level. In that level the server is assigning work in a more efficient manner, where each WU will not render longer than 3 hours, and the server is closer to a balance of sending more work at a time and using less network traffic, also less database queries. But this is a preset guess at an optimal point that is always changing. If too many frames are assigned then the lock on those frames will become inefficient. If frames 1-30 are assigned in a WU, then in a later WU frames 31-60 are assigned, but only 20 frames of each could be rendered before the time limit was reached there would be a 10 frame gap. Frames 1-21 are rendered, and frames 31-51 are rendered, but frames 21-31 are not. While this is fixable, by simply assigning frame 21-31 to a WU, possibly with another set of frames that are in such a gap. However if one of the original WUs was rendered in 10 minutes, there would still be a lot of server traffic, meaning there would still be a large minimum render time. Because of these last two problems, I find this system to be in-elegant. This is where that graph comes in.

with this graph of how long WUs take with a 100 frame interval, the server can go back through and assign WUs more efficiently. If frames 1-30 took 1.5 hours to render, and 100-130 took 2 we can guess that frame 50 will take 1.75 hours. To help visualize this here is a graph:
[img url=http://img22.imageshack.us/img22/6074/graphtco.jpg]

So we know that frame 50 takes 1.75 hours, and so if we want to assign this to a client we should assign 2.6 frames. Two frames will theoretically bring us to 3 hours. However computer speeds may not be the same, though since more than one computer rendered each batch of frames the result will be more average. Also our frame time is only a guess, an educated guess, but a guess all the same. To build in a margin we could assign more work to the WU. If we assign too much we end up with a gap again, but not nearly as large, or if we don\'t assign enough there will be a lot of network traffic. Hence this margin might need some tuning, but I figure that a starting point might be 1 hour extra work, and since each frame should take 1.75 hours, or one hour and 45 minutes, one .6th of a frame is around one hour. Then as each of the WUs come in those are plotted on the graph, so the estimation of WU render time is always getting better.

Why use a system like this? Well unless I am missing something, in which case all that I\'v said is wrong so please ignore, this would make both clients happy and mean less work for the server. It means less network connections, which is good for both parties. Taken to an extreme the user could request XX hours of work at a time, so if a user wants to only render 15 minutes of work, or 5 hours they could. It also means that the hypothetical upload I mentioned where the render times in the scene vary greatly can be efficiently rendered. It means that users can upload an animation that takes 5 seconds per frame, with thousands of frames, and not nearly as many WUs have to be created. To make the uploads of these files more feasible I might consider packing all of the results into one .zip and uploading that to make less connections.

Keep in mind if you read this that I might know nothing about what I\'m talking about, in which case ignore me, but as I read the code for BURP V3 it looks like this wouldn\'t be too major a change, and I am willing to help code some of this in PHP.

~please criticize, this may or may not be a worthless idea - Istvan
ID: 9367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 9372 - Posted: 17 Mar 2009, 21:38:39 UTC

there are points to think about...

following situation:
a really short session is distributed to the hosts (lets say 600) and now everyone starts to render as much as possible in 3 hours. okay, the whole session needs about 2 hours one the slowest machine. here comes the result... you have produced a lot of data and wasted much more cpu time.

in numbers:
600 x 5mb for result files and >800h cpu time (session render time ~1,5h)


a better solution might be some kind of pre-processing sessions on the servers with lower resolution / less details. then you have numbers for each frame to compare and you can easily make decisions about combining frames or split them into more parts.
do you see the difference?!



- bara
ID: 9372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9373 - Posted: 17 Mar 2009, 21:56:26 UTC

I had indeed thought of that, which is why I put that other limit in it:

render as much as you can in 3 hours from frame 1 to frame 30


this means that the server knows that there is a max number of frames being rendered, and can control how many duplicates there are per frame.

If in your instance the short session is rendered like this:

lets say there are 200 frames in this short animation, and 20 clients. Extrapolate this to larger numbers if you please, this only to make a point.

First since there are 20 clients frames are assigned like this:


  • clients 1-5 render frames 1-10
  • clients 6-10 render frames 100-110
  • clients 1-15 render frames 50-60
  • clients 16-20 render frames 150-160



In this way each client is always used, first making notches at set marks, and then chopping those in half, etc. but using different numbers of frames. Here I have shown redundancy per frame as well.


~I may be mis-understanding the question, but I hope I have answered it, Istvan

ID: 9373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 9376 - Posted: 18 Mar 2009, 11:32:25 UTC

hm, time for a statement by janus;)

but i still think that pre-processing is more useful.

pros:

1. you have a session preview (low res / detail)
- maybe it could be used to validate results?! -> downscale res / detail of result frame and compare it with pre-processed frame?!

2. the pre-processing machine can create a render time table:

frame# | pre_time
------------------
frame01 | 10sec
frame02 | 5sec
frame03 | 5sec
frame04 | 30sec
... | ...

to get the real times you have to multiply these times with a fixed value (maybe 1000 - depends on how much you\'ve downscaled the preview). new table would look like this:

frame# | pre_time | real_time
------------------------------
frame01 | 10sec | 10000sec
frame02 | 5sec | 5000sec
frame03 | 5sec | 5000sec
frame04 | 30sec | 30000sec
... | ... |

now the server is able to optimize the wu length.

frame01 -> wu length ok
frame02 -> combine with frame03
frame03 -> combined with frame02
frame04 -> split into 4 parts

when all this is done you can think about distributing your wu\'s.

questions / comments / other ideas?!



- bara
ID: 9376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9377 - Posted: 18 Mar 2009, 12:22:57 UTC - in response to Message 9376.  

I don\'t mean to argue, or be stubborn pushing my point, so please don\'t take this wrong.

It seems that we are suggesting the same thing, except that you say some of the steps should be done locally instead. My reaction is this: BURP is largely community based, doing as little work as realistically possible on other peoples computers rather than on the server. Also I am not sure if the render time increases by a very guessable factor, and from what I\'v seen it is atleast linear (my dominoes render takes 25 mins per frame at 25% and over 3 days at 100%, didn\'t actually wait to see how long). Also this is redundant. You end up with the same frame being rendered multiple times at different resolutions. While your point about making a preview is a wonderful idea I think that there might be a more efficient way. When the first clients send in the steps those could be made into a preview, scaled to any size you want. This means that the same work is done, but not on the server, and it isn\'t redundant. Instead some of the first steps to rendering are done, as well as some more. Also being a numbers type person, this information could make some really cool graphs, and statistics for the animations, showing predicted render times, real render times, and showing example frames (I see a great AJAX project in that... another thing I would love to code).

Again please remember that all I say should be taken with the usual grain of salt, common sense, and in my case some extra salt ;)

~ Istvan
ID: 9377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9383 - Posted: 19 Mar 2009, 15:58:22 UTC
Last modified: 19 Mar 2009, 16:02:15 UTC

I am home sick, so I figured I\'d whip up a simple test script. It can be found here:

http://istvan.us/php/graph.php?frame=33

The page explains it all.

Also, this script works if the points are at any location. They are treated exactly the same if they are on the 100 marks or the 27 mark. This is not a best fit line, though I guess it could be with some more work. I am not sure which would be more accurate, but that is something that may never need to be worried about, certainly not now.
ID: 9383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 9384 - Posted: 19 Mar 2009, 16:39:34 UTC
Last modified: 19 Mar 2009, 16:43:02 UTC

I guess this is really about two things:
1) Making sure that clients get enough work per workunit
2) Making sure that clients do not get too much work (ie. too much more than 3 hours) of work per workunit.

It is true that scheduling changes could help ensure that both targets are met. Also, taking samples or pre-rendering portions of frames could help give an idea about the amount of work that could be put into each workunit.

Personally I think it is more interesting to look at the reasons why the two limits are there in the first place, rather than to try to work around them.

(1) is due primarily to the fact that it is costly to initialize and terminate a session. This startup and shutdown occours both for the session as a whole on the server and for each frame on the clients and in the validator.

Server: This part is pretty much taken care of in BURPv.3. The average cost to start up a session has been dramatically reduced in both time and storage. Validation has also been improved dramatically. Apart from the server->mirror transfers and storage of the final frames there\'s nothing that limits the server from doing quick workunits anymore.
The mirror system is being redesigned to no longer require a startup-phase, so we are down to pure storage. And storage is easy to get these days. Alternatively we could purge results (the raw frames, not the output movie) from quick sessions after some amount of time in order to keep the server clean.

Clients: The current approach in the client is suboptimal for rendering fast workunits. When a workunit starts the client will unpack the renderer, move it into the slot directory provided by BOINC, unpack the session file and then start the renderer. It is feasible to go with a different approach: Only ever unpack the renderer once and move it to the project directory. At any successive request for it just use a virtual link to it in the slot directory. Similarly a session (and associated library archives) will only ever be unpacked once and then linked into the slots. This would facilitate very fast startup/shutdown times since the only thing that needs to be created and cleaned on each successive workunit for a session is the output file.

In other words we will not need to bundle frames, since it will be equally fast to just run two workunits in succession.

(2) The 3 hour soft-limit on rendertime is there to avoid loss of CPU time due to restarting workunits. The one true and only good solution to this problem is checkpointing. Having looked into the Blender renderer it seems likely that someone (preferably the Blender devs) can add checkpointing support to it by storing the contents of every \"bucket\" (a small area 32x32 or 64x64 pixels of size) when it has completed rendering. If a workunit has to restart it only has to regenerate any maps used, load in old buckets and it can then quickly skip back to where it was rendering before being interrupted. This will essentially reduce the cost of restarting a workunit to just a few minutes.

With (1) and (2) out of the way it is no longer as important to establish a good guess on frame complexity or to have a scheduler that works around the problem by creating timed workunits (which by the way is a mess in BOINC). Solving (1) and (2) is the direction that we are going in, and with BURPv.3 (1) is almost done.
ID: 9384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9387 - Posted: 19 Mar 2009, 17:06:25 UTC
Last modified: 19 Mar 2009, 17:06:57 UTC

I am glad you made that post, it is interesting! I assume that what you have said there is/will be implemented in V3?

How about sessions that take 1 minute per frame, or less? I have some renders that are 100,000 frames long, mostly physics simulations, but the render times per frame are low, on the order of one minute per frame. With 1 minute, 4000 clients would produce over 4000 DB queries a second, and if the .blend is sent with each WU that is a lot of transfer to do. Also some of the other features were previews, where you can see a time line of frames that are not sequential. Other applications could include re-using the function for things like baking. If the BURP system expanded to things like baking fluid sims for people, or particles for people where the times vary quite a bit.

One other challenge this would overcome is the setting for \'use my network connection between the hours of XX and XX and not between the hours of XX and XX\' There is the setting to ask the server to provide enough work for so many hours at a time, but these estimates are not very accurate at times, or so I find. But the render time accuracy uses doesn\'t stop there, the ETA on the session page speeds up and slows down, but this system could predict and compensate for that.

Also session optimization. If you see on the graph that at frame 120 the render time spikes until frame 200, then drops back to normal at frame 220 you have the information you need to see what is happening at those times, and if possible optimize them. I have often wanted the ability to do this locally as well. An example would be that session where there were many 100% transparent cubes within each other and it was bogging down the render. If the camera panned to a side and the render times went down sharply, but there was no visible difference in the scene you would know to check it out.

There are also the times where one can not split a frame, namely the VSE. If multiple scenes are in the VSE being switched between, and one of them takes seconds per frame and another 2 hours on a fast computer BURP has a problem - Either the settings are made to fit the shorter render times and when the longer times come around the slower computers will have way too much work for 3 hours. But if you change settings for the faster computers then for the frames where slower computers could handle the work they are un-used. With the knowledge this graphing provides either the shorter frames are assigned to the slower computers, or smaller parts of frames are assigned. At this point we are no longer just talking about the VSE. My dominoes animation was rendered on ORE recently, and the frame times went from 4 minutes at the start (the camera was pulled close to objects) to an hour or so at the end where the camera pulled way out to see the whole set. I had to adjust settings for the longer times, because there is no feature to change the setting throughout an an animation.

Just as I was about to hit \'post reply\' I had another idea: not only could you do the frame splitting more efficiently, but you could do memory much much better. I know that in my animations there have been scenes where the memory was low, but then some particles came into the scene, or new characters, and the memory consumption went up by a significant amount. If two charts, one for parts per frame, and a chart for memory usage were both used the result would be highly specific WUs for each computer.

~the paragraph above has a line of spaces (at least in this window where I am writing it) around 1/3 way through. You have to re-size your window to make it work. kinda cool, Istvan.
ID: 9387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 9389 - Posted: 20 Mar 2009, 8:24:10 UTC
Last modified: 20 Mar 2009, 8:26:59 UTC

As I said, nothing apart from storage would limit us in what we could render. The storage required for rendering 10000 frames in HD is quite immense, though.

Per-frame estimates are terribly expensive. There are people on the ORE project working on instant scenegraph-based estimates for memory consumption. Additionally a session may be sampled in a complexity analysis in the timeframe where an admin has not yet accepted it. Sampling random points from random locations in random frames should give a better approximation than the current static approximation.

And yes, some of these things are already in BURPv.3 (faster scheduler, validator etc.), some of them are planned features (faster clients, complexity analysis, memory estimation), and yet some of them are features that could possibly be made to work given the new framework (checkpointing, session purging).
ID: 9389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Istvan Burbank
Avatar

Send message
Joined: 3 Apr 08
Posts: 312
Credit: 58,920
RAC: 0
Message 9390 - Posted: 20 Mar 2009, 11:21:01 UTC

That will be cool.

I have a question, it takes some time to download a WU. Is the .blend contained in every WU sent? If so is there a way not to do that, for example do what you said with the blender version? If the BOINC client doesn\'t support this you could pretend that each new file in the que was a new version of blender, and similarly when a file left the que a new version. This would mean that only when there were new files would a computer receive them, and hence make it less costly to render large sessions. This is all assuming that the .blend is sent with every WU.

I am interested in learning more about that scenegraph thing you mentioned, Istvan
ID: 9390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 9391 - Posted: 20 Mar 2009, 12:08:38 UTC

your boinc has to download the .blend file only once per session. all other wu\'s from that session are simple server replies like this: \"render frame x; part y of session z\"



- bara
ID: 9391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Server backend and mirrors : A different way to create WUs