wu limit

Message boards : Number crunching : wu limit
Message board moderation

To post messages, you must log in.

AuthorMessage
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4482 - Posted: 15 Jan 2007, 11:37:54 UTC

hi at all

i think it would be good to limit the amount of wu\'s for each host (10 or 20). after crunching them you can get the next 10 (or maybe 20 - depending on crunching time for the wu\'s).
my problem is that some users get too much wu\'s which they can\'t deliver back in time. this slows down the end of each rendering project (see session 265).



mfg bara
ID: 4482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4483 - Posted: 15 Jan 2007, 14:54:54 UTC

I can provide patches for the scheduler code to do this.
ID: 4483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 4484 - Posted: 15 Jan 2007, 15:20:59 UTC - in response to Message 4483.  

I can provide patches for the scheduler code to do this.

If your patches are high enough quality then talk to the BOINC people to enable this feature as a standard feature in the BOINC serverside framework.
ID: 4484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4485 - Posted: 15 Jan 2007, 18:01:40 UTC

janus, could you please use this patch in burp for the next (test)session(s)? i think it\'s much better for the project if everybody gets work, not only a few people with a 10days cache!



mfg bara
ID: 4485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4486 - Posted: 15 Jan 2007, 19:08:34 UTC - in response to Message 4484.  

I can provide patches for the scheduler code to do this.

If your patches are high enough quality then talk to the BOINC people to enable this feature as a standard feature in the BOINC serverside framework.

I did, got comments back about how that should be really implemented, modified my code to follow the ideas (and sent it again), and the whole thing was forgotten for a month. Then got this: \"Nicolas\' idea (limiting the number of results queued on a host) is subsumed by an existing and more optimal idea: only send a host results if they will probably be finished by their deadline (taking into account results already on the host).\" I got busy with other stuff so I didn\'t go on trying to make it go on official code. Have a look around on the mailing lists.
ID: 4486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4488 - Posted: 16 Jan 2007, 17:37:09 UTC

janus, can you see, why i want this new feature in burp?

see here: http://burp.boinc.dk/session.php?id=265

status of this session hasn\'t changed in last 24h... hehe, eta: in 1 hour (sure you mean 1 hour, not 1 week?^^)
ID: 4488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4489 - Posted: 16 Jan 2007, 17:45:46 UTC
Last modified: 16 Jan 2007, 17:46:01 UTC

Also I know a way to send more copies of the still undone workunits, when there are few left (ie. right now). Interested?
UPDATE workunit SET target_nresults=[some number larger than original replication], transition_time=UNIX_TIMESTAMP() WHERE server_state=4 AND id>=[first WU from batch];

ID: 4489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4490 - Posted: 16 Jan 2007, 18:11:06 UTC

i think sending more copies of remaining wu\'s isn\'t the right way for this problem. the boinc server can do it too after deadline is reached. noticed that?^^
the best solution will be to send only a few wu\'s to the clients. this will solve 2 problems at the same time. fast session rendering and results were returned immediately!
ID: 4490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4491 - Posted: 16 Jan 2007, 18:12:25 UTC - in response to Message 4490.  

i think sending more copies of remaining wu\'s isn\'t the right way for this problem. the boinc server can do it too after deadline is reached. noticed that?^^

Sure I did notice, but this means not having to wait till deadline. And if original results come back before the new resent ones, they will be accepted.
ID: 4491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baracutio
Project donor

Send message
Joined: 29 Mar 05
Posts: 96
Credit: 174,604
RAC: 0
Message 4492 - Posted: 16 Jan 2007, 23:13:06 UTC

day 6 - status of session 265: running

there are still ~1.8% missing since 2 days! so we have to solve this problem. it can\'t be, that such a \"small\" session needs more than 6 days?! we have over 3000 hosts (maybe 500 active)...
ID: 4492 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4493 - Posted: 16 Jan 2007, 23:25:58 UTC - in response to Message 4492.  

there are still ~1.8% missing since 2 days! so we have to solve this problem. it can\'t be, that such a \"small\" session needs more than 6 days?! we have over 3000 hosts (maybe 500 active)...

But there is no more work to send :( So that 1.8% is stuck on a few hosts. Why not resend it to other hosts, that hopefully will be faster?
ID: 4493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
indigomonkey

Send message
Joined: 31 May 06
Posts: 44
Credit: 18,633
RAC: 0
Message 4494 - Posted: 16 Jan 2007, 23:32:02 UTC

The majority of the WUs I got lasted less than an hour. It just needs an algorithm to find average time taken, multiply that by, say, ten or so (to allow for very slow or powered-off computers, and release new WUs when this limit has been passed.
ID: 4494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 25 Apr 05
Posts: 347
Credit: 4,618
RAC: 0
Message 4495 - Posted: 16 Jan 2007, 23:34:00 UTC - in response to Message 4494.  

The majority of the WUs I got lasted less than an hour. It just needs an algorithm to find average time taken, multiply that by, say, ten or so (to allow for very slow or powered-off computers, and release new WUs when this limit has been passed.

If I understood correctly, something similar exists (if calculated correctly), it\'s called deadline ;)
ID: 4495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 4515 - Posted: 20 Jan 2007, 11:16:14 UTC
Last modified: 20 Jan 2007, 11:43:21 UTC

The fact that session 265 is missing 1.2% is unrelated to any BOINC scheduling issues. The remaining results have been correctly returned but were flag\'ed as incorrect in the validation step. So far there\'s no strategy for incorrect results (ie. they are neither rejected nor accepted, simply logged in order to understand what kind of errors a session can have).
In other words you could say that everything is working as planned so far - but the plan is going to change once a set strategy is made for erronous results.

I triggedered the debug cleanup script that looks at erronous frames and finds appropriate replacement parts for them. (And sorry for being excessively out of spare time at the moment...)

There\'s a feature called \"low-latency project\" on the drawing board for BOINC. This will make the suggested feature obsolete for projects as BURP as any host will always only download what it is expected to be able to crunch right now and return the result right away.

[Edit: output from the cleanup was:]
--- Correcting ---
Frame 247 done. Part 7 done.
Frame 249 done. Part 3 done.
Frame 250 done. Part 21 done.
Session 265 done
------------------
So the issue really only concerned 3 parts (hence the estimate of a few hours)

And even with a couple of days of slack (or 5?) on my part the render times are pretty impressive:
9 days 1 hours 52 min 11 sec (Real time)
(275 days 4 hours 58 min 13 sec CPU time)
Real time is the time it takes from a session is started untill the final result is turned in. CPU time is the combined time it would have taken an average machine to compute the same session (calculated as the combined time for all returned parts).
That\'s quite a speedup factor!
[/edit]
ID: 4515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : wu limit