Request work unit feature


Advanced search

Message boards : Server backend and mirrors : Request work unit feature

Author Message
Profile mStuff
Send message
Joined: 29 Oct 09
Posts: 26
Credit: 67,721
RAC: 0
Message 13992 - Posted: 24 Jul 2015, 19:50:25 UTC

Sometimes when rendering anything one or more work units can be stuck in a seemingly endless loop and never really get finished -
A 'request work unit' feature would be nice for those that would want a work unit done quickly if it feels stuck.

Right now I would like to try and compile a test animation .mp4 of one of my sessions, but it is stuck at a pretty awkward spot http://burp.renderfarming.net/session_workunits.php?id=2804 (as of writing 166 with the next unfinished frame 308)
If I could request getting that work unit for myself, even uncredited (and the remaining other user can finish it anyways with credit awarded), it would be a nice feature to have.

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4479
Credit: 2,094,806
RAC: 0
Message 13993 - Posted: 24 Jul 2015, 19:55:28 UTC

Noted.
Looks like just a random hiccup, though.

Profile mStuff
Send message
Joined: 29 Oct 09
Posts: 26
Credit: 67,721
RAC: 0
Message 13994 - Posted: 25 Jul 2015, 7:37:38 UTC - in response to Message 13993.

Yeah, the funny thing is, the workunit in question was finished just 5 minutes after I posted the thread, though the 'issue' of it being stuck in switching users for 3 days could easily have been extended if 1) validation marked inconclusive or 2) another client that couldn't finish it.
This is the workunit I was talking about: http://burp.renderfarming.net/workunit.php?wuid=2532334

Please don't see this thread as me complaining, I'm just trying to produce some constructive ideas :)

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4479
Credit: 2,094,806
RAC: 0
Message 13998 - Posted: 25 Jul 2015, 16:36:58 UTC
Last modified: 25 Jul 2015, 16:56:41 UTC

The server keeps track of how reliable all of the machines are. Whenever a workunit instance fails and a new one is generated it is usually sent to a more reliable host.
Towards the end of a session the server even increases redundancy on workunits with "unreliable" hosts if there is free processing power available. This way a few failing hosts normally aren't able to stall a session that is otherwise nearly done.

Profile mStuff
Send message
Joined: 29 Oct 09
Posts: 26
Credit: 67,721
RAC: 0
Message 13999 - Posted: 25 Jul 2015, 16:55:22 UTC
Last modified: 25 Jul 2015, 16:56:09 UTC

That's awesome! I had no idea the backend kept track of this !
I guess I am quite unreliable with all the tasks that I keep aborting if I know I can't finish them all before turning off the PC

Anyways, I turn my PC off at night since I have started to often get BSODs, though I will do a clean reinstall with Win10 in a couple of days. Should hopefully fix the problems.
I hope I can get my reliablility points back :)

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4479
Credit: 2,094,806
RAC: 0
Message 14000 - Posted: 25 Jul 2015, 16:58:23 UTC - in response to Message 13999.
Last modified: 25 Jul 2015, 16:59:22 UTC

It tracks both the number of failures but also the turn-around time - the actual "score" is some weighted number based on all the criteria. It is difficult to score 100% in everything, so it isn't a "reliable" vs "unreliable" classification but rather a full scale of grades in between.

Profile noderaser
Project donor
Avatar
Send message
Joined: 28 Mar 06
Posts: 507
Credit: 1,549,902
RAC: 124
Message 14001 - Posted: 25 Jul 2015, 17:10:50 UTC - in response to Message 14000.

Is the reliability metric visible anywhere? Just curious; I know I have a few "high reliability" hosts, but I'm sure my mobile hosts are pretty "bad".
____________

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4479
Credit: 2,094,806
RAC: 0
Message 14003 - Posted: 25 Jul 2015, 18:54:19 UTC - in response to Message 14001.

If you inspect the host page there's some of the parameters around the bottom of the page.

funkydude
Send message
Joined: 23 Dec 13
Posts: 275
Credit: 2,478,281
RAC: 0
Message 14004 - Posted: 25 Jul 2015, 21:49:06 UTC - in response to Message 13999.

Anyways, I turn my PC off at night since I have started to often get BSODs, though I will do a clean reinstall with Win10 in a couple of days. Should hopefully fix the problems.
I hope I can get my reliablility points back :)


Disable downloading of GPU tasks. It has happened to me also, although it doesn't BSOD, it manages to recover but half my running apps get killed.

It won't happen if you're just rendering CPU tasks.

Profile mStuff
Send message
Joined: 29 Oct 09
Posts: 26
Credit: 67,721
RAC: 0
Message 14006 - Posted: 26 Jul 2015, 15:23:02 UTC - in response to Message 14004.

Thanks, but actually it does also happen with only the CPU running.. something about DPC_WATCHDOG_VIOLATION, which should mean that some of my driver software is outdated.

Win10 in a couple of days - could it bring some issues to BOINC?

funkydude
Send message
Joined: 23 Dec 13
Posts: 275
Credit: 2,478,281
RAC: 0
Message 14008 - Posted: 26 Jul 2015, 15:28:58 UTC - in response to Message 14006.

Thanks, but actually it does also happen with only the CPU running.. something about DPC_WATCHDOG_VIOLATION, which should mean that some of my driver software is outdated.

Win10 in a couple of days - could it bring some issues to BOINC?


From searching:

Several things can cause a DPC Watchdog Violation to be triggered in Windows 8. One of the most common is an SSD drive that has old firmware not supported by Windows 8

Profile mStuff
Send message
Joined: 29 Oct 09
Posts: 26
Credit: 67,721
RAC: 0
Message 14013 - Posted: 26 Jul 2015, 22:50:14 UTC - in response to Message 14008.

Yeah, I did read that too, though I have no idea how to update the firmware on my OCZ SSD, and it has been working without any issues so far (if that is the thing causing trouble)
Otherwise the article says that really every un-updated component could cause this, and I'm not really the one to update all my drivers weekly.. :)
I even rarely get another BSOD (can't remember the error name), that I read from google is specifically caused by an AMD CPU, which I have.


Post to thread

Message boards : Server backend and mirrors : Request work unit feature