when can I get more than 2 work units per machine?

Message boards : Number crunching : when can I get more than 2 work units per machine?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 7949 - Posted: 25 Mar 2008, 17:51:27 UTC

Ho hum

my 16-core server says...
ID: 7949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 7950 - Posted: 25 Mar 2008, 18:00:04 UTC - in response to Message 7949.  

Ho hum

my 16-core server says...

Hm... you should be getting 2 per core rather than 2 per machine. Is this not the case?
ID: 7950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 7952 - Posted: 25 Mar 2008, 18:14:28 UTC - in response to Message 7950.  

Ho hum

my 16-core server says...

Hm... you should be getting 2 per core rather than 2 per machine. Is this not the case?



oh wait, you\'re right.. but then again, I\'m only seeing 16 tasks (instead of 32)

shouldn\'t I see 16 pending tasks?
ID: 7952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 7953 - Posted: 25 Mar 2008, 18:25:21 UTC - in response to Message 7952.  
Last modified: 25 Mar 2008, 18:26:16 UTC

shouldn\'t I see 16 pending tasks?

I guess that depends on the settings. On my machines (with a low connect-every-x setting) the client will fetch the next workunit just before the end of the previous one. This is in fact the optimal situation for a project like BURP where higher priority sessions may suddenly pop up and we don\'t want people to queue up too many workunits.
ID: 7953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 7954 - Posted: 25 Mar 2008, 18:31:28 UTC - in response to Message 7953.  

shouldn\'t I see 16 pending tasks?

I guess that depends on the settings. On my machines (with a low connect-every-x setting) the client will fetch the next workunit just before the end of the previous one. This is in fact the optimal situation for a project like BURP where higher priority sessions may suddenly pop up and we don\'t want people to queue up too many workunits.


hmm - the error I\'m seeing in the logs is
Message from server: (reached per-host limit of 2 tasks)

but I\'m seeing 1 task per CPU - which is more than 2 per host, but only 1 per CPU, which is 8 or 16 in some systems


ID: 7954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 7955 - Posted: 25 Mar 2008, 19:22:21 UTC - in response to Message 7954.  
Last modified: 25 Mar 2008, 19:26:02 UTC

but I\'m seeing 1 task per CPU - which is more than 2 per host, but only 1 per CPU, which is 8 or 16 in some systems

You are right, there\'s an off-by-one error in the scheduler code. I\'ve fixed the configuration file to work around it until the next upgrade of the scheduler. The change will take effect next monday at latest.
I\'ll also have to take a look at that error message... it is confusing as it is right now.
ID: 7955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 7956 - Posted: 25 Mar 2008, 20:42:25 UTC - in response to Message 7955.  

but I\'m seeing 1 task per CPU - which is more than 2 per host, but only 1 per CPU, which is 8 or 16 in some systems

You are right, there\'s an off-by-one error in the scheduler code. I\'ve fixed the configuration file to work around it until the next upgrade of the scheduler. The change will take effect next monday at latest.
I\'ll also have to take a look at that error message... it is confusing as it is right now.


woohoo!! more work! :) LOL
ID: 7956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 7957 - Posted: 25 Mar 2008, 21:25:04 UTC - in response to Message 7956.  

FYI - I checked my MP server and it\'s saying

reached per-host limit of 3 tasks

maybe the code is looking at sockets versus cores?

DP = 2 CPU sockets (in this case 4 cores per socket) = 8 cores
MP = 4 CPU sockets (in this case 4 cores per socket) = 16 cores


ID: 7957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
loki

Send message
Joined: 7 Feb 07
Posts: 97
Credit: 158,266
RAC: 0
Message 7959 - Posted: 25 Mar 2008, 23:09:19 UTC
Last modified: 25 Mar 2008, 23:23:20 UTC

Just recently, I\'m now seeing 2 running tasks and 4 ready to start tasks on my dual-core system.

This appears to be about when it switched in my local time (Eastern -4):
3/25/2008 3:22:08 PM|BURP|Message from server: (reached per-host limit of 3 tasks)

ID: 7959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 7961 - Posted: 26 Mar 2008, 8:51:08 UTC

Hm weird, I\'ll look into it this weekend.
ID: 7961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Reiss
Avatar

Send message
Joined: 7 Aug 06
Posts: 21
Credit: 15,526
RAC: 0
Message 7986 - Posted: 29 Mar 2008, 16:32:49 UTC

Hi all: I am now getting only one wu at a time per cpu or core - where I was getting 2 or 3 before!

Mark Reiss

ID: 7986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 7990 - Posted: 30 Mar 2008, 8:26:12 UTC

I\'ve been looking into this issue over the weekend and the error message should now have been fixed to match what actually happens.

Please note that there are many ways that work can be limited for a host - some are based on the \"connect every X\" setting, some are based on other things like the maximum number of workunits per core (which is a server configuration).

The maximal number of workunits per core is now 2.

Once your client attempts to download more work when it already has 2 unfinished tasks per core it will receive the following message:
(reached per-CPU limit of 2 tasks)

Note that finished but not-yet-uploaded-and-accepted tasks count as \"in progress\" too. This means you may not always have a spare workunit sitting around per CPU but rather have a few extra in total.
ID: 7990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
loki

Send message
Joined: 7 Feb 07
Posts: 97
Credit: 158,266
RAC: 0
Message 7992 - Posted: 30 Mar 2008, 19:00:15 UTC

Looks like it\'s working as expected again.
ID: 7992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile vaughan

Send message
Joined: 12 Mar 05
Posts: 13
Credit: 2,825,598
RAC: 0
Message 8086 - Posted: 9 Apr 2008, 9:46:06 UTC

2 tasks per core seems to be too low as the tasks don\'t take long to crunch. Are there any plans to increase this figure in the future. The issue regarding hoarding of cached units is nullified by the tasks having a short deadline.

ID: 8086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
noderaser
Project donor
Avatar

Send message
Joined: 28 Mar 06
Posts: 516
Credit: 1,567,702
RAC: 0
Message 8097 - Posted: 10 Apr 2008, 4:18:29 UTC

The amount of time it takes for a workunit to complete can vary widely from session to session, and even frame to frame. The 2 units per core limit was put in place, because a computer would go through a bunch of short workunits and ask for a large queue, and then end up downloading a bunch of long workunits that would go way past the deadline.
Click here to see My Detailed BOINC Stats
ID: 8097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile xPOD
Avatar

Send message
Joined: 1 May 07
Posts: 56
Credit: 56,023,852
RAC: 44
Message 8106 - Posted: 10 Apr 2008, 18:14:01 UTC - in response to Message 8097.  

The amount of time it takes for a workunit to complete can vary widely from session to session, and even frame to frame. The 2 units per core limit was put in place, because a computer would go through a bunch of short workunits and ask for a large queue, and then end up downloading a bunch of long workunits that would go way past the deadline.


agreed - the last session was taking me about 24 minutes per CPU, but some other projects are about 7 hours per task...

I\'d like to see more than 2 per CPU myself for my multi-core systems - maybe an algorithm if CPU count > 2 or 4 or 6 or 8 or 16 or (gulp - 24!) then add more cycles
ID: 8106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
noderaser
Project donor
Avatar

Send message
Joined: 28 Mar 06
Posts: 516
Credit: 1,567,702
RAC: 0
Message 8114 - Posted: 11 Apr 2008, 6:23:00 UTC

I guess I don\'t see the problem with having two WUs per CPU; you should always have one running, plus one being uploaded/downloaded/queued. As long as you have internet connectivity you should have BURP work, as long as your preferences don\'t give it a lower priority than other projects.
Click here to see My Detailed BOINC Stats
ID: 8114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Achim

Send message
Joined: 17 May 05
Posts: 183
Credit: 2,642,713
RAC: 0
Message 8156 - Posted: 16 Apr 2008, 17:40:20 UTC

It depends how long the WU crunchs.

let assume you crunch 1 WU, and a second is waiting.
Now you have set the client have hve work for X days, while x is more than he currently assumes for both WU\'s.
What happens is, the client request some more seconds of work.
Now burp say, error you already got 2 per CPU.
Now the client thins error: OK fine, lets wait some time, and try again.
After that time, the situation is unchanged, so burp server responds again with an error.
Now the client wait some more time. and starts again.
Unfortunaltly this wait time increses very fast.
Basically it happens then WU 1 is finished, the wait time is longer then the crunsching time of the second WU.
In case this happens the client get into a situation where both WU\'s are finished, but the client is still waiting to connect to the server due to the number of error results.

The good side of this, is the more CPU\'s the lower the problem (because is is more likely one WU\'s finishes before the next retry, and so the time till next contact starts again from 0.
ID: 8156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
noderaser
Project donor
Avatar

Send message
Joined: 28 Mar 06
Posts: 516
Credit: 1,567,702
RAC: 0
Message 8172 - Posted: 17 Apr 2008, 6:00:34 UTC

Sounds like there needs to be a maximum scheduler delay.
Click here to see My Detailed BOINC Stats
ID: 8172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 8174 - Posted: 17 Apr 2008, 7:20:00 UTC - in response to Message 8172.  
Last modified: 17 Apr 2008, 7:21:41 UTC

Sounds like there needs to be a maximum scheduler delay.

Wouldn\'t be a bad idea. I added it to the configuration file (it isn\'t a new feature, so it\'s ok to \"add\" it to the alpha project too).
For now it is set to 1 hour to avoid swamping the server. How low it should be will be determined during the next few tests.
ID: 8174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : when can I get more than 2 work units per machine?