Message boards : Number crunching : Tasks are not MT
Message board moderation

AuthorMessage
mmonnin

Joined: 21 Mar 17
Posts: 20
Credit: 512,196
RAC: 31
Message 15428 - Posted: 30 Jul 2018, 0:13:58 UTC

The tasks say they are MT up to 8 cores but are only using 1 single thread. CPU time = Run time. Windows and Linux. 4c and 8c. 1 task up to 4 tasks on 32t system.

This was fine like a week ago.
ID: 15428 · Rating: 0 · rate: /
DoctorNow
Project donor

Joined: 11 Apr 05
Posts: 403
Credit: 2,183,062
RAC: 0
Message 15430 - Posted: 30 Jul 2018, 9:43:36 UTC

Probably depends on the settings of the session.
I had some from 3449 over the night which did run fine with MT, but 3450 seems to run slow only on one core.
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 15430 · Rating: 0 · rate: /
Janus
Volunteer moderator

Joined: 16 Jun 04
Posts: 4563
Credit: 2,097,282
RAC: 0
Message 15431 - Posted: 30 Jul 2018, 17:58:04 UTC

Please keep an eye on this over the next few sessions. Recently the plan classes (that contain data about how BOINC should schedule work on clients) and the code relating to them were updated to include a new OpenCL plan-class that we are testing out these days (unfortunately the OpenCL client seems to be broken...). During the change the MT plan class was also upgraded from a max of 64 threads to a max of 128 threads. It is possible that something weird is going on after this change.

Keep in mind that the first few minutes and the last part of WU will almost always be either single-threaded or very few cores as Blender is reading the scene file and performing final postprocessing respectively.

Also keep a keen eye on the type of WU, there has been a few GPU test units recently - they use just one CPU thread but 100% of a GPU, which is similar to what you describe but they do say that they are GPU tasks and not MT like yours.

 I can confirm that the WUs in question unintentionally contain an instruction that limits the number of active threads to 1. Looking into why this is happening.
ID: 15431 · Rating: 0 · rate: /
Janus
Volunteer moderator

Joined: 16 Jun 04
Posts: 4563
Credit: 2,097,282
RAC: 0
Message 15435 - Posted: 30 Jul 2018, 20:37:55 UTC

I've been trying to narrow down this issue further, it seems BOINC sometimes simply doesn't send the multithreading parameter to Glue3 and in that case Glue3 will default to 1 thread. What is weird is that this can happen differently even for two instances of the same WU like in this case where one host rendered with multithreading and the other didn't. The very top of the debug output shows the parsed parameters and in one case --nthreads is simply missing.

Still looking into this.
ID: 15435 · Rating: 0 · rate: /
mmonnin

Joined: 21 Mar 17
Posts: 20
Credit: 512,196
RAC: 31
Message 15436 - Posted: 30 Jul 2018, 20:54:07 UTC

ID: 15436 · Rating: 0 · rate: /
glennpat

Joined: 23 Sep 10
Posts: 8
Credit: 844,434
RAC: 8
Message 15447 - Posted: 13 Aug 2018, 17:02:09 UTC - in response to Message 15428.

New batch running right now is still running as 1 thread even though Boinc says it has 16 threads.
ID: 15447 · Rating: 0 · rate: /
Janus
Volunteer moderator

Joined: 16 Jun 04
Posts: 4563
Credit: 2,097,282
RAC: 0
Message 15449 - Posted: 13 Aug 2018, 17:52:47 UTC

Really having a hard time nailing this issue but was finally able to reproduce it locally today.

BOINC clearly states that the WU is multithreaded but does not pass the number of threads that the WU should be started with (this differs from client to client depending on settings) to Glue3. Glue3 defaults to 1 thread and passes 1 thread to Blender.
ID: 15449 · Rating: 0 · rate: /
Janus
Volunteer moderator

Joined: 16 Jun 04
Posts: 4563
Credit: 2,097,282
RAC: 0
Message 15452 - Posted: 13 Aug 2018, 18:16:10 UTC

Looks like the new plan class may have been missing a flag specifically required to instruct BOINC to pass the number of threads. Really weird that it has been working somewhat for some people and not others.

I've tried cancelling the WU I had that was stuck at 1. The new one I got from the server correctly launched as a proper multithreaded unit. It may be fixed, maybe not, since it seemed a bit random. How does it look at your end?
I don't know if all downloaded WUs are stuck, so I just cancelled everything on my client to get a fresh WU.

You can check very quickly if it launched correctly or not by opening the file "stderr.txt" in the slot directory. If it has "--nthreads detected with value" (and a number) among the first 20 lines or so then it works properly, otherwise it is stuck at 1-thread.
ID: 15452 · Rating: 0 · rate: /
glennpat

Joined: 23 Sep 10
Posts: 8
Credit: 844,434
RAC: 8
Message 15457 - Posted: 13 Aug 2018, 20:30:51 UTC

It is working fine now. Thanks for the fast work.
ID: 15457 · Rating: 0 · rate: /
mmonnin

Joined: 21 Mar 17
Posts: 20
Credit: 512,196
RAC: 31
Message 15459 - Posted: 13 Aug 2018, 22:01:47 UTC

Most tasks are now mt for me. I have one task that is still single threaded. Maybe it didn't get canceled? The next task on the same PC started up was mt.
ID: 15459 · Rating: 0 · rate: /
marsinph

Joined: 13 Apr 18
Posts: 13
Credit: 34,997
RAC: 0
Message 15460 - Posted: 14 Aug 2018, 3:43:10 UTC - in response to Message 15452.

Looks like the new plan class may have been missing a flag specifically required to instruct BOINC to pass the number of threads. Really weird that it has been working somewhat for some people and not others.

I've tried cancelling the WU I had that was stuck at 1. The new one I got from the server correctly launched as a proper multithreaded unit. It may be fixed, maybe not, since it seemed a bit random. How does it look at your end?
I don't know if all downloaded WUs are stuck, so I just cancelled everything on my client to get a fresh WU.

You can check very quickly if it launched correctly or not by opening the file "stderr.txt" in the slot directory. If it has "--nthreads detected with value" (and a number) among the first 20 lines or so then it works properly, otherwise it is stuck at 1-thread.

Hello,
Here below the stderr. It started to run 40 minuts ago.
Since then nothing change in tlot directory !
But it use all cores (8) and seem to be stuck. No any change in remaining percentage (0.099%)

boinc_init_diagnostics() completed
boinc_init_options() completed
boinc_get_init_data() completed
CPU performance profile completed: 3191100673.361846 fpops, 9231904991.951246 iops reported. p_c is 1532798069.898722
Checking if GPU should be enabled...
No, using CPU
Mapping logical files to physical destinations:
in => in
out.zip => ../../projects/burp.renderfarming.net/ses0000003458frm0000000569prt00001_0_r1207534512_0
./windows_zip.exe => ./windows_zip.exe
./windows_unzip.exe => ./windows_unzip.exe
Project Directory Base => C:\ProgramData\BOINC/projects/burp.renderfarming.net
Unpacking archives:
blender_5.13_windows_x86_64__mt.zip => blender_5.13_windows_x86_64__mt.zip
./windows_unzip.exe -o -d "." blender_5.13_windows_x86_64__mt.zip...done
Creating worker...
Worker constructing...
Worker constructed.
$Id: glue.cpp 1827 2014-08-02 13:28:01Z jbk$
$Id: BOINCHandler.cpp 1824 2014-07-29 12:27:55Z jbk$
$Id: Controller.cpp 1824 2014-07-29 12:27:55Z jbk$
$Id: ProgressMonitor.cpp 1278 2011-01-23 09:22:45Z jbk$
Executing blender.exe -noaudio --factory-startup -y -b in -P clirender.py -- -F PNG -T 16 -t 8 -f 569 0.0 0.0 1.0 1.0
po_r aft0xc1b430
po_r aft0xb4
Created pipes
Child created.
WorkerW orker tthrhreaed ad monsitor utp.
arted
|found bundled python: C:\ProgramData\BOINC\slots\9\2.79\python

|('Observer constructed',)

|('Python Main',)

Application reports 'Booted'

|("Preparing disk cache based on 'C:\\ProgramData\\BOINC\\slots\\9' basedir",)

|('Preparing scenes',)

|('Autodetected rendering engine: CYCLES',)

|('CPU rendering',)

|('Using cycles samples:', 1000)

|('Estimating render properties',)

|('Renderer: ', 'CYCLES')

|('Samples: ', 1000)

|('Total work: ', 8100189.0)

|('Scene parsing done',)

|('Cleaning old files',)

|('No need to delete out',)

|('No need to delete out.png',)

|('Launching Cycles Render',)

|Dependency cycle detected:

| rig depends on TaliUncle.LowRes_proxy through Proxy.

| TaliUncle.LowRes_proxy depends on rig through Driver.

|Dependency cycle detected:

| rig depends on Son.lowRes_proxy through Proxy.

| Son.lowRes_proxy depends on rig through Driver.

|Dependency cycle detected:

| body_variants depends on root through Parent Relation.

| root depends on body_variants through Pose Constraint.

|Dependency cycle detected:

| body_variants depends on root through Parent Relation.

| root depends on body_variants through Pose Constraint.
ID: 15460 · Rating: 0 · rate: /

Message boards : Number crunching : Tasks are not MT