Blender crash caused by multiple instances?


Advanced search

Message boards : Problems and Help : Blender crash caused by multiple instances?

Author Message
LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11647 - Posted: 19 Mar 2013, 6:19:09 UTC

The single threaded blender application has been crashing with the following error whenever there is more than one instance running on my multi processor machine.

Problem Event Name: APPCRASH
Application Name: blender_4.79_windows_x86_64.exe
Application Version: 0.0.0.0
Application Timestamp: 4d5423fe
Fault Module Name: blender_4.79_windows_x86_64.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 4d5423fe
Exception Code: 40000015
Exception Offset: 0000000000044338
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 3081
Additional Information 1: b34d
Additional Information 2: b34d2589e137ea366d62fffa8cb3ee21
Additional Information 3: c5e2
Additional Information 4: c5e24e3c1f1c2fb80a016de70f40a61d

The BURP client does recover from the error and continue processing the workunit.
The error occurs every few minutes until I suspend all but one BURP workunit.

To avoid the error I set "No New Tasks" for the BURP project, suspend all workunits then resume each work unit one at a time.
When all BURP workunits have completed I set "Allow New Tasks" for the BURP project.

This error does not occur for sunflower workunits.
____________

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4487
Credit: 2,094,806
RAC: 0
Message 11648 - Posted: 19 Mar 2013, 20:40:57 UTC - in response to Message 11647.

How many is it trying to run? Some of the current workunits are huge - like 12GB of RAM or so.

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11674 - Posted: 26 Mar 2013, 7:17:08 UTC - in response to Message 11648.

Hi Janus,

The scheduler has been trying to run as many as six workunits (CPU limit) at a time although the error has been occurring with as few as two.
Recent sessions that delivered several workunits at once were 1489 and 1566.

I have 32GB of ram and my preferences restrict the memory usage to 50% while active and 90% while idle. Applications are not left in memory while suspended.

I see now that both 1489 and 1566 have large memory requirements and running several workunits simultaneously would exceed either the physical memory of the PC or the memory restriction set by the preferences.

Should the scheduler be able to suspend a workunit or project when the memory limit is exceeded? Does Blender fail before the scheduler is aware of the excess memory usage?

The issue occurs when the scheduler requests a quota of work and receives several workunits. The workunits are marked as high priority and all have very long ETA values. The scheduler will start as many BURP workunits as possible in an attempt to finish the jobs on time.

I am currently using Boinc client version 7.0.28



Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4487
Credit: 2,094,806
RAC: 0
Message 11675 - Posted: 26 Mar 2013, 8:56:31 UTC - in response to Message 11674.

Well if BOINC starts up 6x12GB workunits on a 32GB machine then the client-side scheduler is clearly flawed somehow. The memory requirements are stated in the workunits up front.

Another thing: It is very beneficial to leave workunits in memory if suspended, they will restart from 0% otherwise. Even for other projects you can save quite a few % by simply suspending them in memory (Windows/Linux will just slowly move them to swap so it doesn't really matter performance wise).

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11678 - Posted: 27 Mar 2013, 6:51:31 UTC - in response to Message 11675.
Last modified: 27 Mar 2013, 7:09:22 UTC

I have updated to BOINC client 7.0.58.
The new client seems to handle the situation slightly better than the old one.

I have 4 BURP units still in the queue. I have allowed all to run and the scheduler started all four at once.
The memory usge slowly increased and when the physical memory was exhausted blender failed with an APPCRASH error.

The scheduler temporarily suspended the task and removed the high priority status.
The crashed work unit is now queued with status "Waiting to run"

EDIT:
After changing the local preferences I believe the client may be responsible for causing the APPCRASH error.
I reduced the memory restriction to 10% of total memory all three of the remaining Blender applications crashed.
The scheduler restarted one of the work units as high priority and queued the others with status "Waiting to run.
When I increased the memory limit back to 50% the queue status changed to "Ready to Start" and the units were re-started soon after.

It seems that the APPCRASH is triggered when the scheduler attempts to suspend the Blender process.
It also seems that the BOINC client is using the instantaneous memory usage for its scheduling rather than the pre-determined maximum memory requirement.

Manually suspending a work unit does not cause an error.
____________

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11894 - Posted: 18 Jun 2013, 10:19:20 UTC - in response to Message 11678.

I am now on the latest Boinc Client Beta 7.1.10 and large work units are still being run simultaneously.
Session 1602 does not recover if it crashes.

Is it possible to add a maximum number of processors setting to the BURP settings for each application some time in the future?

Can anyone suggest an app_info.xml that would achieve the same result for the current applications?

Thanks
____________

Profile DoctorNow
Project donor
Avatar
Send message
Joined: 11 Apr 05
Posts: 392
Credit: 2,168,338
RAC: 3
Message 11895 - Posted: 18 Jun 2013, 14:37:18 UTC - in response to Message 11894.
Last modified: 18 Jun 2013, 14:39:28 UTC

Can anyone suggest an app_info.xml that would achieve the same result for the current applications?

I don't know if that is possible for users themselves but it is possible that a WU-limit at the serverside is determined. Some projects are using such a feature.
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11954 - Posted: 29 Jul 2013, 10:27:22 UTC - in response to Message 11895.

I have had mostly Sunflower units recently so it has been less of a problem.

Revisiting this conversation I realised that it was actually the app_config.xml file that I was referring to.

See: http://boinc.berkeley.edu/wiki/Client_configuration
and: http://boinc.berkeley.edu/trac/wiki/ClientAppConfig

I am willing to try this but I would like confirmation of the "short application name" for the Blender and SunflowerBlender(mt) applications.

eg:
<app_config>
<app>
<name>Blender</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>

I assume that setting max_concurrent = 1 for sunflower would still use multiple processors as each instance of the app is multi threaded.

Limiting the single threaded application to one instance should at least stop blender from exhausting the system memory when multiple instances are run simultaneously.

Let me know what you think.

Regards,
LumenDan

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 11982 - Posted: 25 Aug 2013, 1:19:13 UTC - in response to Message 11954.

The app_config.xml file is working to restrict the number of Blender instances

In the contents of the file are as follows:
<app_config>
<app>
<name>blender</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>

The name is case sensitive.
The file is placed in the BURP data directory

As new BOINC client versions are released I will experiment with increasing the number of instances again.

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4487
Credit: 2,094,806
RAC: 0
Message 11988 - Posted: 31 Aug 2013, 8:42:53 UTC - in response to Message 11982.

Thanks for sharing the app_config.xml - others may find it very useful too

LumenDan
Project donor
Send message
Joined: 27 May 05
Posts: 57
Credit: 1,149,938
RAC: 364
Message 12017 - Posted: 18 Sep 2013, 13:00:47 UTC - in response to Message 11988.

Being able to restrict the number of instances has helped BURP fit into my computing schedule better.
I do run other projects and I do power down my computer 5 days of the week.
I usually sleep the computer when BURP has unfinished units and suspend work fetch once or twice a week to allow a clean restart before the longer run days.
The combination of fewer simultaneous tasks and recent improvements in work unit time estimates has helped me run BURP with less interaction.

The result has been better utilisation of my computing resources and a more natural and flow of BURP work units at this stage of the project development.

I am currently running 2 instances and have only had one unit fail since (Session 1602)

Back on the APPCRASH issue that sparked this thread, could it be related to Bug #41?
Does the concept of suspending a process group apply to Windows versions?

It seems possible that the APPCRASH could be a side effect when only one of several processes is suspended by the client when memory limits are exceeded. [/url]
____________

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4487
Credit: 2,094,806
RAC: 0
Message 12022 - Posted: 21 Sep 2013, 19:47:23 UTC

Yes, there's definately something odd going on when BOINC asks Glue to abort a workunit. Very often it results in a crash.


Post to thread

Message boards : Problems and Help : Blender crash caused by multiple instances?