Sunflower A.1.1.R2

Sunflower A.1.1.R2

Description

The first scene
This is the right eye view of the stereo video.
See Session 1114 for more information.

Message boards : Comments and discussion : 1115

AuthorMessage
Profile DoctorNow
Project donor
Avatar

Send message
Joined: 11 Apr 05
Posts: 403
Credit: 2,189,214
RAC: 7
Message 10723 - Posted: 9 Feb 2011, 14:46:50 UTC
Last modified: 9 Feb 2011, 14:53:49 UTC

Had a problem with this WU.
It went over 100% to 101,35% and did stay there for over 2 hours, with a cpu-usage very small up to 10%.
After the 2,5 hours with no progress I decided to abort it.
Finally a windows-popup showed with "SunFlowerBlender doesn't work anymore".
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 10723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4571
Credit: 2,100,463
RAC: 8
Message 10725 - Posted: 9 Feb 2011, 21:26:40 UTC
Last modified: 9 Feb 2011, 21:28:21 UTC

Weird, I can see that your machine spent around 8 times as much time as the other machines on the same workunit but didn't actually use the time for anything productive. I wonder what went wrong there.

Popup window:
There is a bug currently, where the BOINC portion of Glue will crash when it exits (with a nasty Windows bug report window) if a workunit is cancelled or otherwise forcefully aborted. Exactly why that happens is a good question.

Thanks for the report anyways - I'll be gathering the various issues and have a look at fixing them as we move along with the next sessions.
ID: 10725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Project donor
Avatar

Send message
Joined: 9 Dec 06
Posts: 93
Credit: 2,492,267
RAC: 649
Message 10728 - Posted: 9 Feb 2011, 23:26:17 UTC

Yeah, I got the pop up once too.

Also, I had about 10 tasks from this session error out. The odd thing is that these tasks crunched successfully by all the rest of the wingmen. This is across 5 machines (XP64 and win7 64), so it is not specific to a problem with a single machine.

http://burp.renderfarming.net/result.php?resultid=6793130

02:42:06 (3276): No heartbeat from core client for 30 sec - exiting

---------------------------
Exception caught: Heartbeat detection indicates that BOINC is unresponsive.
Status: -7
---------------------------
terminate called after throwing an instance of 'Exception'

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>

Reno, NV
Team: SETI.USA

ID: 10728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Project donor
Avatar

Send message
Joined: 11 Apr 05
Posts: 403
Credit: 2,189,214
RAC: 7
Message 10729 - Posted: 10 Feb 2011, 5:45:07 UTC - in response to Message 10725.  
Last modified: 10 Feb 2011, 5:46:13 UTC

Weird, I can see that your machine spent around 8 times as much time as the other machines on the same workunit but didn't actually use the time for anything productive. I wonder what went wrong there.

???
I'm not sure if I understand how you mean that.
If I compare the runtimes of my quad with other machines I don't see a big difference except for the variations of the performance, which are obvious.
For example this WU:
First instance rendered in 996,13 seconds (an Intel Core @ 2,8 GHz)
Second instance rendered in 1526,49 seconds (my AMD X4 620 @ 2,6 GHz)
Third instance rendered in 2024,92 seconds (an AMD Phenom 9350e, don't know at what speed)
Looks pretty normal to me...

The only thing I can think of which could slow down my performance is that I'm currently crunching PrimeGrids CW-sieve on my GPU, it makes the system a little bit sluggish. I will turn them off for a while to see how much the WUs are improving.
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 10729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Project donor
Avatar

Send message
Joined: 11 Apr 05
Posts: 403
Credit: 2,189,214
RAC: 7
Message 10730 - Posted: 10 Feb 2011, 7:14:20 UTC
Last modified: 10 Feb 2011, 7:21:31 UTC

Okay, report on that:
No, PG is affecting that very little, the CW-sieve app doesn't use that much core-performance that the WUs would take minutes longer...

I watched the performance of the app a bit with the task-manager.
It seems it doesn't work very effectively all the time, especially at the start and the end phase. In the start phase of some minutes, where the progress bar stays at 0%, there's only one core used. After that in the main phase all cores are rendering fully. And in the last minutes it drops down to one and less core using again...
Besides that, I wonder that the Blender app shows the cpu-usage.
The sunflower-x86_64.exe has 0% usage all the time.
I'm not sure, but shouldn't this app show the usage? I think I remember that it was different on the first session I rendered with that...

Edit:
Checked some of the other WUs again. It seems that SunFlowerBlender runs best and the most effective on Linux 64-systems, this one here did render my WU very fast in almost the third of time!
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 10730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Project donor
Avatar

Send message
Joined: 9 Dec 06
Posts: 93
Credit: 2,492,267
RAC: 649
Message 10731 - Posted: 10 Feb 2011, 14:42:40 UTC

Also, FWIW, it looks like the MT app hovers between 80-90% of CPU utilization, on an 8 core machine XP64. It is probably a bit less, since that machine is also running a GPU app, and a couple of non-cpu intensive apps.
Reno, NV
Team: SETI.USA

ID: 10731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4571
Credit: 2,100,463
RAC: 8
Message 10732 - Posted: 10 Feb 2011, 15:59:44 UTC

Weird, I can see that your machine spent around 8 times as much time as the other machines on the same workunit but didn't actually use the time for anything productive. I wonder what went wrong there.

???
I'm not sure if I understand how you mean that.


The new BOINC clients track both runtime (in real time) and CPU time. The result that you linked indeed confirms what you said about the process being stuck: It used a lot more (real) time than the wingmen yet seemingly failed with almost exactly the same CPU time as what the others had when they finished the units.

Your observations about the app and use of multi-core processing is also correct:
1) The app loads all data into memory (single-threaded)
2) Movement vectors are generated for the previous and next frames (single-threaded)
3) The scene renders (multi-threaded) in multiple layers (causing the progress bar to move from 0 to 100% a couple of times). At the end of each layer the CPU utilisation will slowly drop a bit and then spike back to max when a new layer starts.
4) Blender runs compositing to stitch all the scene layers together (single-threaded)
ID: 10732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4571
Credit: 2,100,463
RAC: 8
Message 10733 - Posted: 10 Feb 2011, 16:25:15 UTC
Last modified: 10 Feb 2011, 16:26:54 UTC

It went over 100% to 101,35% and did stay there for over 2 hours, with a cpu-usage very small up to 10%.


I just had this happen to me as well. It went to 102% and just did nothing at all - it stalled in the middle of a motion blur pass and just stood there until the timer killed it (which of course brings up the nasty Windows bug window...).
Since there is nothing in the debug that suggests an issue I'm afraid this one will be hard to nail.
ID: 10733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DoctorNow
Project donor
Avatar

Send message
Joined: 11 Apr 05
Posts: 403
Credit: 2,189,214
RAC: 7
Message 10734 - Posted: 10 Feb 2011, 16:32:36 UTC - in response to Message 10732.  
Last modified: 10 Feb 2011, 16:33:50 UTC

The new BOINC clients track both runtime (in real time) and CPU time. The result that you linked indeed confirms what you said about the process being stuck: It used a lot more (real) time than the wingmen yet seemingly failed with almost exactly the same CPU time as what the others had when they finished the units.

*slaps forehead* Ohh, now I see, it looks like it was too early this morning and I had no coffee yet, so I interpreted your post a bit wrong (and probably because English is not my native language)... ;-D

1) The app loads all data into memory (single-threaded)
2) Movement vectors are generated for the previous and next frames (single-threaded)
...
4) Blender runs compositing to stitch all the scene layers together (single-threaded)

Hm, these single-threaded steps disturb me somehow.
As it looks like the other cores are not given free during this steps and don't do other work it's somehow wasted cpu-time. This is bad, as with more WUs more time adds...
But why is it then that the linux version seems to be so fast, are the steps different there?
Just found another result here, the topmost comp did the WU in almost the third of time, although the same cores and not that much faster than my quad (the third one in the list).
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
My BOINC-Stats
ID: 10734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4571
Credit: 2,100,463
RAC: 8
Message 10736 - Posted: 10 Feb 2011, 18:07:30 UTC - in response to Message 10734.  
Last modified: 10 Feb 2011, 18:09:27 UTC


Hm, these single-threaded steps disturb me somehow.
As it looks like the other cores are not given free during this steps and don't do other work it's somehow wasted cpu-time. This is bad, as with more WUs more time adds...

Don't worry too much about it - as we move along more and more parts will be probably be made multithreaded. For Sunflower we will be using Blender 2.49b while the rest of the farm will move to Blender 2.6 when it is released at some point. As far as I know there has been plenty of improvements on the rendering engine since 2.49b!

But why is it then that the linux version seems to be so fast, are the steps different there?

No the steps are exactly the same, but it is a different compiler, different platform, different I/O strategies. It is still too early to say if it is faster than Windows in general.

A new Windows client should be available today with slightly improved performance and some additional debugging tools to try to figure out what goes wrong with it.
ID: 10736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Post to thread

Message boards : Comments and discussion : 1115