3468

Message boards : Comments and discussion : 3468
Message board moderation

To post messages, you must log in.

AuthorMessage
mmonnin

Send message
Joined: 21 Mar 17
Posts: 18
Credit: 508,435
RAC: 2
Message 15482 - Posted: 18 Sep 2018, 9:37:43 UTC

All tasks are 'Completed, waiting for validation' even though all tasks have been completed. :(
ID: 15482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KubeRoot

Send message
Joined: 6 Apr 15
Posts: 36
Credit: 452
RAC: 0
Message 15484 - Posted: 18 Sep 2018, 17:45:19 UTC

I hope the validator hasn't broken with the high resolution and EXR :/

That said, that might indeed be the cause, the scene is very high resolution and further uses a high filesize format, hopefully Janus will figure it out though.
ID: 15484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4556
Credit: 2,097,282
RAC: 0
Message 15485 - Posted: 18 Sep 2018, 20:06:17 UTC
Last modified: 18 Sep 2018, 20:07:42 UTC

Yeah this one is definitely a bit of a mouthful, each part is using almost 12GB of RAM while being checked. Previously we have been running multiple validators simultaneously but in this case that is a really bad idea because the server simply runs out of memory.

It is probably going to be even worse when it has to stitch it all together.

Switching the session to "best effort" mode which means some results will be marked "invalid" now but they will be fixed and credited later.

I'll have another look at this tomorrow evening.
ID: 15485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KubeRoot

Send message
Joined: 6 Apr 15
Posts: 36
Credit: 452
RAC: 0
Message 15486 - Posted: 18 Sep 2018, 20:12:08 UTC

I feel kinda bad with my scene causing so much trouble, especially since at this point it might've been faster to just render locally with the RAM issues :/

At the very least, there's nothing that would've been running at the moment, right?
ID: 15486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 21 Mar 17
Posts: 18
Credit: 508,435
RAC: 2
Message 15487 - Posted: 18 Sep 2018, 22:20:33 UTC

How is only 1 user valid and 3 others invalid with a 5th in progress? Doesn't a valid task have to match someone else to be valid?
https://burp.renderfarming.net/workunit.php?wuid=3298327
https://burp.renderfarming.net/workunit.php?wuid=3298324

While this task has 4 valid users. I can see this one as more than the normal # of tasks were sent out per WU.
https://burp.renderfarming.net/workunit.php?wuid=3298357

Site seems slower than normal ATM.
ID: 15487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KubeRoot

Send message
Joined: 6 Apr 15
Posts: 36
Credit: 452
RAC: 0
Message 15489 - Posted: 19 Sep 2018, 18:41:24 UTC

And now the whole thing is down, at least the website part :/

The validation was finishing too, so I wonder if it happened when it was trying to stitch all the parts together.

I suppose BURP simply wasn't prepared for this kinda stuff? It feels like it should be possible to handle it, if the individual segments didn't have the full resolution, and the validator and stitching used some special file handling to not load it all at once, but rather take data in as it goes, at least in cases where it's possible.

Hopefully nobody wanted to submit a project in this time, since that'd look pretty awful to them :/

And if it's too much of an issue, as I stated in the description, this isn't important, I understand if this simply cannot go through, though I'd be happy if I could at least get the individual segments to try to stitch it together myself.


Whatever happens though, thank you for your work, Janus, I (and presumably everyone here) really appreciate you keeping this thing going, even when crap like this happens.
ID: 15489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4556
Credit: 2,097,282
RAC: 0
Message 15490 - Posted: 19 Sep 2018, 20:32:54 UTC
Last modified: 19 Sep 2018, 20:54:13 UTC

Streaming the images in a tile-based fashion is actually the way we do it normally for this exact reason.

OpenEXR and OpenEXR multilayer support is still a bit experimental since it relies on a 3rd party piece of software to load and manipulate the files rather than the native support we have for PNG. Unfortunately it has to load an entire colour layer at once in order to allow us to inspect it - at least for now - and that just takes up a whole lot of memory.

There's something wrong with how our current code disposes of the colour layers once they have been used, there is some memory that does not get released immediately. It is not a memory leak as such, but rather a question of needing to get rid of the previous layer before loading the next.
The stitcher definitely has the same issue.

Will have to dig a bit deeper.

I feel kinda bad with my scene causing so much trouble


No need to!

This is an interesting test since the validator system is currently being rewritten to better handle Cycles (and similar physical-based renderers with noisy results) and it would be nice to be able to handle large frames too - even when they are EXR.

Fun fact of the day: If you allocate enough RAM then "top", the Linux task manager, will show resource allocation in terrabytes... never seen that before.
ID: 15490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KubeRoot

Send message
Joined: 6 Apr 15
Posts: 36
Credit: 452
RAC: 0
Message 15492 - Posted: 19 Sep 2018, 23:39:17 UTC

The session claims to be on 127/128, but no actual parts/workunits, or whatever I should be calling them, are reported as unfinished when I open the detailed view.

Is this a bug or just a result of things being stopped while you investigate?
ID: 15492 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4556
Credit: 2,097,282
RAC: 0
Message 15493 - Posted: 20 Sep 2018, 17:33:00 UTC - in response to Message 15492.  

Yes, you are perfectly right, stitching of this session has been manually stopped. Normally it happens as part of the last workunit to return to the server but in this case it will take quite some time before we can proceed.
ID: 15493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4556
Credit: 2,097,282
RAC: 0
Message 15503 - Posted: 24 Sep 2018, 18:43:12 UTC

Putting this session on hold until the new validator/stitcher code has been deployed
ID: 15503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JZD

Send message
Joined: 30 Dec 11
Posts: 100
Credit: 3,487,889
RAC: 0
Message 15511 - Posted: 22 Oct 2018, 8:50:37 UTC

What happened? Why is not it still rendered?
ID: 15511 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KubeRoot

Send message
Joined: 6 Apr 15
Posts: 36
Credit: 452
RAC: 0
Message 15512 - Posted: 11 Dec 2018, 11:24:43 UTC - in response to Message 15511.  

You can see in the past comments, but the gist of it is:

The render is in high resolution in OpenEXR, meaning the data takes up a lot of space.
The render was split up into parts, which ended up with each individual part being successful.
However, when the time came to merge all of the parts into a single image, the stitcher simply cannot handle it.
From what I understood, it comes down to using 3rd party software for EXR handling, which, combined with potentially suboptimal code, leads to straight up running out of memory.

Because of all this, Janus put it on hold until new code which can handle it is deployed.

Though I might be way too late in telling you this, but perhaps it'll help somebody who's confused.
ID: 15512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JZD

Send message
Joined: 30 Dec 11
Posts: 100
Credit: 3,487,889
RAC: 0
Message 15513 - Posted: 13 Dec 2018, 18:19:45 UTC - in response to Message 15512.  

Thank you for a brief explanation of the current situation.
ID: 15513 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Comments and discussion : 3468