Server needs a kick!


Advanced search

Message boards : Number crunching : Server needs a kick!

Author Message
Senilix
Send message
Joined: 15 Oct 14
Posts: 11
Credit: 822,856
RAC: 0
Message 14616 - Posted: 15 Jul 2016, 9:28:45 UTC

Just to let you know: since a few hours no new work units were generated and no work units were validated. So i guess the server needs a good kick...

Woodles
Send message
Joined: 28 Jan 16
Posts: 5
Credit: 2,672,807
RAC: 4,185
Message 14617 - Posted: 15 Jul 2016, 9:58:41 UTC

I'd agree:

15/07/2016 10:57:17 | BURP | Requesting new tasks for CPU
15/07/2016 10:57:18 | BURP | Scheduler request completed: got 0 new tasks
15/07/2016 10:57:18 | BURP | Project has no tasks available

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4483
Credit: 2,094,806
RAC: 0
Message 14618 - Posted: 15 Jul 2016, 15:06:27 UTC

Yup, looking into it

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4483
Credit: 2,094,806
RAC: 0
Message 14619 - Posted: 15 Jul 2016, 15:57:01 UTC
Last modified: 15 Jul 2016, 16:25:34 UTC

It is back on track - it went down due to a parse error in a data exchange between BOINC and BURP backends. It was designed to fail in this situation to make sure a human being looked at the issue before things moved on. Luckily it was nothing bad.

111aaa
Send message
Joined: 18 May 16
Posts: 8
Credit: 1,137,690
RAC: 0
Message 14622 - Posted: 16 Jul 2016, 2:03:22 UTC

Out again?

Woodles
Send message
Joined: 28 Jan 16
Posts: 5
Credit: 2,672,807
RAC: 4,185
Message 14623 - Posted: 16 Jul 2016, 9:06:37 UTC - in response to Message 14622.

Out again?

Yep :(

Profile Janus
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 16 Jun 04
Posts: 4483
Credit: 2,094,806
RAC: 0
Message 14624 - Posted: 16 Jul 2016, 12:07:39 UTC

Same issue with data in the shared memory segment between BOINC and BURP only being partially filled out, causing the BURP queue and scheduler manager to go into panic-mode after retrying 10 times because the numbers don't add up. This stops much of the process revolving around workunit creation and workunit handling in general.

The BOINC scheduler keeps a list of workunits that are ready to send in a memory segment on the server. BURP monitors this segment and makes sure that there is a good balance of different workunits so that high-mem WUs don't block low-mem WUs etc.
This list is of a certain maximum size - currently around 300-450 slots - more than enough to hold a small cache of all the different kinds of workunits that clients can ask for.
When a client gets sent a WU it is removed from the memory and the memory slot is temporarily empty until a new WU can be prepared and loaded into the slot.

What has happened twice now is that a safety measure in the part of BURP that monitors this memory segment and calculates the number of occupied slots plus the number of empty slots has come to the conclusion that the total number of those two kinds of slots combined does not add up to the total size of the list anymore.
It turns out there is a third kind of slot, the partially filled or partially empty slot.
I'll have to dig into why we see these partial entries in memory more often than usual right now - and where they come from. They don't really seem to do anything bad, though, so this is mostly just erring on the side of caution.


Post to thread

Message boards : Number crunching : Server needs a kick!