request to cancel 310 and resubmitt job

Message boards : Number crunching : request to cancel 310 and resubmitt job
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile mpan3

Send message
Joined: 24 Apr 06
Posts: 64
Credit: 29,899
RAC: 0
Message 4826 - Posted: 4 Mar 2007, 18:31:55 UTC

i noticed 310 is rendered at octree resolution of 64, and it will render MUCH faster (10times) if set to 512. and the mem req. is set to 1024MB although each rendering process only takes up 50MB, at the current rate things are going, this job will tie up all the high end system for 100 days...
Contribution to BURP total: 0.5% (manually updated)
ID: 4826 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4827 - Posted: 4 Mar 2007, 18:35:18 UTC
Last modified: 4 Mar 2007, 18:52:03 UTC

Like I said, my friend is responsible for this work. I\'m only the uploader and the FAQ how you should set eg. the needed RAM or set the settings is relly incomplete.

€dit:
It\'s true... It renders really faster. I have tested it. Should I upload the new version and you delede the old one?
BURP4President
ID: 4827 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter0816

Send message
Joined: 8 Jun 06
Posts: 6
Credit: 194,809
RAC: 0
Message 4829 - Posted: 4 Mar 2007, 20:35:18 UTC

I have 1 GB of RAM but boinc shows the message that I only have 1023.5 MB and so no work send. I think if you set min. RAM to 1000 MB it will run too,i think windows has a \"swap\"-file like linux.
ID: 4829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy

Send message
Joined: 25 May 06
Posts: 208
Credit: 676,104
RAC: 0
Message 4830 - Posted: 5 Mar 2007, 0:27:11 UTC - in response to Message 4829.  

I have 1 GB of RAM but boinc shows the message that I only have 1023.5 MB and so no work send. I think if you set min. RAM to 1000 MB it will run too,i think windows has a \"swap\"-file like linux.


I have 1 gig of ram and this is the message that i\'m getting
\'3/5/2007 1:11:49 PM|BURP|Message from server: Your computer has 1023.48MB of memory, and a job requires 1024.00MB\'
ID: 4830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mpan3

Send message
Joined: 24 Apr 06
Posts: 64
Credit: 29,899
RAC: 0
Message 4831 - Posted: 5 Mar 2007, 0:28:09 UTC

peter and speedy, the problem you are describing is known to the developer (janus), see here: http://burp.boinc.dk/forum_thread.php?id=623#4814


Contribution to BURP total: 0.5% (manually updated)
ID: 4831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 4842 - Posted: 5 Mar 2007, 11:51:33 UTC
Last modified: 5 Mar 2007, 11:57:00 UTC

Since it seems that aborting a session which is in progress is a very needed feature I wrote a script that will abort workunits that haven\'t started yet. This way it is possible to abort a session while people sill get credit for the work they have done.
Please note that aborting a session this way does not stop workunits that have already been started (ie. where at least one result-slot has been given out).
The query to do such a thing is pretty nasty to the database server, so expect a slowdown the next few hours or so as the system deletes workunits. The good thing about this script is that it is able to run while the rest of the system is online - it doesn\'t require project downtime.

Currently aborting session 310.
ID: 4842 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4845 - Posted: 5 Mar 2007, 13:05:35 UTC

I hope, the new one works faster then this one.
BURP4President
ID: 4845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AndyK
Project donor
Avatar

Send message
Joined: 2 Apr 05
Posts: 137
Credit: 20,063
RAC: 0
Message 4851 - Posted: 5 Mar 2007, 15:33:10 UTC - in response to Message 4842.  

...
Please note that aborting a session this way does not stop workunits that have already been started (ie. where at least one result-slot has been given out).
...
Currently aborting session 310.


Still getting \"new\" workunits for session 310.
Is that right?

AndyK
ID: 4851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4852 - Posted: 5 Mar 2007, 16:03:08 UTC - in response to Message 4851.  

...
Please note that aborting a session this way does not stop workunits that have already been started (ie. where at least one result-slot has been given out).
...
Currently aborting session 310.


Still getting \"new\" workunits for session 310.
Is that right?

AndyK
I think, you will get WUs till all parts are deleted. I\'m also get new WUs... I have stopped BURP till 312 is ready for rendering. The most WUs ar already killed but there are still 50.000 in the queue.

BURP4President
ID: 4852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 4853 - Posted: 5 Mar 2007, 16:04:56 UTC

There was a slight issue with the session abort script, it may have caused a few issues. Hopefully you should be able to get WUs for the new sessions now.
ID: 4853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4855 - Posted: 5 Mar 2007, 16:28:09 UTC - in response to Message 4853.  

There was a slight issue with the session abort script, it may have caused a few issues. Hopefully you should be able to get WUs for the new sessions now.
No, sorry... I get only WUs for 310.
BURP4President
ID: 4855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stephen Brown
Project donor
Avatar

Send message
Joined: 29 Mar 05
Posts: 9
Credit: 22,773
RAC: 0
Message 4857 - Posted: 5 Mar 2007, 16:36:53 UTC

This is also happening to me, I keep aborting the WU\'s and I just get lots more of 310
ID: 4857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fischer-Kerli
Project donor

Send message
Joined: 24 Mar 05
Posts: 70
Credit: 78,553
RAC: 0
Message 4858 - Posted: 5 Mar 2007, 16:37:31 UTC
Last modified: 5 Mar 2007, 16:41:31 UTC

I only get the \"not enough memory\" message (referring to the 1024 MB of session 310), no 312 or 313 units for which there would be enough RAM.
ID: 4858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4859 - Posted: 5 Mar 2007, 16:41:32 UTC - in response to Message 4858.  

I only get the \"not enough RAM\" message, no 312 units.
That\'s the problem of WU 310. You had to wait for 312, than you only need 256 MB of RAM.
BURP4President
ID: 4859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fischer-Kerli
Project donor

Send message
Joined: 24 Mar 05
Posts: 70
Credit: 78,553
RAC: 0
Message 4860 - Posted: 5 Mar 2007, 16:43:41 UTC - in response to Message 4859.  

I only get the \"not enough RAM\" message, no 312 units.
That\'s the problem of WU 310. You had to wait for 312, than you only need 256 MB of RAM.


I know (see edited post above), but they won\'t come.
ID: 4860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fischer-Kerli
Project donor

Send message
Joined: 24 Mar 05
Posts: 70
Credit: 78,553
RAC: 0
Message 4892 - Posted: 7 Mar 2007, 9:06:54 UTC

Yes, they came for a while the day before yesterday. But since Monday evening UTC, it\'s all back to \"not enough memory\" again and again. Now it\'s Wednesday morning UTC, and I\'m starting to wonder. The status line shows around 290,000 parts to be sent out, decreasing VERY slow - are all of these 310 units? I think there must be a lot of 312 and 313 units in the queue as well - their status pages haven\'t even begun to show the status bar and \"unfinished units\" link, it just says \"ETA: Unknown\". I know that the distribution logic regarding computers with different amounts of RAM is still to be worked on, but even now, \"larger\" units shouldn\'t block the queue completely - and it seems to me that this is exactly what\'s happening. I don\'t think ANYONE else has gotten a \"smaller\" unit for over a day now: I have tons of pending \"smaller\" results, but my credit figure has remained constant for a very long time, meaning that not a SINGLE second result has been received for any of the pending WUs in my account - statistically unlikely if there was a smaller unit now and then between the larger ones.
ID: 4892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4574
Credit: 2,100,463
RAC: 8
Message 4899 - Posted: 7 Mar 2007, 12:24:41 UTC
Last modified: 7 Mar 2007, 12:53:06 UTC

Yes there\'s an issue with that but so far I\'m booked up with stuff to do until the weekend.

The problem is this:
The feeder can only hold X units to be sent out at a time. It uses some kind of queue strategy to pick those X units.

The first (default, trivial) strategy to use is FIFO (first in first out), that way the first session that was submitted would be the first one the feeder would fetch units for (regardless of size). However, if a large session is the first there may be a smaller session later that can\'t fit inside the X units in the feeder because the first session uses up all the slots. Hence a lot of machines will iddle untill the first session is done and the feeder moves on to the next.

The second (trivial) strategy is random queuing:
To start out with everything is fine, it picks some small units and some large ones, however, the larger ones will \"take longer to get rid off\" and - since the chance of picking a small and a large unit hasn\'t changed - the feeder will slowly be filled up with large units waiting to be sent. If a large unit is sent out the feeder may be lucky and burn through 3 small units before it gets a large unit again, but at some time it is simply destined to block again. That happens even when units are selected uniformly at random from the set of all waiting units.
This is the current strategy.

Third (and first non-trivial) strategy is the one I\'m planning to implement really soon:
SFQ (or stochastic fairness queue). The idea behind this queue type is that it must ensure fairness so that each session is able to have units handled in turn, thus preventing any single session from drowning out the rest. It can still block like the random queue in extreme cases, but the risk that it does this is much much smaller than before.
However, it introduces a new kind of block (the reverse of a single session drowning out the rest): A big session can be partly blocked by the constant influx of new smaller ones. But that\'s at least more fair and more efficient than the current random issue that blocks all smaller ones instead of part of one big one.

Until it is possible to get the real queue strategy we will use the random queue with a backup strategy that temporarily delays units that were not sent for 10 mins and tries to find some other unit to send.
ID: 4899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andre
Avatar

Send message
Joined: 19 Aug 06
Posts: 30
Credit: 582
RAC: 0
Message 4900 - Posted: 7 Mar 2007, 13:02:31 UTC

But please delete 310s WUs... I don\'t get any WUs of 312 or 313, just 310... It\'s annoing to cencel all these WUs manually.
BURP4President
ID: 4900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fischer-Kerli
Project donor

Send message
Joined: 24 Mar 05
Posts: 70
Credit: 78,553
RAC: 0
Message 4901 - Posted: 7 Mar 2007, 13:23:13 UTC
Last modified: 7 Mar 2007, 13:23:32 UTC

@ Janus: Thanks for taking the time to post here even when you\'re overworked. The feedback we get from you is one of the things that make this project really valuable.

@ Andre: I\'m not sure if your manually deleting the WUs helps - I could imagine that they just get re-queued so that those who have already submitted results get their credit instantly without Janus having to manually grant it.
ID: 4901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fischer-Kerli
Project donor

Send message
Joined: 24 Mar 05
Posts: 70
Credit: 78,553
RAC: 0
Message 4904 - Posted: 7 Mar 2007, 15:09:52 UTC

The queue is (temporarily?) unblocked - small WUs are getting through, and a few pending results have been resolved.
ID: 4904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : request to cancel 310 and resubmitt job