Stats: Scraping the server is a bad idea

Message boards : Number crunching : Stats: Scraping the server is a bad idea
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 684 - Posted: 25 Apr 2005, 15:16:26 UTC

Scraping the server is NOT a good idea. Especially when wasting the server bandwidth with hundreds of lookups for the same non-existant user...

There will now be a 60 sec delay for lookups to users that do not exist.

Please consider using the XML stats that are exported hourly instead of scraping the server for stats. There is no need to waste so many resources just to do such a simple thing as getting stats. Thank you.
ID: 684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AndyK
Project donor
Avatar

Send message
Joined: 2 Apr 05
Posts: 137
Credit: 20,063
RAC: 0
Message 694 - Posted: 26 Apr 2005, 1:50:45 UTC

What stats exactly do you mean?

The wap stats, or some og the stats in the menu, etc.?


Andy
ID: 694 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thor Prime
Avatar

Send message
Joined: 2 Mar 05
Posts: 7
Credit: 445
RAC: 0
Message 698 - Posted: 26 Apr 2005, 2:58:50 UTC - in response to Message 694.  

> What stats exactly do you mean?

I think Janus is talking about some stats sites that gather stats data from <A HREF="http://burp.boinc.dk/top_users.php">HTML pages</a> instead of download and decode the <A HREF="http://burp.boinc.dk/stats/">XML stats files</a>.


ID: 698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Janus
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 16 Jun 04
Posts: 4570
Credit: 2,100,463
RAC: 8
Message 699 - Posted: 26 Apr 2005, 7:09:42 UTC - in response to Message 694.  

> What stats exactly do you mean?

The severely faulty script that looked up the same user many hundred times was hitting the userw.php page. But basicly anything that doesn't fetch the XML files (or use a third party site such as BOINC.dk for single user/team stats) is scraping.

Scraping is only a problem when the server that scrapes the BURP server is using all the bandwidth for a long period of time. It degrades the overall performance of the website for all users.

The reason why I'm particularly picky about this issue is that the main BURP server doesn't have much bandwidth to waste on that kind of thing - other larger projects like seti@home can afford it because usually their network troughput is much larger than that of the scraping server so there is still bandwidth left to serve the real users.
ID: 699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Stats: Scraping the server is a bad idea