KWSN Orbiting Fortress Forum Index KWSN Orbiting Fortress
KWSN Distributed Computing Teams forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

SETI = borked again

 
Post new topic   Reply to topic    KWSN Orbiting Fortress Forum Index -> KWSN S@H Team
View previous topic :: View next topic  
Author Message
JerWA
Prince
Prince


Joined: 01 Jan 2007
Posts: 1497
Location: WA, USA

PostPosted: Sun May 13, 2007 8:16 am    Post subject: SETI = borked again Reply with quote

Well, you may or may not have noticed that SETI came back online last night with new work generated and sent for the first time in awhile. Yay! Unfortunately, around midnight, something broke and everything came to a screeching halt (see network activity link for Berkley below). No word yet what's up, but hopefully it's just a small hiccup (i.e. a machine that needs to be smacked) and we'll be rolling again soon.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets

In the mean-time, might be wise to set your clients not to download new work and/or disable network activity if SETI is your only project. Past experience has shown us that pestering the upload server with results tends to invalidate them.
_________________

Stats: [BOINC Synergy] - [Free-DC] - [MundayWeb] - [Netsoft] - [All Project Stats]
Back to top
View user's profile Send private message Yahoo Messenger MSN Messenger
mohrorless
Mail Order Goat Bride
Prince


Joined: 09 Oct 2006
Posts: 11206
Location: NYC

PostPosted: Sun May 13, 2007 12:05 pm    Post subject: Reply with quote

Maybe the new server got overworked too soon and quit. Wink
_________________
Fetch me the Holy Hand Grenade!

#Usa


Keeper of the Unending keg of PGGBs
Taunter in Training
Campaign Manager for Sir Shrubbery



Plus

Back to top
View user's profile Send private message
JerWA
Prince
Prince


Joined: 01 Jan 2007
Posts: 1497
Location: WA, USA

PostPosted: Sun May 13, 2007 1:56 pm    Post subject: Reply with quote

Dunno. It does this on a pretty regular basis (enough so that all the users there have these network activity links hehe), I don't think it's anything special caused by the new server.
_________________

Stats: [BOINC Synergy] - [Free-DC] - [MundayWeb] - [Netsoft] - [All Project Stats]
Back to top
View user's profile Send private message Yahoo Messenger MSN Messenger
Quixote
Duke
Duke


Joined: 06 Nov 2006
Posts: 355
Location: Aaaargh!

PostPosted: Sun May 13, 2007 4:42 pm    Post subject: Reply with quote

Actually, they seem to have been working on it quite heroically through the weekend - let's see what the "Monday blues" bring. I'm leaving the settings just as they are, this thing um here is cranking out "climate" and "little green men" only, for now.

By the way - does "cricket" work with Winders XP?
_________________
tilting windmills, rescuing damsels,etc
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mohrorless
Mail Order Goat Bride
Prince


Joined: 09 Oct 2006
Posts: 11206
Location: NYC

PostPosted: Sun May 13, 2007 6:23 pm    Post subject: Reply with quote

I seem to have several WUs with a status of "Downloading" and 1 "Uploading"... Confused
_________________
Fetch me the Holy Hand Grenade!

#Usa


Keeper of the Unending keg of PGGBs
Taunter in Training
Campaign Manager for Sir Shrubbery



Plus

Back to top
View user's profile Send private message
Tenebra
Prince
Prince


Joined: 16 Nov 2006
Posts: 2053
Location: Somewhere in the Outer Rim of a Galaxy far far away

PostPosted: Mon May 14, 2007 3:09 am    Post subject: Reply with quote

Yeap me too...
_________________

My Greek Blog: The Dark Side of the Force and Other Stories
My English Blog: Snippets and Other Stories from the Net
Back to top
View user's profile Send private message MSN Messenger
JerWA
Prince
Prince


Joined: 01 Jan 2007
Posts: 1497
Location: WA, USA

PostPosted: Mon May 14, 2007 10:07 am    Post subject: Reply with quote

If you have work showing as queued for download, you can kind've help that along (to keep work going if you need it) by following the instructions in this post:
http://setiathome.berkeley.edu/forum_thread.php?id=39438

Essentially, change your HTTP proxy in BOINC Manager to 128.32.18.173. Then go to your tasks list, find the ones that are downloading and make a note (usually the last 3 #'s are enough) of them, then go to the transfers window and "retry now" for those specific files. They will download immediately. Once finished, remove the HTTP proxy setting as it will prevent uploads from working whenever the server is finally back online.

I did this and cleared all the pending download work for SETI from my clients which should keep them busy for a few more hours. Some have also reported that you may actually get MORE pending downloads once you clear them out, so you can keep repeating this cycle to get work if you need it.
_________________

Stats: [BOINC Synergy] - [Free-DC] - [MundayWeb] - [Netsoft] - [All Project Stats]
Back to top
View user's profile Send private message Yahoo Messenger MSN Messenger
mohrorless
Mail Order Goat Bride
Prince


Joined: 09 Oct 2006
Posts: 11206
Location: NYC

PostPosted: Mon May 14, 2007 10:53 am    Post subject: Reply with quote

Thanks JerWa,

It worked great!
_________________
Fetch me the Holy Hand Grenade!

#Usa


Keeper of the Unending keg of PGGBs
Taunter in Training
Campaign Manager for Sir Shrubbery



Plus

Back to top
View user's profile Send private message
Tenebra
Prince
Prince


Joined: 16 Nov 2006
Posts: 2053
Location: Somewhere in the Outer Rim of a Galaxy far far away

PostPosted: Mon May 14, 2007 12:08 pm    Post subject: Reply with quote

Thanks, worked for me too.
_________________

My Greek Blog: The Dark Side of the Force and Other Stories
My English Blog: Snippets and Other Stories from the Net
Back to top
View user's profile Send private message MSN Messenger
JerWA
Prince
Prince


Joined: 01 Jan 2007
Posts: 1497
Location: WA, USA

PostPosted: Tue May 15, 2007 6:15 pm    Post subject: Reply with quote

They're still working on it, just posted another update:
Matt Lebofsky wrote:
We had the usual outage today which was mostly a success. The database compressed and was backed up in just over an hour. Normally this takes almost twice as long but the result table has significantly shrunk over the past two weeks (wonder why?). After that we put the new thumper in the closet (we being me, Eric, Jeff, and Kevin - it's a heavy machine). We also rebooted bruno to cleanly pick up a new disk (replacing a failed disk from yesterday). And I rebooted penguin to attach koloth's old tape drive to it (so it could read the classic data tapes for splitting).

That all went well. We also updated all the BOINC-side code to bring the SETI@home project in line with the current BOINC source tree and a few things broke, namely our validators and assimilators. These aren't project critical for the time being, so we're postponing dealing with these until we deal with the real problem at hand: getting people to connect to our data servers.

I think this is the longest outage we've ever had (even though it wasn't a "complete" outage - just no work was available) and we're in a whole new network configuration since the last major outage (new OS, new servers, new ISP, new switches, new router). In short, we're being clobbered by the returning flood of work requests. The major bottleneck is somewhere in the direction of our Hurricane router or bruno. Or at least that's the way it seems right now and there's no guarantee that when we break that dam a new bottleneck won't arise. I don't have the time to spell out what is broken and what we tried and what failed and what yielded unexpected results. Just know we're working on it and we understand most connections are being dropped.

- Matt

_________________

Stats: [BOINC Synergy] - [Free-DC] - [MundayWeb] - [Netsoft] - [All Project Stats]
Back to top
View user's profile Send private message Yahoo Messenger MSN Messenger
cozycat
Squire
Squire


Joined: 14 Nov 2006
Posts: 3

PostPosted: Tue May 15, 2007 7:40 pm    Post subject: Reply with quote

I'm just going to continue letting my machine do its thing. While I keep pestering my Wife's Primary Professor to run S@H on the lab machines. Any good ideas on how to get a positive response? So far I have gotten threats about bad things being done to my person if I keep asking, lol.
Back to top
View user's profile Send private message
JerWA
Prince
Prince


Joined: 01 Jan 2007
Posts: 1497
Location: WA, USA

PostPosted: Wed May 16, 2007 9:07 pm    Post subject: Fast One (May 16 2007) Reply with quote

Matt Lebofsky wrote:
Quick note as I gotta catch a bus..

Wow - what a mess. I think we're in the middle of our biggest outage recovery to date, and it's breaking everything. The good news is we're coming into some newer hardware which we'll get on line to help somehow.

See Eric's thread in the Staff Blog. He's been working overtime getting a new frankenstein machine together to act as another upload/download server and reduce the load on bruno. The scheduling server (galileo) has been choking - I just now moved all that over to bruno as well. So we may retire galileo soon, too. Jeff has been going nuts trying to track down errors in validator/assimilator code so we can get those on line as well. And our old friend "slow feeder query" is back, probably just being aggravated by the heavy load.

Gotta go..

- Matt


And the referenced post...

Eric Korpela wrote:
This one could probably go in the techincal news, but since I haven't blogged in a while, I decided to jot it down here.

Following the large outage, bruno's been having some problems keeping up. Lots of dropped connections. I guess most of you noticed that. It's not a lack of hardware this time, just an over-abundance of connection attempts.

Some of the dropped connections were local file-server connections, which causes some of the http processes to wait around which causes more dropped connections. Changing some of the TCP tuning parameters helped, but didn't solve the problem.

We did some brain storming before the outage and have come up with some tactics to combat these issues.

We're setting up our router to proxy the SYN/ACK handshakes. That way if we are flooded, the connections will be dropped before they get to bruno. That'll in turn prevent the NFS connections from getting dropped.

We're also getting rid of some configuration remnants from earlier BOINC server code. Currently bruno handles all of the incoming connections and forwards them to other machines when appropriate for uploads and downloads. We can designate other machines as upload or download handlers so that bruno won't have to touch those connections at all.

If that's not enough, we'll set up web servers on some of the other machines and get back to round robin DNS for the upload and download servers.

Well, that's enough typing for now. This weekend, one of my fingers had an unfortunate meeting with the leading edge of a 120mm fan blade inside a server case. Fortunately the fan blade broke and it doesn't look like I'll lose the fingernail. I've learned my lesson, always approach case fans from the trailing edge.

--
Eric

_________________

Stats: [BOINC Synergy] - [Free-DC] - [MundayWeb] - [Netsoft] - [All Project Stats]
Back to top
View user's profile Send private message Yahoo Messenger MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    KWSN Orbiting Fortress Forum Index -> KWSN S@H Team All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Optimized Seti@Home App | BOINC Stats