[OpenAFS] Help: intermittent fileservice hangs

Tracy Di Marco White afs-info@gendalia.org
Tue, 03 Feb 2004 19:44:43 -0600


In message <Pine.GSO.4.58-035.0402031737220.29100@johnstown.andrew.cmu.edu>, De
rrick J Brashear writes:
>On Tue, 3 Feb 2004, Tracy Di Marco White wrote:
>
>The below looks an awful lot like a half-reachable client (it can talk to
>you, you can't reply)

I'm not sure it's causing my problems, since we seem to see this a lot.

Today, with no reports of access problems:
Tue Feb  3 18:52:53 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba91af.63340
Tue Feb  3 18:53:49 2004 CB: RCallBackConnectBack (host.c) failed for host cd828c1.23904
Tue Feb  3 18:54:45 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba9376.7001
Tue Feb  3 18:55:42 2004 ProbeUuid failed for host 81ba4676.7001
Tue Feb  3 18:56:38 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba4213.7001
Tue Feb  3 18:57:34 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba8f12.7001
Tue Feb  3 18:57:42 2004 ProbeUuid failed for host 81ba3435.7001
Tue Feb  3 18:58:38 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba9c04.7001
Tue Feb  3 18:59:35 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba3850.7001
Tue Feb  3 19:00:31 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba743b.7001
Tue Feb  3 19:01:27 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba9c30.7001
Tue Feb  3 19:02:23 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba9367.7001

>> Before I turned up logging, possibly when things were getting bad:
>> Wed Jan 28 16:28:16 2004 CB: RCallBackConnectBack (host.c) failed for host 81badf78.7001
>> Wed Jan 28 16:29:12 2004 CB: RCallBackConnectBack (host.c) failed for host 81ba91af.62671
>> Wed Jan 28 16:29:44 2004 CB: WhoAreYou failed for 81ba8c1e.7001, error -3
>[]
>>
>> a little bit later, when no one could do things if their space was on afs-10:
>> Wed Jan 28 16:55:31 2004 CB: XCallBackBulk failed, host=81ba8c2a.7001; callback list follows:
>> Wed Jan 28 16:55:31 2004 CB: Host 81ba8c2a.7001, file 537818307.412.3015 (part of bulk callback)
>> Wed Jan 28 16:55:31 2004 CB: Host 81ba8c2a.7001, file 537947901.1.1 (part of  bulk callback)
>> Wed Jan 28 16:55:31 2004 BreakDelayedCallbacks FAILED for host 81ba8c2a which IS UP.  Possible network or routing failure.
>> Wed Jan 28 16:55:31 2004 MultiProbe failed to find new address for host81ba8c2a.7001
>> Wed Jan 28 16:55:32 2004 CB: XCallBackBulk failed, host=81ba8c1d.7001; callback list follows:
>
>ok, perhaps a whole network was half-reachable.

I'm in the process of moving most of my servers from FDDI to 100MB
ethernet, because I get communication failures on vos releases and
I have no network statistics on the FDDI.  afs-10, however, is already
on 100MB ethernet.  My network manager sees no network problems when
we're having afs problems.  I'm also seeing problems moving a volume
from /vicepc to /vicepd on afs-10, with communication time outs.

>> I'm sure this is at least part of my problem, is there a way to fix it
>> or raise the limits?
>
>i don't think it's your problem.

Thoughts?

-Tracy