[OpenAFS] "VL_RegisterAddrs rpc failed"

Ty Sarna tsarna@sarna.org
Sun, 30 Jul 2006 15:57:04 -0400


Aha! This (from Ken, at both quote levels) helped:

> >I was thinking of removing the address that is in there.  From what I
> >remember, the problem is that the vlserver thinks another server is
> >the one that has that IP address (I never figured out how I got
> >in that situation; I believe you could probably do it by doing a 
> >"vos changeaddr" to change the vlserver's notion of the server's addrss
> >to something else, starting the fileserer again, and doing vos syncvldb.
> 
> I meant to say, "I think you could probably FIX it by doing ...".
> My fix was more involved (it was more of a panic situation, and all
> I really remember was that after what seemed to be a dozen vos
> changeaddrs, everything was working again).

Here's what was screwed up, and why it doesn't look like it was.
Let's say I have foo.bar, with one server, afs.foo.bar, IP 9.9.9.9. 

foo$ vos listvldb
root.cell
    RWrite: ... ROnly: ... Backup: ...
    number of sites -> 1
        site 9.9.9.9 ...

root.afs
    RWrite: ... ROnly: ... Backup: ...
    number of sites -> 1
        site 9.9.9.9 ...
    
Now it looks like they're on the same server. But those two 9.9.9.9's
are *magically different*! Here's how you can see:

foo$ vos changeaddr 9.9.9.9 1.1.1.1
foo$ vos changeaddr 9.9.9.9 2.2.2.2
# wait, why no error? should be no 9.9.9.9 remaining, right?
foo$ vos changeaddr 9.9.9.9 3.3.3.3
# an error here, there are no more evil twins left

now:

foo$ vos listvldb
root.cell
    RWrite: ... ROnly: ... Backup: ...
    number of sites -> 1
        site 2.2.2.2 ...

root.afs
    RWrite: ... ROnly: ... Backup: ...
    number of sites -> 1
        site 1.1.1.1 ...
    
Aha!

Ok, so the fix:

Pick the phony ip that has the most entries, and changeaddr it back to
your real IP.  Now, "vos syncvldb" plus "vos syncserv" should fix the
other one.  Trouble is, this takes forever as it keeps timing out trying
to talk to he phony IP.  So, I had another box with openafs installed
but not running it.  So I set it up as an empty server, and changeaddr'd
the remaining phony ip to that, and then syncvldb/syncserv ran much
faster. I assume that if you let it run with the phony IP it will
eventually finish and fix it too, but I didn't want to wait that long.

I hope that helps the next poor soul who runs into this.  And it would
still be nice if there were entries in VLLog about this, like FileLog
said there should be... 

-Ty