[OpenAFS] My salvager was cored by my volume.

Hartmut Reuter reuter@rzg.mpg.de
Thu, 28 Jun 2007 18:44:38 +0200


Harald Barth wrote:
> Yesterday I had a server crash after a HW-RAID box decided to go out
> for lunch wihout even trying to have a reason. After I restarted with
> fast-restart and then salvaged everything. First pass with 
> orphans ignore:
> 
> + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans ignore -localauth
> Starting salvage.
> bos: salvage completed
> SalvageLog:
> @(#) OpenAFS 1.4.4 built  2007-04-25 
> 06/27/2007 20:07:27 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans ignore)
> 06/27/2007 20:07:28 2 nVolumesInInodeFile 64 
> 06/27/2007 20:07:28 CHECKING CLONED VOLUME 537045986.
> 06/27/2007 20:07:28 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
> 06/27/2007 20:07:28 SALVAGING VOLUME 537045984.
> 06/27/2007 20:07:28 pdc.vol.module (537045984) updated 06/01/2005 14:10
> 06/27/2007 20:07:28 totalInodes 3019
> 06/27/2007 20:07:29 dir vnode 451: ??/.. (vnode 449): unique changed from 6629 to 11697 -- deleted
> 06/27/2007 20:07:29 dir vnode 455: ??/.. (vnode 453): unique changed from 6631 to 7491 -- deleted
> 06/27/2007 20:07:29 Vnode 449: link count incorrect (was 2, now 1)
> 06/27/2007 20:07:29 Vnode 453: link count incorrect (was 9, now 8)
> 06/27/2007 20:07:29 Found 2 orphaned files and directories (approx. 4 KB)
> 06/27/2007 20:07:29 Salvaged pdc.vol.module (537045984): 3012 files, 25862 block
> 
> Second pass with orphans attach:
> 
> + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans attach -localauth
> Starting salvage.
> bos: salvage completed
> SalvageLog:
> @(#) OpenAFS 1.4.4 built  2007-04-25 
> 06/28/2007 15:57:26 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans attach)
> 06/28/2007 15:57:27 2 nVolumesInInodeFile 64 
> 06/28/2007 15:57:27 CHECKING CLONED VOLUME 537045986.
> 06/28/2007 15:57:27 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
> 06/28/2007 15:57:27 SALVAGING VOLUME 537045984.
> 06/28/2007 15:57:27 pdc.vol.module (537045984) updated 06/01/2005 14:10
> 06/28/2007 15:57:27 totalInodes 3019
> 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
> 06/28/2007 15:57:28 Directory bad, vnode 451; salvaging...
> 06/28/2007 15:57:28 Salvaging directory 451...
> 06/28/2007 15:57:28 Checking the results of the directory salvage...
> 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
> 06/28/2007 15:57:28 Directory bad, vnode 455; salvaging...
> 06/28/2007 15:57:28 Salvaging directory 455...
> 06/28/2007 15:57:28 Checking the results of the directory salvage...
> 06/28/2007 15:57:28 "Salvage volume group" core dumped!
> 
> How unhappy is my volume or my salvager and where is that core?
> 
> Yes, I can access the volume and no, it is not written very often.
> 
> haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ ls
> amd64_fc3  i386_fc3  ia64_deb30  man          rs_aix43
> bin        i386_rh9  init        modulefiles  src
> haba@habarber /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ fs lq .
> Volume Name                   Quota      Used %Used   Partition
> pdc.vol.module                50000     25862   52%         69%  
> 
> # vos exa pdc.vol.module -local
> pdc.vol.module                    537045984 RW      25862 K  On-line
>     ruffe.pdc.kth.se /vicepa 
>     RWrite  537045984 ROnly          0 Backup  537045986 
>     MaxQuota      50000 K 
>     Creation    Fri May 16 10:20:22 2003
>     Copy        Wed May  2 21:42:08 2007
>     Backup      Thu Jun 28 02:18:52 2007
>     Last Update Wed Jun  1 14:10:44 2005
>     4874 accesses in the past day (i.e., vnode references)
> 
>     RWrite: 537045984     Backup: 537045986 
>     number of sites -> 1
>        server ruffe.pdc.kth.se partition /vicepa RW Site 
> 
> Tips and tricks how to proceed?

The best would certainly be to find out why and where it core-dumped.
Compile the salvager with -g and without -O and run it under gdb with 
-debug (to avoid it forks) or gdb the core file.

Hartmut
> 
> Harald.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-----------------------------------------------------------------
Hartmut Reuter                           e-mail reuter@rzg.mpg.de
					   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------