commit 2ab21bd6c0bbcc0e663c3b8b9521724316f0a611 Author: Benjamin Kaduk Date: Thu Mar 18 18:50:35 2021 -0700 Make OpenAFS 1.9.1 Update version strings for the second 1.9.x development release. Change-Id: I318ff00f02f618e0a25571a3c957ae6a6500e65c Reviewed-on: https://gerrit.openafs.org/14560 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8978f6a56b1ced2e8c75dd4feda7b8d602231268 Author: Michael Meffie Date: Thu Feb 18 18:35:33 2021 -0500 Update NEWS for OpenAFS 1.9.1 Change-Id: I20c23a3d0a84491c1eb4b9c36aee62726fb0b4e9 Reviewed-on: https://gerrit.openafs.org/14539 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit f8f3532bf5e0f3e3bb0e09355ddf7c751fae246d Author: Mark Vitale Date: Thu Mar 11 17:34:29 2021 -0500 fstrace: remove common dead code Previous commits removed dead code from both fstrace.c and afs_icl.c. Now remove anything from config/icl.h that is no longer needed. No functional change is incurred by this commit. Change-Id: Ibdad10ec4c91cd8c2d3fbd637354357f05ac2621 Reviewed-on: https://gerrit.openafs.org/14556 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ba58d9912cff07a6f2af7275017cf70115f1a88d Author: Mark Vitale Date: Thu Mar 11 15:36:54 2021 -0500 afs: remove dead ICL (fstrace) code The ICL code (afs/afs_icl.c) which supports fstrace includes a number of functions that have been dead code since the original IBM code import. Some of these seem to have been intended to support fine-grained event tracing, but the implementation was never completed. Remove the dead code. No functional change is incurred by this commit. Change-Id: If4d6d993175df57d4c5d827ab178ed3ba0bc7ed8 Reviewed-on: https://gerrit.openafs.org/14555 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d60ca9e08f83e2ee12c4bb2f10ca674190b900d2 Author: Mark Vitale Date: Thu Mar 11 15:19:58 2021 -0500 fstrace: remove dead code Numerous functions in venus/fstrace.c with names icl_* have been dead code since the original IBM code import. Furthermore, many of them have similar implementations in afs/afs_icl.c with names afs_icl_*. Remove the dead code. No functional change is incurred by this commit. Change-Id: I3943a9cf333c4044c877b46e7b2eec4285358c18 Reviewed-on: https://gerrit.openafs.org/14554 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c49d383f99969a98da34accf8666a5f3ae6c98d8 Author: Cheyenne Wills Date: Fri Mar 12 12:29:57 2021 -0700 bozo: Fix memory leak, check for malloc failures While reading the BosConfig file, the buffer obtained to hold the notp (notify) parameter is never freed. Reading the BosConfig is only done once at bosserver start up, so this is a one-time memory allocation. There are no checks for malloc failures. Release the notp buffer and add checks for memory allocation errors. Change-Id: Iffcb0db12f983a6a6d6a810a98be30152fa73c89 Reviewed-on: https://gerrit.openafs.org/14551 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 1bd68506be3243c5670aaf53798b2e4e715d4c8b Author: Cheyenne Wills Date: Fri Mar 5 16:31:03 2021 -0700 Linux 5.12: Add user_namespace param to inode ops The Linux commits: "fs: make helpers idmap mount aware" (549c72977) and "attr: handle idmapped mounts" (2f221d6f7) that were merged into Linux-5.12-rc1 cause a build failure when creating the kernel module. Several functions within the inode_operations structure had their signature updated to include a user_namespace parameter. This allows a filesystem to support idmapped mounts. OpenAFS only implements some of the changed functions. LINUX/vnodeops function inode_operation ===================== =============== afs_notify_change setattr afs_linux_getattr getattr afs_linux_create create afs_linux_symlink symlink afs_linux_mkdir mkdir afs_linux_rename rename afs_linux_permission permission Update the autoconf tests to determine if the Linux kernel requires the user_namespace structure for inode_operations functions. If so, define a generic "IOP_TAKES_USER_NAMESPACE" macro. Update the above vnodeops functions to accept a 'struct user_namespace' parameter. When using the 'setattr_prepare' function a user namespace must be now provided. In order to provide compatibility as a non-idmapped mount filesystem the initial user namespace can be used. With OpenAFS, the initial user namespace obtained at kernel module load time is stored in a global variable 'afs_ns'. Update the call to setattr_prepare to pass the user namespace pointed to by the 'afs_ns' global variable. Update calls to setattr to pass the user namespace pointed to by the 'afs_ns' global variable. Notes: The changes introduced with Linux 5.12 allow a filesystem to support idmapped mounts if desired. This commit does not implement support for idmapped mounts, but will continue to use the same initial user namespace as prior to Linux 5.12. With Linux 5.12 the following autoconf checks fail: HAVE_LINUX_INODE_OPERATIONS_RENAME_TAKES_FLAGS HAVE_LINUX_SETATTR_PREPARE IOP_CREATE_TAKES_BOOL IOP_GETATTR_TAKES_PATH_STRUCT IOP_MKDIR_TAKES_UMODE_T The new macro 'IOP_TAKES_USER_NAMESPACE' covers the cases where these macros where used. Change-Id: Id450d5c716137340ed20af5531c0cd756e4435cd Reviewed-on: https://gerrit.openafs.org/14549 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 12ae2beeeb172cebdfa24d5ea149f73fd85541f8 Author: Cheyenne Wills Date: Mon Mar 8 09:22:04 2021 -0700 Linux: Create wrapper for setattr_prepare Move call to setattr_prepare/inode_change_ok into an osi_compat.h wrapper called 'afs_setattr_prepare'. This moves some of the #if logic out of the mainline code. Change-Id: Ie17cf4c645d754c9e9efd8a603f1bc752d07cf36 Reviewed-on: https://gerrit.openafs.org/14548 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit bdd4a0c78b1acaf1c947ca53d16159ef95cc9840 Author: Andrew Deason Date: Wed Jan 1 17:09:24 2020 -0600 FBSD: Avoid recursive osi_VM_StoreAllSegments lock Currently, osi_VM_StoreAllSegments calls vget() for the given vnode, which requires locking the vnode. However, the vnode should already be locked. For example, when called from the close syscall, we reach this function via: vn_close1 -> afs_vop_close -> afs_close -> afs_StoreOnLastReference -> afs_StoreAllSegments -> osi_VM_StoreAllSegments. This causes a panic like so: kernel: panic: lockmgr_xlock_hard: recursing on non recursive lockmgr 0x[...] @ /usr/src/sys/kern/vfs_subr.c:2730 We can also reach this code path from the BOP_STORE background operation (BStore -> afs_StoreOnLastReference -> afs_StoreAllSegments -> osi_VM_StoreAllSegments), initiated from afs_close(), which has the vnode locked. In this case, we won't be recursively locking the vnode, since the process calling afs_close() is the one that holds the lock, and the background thread is the process trying to lock the vnode again. So we'll just deadlock. From the comments in this function, it seems like locking the vnode at all in here is unnecessary, since the vnode should be locked from the higher-level functions anyway. So just skip the vget and all of the related looping retry logic. As a result, this function can now become somewhat simplified. Change-Id: Ic5a18de46e51dc86190207163ad0fe73bc03cbd7 Reviewed-on: https://gerrit.openafs.org/14000 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 14cbd02b8a1a4f1d3c30dd4fb2864d35f39a95eb Author: Tim Creech Date: Thu Aug 29 21:35:36 2019 -0400 FBSD: Accommodate 12.0's 64-bit inodes In FreeBSD 12 (see: https://reviews.freebsd.org/rS318736), the layout of struct dirent changed to allow for 64-bit inodes and a few other changes. Update our struct min_direct to accommodate, to allow our readdir() results to be accurate. Without this, readdir() can yield garbage entries, due to the mismatch in the structure definitions. Change-Id: I36c2bf1f35b4d1ab61a2b4d51da7514827b3551b Reviewed-on: https://gerrit.openafs.org/13854 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c0b7367253eb6c346d577e099a0b0172d4d24ff3 Author: Andrew Deason Date: Fri Mar 5 22:20:35 2021 -0600 dir: Explicitly 'make all' in src/dir/test Currently, we 'cd test' and then just run 'make', which makes the first target specified in the Makefile. On some platforms (FreeBSD), this results in 'make' trying to build '%.c', which of course we cannot do, since that's a pattern rule, and so 'make' fails. To fix this, just 'make all' explicitly, to make the intended targets in src/dir/test. Change-Id: Icbbf60c9c163c24fbbed01c754c4f1eefeae6b78 Reviewed-on: https://gerrit.openafs.org/14550 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit cc1fd6a93d1475867ff2e88fb029a898424372d0 Author: Andrew Deason Date: Sat Apr 25 20:56:01 2020 -0500 FBSD: Lock vm object before vm_page_undirty We must write-lock the underlying vm object before calling vm_page_undirty; otherwise vm_page_undirty asserts if INVARIANTS is defined. For example: kernel: panic: Lock vm object not exclusively locked @ /usr/src/sys/vm/vm_page.c:4487 kernel: kernel: cpuid = 0 kernel: time = 1587858280 kernel: KDB: stack backtrace: kernel: #0 0xffffffff80bf0c07 at kdb_backtrace+0x67 kernel: #1 0xffffffff80ba7f8d at vpanic+0x19d kernel: #2 0xffffffff80ba7d73 at panic+0x43 kernel: #3 0xffffffff80ba3a7e at __rw_assert+0x17e kernel: #4 0xffffffff828da525 at vm_page_undirty+0x15 kernel: #5 0xffffffff828da33e at afs_vop_putpages+0x36e kernel: #6 0xffffffff811ef0ae at VOP_PUTPAGES_APV+0x8e kernel: #7 0xffffffff80ef4c2d at vnode_pager_putpages+0x7d kernel: #8 0xffffffff80ee77cf at vm_pageout_flush+0xff kernel: #9 0xffffffff80edd1b9 at vm_object_page_collect_flush+0x239 kernel: #10 0xffffffff80edce99 at vm_object_page_clean+0x179 kernel: #11 0xffffffff828d681c at osi_VM_StoreAllSegments+0x18c kernel: #12 0xffffffff828508cd at afs_StoreAllSegments+0x9d kernel: #13 0xffffffff8287ae0e at afs_StoreOnLastReference+0x17e kernel: #14 0xffffffff8287c3a5 at afs_close+0x245 kernel: #15 0xffffffff828d7766 at afs_vop_close+0x166 kernel: #16 0xffffffff811eb7a8 at VOP_CLOSE_APV+0x88 kernel: #17 0xffffffff80c80ba3 at vn_close1+0xe3 So, lock the vm object before undirtying our pages in afs_vop_putpages. Change-Id: Ifd047e3caf8c2b3e624aaf2bbdb1235a8c38a414 Reviewed-on: https://gerrit.openafs.org/14162 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e50e5ede55497b0c02647d21905f4134919fbf05 Author: Tim Creech Date: Thu Aug 29 22:12:41 2019 -0400 FBSD: Use VM_CNT_INC/VM_CNT_ADD on FreeBSD 12 r317061 changed where v_vnodein &c are stored. Use the new VM_CNT_INC/VM_CNT_ADD macros when available to accommodate. Change-Id: I576e333ebdf9e1c6ebb14ff1a1af4c3ad89faa47 Reviewed-on: https://gerrit.openafs.org/13859 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 81ea654494f5c90f67eb54adbb722a95e0d11d82 Author: Benjamin Kaduk Date: Fri Sep 25 09:22:16 2020 -0700 FBSD: avoid vrefl() Commit 20dc2832268eb (correctly) introduced changes so that we avoid interacting with vnodes marked as VI_DOOMED to the extent possible, but in doing so inadvertendly used the vrefl() KPI that was only introduced in FreeBSD 11.0. Rewrite the relevant logic to use the older vref() KPI, at the cost of a few more unlock/locks, in order to have a single codepath that works on all supported FreeBSD versions. Change-Id: Ib315d59ea6c6208bbd0c908d8eaf502a4de51869 Reviewed-on: https://gerrit.openafs.org/14373 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 0066f4e9f27fedc4cf4df52eaf10d35ae5c7ad6e Author: Tim Creech Date: Thu Aug 29 22:13:20 2019 -0400 FBSD: Handle missing LINK_MAX LINK_MAX was removed in r327598. When we don't have a LINK_MAX, just use its value from before it was removed (32767). Change-Id: Id66a2ba8b7085b392def1d17eace22c7f742e1a4 Reviewed-on: https://gerrit.openafs.org/13860 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 2add334454019b4a8fd979fb16da686cf93b56c6 Author: Tim Creech Date: Thu Aug 29 21:55:05 2019 -0400 FBSD: Use syscall "helper" functions syscall_register/syscall_deregister were effectively removed in r329647. Use syscall_helper_register/syscall_helper_unregister instead, which have existed since r205321 in FreeBSD 9. Change-Id: I2d5e3101024a44c18395d7eb95c644df6005e0aa Reviewed-on: https://gerrit.openafs.org/13858 Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 3bc541743b09f408364a946139c524d53056d40a Author: Tim Creech Date: Thu Aug 29 21:40:26 2019 -0400 FBSD: Handle malloc/free changes in FBSD 12 FreeBSD 12 (r328417) removed the deprecated compatibility macros MALLOC and FREE. Convert our users to just use the normal malloc and free, so we can build. FreeBSD 12 (r334545) also changed malloc() into a macro, which breaks our own malloc macro in our hcrypto config.h. To fix this, just undef malloc, if it's already a macro. Change-Id: I5c683e3834710a60cc78476cbaa7203218b11fe0 Reviewed-on: https://gerrit.openafs.org/13856 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7c89322c45605c90c8ce27a77695a1c291f0def4 Author: Andrew Deason Date: Sat Dec 21 18:34:20 2019 -0600 FBSD: Use CK_STAILQ_FOREACH for ifaces on FBSD 12 FreeBSD 12 changed how network interfaces and network addresses are linked together; we're supposed to use CK_STAILQ_FOREACH to traverse them now, instead of TAILQ_FOREACH. To try to keep this change simpler, introduce a new macro, AFS_FBSD_NET_FOREACH, which picks the right macro to use. Based on a commit by tcreech@tcreech.com. Change-Id: Iab0f93701dd60dcf4237a7fbbf461019bceaeb38 Reviewed-on: https://gerrit.openafs.org/13999 Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 9e98d61ff41709cee8d484be1ecd638a18e2ce0f Author: Tim Creech Date: Sat Dec 21 18:22:40 2019 -0600 FBSD: Add proper locks when traversing net ifaces When traversing the list of network interfaces, or the list of addresses for a network interface, we're supposed to lock the relevant resource with IFNET_RLOCK, if_addr_rlock, or IN_IFADDR_RLOCK. Add these locks around our code that examines network interfaces, to avoid issues if the interface or address list changes while we're traversing them. While we're doing this, move around some "AFS_DARWIN_ENV || AFS_FBSD_ENV" ifdefs, since these were getting a bit hard to read. This commit adds some duplicated code, but the result should be easier to follow. Also for FreeBSD 12, we must be in NET_EPOCH_ENTER when calling ifa_ifwithnet/rx_ifaddr_withnet (it panics if we don't, with INVARIANTS). Add the needed NET_EPOCH_ENTER/EXIT calls, but do so a bit higher up the call stack, since the returned structures are potentially no longer valid after we NET_EPOCH_EXIT. Since this means we're calling these in a few places in libafs, create a couple of rx abstractions (RX_NET_EPOCH_ENTER) to handle the relevant ifdefs. [adeason@dson.org: Various adjustments to locking calls; splitting up DARWIN/FBSD ifdefs.] Change-Id: I65d63b99b6f6ef3254325cce9338be27ef78478c Reviewed-on: https://gerrit.openafs.org/13998 Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 70f3ac5d04a02470366a980224fdf8fadb31b463 Author: Andrew Deason Date: Sat Apr 25 17:20:54 2020 -0500 rx: Indent ifdef maze in rx_kernel.h Change-Id: I3a10206234496b9de6f7ddeafebdee8ab10e5546 Reviewed-on: https://gerrit.openafs.org/14161 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 08c769967ca12f1ac99c736789f1925763d8a115 Author: Andrew Deason Date: Fri Dec 20 22:09:35 2019 -0600 rx: Indent ifdef maze in rx_kcommon.c Change-Id: I8b898fb5f7bcc142de3a111baaa6dfb9606fa199 Reviewed-on: https://gerrit.openafs.org/13997 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2a8db42664cc450c2db097fe19472fe7876203df Author: Andrew Deason Date: Fri Dec 20 21:51:18 2019 -0600 afs: Indent ifdef maze in afs_server.c Change-Id: I223b932490ca1e89711844e41cbff2cd9b50a0f4 Reviewed-on: https://gerrit.openafs.org/13996 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 7251f0991dd108366115e6c628b01d6f5824fa89 Author: Mark Vitale Date: Wed Feb 17 13:53:55 2021 -0500 rx: correctly count RX_PACKET_TYPE_VERSION packets Since the original IBM code import, rx statistics for counting incoming packet types have inadvertently omitted RX_PACKET_TYPE_VERSION packets. This results in rxdebug -rxstats always reporting 0 for the number of version packets read. A similar bug causes a debugging facility in rxi_ReceivePacket to emit "*UNKNOWN*" instead of "version" for version packets. Correct all versions of the offending logic. Change-Id: I9e713eb595b75ef06a347a1c05edb9efffd0b366 Reviewed-on: https://gerrit.openafs.org/14519 Tested-by: Mark Vitale Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit f3dcf1e07a88032b56deb2eee27d8515af183974 Author: Pat Riehecky Date: Fri Jun 1 15:32:57 2018 -0500 Resolve missing printf args A handful of printf's requested more args than they were given. The missing args are now provided. (via cppcheck) Change-Id: I3d2bfd1b68a3518ee4c8a65f02446a2bae85d926 Reviewed-on: https://gerrit.openafs.org/13155 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 268025f841f1a2bd16b802459a8b590939331bcd Author: Cheyenne Wills Date: Mon Feb 22 11:08:39 2021 -0700 autoconf: use AC_CHECK_TOOL for as and ld Some platforms use the GNU target triplet as a prefix to the toolchain utilities (e.g. x86_64-pc-linux-gnu-as) to allow the use of alternative toolchains, cross-compiling, etc. The Gentoo Linux distribution has a mode of building packages (-native-symlinks) where the toolchain utilities only exist as their prefixed names (e.g. 'as' does not exist, but 'x86_64_pc-linux-gnu-as' does). This results in configure failing to locate the tools when using AC_CHECK_PROGS. (Gentoo uses the --host and --build configure parameters to specify the prefix names for the tools). Replace AC_CHECK_PROGS with AC_CHECK_TOOL for the toolchain related commands 'as' and 'ld'. AC_CHECK_TOOL works like AC_CHECK_PROGS but it will also look for the program with a prefix (specified by using configure's --host parameter). Note: libtool.m4 runs AC_CHECK_TOOL for ar. Change-Id: I8005c765d213b7d1d6292a7dd80f10a3d0e2ec68 Reviewed-on: https://gerrit.openafs.org/14544 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a9cff4ab4c9eba40145706778664318de2d89c12 Author: Pat Riehecky Date: Wed Jun 6 10:55:18 2018 -0500 strlcpy restricted to array length. For kaprocs, size of 'caller.userID.name' is defined by a different macro than size of 'name'. They can become out of sync, so restricting to size of dest. For scout and afsmon-win, the if statement determined that the string was longer than the dest buffer. So we are using the size of the buffer as the max length to setup for truncation. (via cppcheck) Change-Id: I38a2bff1d59d17ea02e136c35cd5b132a75a8ed8 Reviewed-on: https://gerrit.openafs.org/13163 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4898be6278f6ff4e214dfe993c0aa31262b9036f Author: Pat Riehecky Date: Tue Jun 12 13:33:31 2018 -0500 localtime can return NULL if unable to read system clock This adds checks for some invocations of localtime() to avoid possible NULL dereference. (via facebook-infer) Change-Id: I2b779d8f60c032563eb4ee3cebe20b14afbb0fa3 Reviewed-on: https://gerrit.openafs.org/13206 Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ec45ae60536190c2f5fbf272a9acfe0a85824e24 Author: Pat Riehecky Date: Wed Sep 19 15:51:00 2018 -0500 configure.ac: Add missing double include guard This is primarily a sanity check (identified by clang-tidy). Change-Id: I92d05fdfed0e32c0e39cc2f8ce412b613c0a38fc Reviewed-on: https://gerrit.openafs.org/13333 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 441c00c430f8af30be425a51bc5fbe0d6c005afa Author: Pat Riehecky Date: Fri Jun 1 15:59:37 2018 -0500 If realloc() == NULL we lost the pointer to old memory Systems under memory pressure may fail to realloc(). If so, the pointer to the old memory is lost, but not released. This code catches the pointer before hand to ensure the memory isn't leaked. (via cppcheck) Change-Id: I4c5a11c1daf4e78f7ffde71af0175d9106f6c3cd Reviewed-on: https://gerrit.openafs.org/13156 Reviewed-by: Joe Gorse Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9338cb5fce2e38b864b8f957b6ea4c56c78d20f8 Author: Michael Meffie Date: Tue May 31 16:23:41 2016 -0400 SOLARIS: provide cache manager stats via kstat Provide statistical information via the solaris kstat framework. Data can be examined with the kstat tool or the kstat userspace api. The kstat module is called openafs. Three kstat names are provided. The "param" name provides cache manager parameters as given by the cmdebug -cache program. # kstat -m openafs -n param The "cache" name provides cache manager statistics as given by the xstats plus some additional cache related stats. The "cache" name also provides the libafs kernel module version string and the current local cellname. # kstat -m openafs -n cache The "rx" name provides general rx statistics as given by rxdebug -rxstat. # kstat -m openafs -n rx Change-Id: Ic07e3b58fa5c79145f12f8519a6f7fce0d91138b Reviewed-on: https://gerrit.openafs.org/13170 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6b96a49eb6268adf9fc7e077fe849af7802a1575 Author: Andrew Deason Date: Wed Aug 26 13:54:00 2020 -0500 Retire AFS_MOUNT_AFS Currently, the AFS_MOUNT_AFS #define is used to mean two completely different things: - The string "afs", corresponding to the first argument to mount(2) on many platforms and some related calls inside libafs (e.g. getnewvnode() on FBSD). - An integer identifying the AFS filesystem (e.g. gfsadd() on AIX). Depending on the platform and the build context (UKERNEL vs KERNEL), AFS_MOUNT_AFS gets defined to one of those two things. This is very confusing, and has led to mistakes in the past, such as those fixed in commit 446457a1 (afs: Set AFS_VFSFSID to a numerical value). To avoid such confusion, get rid of AFS_MOUNT_AFS completely, and replace it with two new symbols: - AFS_MOUNT_STR, the string "afs". - AFS_FSNO, the integer given to gfsadd() et al. When AFS_MOUNT_AFS is split this way, AFS_MOUNT_STR then is always defined to the same value, so remove it from the param.h files for our platforms. Instead, define it in afs.h for libafs use, and in afsd_kernel.c (the only place outside of src/afs that uses it). Also remove the logic for conditionally defining MOUNT_AFS from the param.h files, moving the logic to the same locations as AFS_MOUNT_STR. Note that this commit removes the numeric definition for AFS_MOUNT_AFS in param.sgi_65.h (aka AFS_FSNO). We never actually used this value, since AFS_FSNO is not used on IRIX; instead, we tend to use the 'afs_fstype' global instead of a constant number. Change-Id: I6cbf051dc938cd1c456cbe236c0afe99a3c3dd87 Reviewed-on: https://gerrit.openafs.org/14323 Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e0e0b3cea6305cdbccc71039a05d6121c32c51cf Author: Andrew Deason Date: Mon Dec 21 20:43:56 2020 -0600 Remove AFS_PARISC_LINUX24_ENV references Since commit 91713206 (Remove LINUX24 from src/afs), AFS_PARISC_LINUX24_ENV is never defined. Remove references to it. Change-Id: I854701f26ec86b9b9fb99dc57c36f04f78a09517 Reviewed-on: https://gerrit.openafs.org/14472 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 78ef922612bef5f5fd6904896e84b9d2ea802404 Author: Cheyenne Wills Date: Fri Jan 22 07:57:55 2021 -0700 Linux 5.11: Test 32bit compat with in_compat_syscall Linux 5.11 removed the TIF_IA32 thread flag with commit: x86: Reclaim TIF_IA32 and TIF_X32 (8d71d2bf6efec) The flag TIF_IA32 was being used by openafs to determine if the task was handling a syscall request from a 32 bit process. Building against a Linux 5.11 kernel results in a build failure as TIF_IA32 is undefined. The function 'in_compat_syscall' was introduced in Linux 4.6 as the preferred method to determine if a syscall needed to handle a compatible call (e.g. 32bit application). To resolve the build problem, use 'in_compat_syscall' if present (Linux 4.6 and later) to determine if the syscall needs to handle a compatibility mode call. Add autoconf check for in_compat_syscall. Notes about in_compat_syscall: In Linux 4.6 'in_compat_syscall' was defined for all architectures with a generic return of 'is_compat_task', but allows architecture specific overriding implementations (x86 and sparc). At 4.6 (and later), the function 'is_compat_task' is defined only for the following architectures to return: Arch Returns ======= ============================== arm64 test_thread_flag(TIF_32BIT); mips test_thread_flag(TIF_32BIT_ADDR) parisc test_ti_thread_flag(task_thread_info(t), TIF_32BIT) powerpc is_32bit_task() s390 test_thread_flag(TIF_31BIT) sparc test_thread_flag(TIF_32BIT) If the Linux kernel is not built with compat mode, is_compat_task and in_compat_syscall is set to always return 0 Linux commit that introduced in_compat_syscall: compat: add in_compat_syscall to ask whether we're in a compat syscall (5180e3e24fd3e8e7) Change-Id: I59deebfe5d8cddaf845b15ef69e65a684a961280 Reviewed-on: https://gerrit.openafs.org/14499 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 32cc6b0796495e596262d84c428172a511f757c4 Author: Cheyenne Wills Date: Fri Jan 29 11:32:36 2021 -0700 Linux: Refactor test for 32bit compat Refactor the preprocessor checks for determining the method to test for 32bit compatibility (64bit kernel performing work for a 32bit task) into a common inline function, 'afs_in_compat_syscall' that is defined in LINUX/osi_machdep.h. Update osi_ioctl.c and afs_syscall.c to use afs_in_compat_syscall. Add include afs/sysincludes into osi_machdep.h to ensure linux/compat.h is pulled for the functions called in afs_in_compat_syscall. Change-Id: I6610cc19fedd909de8e8941ded05ed1608e52403 Reviewed-on: https://gerrit.openafs.org/14500 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 0c1465e4f3310daa54f1e799f76237604222666d Author: Andrew Deason Date: Thu Jan 28 16:59:47 2021 -0600 LINUX: Fix includes for fatal_signal_pending test Commit 8b6ae289 (LINUX: Avoid lookup ENOENT on fatal signals) added a configure test for fatal_signal_pending(). However, this check fails incorrectly ever since Linux 4.11, because fatal_signal_pending() was moved from linux/sched.h to linux/sched/signal.h in Linux commit 2a1f062a (sched/headers: Move signal wakeup [...]). Fix this by including linux/sched/signal.h if we have it during the configure test. A false negative on this configure test doesn't break the build, but it disables one of our safeguards preventing incorrect negative dentries at runtime. The function fatal_signal_pending() hasn't changed in quite some time (except for what header it lives in); it was introduced in Linux 2.6.25 via Linux commit f776d12d (Add fatal_signal_pending). So to try to avoid this mistake again in the future, make it so a missing fatal_signal_pending() breaks the build if we're on Linux 2.6.25+. Change-Id: Id0b91b2f24e2ea87c9c900076ab7ab1fcab3d304 Reviewed-on: https://gerrit.openafs.org/14508 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 031ebf43a8d4db79ee1aa9aff571094354c548b1 Author: Cheyenne Wills Date: Tue Dec 22 11:06:42 2020 -0700 afs: Cleanup afsincludes.h indentation Clean up the indentation of preprocessor statements Remove commented out code. Change-Id: I37fec6f15a8972651ef05aa00580a2628e0a1a46 Reviewed-on: https://gerrit.openafs.org/14471 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 873a5d9e8835b969370f1f031acef60745a0fff8 Author: Cheyenne Wills Date: Tue Dec 22 11:03:33 2020 -0700 afs: Clean up VNOPS/afs_vnops_attrs.c indentation Clean up the indentation of preprocessor statements, add #endif comments where helpful. Clean up whitespace in code indentation. Change-Id: I5e6eb3d8ad2688f2b5a56b760d1c1f031f6ca9ec Reviewed-on: https://gerrit.openafs.org/14470 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2628e0b5070835ded51e61f33bbb3ecaa19da012 Author: Cheyenne Wills Date: Thu Nov 21 11:58:04 2019 -0700 rx: Remove dead reference to rxk_ListenerProc rx_prototypes.h has an extern definition for rxk_ListenerProc. That function was removed in commit e261238470ed28ee7c1068d914de171b34033e09 'SOLARIS: Perform daemon syscalls as kernel threads' Remove the extern definition/reference in rx_prototypes.h Change-Id: I9f845f24f993f5a5cfb353e594ecdf3ec6de73ab Reviewed-on: https://gerrit.openafs.org/14038 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d7469128ceefbd96b61f32f62fd1e11c3674dac8 Author: Cheyenne Wills Date: Wed Dec 23 13:25:31 2020 -0700 afs: Clean up afs_init.c indentation Clean up the indentation of preprocessor statements, add #endif comments where helpful. Clean up whitespace in code indentation. Change-Id: Id7eeeabfea52c99f783e23468cfc89ce9ed8eae5 Reviewed-on: https://gerrit.openafs.org/14469 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bf37aec672efaf7824d9c96bcff7a45eb47ef280 Author: Cheyenne Wills Date: Fri Jan 22 12:48:21 2021 -0700 tests: fix potential divide by zero condition Running clang's static analysis revealed a possible divide by zero condition. There is a random chance of the divide by zero. - it has to be in the first pass of the main loop testing events (counter = 0) - 90% chance path : if (counter < (NUMEVENTS -1) && random() % 10 == 0) -- needs to be false - 25% chance path: if (random() % 4 == 0) -- needs to be true if the above conditions are met, the statement int victim = random() % counter is a divide by zero. Add a check to ensure the counter is greater than zero. Add a comment to document that only events prior to the current event are randomly selected. Change-Id: I4b4e73fa324842bb504bcc952079af15aea8a6a3 Reviewed-on: https://gerrit.openafs.org/14501 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 43ef1f2a5d80aa1c3f5b4831ada8e776ac0c7d13 Author: Benjamin Kaduk Date: Thu Jan 14 10:20:59 2021 -0800 Remove overflow check from update_nextCid The rx_nextCid global has been an unsigned type since http://gerrit.openafs.org/11106 (which was actually merged before the refactoring of overflow check to avoid signed integer overflow) and thus there is no need to avoid signed overflow. The per-connection cid has been unsigned since the IBM import. The natural unsigned behavior on overflow of wrapping is the desired behvaior here, so just remove the extra logic and always increment. Change-Id: I2d9fd24082b762eb871199da3ac1cc0983764585 Reviewed-on: https://gerrit.openafs.org/14496 Reviewed-by: Jeffrey Hutzelman Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 2c0a3901cbfcb231b7b67eb0899a3133516f33c8 Author: Jeffrey Altman Date: Thu Jan 14 09:57:13 2021 -0500 rx: update_nextCid overflow handling is broken The overflow handling in update_nextCid() produces a rx_nextCid value of 0x80000001 which itself is out of the valid range. When used to construct the first call of a new connection the connection id for the call becomes 0x80000002, and all subsequent connections also trigger the overflow handling and thus also receive connection id 0x80000002. If the same connection id is used for multiple connections from the same endpoint the accepting rx peer will be very confused. When authenticated connections are used, the CHALLENGE/RESPONSE will fail because of a mismatch in the connection's callNumber array. If an initiator makes only a single connection to a given rx peer, that connection would succeed, but once multiple connections are initiated all communication from a broken initiator to any rx peer will fail. The incorrect overflow calculation was introduced by 39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid generation into the rx core"). This change corrects the overflow value to become 1 << RX_CIDSHIFT Change-Id: If36e3aa581d557cc0f4d2d478f84a6593224c3cc Reviewed-on: https://gerrit.openafs.org/14492 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit a3bc7ff1501d51ceb3b39d9caed62c530a804473 Author: Jeffrey Altman Date: Thu Jan 14 09:41:39 2021 -0500 rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid 39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid generation into the rx core") introduced the use of RAND_bytes() to generate the initial 'rx_nextCid' but failed to remove the rx_nextCid = ((tv.tv_sec ^ tv.tv_usec) << RX_CIDSHIFT; assignment inherited from IBM/Transarc. At Thu, 14 Jan 2021 08:25:36 GMT the IBM inherited calculation overflows the value CID range. This triggers broken overflow logic in update_nextCid(). Change-Id: Ib7283def1ded9792d394133a3969a6d86f3a6123 Reviewed-on: https://gerrit.openafs.org/14491 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Jeffrey Hutzelman Reviewed-by: Cheyenne Wills Tested-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 32dfff2c9881e76446446f8c8875b978ca4cbefb Author: Mark Vitale Date: Tue Jan 12 15:50:07 2021 -0500 volser: correctly attribute 'vos partinfo' errors Since the original IBM code import, the 'vos partinfo' error message blames the wrong command, 'vos listpart': $ vos partinfo afs01.sinenomine.net : Could not get afs tokens, running unauthenticated. Could not fetch the list of partitions from the server Possible communication failure Error in vos listpart command. Possible communication failure Correct the error message to specify 'vos partinfo' instead. Change-Id: I966dee8c679db89c7ed5ce21d31ebc7424803cf2 Reviewed-on: https://gerrit.openafs.org/14489 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 750628da77bb71e24ed3061431bbb913ff8d5f72 Author: Mark Vitale Date: Thu Aug 20 16:09:02 2020 -0400 vol: prevent salvage segfault for orphaned vnode with out-of-range parent While salvaging a RW volume, salvager may segfault if it encounters an orphaned directory with a parent vnode that does not exist. For example, if the large vnode index contains a maximum vnode of 2901, any parent vnode encountered that is larger than 2901 will result in an out-of-bounds reference to our vnode essence array, leading to a segfault or undefined behavior. Modify the logic to check for out-of-bounds parent vnodes, and log them rather than segfaulting. Change-Id: I49f53935830fbb428fe0bff04c33248d3806a4b2 Reviewed-on: https://gerrit.openafs.org/14385 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 444a971edc47c34efbefed6e332ee6e843ae072b Author: Andrew Deason Date: Sat Jan 9 12:50:03 2021 -0600 afs: Remove SRXAFSCB_GetDE The GetDE RPC has been commented out from afscbint.xg effectively since it was introduced, but we still define the SRXAFSCB_GetDE server stub for it. This is useless, but also potentially dangerous, since the stub routine just returns success, without populating the output arguments. One of the output arguments is a string, and so if this RPC is actually run, the rxgen-generated server code will try to xdr_string() that string. Since we never set it to anything, this will result in xdr_string trying to dereference a NULL pointer. None of this actually happens currently, since the GetDE RPC is commented out. But to avoid the above situation if it's ever uncommented, remove the useless SRXAFSCB_GetDE function. Change-Id: I6ef478ee69a8de1ac14baa86aa82489181d67452 Reviewed-on: https://gerrit.openafs.org/14488 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2971dcb3b4da04fff3f4bd9c3d3e3f0ab7a94cae Author: Andrew Deason Date: Sat Jan 9 12:47:09 2021 -0600 WINNT: Restore missing '#ifdef PC' Commit 339167ef (Remove dead code) meant to remove the '#ifdef notdef' block in here, but we accidentally also removed the subsequent '#ifdef PC'. This file may not be very important, since WINNT still builds with this mistake, but an unbalanced #ifdef is potentially super confusing, so fix it. Change-Id: I100792830e1bed0af08bcb81a34bb185b2c6358a Reviewed-on: https://gerrit.openafs.org/14487 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 2630e70550defc664efa0952589cf82ed3c51796 Author: Andrew Deason Date: Mon Feb 10 15:57:43 2014 -0600 Move key-related warnings to common server code Each server process can log a couple of different warnings about the server keys found on disk: - If afsconf_GetLatestKey() returns success (indicating a single-DES key is present), we call LogDesWarning(). - If afsconf_CountKeys() returns 0 (indicating there are no keys at all on disk), we log a warning that all authenticated access will fail. Currently, the code to do these checks and log the relevant warning is duplicated across the startup code for nearly every server process. To avoid this duplication, and to make sure the checks aren't accidentally skipped for anyone, move these checks to afsconf_BuildServerSecurityObjects, which every server process calls. We must add an additional parameter to afsconf_BuildServerSecurityObjects to handle the different logging mechanism these servers use, but afsconf_BuildServerSecurityObjects is declared in a public header (cellconfig.h), and is exported in a public library (libafsauthent). So to avoid changing a public symbol, introduce a new variant of the function, called afsconf_BuildServerSecurityObjects_int. Declare this in a new internal header, authcon.h. We don't have easily-usable logging functions for upserver and butc, so just don't log the warnings for those. For ubik servers, don't update ubik_SetServerSecurityProcs to use the new function; the initial call to afsconf_BuildServerSecurityObjects_int in the server's startup code will cover logging the warning on startup. Change-Id: I5d5fceefdaf907f96db9f1c0d21ceb6957299a59 Reviewed-on: https://gerrit.openafs.org/10831 Tested-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit faa9d8f11f28232000446d787ebf53ab9345eb89 Author: Andrew Deason Date: Sat Dec 8 18:05:36 2018 -0600 rx: Split out rxi_ConnectionMatch Split out the connection-matching logic in rxi_FindConnection into a new function called rxi_ConnectionMatch, so we can use it in other functions in future commits. This commit should have no visible impact; it is just code reorganization. Change-Id: Ibacec68d268977a8a2a3aca172653fc088334da6 Reviewed-on: https://gerrit.openafs.org/13603 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit c43933279141c86f6029f14ca4aee06fe3addcf7 Author: Andrew Deason Date: Sat Dec 8 17:16:43 2018 -0600 rx: Remove unneeded rxi_ReceiveDataPacket params rxi_SplitJumboPacket doesn't use its 'host', 'port', or 'first' arguments, and rxi_ReceiveDataPacket only uses its 'host' and 'port' arguments to pass to rxi_SplitJumboPacket. Remove these unused parameters from both functions. While we're changing rxi_SplitJumboPacket anyway, move the declaration for rxi_SplitJumboPacket to rx_internal.h, so it's no longer in a public header. This commit should have no visible impact; it is just code reorganization. Change-Id: I16a7f613957d8cd2d415f65fa083e11d8a13edc8 Reviewed-on: https://gerrit.openafs.org/13602 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit a36832e2d891caab8644a3b4641c7c94fab4105f Author: Andrew Deason Date: Thu Sep 19 12:18:08 2019 -0500 rx: Avoid new server calls for big-seq DATA pkts We currently never open our receive window to more than 32 packets. If we received a DATA packet for an unrecognized call with a seq of 33 or more, the packet is almost certainly from a previously-running call that we were restarted during. As described in commit 7b204946 (rx: Avoid lastReceiveTime update for invalid ACKs) and commit "rx: Avoid new server calls for non-DATA packets", clients can get confused when we respond to calls in these situations, so drop the packets instead. Change-Id: I5b3a699bf245375e92ac97a24ad3638cbb3b8f3c Reviewed-on: https://gerrit.openafs.org/13876 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cd35aa9e2aec16d622177eeea1e1b3ec8aacdd45 Author: Andrew Deason Date: Wed Dec 23 12:44:35 2020 -0600 afs: Fix XBSD check for VNOVAL va_uid Commit e86eb73e (obsd-vattrs-20040125) introduced an XBSD-specific check to detect some unchanged attributes. But the #ifdef for XBSD for the va_uid section was added in the middle of an HPUX-specific block by mistake. Move this #ifdef one level higher, so it's actually used on BSD platforms. Change-Id: I606f87f21d6c4830ed8bcf50abd6fb5807868ff5 Reviewed-on: https://gerrit.openafs.org/14473 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 3f9a08db86f951df3f6f69f1143f17dd7b43b150 Author: Mark Vitale Date: Thu Aug 8 18:18:22 2019 -0400 rx: Avoid new server calls for non-DATA packets Normally, a client starts a new Rx call by sending DATA packets for that call to a server, and rxi_ReceiveServerCall on the server creates a new call struct for that call (since we don't recognize it as an existing call). Under certain circumstances, it's possible for a server to see a non-DATA packet as the first packet for a call, and currently rxi_ReceiveServerCall will create a new server call for any packet type. The call cannot actually proceed until the server receives data from the client (and goes through the challenge/response auth handshake, if needed), but usually this is harmless, since the existence of any packets for a particular call channel indicate that the client is trying to run such a call. The server will respond to the client with ACKs to indicate that it is missing the needed DATA packet(s), and the client will send them and the call can proceed. However, if a call is in the middle of running when the server is restarted, the client may be sending ACKs for a pre-existing call that the server doesn't know about. In this case, the server generates ACKs that indicate the server has not received any DATA packets, which may appear to violate the protocol, depending on the prior state of the call (e.g. the server appears to try to move the window backwards). Clients should be able to detect this and kill the call, but many do not. For many OpenAFS releases before commit 7b204946 (rx: Avoid lastReceiveTime update for invalid ACKs), the client will get confused in this situation and will keep the call open forever, never making progress. There isn't any benefit to creating a new server call in these situations, so just ignore non-DATA packets for unrecognized calls, to avoid stalled calls from such clients. Those clients will not get a response from the server, and so the call will eventually die from the normal Rx call timeout. Change-Id: I565625ba8b6901f9b745124a8816a9ba816c0264 Reviewed-on: https://gerrit.openafs.org/13758 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7b204946010673506e0f74991f59a0865292199c Author: Andrew Deason Date: Tue Aug 27 22:58:23 2019 -0500 rx: Avoid lastReceiveTime update for invalid ACKs Currently, we ignore ACK packets in a few cases: - If the ACK appears to move the window backwards (if firstPacket is smaller than call->tfirst). - If the ACK appears to have been received out of order (if previousPacket is smaller than call->tprev). - If the ACK packet appears truncated. In all of these cases, we ignore the ACK packet completely in our ACK processing code (rxi_ReceiveAckPacket), but we still process the packet at higher levels (rxi_ReceivePacket). Notably, this means we update call->lastReceiveTime after rxi_ReceiveAckPacket returns, even for ACK packets we haven't really looked at. Normally this does not cause any noticeable problems, because such packets should either never be encountered, or only consist of a small number of packets that are mixed in with valid packets. However, if our peer is a server, and it is restarted in the middle of a call, our peer may exclusively send us packets that fall into the above categories. (This does not happen if our peer is a client, because clients just ignore packets for calls they do not recognize.) For example: Consider a call where a client is sending data to a server, and the server restarts after the client has sent a DATA packet with sequence number 1000. The server may then start responding to the client with ACKs with firstPacket set to 1, since the restarted server has no knowledge of the call's state. In this case, a firstPacket of 1 is well below where our window was, so all of the ACKs from the server are ignored. But we keep updating call->lastReceiveTime for all of these packets, and so the call stays alive forever until an idle-dead or hard-dead timeout activates (if any are set). As another example, consider the case where a client is sending data to a server, and the server receives a full window of packets (say, 16 packets), has not yet passed any data to the application yet, and the server restarts. The restarted server then starts responding to the client with ACKs with firstPacket set to 1, and previousPacket set to 0. We also ignore all of the ACKs from the server in this case, because even though firstPacket looks sane, it looks like previousPacket has gone backwards. We still update call->lastReceiveTime for each ignored ACK we get, keeping the call alive. Before commit 4e71409f (Rx: Reject out of order ACK packets) was introduced in 1.6.0, neither of these issues could occur. That commit introduced the issue specifically if previousPacket goes backwards; that is, if the server restarts before firstPacket moves forwards. Commit 8d359e6d (rx: Remove duplicate out of order ACK check) in 1.8.0 introduced the issue when 'firstPacket' goes backwards, since previously the FIRSTACKOFFSET-based check caused us to ignore those packets without updating call->lastReceiveTime. That is, if the server restarts after firstPacket moves forwards. In this commit, we still ignore packets in the above cases, but we also avoid updating lastReceiveTime when we update such packets, to make sure that we do not keep a call alive solely from receiving these invalid packets. Alternatively, we could change our logic to immediately abort calls where firstPacket moves backwards (since this violates the Rx protocol), or to not ignore some packets where previousPacket goes backwards (since these calls may be recoverable). And we could also skip updating lastReceiveTime for invalid packets of other types. But for now, this commit just avoids updating lastReceiveTime for invalid ACK packets, in order to just try to restore our behavior before 1.6.0, while still retaining the benefits of ignoring out-of-order ACKs. Further changes in this area can potentially be handled separately by future commits. Also increment the spuriousPacketsRead counter for packets that we ignore in this way (which we used to do for some packets before commit 8d359e6d), so we are not entirely silent about ignoring them. Written in collaboration with mvitale@sinenomine.net. Change-Id: Ibf11bcb2417d481ab80cf4104f2862d1d6502bf4 Reviewed-on: https://gerrit.openafs.org/13875 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit f6490629e1239c412002f316804c656c9be61400 Author: Andrew Deason Date: Wed Aug 28 17:12:53 2019 -0500 rx: Introduce ack_is_valid Take some of our existing logic for ignoring invalid ACK packets and split it out into a separate function, ack_is_valid. This just makes it easier to add more complex logic in here and write longer comments explaining the decisions. Note that the bug mentioned regarding the previousPacket field was introduced in IBM AFS 3.5, and was fixed in OpenAFS in commit bbf92017 (rx: rxi_ReceiveDataPacket do not set rprev on drop), included in OpenAFS 1.6.23. This commit incurs no functional change; it is just code reorganization. Change-Id: Idd569c6bc0c475e700935cf86780a04ab24102f4 Reviewed-on: https://gerrit.openafs.org/13874 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 5c9234694543f206174d30e21886286d419fd8df Author: Andrew Deason Date: Mon Nov 2 13:52:25 2020 -0600 rx: For AFS_RXERRQ_ENV, retry sendmsg on error When AFS_RXERRQ_ENV is defined, we currently end up doing something like this for our sendmsg abstractions: if (sendmsg(...) < 0) { while (rxi_HandleSocketError(sock)) ; return error; } return success; This means that when sendmsg() returns an error, we process the socket error queue before returning an error. The problem with this is that when we receive an ICMP error on our socket, it creates a pending socket error that is returned for any operation on the socket. So, if we receive an ICMP error after trying to contact any peer, sendmsg() could return an error when trying to send for any other peer. Even though there is no issue preventing us from sending the packet, we'll fail to actually send the packet because sendmsg() returned an error. This effectively causes an extra outgoing packet drop, possibly delaying the related RPC. To avoid this, change Rx to retry the sendmsg call when it returns an error, since the error may be due to an unrelated ICMP error. To avoid needing to implement this retry loop in multiple places, move around our sendmsg code for AFS_RXERRQ_ENV, so that the higher-level function rxi_NetSend performs the retry and checks for socket errors (instead of the lower-level rxi_Sendmsg or osi_NetSend). Also change our functions to process socket errors to be more consistent between kernel and userspace: now we always have rxi_HandleSocketErrors, which runs a loop around the platform-specific osi_HandleSocketError. With this commit, osi_HandleSocketError is now required to be implemented when AFS_RXERRQ_ENV is defined. We hadn't been implementing this for UKERNEL, so just turn off AFS_RXERRQ_ENV for UKERNEL. Thanks to mbarbosa@sinenomine.net for discovering and providing information about the relevant issue. Change-Id: Iccceddcd2d28992ed7a00dc308816a0cb1a0195f Reviewed-on: https://gerrit.openafs.org/14424 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit eff7fa4b2eb9a3001dc18dca157ccbd5f19f89b6 Author: Andrew Deason Date: Mon Nov 2 13:16:41 2020 -0600 rx: Save errno in pthread rxi_Sendmsg Currently, our pthread version of rxi_Sendmsg uses 'errno' in some logic if sendmsg fails, but we do so after calling functions that might alter errno (e.g. fflush). To make sure we get the correct errno value, save the value of errno right after sendmsg returns an error. Reorganize this function a bit to help make the logic easier to follow. Change-Id: I6bf284bd75edb5404bb6771bb99a9381b0f8654d Reviewed-on: https://gerrit.openafs.org/14423 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2ad9190b838fd720f4788b18099cc6b6fd3a6727 Author: Michael Meffie Date: Sun May 22 20:35:26 2016 -0400 afsio: readdir/fidreaddir commands Add the readdir/fidreaddir sub-commands to afsio dump AFS3 directory objects. This command dumps the raw directory object to stdout. Pipe the output to a program, such as the afsdump_dirlist program (from the CMU dumpscan tool kit), to parse the directory object. Example usage: afsio readdir -dir /afs/mycell/mypath/somedir | afsdump_dirlist Change-Id: Ief181b432cdea6a11bbe61e781686ade2795faad Reviewed-on: https://gerrit.openafs.org/12381 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 1efa4e49f2dabe2f3a1ef235e21a96ae9d5ff6bf Author: Mark Vitale Date: Mon Dec 7 14:42:54 2020 -0500 vol: always build vol-bless utility In order to avoid future bit-rot, always build vol-bless. Also add it to the clean rule. However, continue to leave it undistributed and uninstalled by default. Change-Id: I3d2dc94c28a7feeb20167223655e97538e807ce6 Reviewed-on: https://gerrit.openafs.org/14464 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 4a45219eb7617e918761553a698d4c10a04f56dd Author: Benjamin Kaduk Date: Fri Dec 11 11:09:20 2020 -0800 Fix spelling of struct rx_ackPacket in comment A comment in rx_packet.h referred to the size of struct rx_ackpacket, but the actual structure is spelled with a majuscule 'P'. Change-Id: Iaf57f098b2e818fe0d492a89347a0a14bc3eb392 Reviewed-on: https://gerrit.openafs.org/14468 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7239565b0fea8504deebc5bd43c4fa1ea80fcb17 Author: Andrew Deason Date: Mon Nov 2 13:11:49 2020 -0600 rx: Reorganize LWP rxi_Sendmsg to use 'goto error' Our LWP version of rxi_Sendmsg can allocate an fd_set, but we don't free the fd_set if sendmsg() returns certain errors afterwards. To make sure we go through the same cleanup code for the different possible error code paths, reorganize the function to go through a 'goto error'-style destructor. This also makes our return codes a bit more consistent; we should always return -errno now for errors. Change-Id: I5eaeb7f4ea1d76acc3bd9c52dc258f53f59f631e Reviewed-on: https://gerrit.openafs.org/14422 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 01c10fe8a98ffabd5cf9ec27f4b51f7011c3f1af Author: Andrew Deason Date: Thu Dec 10 14:17:56 2020 -0600 audit: Add missing AUD_TSTT case In commit 9ebff4c6 (OPENAFS-SA-2018-001 audit: support butc types), several new butc-related audit data types were added. In the AIX-specific audmakebuf() function, the case for the AUD_TSTT type is missing the actual "case" clause in the code, causing AUD_TSTT types to be treated as invalid (and so falling through to the "AFS_Aud_EINVAL" case). Add the "case" for AUD_TSTT, so it's treated properly on AIX. Note that the non-AIX printbuf() already handled this properly, so no changes are needed there. Change-Id: Ic46c18b503bacb0901ff0a60534f6c45ce3c9a75 Reviewed-on: https://gerrit.openafs.org/14466 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 986ee6a0a70d70f366baeb43670eb367f0525b97 Author: Mark Vitale Date: Mon Dec 7 14:40:33 2020 -0500 vol: add vol-bless to .gitignore No functional change is incurred by this commit. Change-Id: If84ba946d43d67eb6c253462f5826f9a45a2df46 Reviewed-on: https://gerrit.openafs.org/14463 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit e1f20287a4d0cd80c6bfe7309b907fe5a4ac1464 Author: Mark Vitale Date: Mon Dec 7 13:13:28 2020 -0500 vol: make vol-bless buildable again The vol-bless utility is not built by default and so is subject to bit-rot. Thus commit 170dbb3ce301329ff127bb23fb588db31439ae8d 'rx: Use opr queues' overlooked vol-bless.c when adding includes for users of struct rx_queue. Add the required #include so vol-bless builds again. Note to maintainers: this change is only required for 1.8.x and later; vol-bless builds fine in 1.6.x and earlier releases. Change-Id: Ia0bb78e3e7dd74b2f65ac07707aced2c81aaa5d9 Reviewed-on: https://gerrit.openafs.org/14462 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 23bd776b0140deb596287869872a41de555ba99a Author: Mark Vitale Date: Thu Aug 9 17:40:09 2018 -0400 afs: consolidate duplicated wait-for-cache-drain code Consolidate duplicated logic into a new routine afs_MaybeWaitForCacheDrain(). Change-Id: I2e23b86eeaabe3bc559e3ddca5c1e03082af6a3f Reviewed-on: https://gerrit.openafs.org/13278 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 25792e246362a201743533a970f90dbc77d0ed5c Author: Michael Meffie Date: Mon Jun 20 15:29:45 2016 -0400 afs: more cache truncation stats Add counters for cache too full and waiting to drain occurrences. These will be used in later commits to indicate how often the cache truncation is required and how often the cache manager is waiting for cache truncation to complete. Change-Id: I4aa802729f0910dff1fb3e90b2d44d36df8bf8f3 Reviewed-on: https://gerrit.openafs.org/13168 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 611507d8b5f59b9f74fb19729026e3a48d716e5d Author: Cheyenne Wills Date: Wed Sep 25 13:39:40 2019 -0600 kauth: Add support for updated audit facility New functionality was added to the audit facility that allows multiple audit logs. The updated audit interfaces require a specific calling sequence even if multiple audit logs are not used. Support for multiple auditlogs is not supported for kauth. Since kauth does not use libcmd for processing the command line, and adding support for multiple audit log instances requires additional effort, that is not warranted. Update kauth to follow the proper calling sequences for the audit facility. Update help message and manpage entries for -auditlog and -audit-interface. Make note that multiple -auditlogs are not supported. Change-Id: I98111b1e399e6687fde235bc2eadf0a28fa8acf4 Reviewed-on: https://gerrit.openafs.org/13782 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5069c697c706c1b93b6c4881f07f5995a6c0d5d1 Author: Cheyenne Wills Date: Fri Dec 4 10:16:57 2020 -0700 Add command line support for multiple audit logs Gerrits #13774 (audit: Support multiple audit interfaces and interface options) and #13775 (audit: Add cmd helper for processing audit options) added support in the audit facility for multiple audit logs. Add command line support to use multiple audit logs for daemons that use libcmd for command line processing: bosserver, buserver, butc, fileserver, volserver, ptserver, and vlserver. Update the daemons to add a call to audit_open, and where possible add a call to audit_close when shutting down the daemon. Update help message and manpage entries for -auditlog and -audit-interface Change-Id: I4356e1aa84f580897a0e788e2a2829685be891aa Reviewed-on: https://gerrit.openafs.org/13776 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3e204354f5125816d0bd236e86839ccb1a9ff6e5 Author: Cheyenne Wills Date: Mon Nov 9 12:27:36 2020 -0700 audit: Add cmd helper for processing audit options osi_audit_cmd_Options will handle the processing for the -audit-interface and -audit-log command line options. The auditlog / audit-interface options are used by several services; this new helper routine provides a simple method to process the audit related command line options in a consistent fashion. Change-Id: I5acd12062dbfec23c1cbb0b2cdfc2d224354eed9 Reviewed-on: https://gerrit.openafs.org/13775 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 52da4b94889e09bc07aa51154810e5b9f909915f Author: Cheyenne Wills Date: Fri Nov 13 11:20:15 2020 -0700 audit: Support multiple audit interfaces and interface options Currently, the audit subsystem only allows for one audit log to exist for the entire process. This can make it cumbersome to use for sites that have multiple tools or destinations that want to read the audit data. For example, to feed the audit data to two separate scripts, one script needs to read the data, and retransmit the data to the second script. To make such a setup easier, change the audit system to allow for multiple audit logs to exist at once. To allow callers to associate each audit log with an interface, we change the syntax for the value to the -auditlog parameter to the following: [interface:]filespec[:options] For example: -auditlog sysvmq:/tmp/msgqueue To accommodate the existing -audit-interface parameter, change the behavior of -audit-interface so that it sets the default audit interface if none is specified for -auditlog. This allows existing users of -audit-interface to experience the same behavior as before. In order to implement this, change the audit API and all existing audit interfaces to avoid using per-interface globals, and instead allocate per-instance contexts during startup. Also change the code so the audit message is constructed inside audit.c, instead of via a per-interface callback, which eliminates the duplicated logic in each interface's append_msg(), and lets us avoid holding 'audit_lock' during message construction. While we're changing the audit API, also introduce a few new operations: open_interface, close_interface and set_options. This commit and the existing interfaces do not make use of these new functions, but future commits will do so. This commit also only changes the audit subsystem itself to be able to handle multiple audit logs, and doesn't change any command-line parsing logic. Future commits will add the command-line parsing logic changes required so daemons can actually configure multiple interfaces. Thanks to Andrew Deason (adeason@sinenomine.net) for providing the changes needed to reduce holding the 'audit_lock' and improve performance as well as providing input during the development of this change. Change-Id: I1311ea417fdd0ba38d2206083cd65bd7a054d017 Reviewed-on: https://gerrit.openafs.org/13774 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 78e5e1b0e54b31bb08b7578e86a6a2a95770d94c Author: Andrew Deason Date: Mon Oct 26 12:35:32 2020 -0500 LINUX: Return errors in our d_revalidate In our d_revalidate callback (afs_linux_dentry_revalidate), we currently 'goto bad_dentry' when we encounter any error. This can happen if we can't allocate memory or some other internal errors, or if the relevant afs_lookup call fails just due to plain network errors. For any of these cases, we'll treat the dentry as if it's no longer valid, so we'll return '0' and call d_invalidate() on the dentry. However, the behavior of d_invalidate changed, as mentioned in commit afbc199f1 (LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches()). After a certain point in the Linux kernel, d_invalidate() will also effectively d_drop() the given dentry, unhashing it. This can cause getcwd() calls to fail with ENOENT for those directories (as mentioned in afbc199f1), and can cause bind-mount calls to fail similarly during a small window. To avoid all of this, when we encounter an error that prevents us from checking if the dentry is valid or not, we need to return an error, instead of saying 'yes' or 'no'. So, change afs_linux_dentry_revalidate to jump to the 'done' label when we encounter such errors, and avoid calling d_drop/d_invalidate in such cases. This also lets us remove the 'lookup_good' variable and consolidate some of the related logic. Important note: in older Linux kernels, d_revalidate cannot return errors; callers just interpreted its return value as either 'valid' (non-zero) or 'not valid' (zero). The treatment of negative values as errors was introduced in Linux commit bcdc5e019d9f525a9f181a7de642d3a9c27c7610, which was included in 2.6.19. This is very old, but technically still above our stated requirements for the Linux kernel, so try to handle this case, by jumping to 'bad_dentry' still for those old kernels. Just do this with a version check, since no configure check can detect this (no function signatures changed), and the only Linux versions that are a concern are quite old. Change-Id: Ie530ce08463cf6b6899f056cb76ae4047c989ef2 Reviewed-on: https://gerrit.openafs.org/14417 Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4c33820525af510a8a937289005e39d5b6683b19 Author: Michael Meffie Date: Mon Aug 17 15:44:55 2020 -0400 vldb_check: Check for volume lock inconsistencies Verify the a lock timestamp is set if, and only if, a lock volume operation flag is also set. When running vldb_check with the -fix option, fix the inconsistent entries by setting the lock timestamp to the current time if a lock flag is set, or by setting the VLOP_DELETE flag if the lock timestamp is set but no lock flags are set. (The VLOP_DELETE flag is the flag set by the 'vos lock command, and is shown in vos output as "delete/misc".) Volume lock fields can be put into an inconsistent state, at least, by interupted vos rename operations, due to bugs in vos rename. When the volume lock timestamp and lock flags are in this inconsistent state, the volume is locked, but that is not indicated by 'vos listvldb'. The volume can be unlocked by issuing 'vos unlock'. Change-Id: Idc4f821a9eb7675edd78a8547fdfe46e838b0c89 Reviewed-on: https://gerrit.openafs.org/14307 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 6779e30d372b2cd5e7995da23ed5e2971124b79c Author: Michael Meffie Date: Fri Dec 27 11:53:05 2019 -0500 vsprocs: Remove dead code Remove the dead code in UV_VolumeMove() commented out with the macro ENABLE_BUGFIX_1165. Remove two commented out lines of code in UV_ConvertRO(). Change-Id: Ic628c74df011b0f09be6b03f72ab1baac5e59caf Reviewed-on: https://gerrit.openafs.org/14004 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit 56aa396d8359276d778d41aa509041c8c75b4e96 Author: Cheyenne Wills Date: Thu Nov 5 13:50:59 2020 -0700 vos: Cleanup function definitions The functions defined within vos.c are not referenced outside of vos.c but are not declared as static. Convert the functions within vos.c to static declarations. Change-Id: Ia684e698adc53ced964e10ee0496cb52a3af564e Reviewed-on: https://gerrit.openafs.org/14009 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a3be2c74a95489f63837840af8ec42049ce021bf Author: Cheyenne Wills Date: Thu Nov 5 13:49:54 2020 -0700 vos: Remove dead code Clean out dead code from vos.c GetVolumeType - not referenced anywhere CompareVLDBEntry - commented out since 1st git commit osi_audit - Comment indicates this might have been needed at one point. Builds without it. Does not look like the vos executable is pulling in any of the audit code. RestoreVolume - remove stale comment about typo previous to openafs 1.0 RemoveSite - remove commented out partition check Change-Id: I9c0b59d5c37d403610c7a904717ac9765598fc99 Reviewed-on: https://gerrit.openafs.org/14008 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 45a69b61133ae8ca8e49a002ddc1895386796d51 Author: Marcio Barbosa Date: Thu Sep 3 23:57:34 2020 +0000 volser: take RO volume offline during convertROtoRW The vos convertROtoRW command converts a RO volume into a RW volume. Unfortunately, the RO volume is not checked out from the fileserver during this process. As a result, accesses to the volume being converted can leave volume objects in an inconsistent state. Moreover, consider the following scenario: 1. Create a volume on host_b and add replicas on host_a and host_b. $ vos create host_b a vol_1 $ vos addsite host_b a vol_1 $ vos addiste host_a a vol_1 2. Mount the volume: $ fs mkmount /afs/.mycell/vol_1 vol_1 $ vos release vol_1 $ vos release root.cell 3. Shutdown dafs on host_b: $ bos shutdown host_b dafs 4. Remove RO reference to host_b from the vldb: $ vos remsite host_b a vol_1 5. Attach the RO copy by touching it: $ fs flushall $ ls /afs/mycell/vol_1 6. Convert RO copy to RW: $ vos convertROtoRW host_a a vol_1 Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving the volume object for the RO copy set as VOL_STATE_ATTACHED (on success, this volume should be set as VOL_STATE_DELETED). 7. Add replica on host_a: $ vos addsite host_a a vol_1 8. Wait until the "inUse" flag of the RO entry is cleared (or force this to happen by attaching multiple volumes). 9. Release the volume: $ vos release vol_1 Failed to start transaction on volume 536870922 Volume not attached, does not exist, or not on line Error in vos release command. Volume not attached, does not exist, or not on line Notice that this happens because we cannot mark an attached volume as destroyed (FSYNC_com_VolDone). To avoid the problem mentioned above and to prevent accesses to the volume being converted, take the RO volume offline before converting it to RW. Change-Id: Ifd342e1f420dc42e5da49242a7aa70db7d97a884 Reviewed-on: https://gerrit.openafs.org/14340 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c17c157641d83226fee5bc20f588f14bb132bb68 Author: Cheyenne Wills Date: Tue Nov 10 09:17:16 2020 -0700 vos: Cleanup indentation whitespace Fix the indentation whitespace in vos.c, and remove double blank lines. No functional change. Change-Id: I97587779d6d2c131b5eac98bbee49efae73fafe9 Reviewed-on: https://gerrit.openafs.org/14007 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4bbf1239f8fcaaccb1ea7431a7eb3d42f8f194af Author: Michael Meffie Date: Wed Dec 18 20:30:17 2019 -0500 vos: Return true when GetServerAndPart finds a site Change the GetServerAndPart() function to return true when a volume site in the vldb entry is found. Do not change the output arguments unless the site is found. Also, add a function comment header and fix some comment typos in this function. Change-Id: I10b43054b1bf9e6757ccdc95cb4559ab8b6dc013 Reviewed-on: https://gerrit.openafs.org/14006 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit ace2f7f5ce13502f5cb6ec39a9e84864b80ec76b Author: Michael Meffie Date: Mon Dec 23 18:37:21 2019 -0500 vos: Add missing -partition requires -server checks The `vos remove` command was missing a check for the -server option when the -partition option is given. This command requires the -server option when the -partition is given, as documented in the man page. The `vos syncvldb` command performed the check for the -server option when the -partition option is given, but in the wrong location. As documented, the `vos unlockvldb` command permits the -partition option without a -server option, in which case all of the volumes listed in the VLDB with sites on the specified partition are unlocked. However, this command incorrectly issued an RPC to a volume server at address 0.0.0.0 when only the partition is given. Change-Id: I6b878678e28b34250e63d2d082747f6fd416972d Reviewed-on: https://gerrit.openafs.org/14005 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit de3e7289e227db057cb4eca431e47d5c5502da53 Author: Mark Vitale Date: Thu Nov 5 18:16:51 2020 -0500 vos: avoid double release of a volume lock To update a volume entry in the VLDB, vos commands typically lock the volume entry via VL_SetLock, then call VL_UpdateEntryN, then release the lock via VL_ReleaseLock. However, some vos commands exploit the optional lock release flags of VL_UpdateEntryN to combine the update and unlock operations into a single RPC. This approach requires extra care to ensure that VL_ReleaseLock is issued for a failed VL_UpdateEntryN, but NOT for a successful VL_UpdateEntryN. Unfortunately, the following commands have success paths that fall through to the error path, resulting in a double release of the volume lock: - vos convertROtoRW - vos release A second VL_ReleaseLock of a volume entry that has already been unlocked via VL_UpdateEntryN is essentially a harmless no-op (other than negating any benefit of exploiting the VL_UpdateEntryN lock flags). However, if there is a race with another volume operation on the same volume, it is possible for this bug to release the volume lock of a different volume operation. This problem has been present in 'vos release' since OpenAFS 1.0. This problem has been present in 'vos convertROtoRW' since the command's introduction in commit 8af8241e94284522feb77d75aee8ea3deb73f3cc vol-ro-to-rw-tool-20030314. Properly maintain state to avoid unlocking a volume (with VL_ReleaseLock) that has already been unlocked (via VL_UpdateEntryN). Thanks to Andrew Deason for discovering the issue and suggesting the fix. Change-Id: I757b4619b9431d1ca980f755349806993add14a5 Reviewed-on: https://gerrit.openafs.org/14426 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 8e1c321dc8e85966383760a6765f7f192ecf632a Author: Mark Vitale Date: Fri Aug 28 16:19:29 2020 -0400 volser: document 'vos restore -readonly' restriction Commit 0c03f8607e15 vos-command-enhancements-20011008 introduced the 'vos restore' -readonly option, which allows the restored volume to be RO instead of the default RW. The commit message documents the following restriction: - ... This option causes the restored volume to be an RO volume. It is not permitted to restore an RO volume when the associated RW volume already exists. While it is possible to restore an RW volume where an RO volume exists, caution should be used to avoid doing this with VLDB entries created by 'vos restore -readonly', since such entries have their ROVOL and RWVOL ID's set to the same thing. Document this restriction in the 'vos restore' man page, and in a code comment. No functional change is incurred by this commit. Change-Id: I34f6c5434b82da538a38a9d219207b33dcf62b17 Reviewed-on: https://gerrit.openafs.org/14348 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0bcfe89d68972bf67cf891c89da519a455401327 Author: Mark Vitale Date: Fri Aug 28 15:42:06 2020 -0400 volser: improve error checking for 'vos restore' UV_RestoreVolume2 calls VLDB_GetEntryByName to obtain information for sanity checking, but only checks for a VL_NOENT error code; other codes are thus ignored, which may lead to confusing results. Add an additional error check for 'vos restore' (and other callers of UV_RestoreVolume2) to stop and issue an error message if a non-VL_NOENT error code is received from VLDB_GetEntryByName. Change-Id: Idf41965fdd84fa282a3397215ec393ae10f72018 Reviewed-on: https://gerrit.openafs.org/14347 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a534ee83f2e2e153c298e5583e01ba0b3e4b8e4a Author: Mark Vitale Date: Mon Aug 31 14:30:26 2020 -0400 volser: fix 'cant' typos Correctly spell "can't" in a log message and a comment. Change-Id: I9d5c667d9c5ea3c5b726f958431c497353433239 Reviewed-on: https://gerrit.openafs.org/14346 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit aed4a0c4b91c5ce185547e83bfff443f3d3831f9 Author: Mark Vitale Date: Fri Jul 19 14:41:55 2019 -0400 afs: avoid panic in DNew when afs_WriteDCache fails afs_WriteDCache may fail for an IO error, or if interrupted (EINTR). Unfortunately, DNew will panic in this case, crashing the entire machine. In order to avoid an outage in this case, don't panic. Instead, reflect the error back to the caller of DNew. While here, add Doxygen comments to DNew. Change-Id: I27a8f89bab979c5691dded70e8b9eacbe8aff4fd Reviewed-on: https://gerrit.openafs.org/13804 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 1c04036b3425525cd94a9c9c47ca93de05c11c40 Author: Mark Vitale Date: Mon Aug 19 15:43:09 2019 -0400 afs: remove redundant assignment DRelease has two assignments for tp = entry->buffer; remove the second (redundant) one. Introduced with 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data'. No functional change should be incurred by this commit. Change-Id: If4a17862f451973075fa3fa267b5139046d97ede Reviewed-on: https://gerrit.openafs.org/13802 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6bd94fe29d1aa6ce61ba02e681defea79770ccdd Author: Mark Vitale Date: Wed Feb 5 17:49:03 2020 -0500 dir: check DNew return code Commit 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state buffer locations for data' changed DNew and DRead to return a return code. However, the callers of DNew were not modified to check the new return code. (This commit applied only to the implementations dealing with AFS directories, in afs/afs_buffer.c and dir/dir.c. The ubik implmentations of DNew and DRead, dealing with ubik databases, were not modified.) Modify all (non-ubik) callers of DNew to check the return code. In addition, modify code as needed so return codes are properly propagated to the callers. While here, add Doxygen comments for AddPage and FindBlobs. Change-Id: Iabde6499745dd351f3fcda73c9f52c440a36490e Reviewed-on: https://gerrit.openafs.org/13801 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7b0a66f63b8fd0d332de4766b2472c0270f0f253 Author: Andrew Deason Date: Mon Oct 19 18:30:27 2020 -0500 Remove unused xdr types Numerous types and constants are defined in our various RPC-L files that are never used or referenced by anything. Remove them. Change-Id: I0b03be1ce0e186a88f80d2f3f7a66a1e25965ff3 Reviewed-on: https://gerrit.openafs.org/14404 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0787a2c8aed555df8b74635bd34588e7d20865ac Author: Benjamin Kaduk Date: Sun Sep 2 17:10:56 2018 -0500 volser: apply static keyword to VolPartitionInfo definition The function declaration was already marked as static; mark the definition as well for consistency (and consistency with the other helpers in this file). Change-Id: I642db1d27efd34ab2a09f7299791c19d07b1f923 Reviewed-on: https://gerrit.openafs.org/13321 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit dcce956df4fc8d368962cb36d8b3c801be69a85a Author: Mark Vitale Date: Sun Mar 3 20:20:58 2019 -0500 dir: check afs_dir_Create return code in afs_dir_MakeDir afs_dir_MakeDir() ignores the return code from afs_dir_Create() for the '.' and '..' ("dot" and "dotdot") directories. This has been the case from the earliest implementation (MakeDir() calling Create()) in the original IBM import. Instead, check the return codes to prevent the possibility of creating malformed directories. Change-Id: I60179488429dfa9afe60c4862c5e42b41f1e0048 Reviewed-on: https://gerrit.openafs.org/13800 Reviewed-by: Benjamin Kaduk Reviewed-by: Mark Vitale Tested-by: BuildBot commit 04805f48a2eb6ddaa604d8d0738888fd5f960f20 Author: Benjamin Kaduk Date: Sun Sep 2 17:06:38 2018 -0500 ptserver: rename NameToID and IDToName helpers These helper function names alias the names of public RPCs and can cause confusion when grepping the code. Rename them in a different style to provide greater hamming distance between the various functions involved in handling these RPCs. Change-Id: I0e2c7997bc145888affdac28716293ff820756c7 Reviewed-on: https://gerrit.openafs.org/13320 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0639ca8d221231309d59882a63e5a95a10cfdac3 Author: Mark Vitale Date: Sun Mar 3 20:51:45 2019 -0500 dir: check afs_dir_MakeDir return code in DirSalvage Since the original IBM import, DirSalvage() has ignored the return code from afs_dir_MakeDir() (f.k.a. MakeDir). This has been safe because, as the comment states, afs_dir_MakeDir returns no (non-zero) error code. In preparation for a future commit, add a check for the return from afs_dir_MakeDir and remove the comment. Change-Id: Ibb259a7aaeeb21ef70a7794143a0dadb2a75725d Reviewed-on: https://gerrit.openafs.org/13799 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 735fa5fb090ee0efc2161597a3974f6fa45126f6 Author: Mark Vitale Date: Thu Jan 30 14:04:05 2020 -0500 dir: distinguish logical and physical errors on reads The directory package (src/dir) salvage routines DirOK and DirSalvage check a global variable 'DErrno' to distinguish logical errors (e.g. short read) from physical errors (e.g. EIO). However, since the original IBM import, this logic has not worked correctly because there is no longer any code that sets the value of DErrno - its value is always zero. Instead, modify all implementations of ReallyRead to optionally return the errno for low-level IO errors. Also, create a new userspace-only variant - DReadWithErrno() - of the src/dir/buffer.c version of DRead (the version called by DirOK and DirSalvage, and the only caller of ReallyRead) to return the ReallyRead errno upon request. Also create an analogous variant of afs_dir_GetBlobs, afs_dir_GetBlobsWithErrno(). Finally, convert DirOK and DirSalvage to use the new variants and replace DErrno with equivalent logic. Remove all other references to DErrno. Change-Id: I3de182ce49c1682572142da594af5dc2c00ede74 Reviewed-on: https://gerrit.openafs.org/13798 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 1caeeea43c038011306dd1c391680c24fc318e3d Author: Andrew Deason Date: Mon Oct 26 12:19:19 2020 -0500 afs: Log pid with disk cache read errors Log the current pid (and procname) when we complain about an error when reading from CacheItems in afs_UFSGetDSlot. These errors can result in confusing situations, so it can be helpful to know at least what process saw the error. Our logic for logging this information is getting a bit large, so also move this to a new function, LogCacheError. Change-Id: I3427e736458784df0d516f4182684605e930e128 Reviewed-on: https://gerrit.openafs.org/14416 Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 98c1a8751c5d29e45a3839552d495c9616b82118 Author: Cheyenne Wills Date: Tue Oct 8 11:54:58 2019 -0600 roken: use strtok_r from roken Windows standard library doesn't provide strtok_r. Use the strtok_r that is provided from roken. Change-Id: I1bccb9a306c9dd1963f044127fb5dfe4da5728cc Reviewed-on: https://gerrit.openafs.org/13891 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e5abb348829c65e1322e876e9a4a1ad4059b2697 Author: Heimdal Developers Date: Tue Oct 8 10:47:05 2019 -0600 Import of code from heimdal This commit updates the code imported from heimdal to 5dfaa0d10b8320293e85387778adcdd043dfc1fe (git2svn-syncpoint-master-311-g5dfaa0d10) New files are: roken/strtok_r.c Change-Id: I27042f614c7d6ce9a95a80d01474e8bf401e4760 Reviewed-on: https://gerrit.openafs.org/13890 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit fe6a97b4d89cd15b6a298f69bea60aca961c2ec9 Author: Cheyenne Wills Date: Tue Oct 8 11:02:59 2019 -0600 roken: add strtok_r to the imported file list Import the strtok_r function which is needed by audit for parsing command line options. Change-Id: I8412c5a663dc3315c4146665edb72d9a6b8df5be Reviewed-on: https://gerrit.openafs.org/13889 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a912a29a4568e20af8b354fc901c557d67aca1f7 Author: Benjamin Kaduk Date: Sat Sep 1 21:47:39 2018 -0500 Detect realloc failure While reviewing other commits, a call to realloc() was discovered that would leak memory on failure (by virtue of always assigning the realloc() return value to the pointer holding the input address, even when the return value is NULL). Check for failure and return early in that case (giving an incomplete list of events). Change-Id: Ic6e889f1d990bd289812ce4bf8e9cd4ebce488ec Reviewed-on: https://gerrit.openafs.org/13313 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7ede3fa17f1ba419e4b5febd36ccdc3af38f92bc Author: Benjamin Kaduk Date: Sun Sep 2 17:03:38 2018 -0500 ptserver: move IDToName, NameToID to ptprocs.c and make static These two helpers are only used in implementing server-side RPC handlers, and having to track the codeflow across files is unhelpful. Move them into the file where they're used, make them static, and remove the prototypes from ptrototypes.h (which is not an installed header, so there is no API/ABI breakage). Change-Id: I236d17865a296933f41aaee206535d341c3a955d Reviewed-on: https://gerrit.openafs.org/13319 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit fe4f6638d130819c48a9957901b38839bab134a4 Author: Benjamin Kaduk Date: Sun Sep 2 16:35:42 2018 -0500 Assign explicit opcodes to butc RPCs This should prevent inadvertent reassignment if additional RPCs are introduced in the future. Change-Id: I5645ca478d2ecef9962f4bde04ab8f9895dd9497 Reviewed-on: https://gerrit.openafs.org/13317 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit fd6add0aca03a5a17f7109c785b6027a76f13cdf Author: Andrew Deason Date: Mon Nov 12 15:06:09 2018 -0600 vlserver: Return VL_DBBAD on unhash failure If we try to delete a vlentry, and the vlentry cannot be found on one of its hash chains, we cannot unhash the vlentry properly and the operation fails with VL_NOENT. This results in the following error messages to the user: $ vos delentry 123456 Could not delete entry for volume 123456 You must specify a RW volume name or ID (the entire VLDB entry will be deleted) VLDB: no such entry Deleted 0 VLDB entries This is confusing, because VL_NOENT can also occur if the user specifies a volume that does actually not exist. This situation is indicative of database corruption, usually because of a ubik transaction that was only half-applied, or because of other ubik bugs in the past. The situation can only really be fixed by repairing the database, so return VL_DBBAD in this case instead, to more clearly indicate that something is wrong with the database, and not a problem with the arguments the caller provided. Change-Id: I6fc275c3ad05c108778f36687227b0a927cca5da Reviewed-on: https://gerrit.openafs.org/13384 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 878d27c845157bb64c32bbd6c3cacce17c681d70 Author: Andrew Deason Date: Mon Nov 12 14:41:44 2018 -0600 vlserver: Add VL_DBBAD error code The VL_ error table currently doesn't have an error code to indicate that an operation cannot succeed because the database is corrupted. There are a few error codes for specific cases of errors that are probably the result of corruption (like VL_IDALREADYHASHED, or VL_EMPTY), but these are only for specific cases and indicate rather low-level internal problems. There are some instances where the real problem preventing an operation from succeeding is that the database is just corrupt or inconsistent in some way, and the administrator must repair the database before it can succeed. And we currently don't have any way of indicating that situation via an error code. So, introduce the VL_DBBAD code, to indicate this situation. Error codes already exist in other tables for similar situations, such as PRDBBAD, and KADATABASEINCONSISTENT. This commit does not use the new error code anywhere; we just introduce it into the VL_ error table, so comerr-using applications will be able to interpret it. Note that the VL_DBBAD error code has been recognized by the AFS Assigned Numbers Registry as recorded in the ticket history of Change-Id: I8fea356a4e0db907ec8418efe6ef35d547be0a63 Reviewed-on: https://gerrit.openafs.org/13383 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit c1c4e308cfe0b189778259840f02183c83c1083e Author: Benjamin Kaduk Date: Thu Aug 30 09:54:23 2018 -0500 ptserver: move allocation out of put_prentries() into listEntries() put_prentries() is a helper function for listEntries(), but the contract between the two is rather odd -- put_prentries() is expected to notice when the backing store has not yet been allocated and silently allocate it, even though there is only the single caller and the allocation could be done in the caller. Move the allocation to the caller and adjust the "buffer is full" logic accordingly, and normalize the initialization of the output array to just use calloc() instead of individual memset()s when populating each entry. Change-Id: Icf84e3b60eae81a1570b12d7adbf006a24a104f3 Reviewed-on: https://gerrit.openafs.org/13315 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 61a8f0c5b8b1cf5022e4f20af4eb42cae1cb03f6 Author: Cheyenne Wills Date: Thu Jun 6 14:08:53 2019 -0600 volser: Avoid calling osi_audit before audit init volmain.c calls osi_audit before the audit facility is fully initialized. Commit 16d67791 (auditlogs-for-everyone-20050702) introduced the -auditlog parameter; it appears that it didn't remove the call to osi_audit (right after osi_audit_init) that was called before command line argument processing. This resulted in calling the audit facility before it was fulling initialized with the -auditlog and -audit-interface parameters. The 16d67791 commit replicated the osi_audit call after command line processing. Change-Id: Ia0c0054a2fb11892b5b30c0f0838a4d6bbdf9bbb Reviewed-on: https://gerrit.openafs.org/13772 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 52c4bf4d18a82d70016b584db95ab66aaa7ffb15 Author: Andrew Deason Date: Mon Oct 19 16:07:44 2020 -0500 audit: Always call pthread_once in osi_audit_init Currently, we skip the pthread_once call in osi_audit_init if audit_lock_initialized is set. But this is somewhat pointless, since pthread_once will effectively do this check itself, and better (it will wait if osi_audit_init is actively running in another thread). So just get rid of audit_lock_initialized, and replace the other assert for audit_lock_initialized with another plain pthread_once call. Change-Id: I466c8ec2d1516edecaae23d4354892e7e3a88918 Reviewed-on: https://gerrit.openafs.org/14403 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit e83347ce6ada6576bb8cd6f2719540a5565b09c0 Author: Benjamin Kaduk Date: Thu Sep 6 18:51:06 2018 -0500 remove unused src/butc/common.h Change-Id: Ie25a9ca4f715c841a7f7fa130176cfbdc5ef18e7 Reviewed-on: https://gerrit.openafs.org/13322 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit b1975c27d2201aad607a90e3795f6d0d5c4d8a8a Author: Benjamin Kaduk Date: Sun Sep 2 16:37:44 2018 -0500 butc: consistently spell taskId parameter All but one RPC used the capitalization "taskId"; adjust the long straggler for consistent style. Change-Id: I996d96a4fc67af7f745bf67041c90390073ca9ea Reviewed-on: https://gerrit.openafs.org/13318 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit f44d7e64bba10aa33f2188bf6629ab206b217f2d Author: Benjamin Kaduk Date: Sun Sep 2 16:18:31 2018 -0500 Remove commented-out butc RPC definitions These functions have been commented out since the original IBM import, and un-commenting them in their current location would be an ABI break (by causing opcodes to be reassigned for subsequent RPCs). Since they are just noise in the interface description file, remove them. Change-Id: I7e8cd2e7dfa4469e39e26a0437059c108f3ef218 Reviewed-on: https://gerrit.openafs.org/13316 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 465701c8f1a6dacc7e8adeef47a73eca85a928a8 Author: Benjamin Kaduk Date: Sat Sep 8 21:25:40 2018 -0500 butc: Initialize RPC outputs at top of function RPC handlers are a little bit special in that their output parameters are discarded on error and an Rx abort is sent instead of the usual response fields. Nonetheless, it is good code hygeine to adhere to the practices we use for the rest of the functions in the tree: initialize output variables before the first return. Change-Id: I6c2e25b04ccb6277bd28e398121723b92fe42b04 Reviewed-on: https://gerrit.openafs.org/13314 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3e3fce24da31a31ca9a3f4ad356c4e4eaf0ad897 Author: Andrew Deason Date: Mon Nov 12 15:01:18 2018 -0600 vlserver: Warn when we cannot unhash deleted entry If we are trying to delete an entry from the vldb, we fail with VL_NOENT if we cannot find the given entry on one of its hash chains. This is indicative of corruption in the vldb (since we have an entry not on a hash chain), but we don't really indicate this clearly. There are no log messages, and the user running 'vos' only sees an error like this: $ vos delentry 123456 Could not delete entry for volume 123456 You must specify a RW volume name or ID (the entire VLDB entry will be deleted) VLDB: no such entry Deleted 0 VLDB entries Which is the exact same error message if the user tries to delete a volume that does not actually exist. We currently do not have an error code that clearly says that the database appears corrupted and needs to be fixed, but we can at least log an error in VLLog for this case, to give the administrator a chance at fixing the situation. So, log a message in this situation. Change-Id: I4f0ee8749a90441e1f8d779890293dc5d1d9dbee Reviewed-on: https://gerrit.openafs.org/13382 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 48df3ac30210056ec046b48c28aee425b0690f92 Author: Mark Vitale Date: Tue Oct 6 00:02:53 2020 -0400 bos: do not assume fs just if dafs bnode is stopped If dafs is configured but stopped, 'bos salvage -forceDAFS' will fail with: bos: failed to get instance info for 'fs' (no such entity) bos: shutting down 'fs'. bos: can't stop 'fs' (no such entity) This is due to incomplete logic in IsDAFS, introduced with commit e46f10a0a0a930f318833a8a86b10c19744160c1 'bos: Do not assume DAFS just if DAFS bnode exists' Add logic to IsDAFS to work correctly when dafs is configured but stopped. Change-Id: I50f8209180536d25e68c0ad6fb826202d8f27ce7 Reviewed-on: https://gerrit.openafs.org/14382 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit f372ec041a83288a5d096360f0ad8589e4db666a Author: Mark Vitale Date: Tue Oct 6 10:18:11 2020 -0400 bozo: defer audit open until log dir is created and current On a new OpenAFS install where the log directory has not yet been created. 'bosserver -auditlog /usr/afs/logs/' (absolute path) fails with ENOENT because the log directory doesn't exist yet. Furthermore, 'bosserver -auditlog ' (relative path) succeeds, but the audit file is created in the current working directory when bosserver was started, not in the expected log directory (Transarc /usr/afs/logs). Both problems have been present since bosserver audit log support was introduced by commit 16d67791dce45e5d4ee9b854c796492ffcde2113 'auditlogs-for-everyone-20050702'. Reorder the bosserver initialization steps to ensure that the log directory has been created and is the current working directory, before creating and opening the audit log. Change-Id: I1dc3c136edd12c5425ef0b7a3212a18d4c3036f7 Reviewed-on: https://gerrit.openafs.org/14381 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 87041d676c93dfe35a085b9b5aaa73e74c08bc90 Author: Andrew Deason Date: Sat Oct 17 20:51:51 2020 -0500 bozo: Properly detect presence of -auditlog cmd_OptionAsString returns non-zero if the given option _isn't_ given (CMD_MISSING), so we need to call osi_audit_file only when cmd_OptionAsString returns 0. Since commit f6cdf71 (bozo: Use libcmd for command line options), this causes bosserver to complain on startup if no -auditlog was given: $ bosserver Warning: auditlog (null) not writable, ignored. To fix this, skip calling osi_audit_file if -auditlog was not given. While we're changing this anyway, change our processing of our audit-related options to more closely match what other daemons do, like ptserver or viced, so it's easier to see if we're doing the right thing. That is, just call cmd_OptionAsString() without a conditional, and just test if auditFileName is non-NULL later on, after options processing. Change-Id: I563c7efd02cb5210c32c0cc7f5a03683db792e98 Reviewed-on: https://gerrit.openafs.org/14402 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e8702e6a615a160cdbe464f76bd6f100667720d2 Author: Mark Vitale Date: Fri Oct 28 18:12:19 2016 -0400 afs: prevent double release of global lock afs_xvcb afs_GetServer calls ReleaseWriteLock(&afs_xvcb) twice within a few lines. The second one is spurious. Commits b18653de7ae90491c2e75f4a98410581655d776c 'xserver lock order violation' and f2bf60ed4f1323cd6f74f2f01114f7e4f714db53 'xvcb lock order violation' were written by the same author at the same time and apparently were victims of a bad merge. Discovered during a lock audit project as a panic during afsd startup: assertion failed: (&afs_xvcb)->excl_locked == WRITE_LOCK, file: /home/mvitale/src/sna-openafs/src/afs/afs_server.c, line: 2089 afs_GetServer is called frequently by many threads and so this bug could easily have released another thread's write lock on afs_xvcb. Remove the spurious second release. Change-Id: I495f4775e18ed37cfbccd03b5f25594586864b83 Reviewed-on: https://gerrit.openafs.org/14411 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fed176cc50512b4a5ae83c64b24c25e04198fa24 Author: Marcio Barbosa Date: Fri Feb 28 02:41:53 2020 +0000 ubik: Introduce IndexOf() To make the ubik_Call* functions cleaner, consolidate code that finds the index of the connection associated with a host into a new function. No functional change should be incurred by this commit. Change-Id: I320d7a41221cb533e8d077c412f872152ac43b75 Reviewed-on: https://gerrit.openafs.org/14060 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ea9e5e8519dc486cfb019447ee5d695de104079d Author: Andrew Deason Date: Thu Jul 18 16:21:10 2019 -0500 afs: Handle osi_NewVnode failures Currently, code inside afs_vcache.c assumes that osi_NewVnode always returns non-NULL, which means that osi_NewVnode must panic if it cannot create a new vnode. All of the callers of afs_GetVCache, afs_NewVCache, etc, already handle getting a NULL return, though (after all, the given fid may not exist or be inaccessible due to network errors, etc). So, just propagate NULL returns from osi_NewVnode up to our callers, to avoid panics in these situations. Modify osi_NewVnode on many arches to return an error on allocation failure, instead of panic'ing. Change-Id: Ib578b1747590bdf65327d4674e0849811ed999eb Reviewed-on: https://gerrit.openafs.org/13701 Reviewed-by: Benjamin Kaduk Reviewed-by: Yadavendra Yadav Tested-by: BuildBot commit e1e5df918fee00d4d9152c31c24cc1e7f23b71a6 Author: Mark Vitale Date: Mon Sep 18 19:45:10 2017 -0400 stats: incorrect clock square algorithm Since the original IBM code import, OpenAFS has an algorithm for squaring clock values, implemented identically in three different places. This algorithm does not account correctly for microsecs overflow into seconds, resulting in incorrect "sum-of-squares" values for queue and execution time in several OpenAFS performance utilities. Specifically, this code: t1.tv_usec += (2 * t2.tv_sec * t2.tv_usec) % 1000000 \ + (t2.tv_usec / 1000)*(t2.tv_usec / 1000) \ + 2 * (t2.tv_usec / 1000) * (t2.tv_usec % 1000) / 1000 \ + (((t2.tv_usec % 1000) > 707) ? 1 : 0); \ Can allow for the tv_usec field to be increased by a theoretical max of around: t1.tv_usec += 999998 \ + 999*999 \ + 2 * 999 * 999 / 1000 \ + 1; \ Or: t1.tv_usec += 1999996; \ If t1.tv_usec is already 999999, after this calculation its value could be as high as 2999995. So just checking once if t1.tv_usec is over 1000000 is not sufficient, since the resulting value (1999995) is still over 1000000. Correct all implementations by repeatedly checking if tv_usec is over 1000000 after the above calculation: macro affected utility ===================== ============================ afs_stats_SquareAddTo xstat_cm_test fs_stats_SquareAddTo xstat_fs_test clock_AddSq rxstat_get_process and _peer Change-Id: I3145d592ba6bc1556729eac657f43d476c99eede Reviewed-on: https://gerrit.openafs.org/14376 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit e985d43d99d93172b5608a3c73fd3201d3b3a212 Author: Mark Vitale Date: Mon Sep 28 16:35:38 2020 -0400 rxstats: correctly report vlserver VL_* RPC stats Since the original IBM code import, rxstat_get_process and rxstat_get_peer have reported vlserver VL_* RPC stats as for the "volserver interface". Correct this to read "vlserver interface". Change-Id: Ie65fd41150bed8180ad8792c21a67012084459ab Reviewed-on: https://gerrit.openafs.org/14375 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 18c345a9f8ee9b2ff73f23dae68757b19d3283f5 Author: Mark Vitale Date: Mon Sep 28 15:40:34 2020 -0400 rxstats: correctly distinguish client and server stats Commit d3eaa39da3693bba708fa2fa951568009e929550 'rx: Make the rx_call structure private' inadvertently caused all rxstats (aka rpcstats) to be recorded as client stats by hardcoding the value for isServer to 1. Therefore, when peer or process rxstats are enabled for a OpenAFS component, the rxstat_get_process and rxstat_get_peer utilities will erroneously report both client and server stats as "accessed as a client". This is particularly problematic for ubik VOTE_* and DISK_* RPC stats, for which a given ubik server may be both client and server over time. In this case, both client and server stats are conflated into the same "accessed as a client" counters. Instead, properly pass the value of isServer from rx_RecordCallStatistics through to rxi_IncrementTimeAndCount. Note to maintainers: This bug is only in master and all 1.8.x releases; no 1.6.x releases are affected. Note: Confusingly, isServer=1 indicates client stats and isServer=0 indicates server stats. However, this is a quirk of the original implementation and wire format of the RXSTATS_* RPCs and cannot be changed. isServer is actually shorthand for "remote is server"; thus all RPC client stubs record their rxstats with isServer == 1, and all RPC server stubs record their rxstats with isServer == 0. Change-Id: I2420f807e2c18ddfb9de7093a487825fa2d0a68e Reviewed-on: https://gerrit.openafs.org/14374 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit f18b58f8227df2ab420d69eb5937a99f747c7692 Author: Marcio Barbosa Date: Thu Sep 3 20:11:34 2020 +0000 volser: Close dirp on error in ConvertROtoRW Currently, if SAFSVolConvertROtoRWvolume cannot create a new transaction for the volume to be converted, it returns without closing the directory stream opened by it. To prevent this leak, go through a new 'goto done' destructor if NewTrans fails. Change-Id: Ie0580e7739ae667f1cd2f9cabb8aaf5e15d3f2dd Reviewed-on: https://gerrit.openafs.org/14342 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 47d809d4434f6724d0b6fbe2dcb54749486eeddb Author: Michael Meffie Date: Fri Aug 28 11:24:10 2020 -0400 bozo: Log each dir and file with bad access rights The bosserver directory and file access check stops after finding one directory or file with incorrect permissions or owner. A log message is written for this first one found, but more than one directory or file may have incorrect access rights. Instead check all of them so the bosserver logs a warning message for each incorrect director or file permission found. This should make it easier to fix all of the file permission problems at once. Change-Id: Ia3f14800ce036aa390929109a286cf21828e8a35 Reviewed-on: https://gerrit.openafs.org/14330 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit a6b14ea90259fbc4ead62f5f4288e435801db81e Author: Michael Meffie Date: Fri Aug 28 11:23:00 2020 -0400 bozo: Add KeyFileExt and rxkad.keytab to access rights check When the KeyFileExt and rxkad.keytab were added to OpenAFS, they were not added to the bosserver's access rights check. Add these files to the bosserver access checks, with the same access rights needed for the original KeyFile. Also, add the full path for KeyFileExt to the dirpath package (not just the filename), which was not done when the KeyFileExt was introduced. This is needed to perform the access checks. Change-Id: I8c9028e846fad9f15823baeb7cc15a8f80ed5c1c Reviewed-on: https://gerrit.openafs.org/14329 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit e17bc8ce865f630d268c2a5e8cafb79ad8855f12 Author: Mark Vitale Date: Wed Sep 23 17:32:40 2020 -0400 afs: remove vestigial externs for afs_xvcache These have not been needed since src/afs/afs_prototypes.h gained 'extern afs_rwlock_t afs_xvcache' with commit 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821" Remove the vestigial extern references. Change-Id: Id6aceff0d5df1f1bed210a3fbf2951c62f35ddbb Reviewed-on: https://gerrit.openafs.org/14406 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit a3fc79633fb0601bf02508bd1e64652f403e4b7e Author: Mark Vitale Date: Wed Sep 23 17:02:52 2020 -0400 afs: remove vestigial externs for afs_xcbhash Commit 64cc7f0ca7a44bb214396c829268a541ab286c69 "afs: Create afs_StaleVCache" consolidated many references to afs_xcbhash into a new function afs_StaleVCache. However, this left many references to 'extern afs_wrlock_t afs_xcbhash' that are no longer needed. But actually, many of these have not been needed since src/afs/afs_prototypes.h gained 'extern afs_rwlock_t afs_xcbhash' with commit 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821" Remove the vestigial extern references. No functional change is incurred by this commit. Change-Id: Ie6cfb6d90c52951795378d3b42e041567d207305 Reviewed-on: https://gerrit.openafs.org/14405 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4e85324729f8c11135b131310089fff4d81692e9 Author: Mark Vitale Date: Fri Sep 18 12:46:57 2020 -0400 xstat: prevent CPU loop when -period 0 Historically xstat_cm_test and xstat_fs_test have supported option '-period ' to specify continuous operaiton for a length of time. If '-period 0' was specified, both programs exited immediately. Beginning with commits 2c1a7e47336c8f8d14dd6c65d53925a9e0e87c66 'xstat: add xstat_*_Wait functions' and 6b67cac432043a43d7cdfa6af972ab54412aff94 'convert xstat and friends to pthreads', xstat_cm_test and xstat_fs_test now support -period 0 to run "forever". This support is implemented in xstat_cm_Wait and xstat_fs_Wait, respectively. Although the "wait forever" logic was added to allow consolidation of similar code in afsmonitor, it also changed how xstat_cm_test and xstat_fs_test behave for '-period 0'. Unfortunately, there is a bug in this support, at least when running on pthreads. After the initial 24 minute timer expires, the while (1) will repeatedly run select with a timeout that is now 0. This causes the while loop to consume 100% of the CPU on which this thread is dispatched. Instead, modify the wait-forever logic to specify NULL for the select() timeout value. Also update the man page to document that '-period 0' means forever. Change-Id: I25d0d5be0eedb8bf3de495785b9b03a3e3d45221 Reviewed-on: https://gerrit.openafs.org/14366 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 74f46e0912b3f9061d7fadc3b3d08a11d6adda97 Author: Andrew Deason Date: Thu Dec 12 21:00:20 2019 -0600 afs: Return to userspace after AFS_NEW_BKG reqs Currently, for AFS_NEW_BKG, background daemons run in the context of a normal user process (afsd), in order to return to run userspace-handled background ops. For non-AFS_NEW_BKG when AFS_DAEMONOP_ENV is defined, background daemons run as kernel threads instead, and have no corresponding userspace process. On LINUX, whether or not we run as a kernel thread has some odd side-effects: at least one example of this is how open file handles (struct file) are treated when closed. When the last reference to a struct file is closed, the final free is deferred to an asynchronous call to be executed "later", in order to avoid issues with lock inversion. For kernel threads, "later" means the work is schedule on the global system work queue (schedule_work()), but for userspace processes, it is scheduled on the task work queue (task_work_add()), which is run around when the thread returns to userspace. For background daemons, we never return from the relevant syscall until we get a userspace background request (or the client is shutting down), Commit ca472e66 (LINUX: Turn on AFS_NEW_BKG) changed LINUX to use AFS_NEW_BKG background daemons, so background requests now run as a normal userspace process, and last-reference file closes are deferred. Since we may never return to userspace, this means that our file handles (used for accessing the disk cache) may never get freed, leading to an unbounded number of file handles remaining open. This can be seen by seeing the first value in /proc/sys/fs/file-nr growing without bound (possibly slowly), as accessing /afs causes background requests. Eventually the number of open files can exceed the /proc/sys/fs/file-max limit, causing further file opens to fail, causing various other problems and potentially panics. To avoid this issue, define a new userspace background op, called AFS_USPC_NOOP, which gets returned to the afsd background daemon process. When afsd sees this, it just does nothing and calls the AFSOP_BKG_HANDLER syscall again, to go into the background daemon loop again. In afs_BackgroundDaemon, we return the AFS_USPC_NOOP op whenever there are no pending background requests, or if we've run 100 background requests in a row without a break. This causes us to return to userspace periodically, preventing any such task-queued work from building up indefinitely. Do this for all platforms (currently just LINUX and DARWIN), in order to simplify the code, and possibly avoid other similar issues, since staying inside a syscall for so long while doing real work is arguably weird. Add a documentation comment block for afs_BackgroundDaemon while we're here. Thanks to mvitale@sinenomine.net for discovering the file leak. Change-Id: I1953d73b2142d4128b064f8c5f95a5858d7df131 Reviewed-on: https://gerrit.openafs.org/13984 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 83ce8d41c68a5ebcc84132d77af9024c6d285e05 Author: Andrew Deason Date: Tue Oct 13 20:18:59 2020 -0500 ubik: Remove unused sampleName The RPC-L type sampleName and related constant UMAXNAMELEN are not referenced by anything, and have been unused since OpenAFS 1.0. Remove the unused definitions. Change-Id: I21a11d9db9ed80547de8685623fb09f9a86934f1 Reviewed-on: https://gerrit.openafs.org/14386 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bc6f50ca0ce6c17a5a9b1042fa90235613f80c95 Author: Andrew Deason Date: Tue Oct 13 12:17:37 2020 -0500 dir: Set srcdir correctly in src/dir/test srcdir is a magic variable that needs to be set to @srcdir@, not some relative path like ../../.. (which will usually be somewhere in the objdir, not srcdir). Set it correctly in here. Without this, objdir builds can fail with: make[4]: Entering directory '...obj/src/dir/test' make[4]: *** No rule to make target 'dtest.o', needed by 'dtest'. Stop. Which happens because the automatic rule for dtest.o can't be constructed, since we cannot find dtest.c automatically because srcdir isn't set properly. This has been broken since commit 37b4195d (dtest-20021111), but was not noticeable until commit 192a2ff4 (dir: make dtest buildable again), since that caused dtest to actually get built. Also set LIBS correctly in here, using the conventional ${TOP_LIBDIR}, since ${srcdir} no longer points to "../../..". Change-Id: I539e01a4397c558dc0eda492834b3f9913f71634 Reviewed-on: https://gerrit.openafs.org/14384 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit f6cdf7165b4e66772ee06314658b7c209928d611 Author: Cheyenne Wills Date: Fri Aug 21 12:53:30 2020 -0600 bozo: Use libcmd for command line options Update bosserver to use libcmd for command line parsing. Change-Id: Iaa55dc33b72983a48089a7b359260916bea2d1e7 Reviewed-on: https://gerrit.openafs.org/13845 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 1aa7d3c199e77e3ebdffe9cea4dee8ee82e81fcd Author: Mark Vitale Date: Mon Mar 4 01:37:53 2019 -0500 afs: refactor directory checking in DRead Commit d566c1cf874d15ca02020894ff0af62c4e39e7bb 'dread-do-validation-20041012' modified directory checking (in the afs_buffer.c implementation of DRead()) to use size information passed to DRead, rather than obtained from the cache via afs_CFileOpen. Because this directory checking does not require any information from the cache buffers or the cache partition, we can make the check right away, before searching the cache buffers or calling afs_newslot. To clarify and simplify, move the directory sanity checking logic to the beginning of DRead. Remove the afs_newslot cleanup logic which is no longer needed. While here, add Doxygen comments for DRead. Change-Id: I8cea4e885ece64e760271c8194c126250f87104e Reviewed-on: https://gerrit.openafs.org/13803 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6ac68ca514932262fa949eca50527735ff5c09a4 Author: Mark Vitale Date: Thu Mar 7 14:31:49 2019 -0500 dir: check afs_dir_MakeDir return code in dtest The dtest test program ignores the return from afs_dir_Makedir. Fix this so errors may be identified in testing. While here, also improve the diagnostic message for afs_dir_Create failures, to make it consistent with the new diagnostic message for afs_dir_MakeDir failures. Change-Id: Ib882947e01c864344f17faad8a646b2487793f29 Reviewed-on: https://gerrit.openafs.org/13797 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c5a9d4447d69d72de3304781194fa392a7c6e1d8 Author: Mark Vitale Date: Wed Mar 6 11:27:58 2019 -0500 dir: dtest should flush on error when creating directories The dtest -f subcommand (CRTest()) exits immediately if there is an error while adding files. This may create an empty, incomplete, or corrupt directory object on disk because we neglected to call DFlush before exiting. Always call DFlush from CRTest() whether it fails or succeeds. Change-Id: Ia7b4ad00ea6f4f9f788cd75ae726bdadb60ee9c3 Reviewed-on: https://gerrit.openafs.org/13796 Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit 2a1a65faab0a13083a749a63dcf3ee0879823188 Author: Mark Vitale Date: Tue Mar 5 23:20:10 2019 -0500 dir: correct fid type for dtest The dtest utility has had its fid[] arrays defined as 'long' since the initial IBM import. Commit 0a98548832472152304410e41306adcc5b91f6a2 'dir: Make test utility build again' converted some - but not all - the fid arrays to afs_int32. Allow dtest to operate correctly by converting the rest of the fid arrays to afs_int32. Change-Id: I2ebe36272e02cf860577153ab94f3591e1d707e8 Reviewed-on: https://gerrit.openafs.org/13795 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Tested-by: BuildBot commit 192a2ff49af5dbbb4f8175eec7cb63bfe97e444e Author: Mark Vitale Date: Tue Mar 5 23:11:38 2019 -0500 dir: make dtest buildable again Commit 7fe4125fe3435092b75ed29b884d8d3c2d1a2cad 'dir/vol: Die() really does' overlooked src/dir/test/dtest.c, breaking its build. Fix the signature of Die() and the makefile so dtest can be built. In addition, change the Makefile so it is always built. Change-Id: I18129acbfdaa770987c7f0b8055ff593f776e518 Reviewed-on: https://gerrit.openafs.org/13794 Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 836b3da39a32116e80f21bbb274795936e27e21c Author: Mark Vitale Date: Fri Oct 4 14:52:21 2019 -0400 dir: remove unused test files Makefile rules for physio.c and test-salvage.c have been commented out since the original IBM code import, and were removed in commit 37b4195d603630498664fa0975ea5d5c82f9aa4f 'dtest-20021111' to fix dtest. However, that commit neglected to remove the source files and other references to them in Makefile.in Finish the job by removing the files and references to them. No functional change is incurred by this commit. Change-Id: I57527be99cd28a481a86b659d1eb3227af9f1c99 Reviewed-on: https://gerrit.openafs.org/14052 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2928dbd78f56ef25684618426df9905cf59c384b Author: Mark Vitale Date: Sun Mar 3 22:06:28 2019 -0500 vol: de-orbit test programs The updateDirInode and listVicepx utilities are obsolete; they no longer build, are severely bitrotted, and have been largely replaced by volscan. While here, also remove other objects that have not been built by default since before the original IBM import: - ILIST ilist.exe - NAMEI_PROGS nicreate, nincdec, nino, nilist Remove all of them from the tree. Change-Id: I8f68ec425cce5e84bcc5f41d598eec23102109de Reviewed-on: https://gerrit.openafs.org/13793 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0a7d0c30a940dbbafe3f97fa222750d95870df93 Author: Benjamin Kaduk Date: Fri Sep 18 08:56:44 2020 -0700 Make OpenAFS 1.9.0 Update version strings for the first 1.9.x development release. Change-Id: I0d0e204ffe8d64d7c0f794f313c0f24ccea12783 Reviewed-on: https://gerrit.openafs.org/14362 Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 26a3f43a18508aa6fe63ad267f3127555f123ab9 Author: Benjamin Kaduk Date: Fri Sep 4 08:56:36 2020 -0700 Import NEWS from OpenAFS 1.8.6 Stay up to date with the stable branch at least until the initial version of the new release series. Change-Id: Iefcd9cc039399cd4cbbcc0474c2cabffa7780305 Reviewed-on: https://gerrit.openafs.org/14344 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 67a4279b65cc5082e23e72964b3974e17eeb77a9 Author: Benjamin Kaduk Date: Fri Sep 4 08:55:19 2020 -0700 Update 1.9.0 NEWS for recent changes Add some entries for the commits that landed since the previous update. Change-Id: I74820ee5a07c3fb539f233b2bd0c30aab262ba74 Reviewed-on: https://gerrit.openafs.org/14343 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d2e755e33a266df17169a1fc05db1e540b5e76af Author: Mark Vitale Date: Tue May 12 12:59:31 2020 -0400 DARWIN: disable kextutil check for versions requiring notarization Our kextutil signing check will fail for releases that require notarization (Mojave 10.14.5 and up, Catalina 10.15 all versions), because we aren't notarized yet at the time of the check. Instead, disable the check for those releases. Change-Id: Iec1b74d18ae02cdd031ed3194ffb9900aa8a1b55 Reviewed-on: https://gerrit.openafs.org/14222 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4d6c255c816c0a4f765048792dea34671fff6e87 Author: Thomas L. Kula Date: Thu May 14 14:08:40 2009 -0400 dumpscan: Don't call cb_dirent twice This fixes a bug where p->cb_dirent is called twice, if it exists. Change-Id: I7a7a6abf522b62eb310d003a61b3bbcdcda9e850 Reviewed-on: https://gerrit.openafs.org/14308 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 85893ac3df0c2cb48776cf1203ec200507b6ce7d Author: Marcio Barbosa Date: Mon Aug 31 19:56:56 2020 +0000 Revert "vos: take RO volume offline during convertROtoRW" This reverts commit 32d35db64061e4102281c235cf693341f9de9271. While that commit did fix the mentioned problem, depending on "vos" to set the volume to be converted as "out of service" is not ideal. Instead, this volume should be set as offline by the SAFSVolConvertROtoRWvolume RPC, executed on the volume server. The proper fix for this problem will be introduced by another commit. Change-Id: I0ce5ba793fe3c07e535225191b74eeb402ab5bfd Reviewed-on: https://gerrit.openafs.org/14339 Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8b68f1a4e1e3ae06de0d6c5a8af60ef99cacb83a Author: Michael Meffie Date: Mon Aug 24 13:12:13 2020 -0400 build: Add rpm target Add a top-level makefile target to build RPMs for Red Hat distributions from the currently checked out commit. The resulting rpms are placed in the packages/rpmbuild/RPMS/ directory. The rpm target is intended to be a convenience for testing changes to the rpm packaging or generating packages for local testing. Change-Id: Id951eb2b03629be59f6258e89e8356fe1fde1ff5 Reviewed-on: https://gerrit.openafs.org/14114 Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 7cc6b97ad26089ecb88019468f3ef7c0222cebe1 Author: Michael Meffie Date: Fri May 1 14:05:24 2020 -0400 makesrpm: Support custom version strings The makesrpm.pl script generates a source RPM by creating a temporary rpmbuild workspace, populating the SOURCES and SPECS directories in that workspace, running rpmbuild to build the source RPM, and finally copying the resulting source RPM out of the temporary workspace. The name of the source RPM file created by rpmbuild depends on the package version and release strings. Unfortunately, the format of the source RPM file name changed around OpenAFS 1.6.0, so makesrpm.pl has special logic to find the version string and extra code depending on the detected OpenAFS version. Instead of trying to predict the name of the resulting source RPM file from the OpenAFS version string, and having different logic for old versions of OpenAFS, use a filename glob to find resulting source RPM file name in the temporary rpmbuild workspace. Remove the major, minor, and patch level variables, which were only used to guess the name of the resulting source RPM file name. Convert '-' characters to '_' in the package version and package release, since the '-' character is reserved by rpm as a field separator. While here, add the --dir option to specify the path of the generated source RPM, and change the 'srpm' makefile target to use the new --dir option, instead of changing the current directory before running makesrpm.pl. Also, add a dependency on the 'dist' makefile target, since the the source and document tarballs are required to build the source RPM. Add pod documentation and add the --help (-h) option to print a brief help message, and add the --man option to print the full man page. With this change, we can build a source RPM even when the .version file in the src.tar.bz file has a custom format or was created from a checkout of the master branch or other non-release reference. Change-Id: I7320afe6ac1f77d4dd38fcc194d41678fde5c950 Reviewed-on: https://gerrit.openafs.org/14116 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 4f78b3fdf1b6df9a5da85fc8bcfae28857081799 Author: Stephan Wiesand Date: Tue Aug 25 23:34:39 2020 +0200 Correct our contributor's code of conduct There are no races. Racism does exist though. Change-Id: I0a4cde55a5f470649eb99c5d7f30c9cec86d9baa Reviewed-on: https://gerrit.openafs.org/14320 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit c4f853aa00f1650b678cbd22ad1e2a9cf01c1303 Author: Andrew Deason Date: Wed Aug 26 15:41:00 2020 -0500 UKERNEL: Build linktest with COMMON_CFLAGS Currently, 'linktest' in libuafs is built with a weird custom rule that specifies several various CFLAGS and LDFLAGS, etc. One side-effect of this is that linktest is built without specifying -O, even if optimization is otherwise enabled. Normally nobody would care about the optimization of linktest, since it's never supposed to be run, but this can cause an error when building with -D_FORTIFY_SOURCE=1 on some systems (such as RHEL7): In file included from /usr/include/sys/types.h:25:0, from /.../src/config/afsconfig.h:1485, from /.../src/libuafs/linktest.c:15: /usr/include/features.h:330:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp] # warning _FORTIFY_SOURCE requires compiling with optimization (-O) ^ cc1: all warnings being treated as errors make[3]: *** [linktest] Error 1 For now, to fix this just include $(COMMON_CFLAGS) in the flags we give for linktest, so $(OPTMZ) also gets pulled in, and building linktest gets a little closer to a normal compilation step. Change-Id: I3362dcfe8407825ab88854ae59da4188ed16be9d Reviewed-on: https://gerrit.openafs.org/14324 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 696f2ec67b049639abf04905255a7d6173dbb19e Author: Jan Iven Date: Tue Sep 1 14:51:25 2020 +0200 ptserver: Remove duplicate ubik_SetLock in listSuperGroups It looks like a call to ubik_SetLock(.. LOCKREAD) was left in place in listSuperGroups after locking was moved to ReadPreamble in commit a6d64d70 (ptserver: Refactor per-call ubik initialisation) When compiled with 'supergroups', and once contacted by "pts mem -expandgroups ..", ptserver will therefore abort() with Ubik: Internal Error: attempted to take lock twice This patch removes the superfluous ubik_SetLock. FIXES 135147 Change-Id: I8779710a6d68e4126fc482123b576690d86e4225 Reviewed-on: https://gerrit.openafs.org/14338 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 16bae98ec525fa013514fb46398df682d7637ae0 Author: Cheyenne Wills Date: Mon Aug 24 11:10:30 2020 -0600 INSTALL: document the minimum Linux kernel level The change associated with gerrit #14300 removed support for older Linux kernels (2.6.10 and earlier). The commit 'Import of code from autoconf-archive' (d8205bbb4) introduced a check for Autoconf 2.64. Autoconf 2.64 was released in 2009. The commit 'regen.sh: Use libtoolize -i, and .gitignore generated build-tools' (a7cc505d3) introduced a dependency on libtool's '-i' option. Libtool supported the '-i' option with libtool 1.9b in 2004. Update the INSTALL instructions to document a minimum Linux kernel level and the minimum levels for autoconf and libtool. Notes: RHEL4 (EOL in 2017) had a 2.6.9 kernel and RHEL5 has a 2.6.18 kernel. RHEL5 has libtool 1.5.22 and autoconf 2.59, RHEL6 has libtool 2.2.6 and autoconf 2.63, and RHEL7 has libtool 2.4.2 and autoconf 2.69. Change-Id: I235eeffa4adb152e05aab7aca839700816e62c83 Reviewed-on: https://gerrit.openafs.org/14305 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b968875a342ba8f11378e76560b46701f21391e8 Author: Yadavendra Yadav Date: Fri Aug 21 01:54:00 2020 +0530 afs: Avoid NatPing event on all connection Inside release_conns_user_server, connection vector is traversed and after destroying a connection new eligible connection is found on which NatPing event will be set. Ideally there should be only one connection on which NatPing should be set but in current code while traversing all connection of server a NatPing event is set on all connections to that server. In cases where we have large number of connection to a server this can lead to huge number of “RX_PACKET_TYPE_VERSION” packets sent to a server. Since this happen during Garbage collection of user structs, to simulate this issue below steps were tried - had one script which “cd” to a volume mount and then script sleeps for large time. - Ran one infinite while loop where above script was called using PAG based tokens (As new connection will be created for each PAG) - Instrumented the code, so that we hit above code segment where NatPing event is set. Mainly reduced NOTOKTIMEOUT to 60 sec. To fix this issue set NatPing on one connection and once it is set break from “for” loop traversing the server connection. Change-Id: Ia38cec0403fde76cdd59aa664bd261481e2edee6 Reviewed-on: https://gerrit.openafs.org/14312 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason commit 291bad659e26c21332abd2954ee8d49fccad90da Author: Mark Vitale Date: Mon Apr 20 14:51:08 2020 -0400 vos: avoid 'half-locked' volume after interrupted 'vos rename' Reported symptoms: If a 'vos rename' is interrupted after it has locked the volume and replaced the VLDB entry, but before it has unlocked the volume, the volume will remain locked. However, the locked volume will NOT be listed as locked in any vos commands that display locked status (see below for details). Background: Most vos write operations lock the VLDB volume entry before proceeding, then release the volume lock when finished. This is accomplished via VL_SetLock and VL_ReleaseLock, respectively. VL_SetLock always sets these members in the VLDB volume entry: - flags is modified to set the required VLOP_* code bit as specified - LockAFSid is set to 0 (never implemented) - LockTimestamp is set to the current time VL_ReleaseLock always sets them as follows: - flags is cleared of any VLOP_* code bit - LockAFSid is set to 0 (never implemented) - LockTimestamp is set to 0 VL_ReplaceEntry(N) may also optionally clear each of these members: - flags operation bits may be explicitly cleared via LOCKREL_OPCODE - LockAFSid may be explicitly cleared via LOCKREL_AFSID - LockTimestamp may be explicitly cleared via LOCKREL_TIMESTAMP When all 3 options are specified, VL_ReplaceEntry also does the functional equivalent of a VL_ReleaseLock. Most vos operations use this method. However, when no lock release options are specified on VL_ReplaceEntry(N), the VLDB entry is simply replaced with the supplied entry. This includes whatever flags values are specified in the supplied entry; therefore, this amounts to an additional, implicit way to set or modify the flags. Root cause: 'vos rename' (UV_RenameVolume) is the only vos operation that does all of the following things: - accepts a replacement volume entry that was obtained before VL_SetLock (and thus does NOT have any lock flags set) - issues VL_SetLock (which sets the lock flag in the VLDB) - issues VL_ReplaceEntry(N) with the original unlocked entry, and with no lock release options (thus with explicit intent to leave the lock flag unchanged, but inadvertently doing an implicit clear of the lock flag in the VLDB) - (performs some additional volserver work) - issues VL_ReleaseLock to release the volume lock Therefore, if 'vos rename' is cancelled or killed before reaching the final VL_ReleaseLock step, the VLDB entry is left with the lock flags cleared but the LockTimestamp still set. As we will see below, this 'half-locked' state produces confusing results from other vos commands. Detection of locked state: The 'vos lock' command (and all other vos commands that issue VL_SetLock) use the lock timestamp to determine if a volume is locked. However, several other vos commands ('vos listvldb ', 'vos examine ', 'vos listvldb -locked') use the VLDB entry's lock flags (not the lock timestamp) to determine if the volume is locked. Therefore, if the lock flags have been cleared but the lock timestamp is still set, these commands fail to detect that the volume is still locked. Yet an administrator's 'vos lock ' will still fail with: Could not lock VLDB entry for volume VLDB: vldb entry is already locked This is the external manifestation of the 'half-locked' state. Workaround and fix: This scenario has a simple workaround: 'vos unlock '. However, to avoid this confusing outcome in the first place, modify the 'vos rename' logic so that the lock flags are no longer inadvertently cleared. Now, if the 'vos rename' is interrupted before the volume is unlocked, it will still appear locked in normal vos command output. Change-Id: I6cc16d20c4487de4e9a866c6f0c89d950efd2f7d Reviewed-on: https://gerrit.openafs.org/14157 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 21cd26cb0d0a37d9412c0285a3c73c693222fd8a Author: Mark Vitale Date: Tue Aug 25 12:37:09 2020 -0400 rxgen: remove dead code hndle_param_tail Since the original IBM code import, hndle_param_tail has been dead code. It was later ifdef'd out in commit 8f2df21ffe59 'pull-prototypes-to-head-20020821' Remove the dead code from the tree. No functional change is incurred by this commit. Change-Id: I29128eecc93a5871f5bb9369c3983baf5b537beb Reviewed-on: https://gerrit.openafs.org/14322 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d5f0e16ac44475be55a7cc3e2895fc4a3a923ece Author: Marcio Barbosa Date: Tue Aug 18 13:56:26 2020 +0000 bos: suppress unnecessary warn if -noauth Commit d008089a7 (Add interface to select client security objects) consolidated the code that selects the client security objects into a set of new interfaces. Before this commit, the "bos: running unauthenticated" message, which warns the user when an unauthenticated connection is established, used to be suppressed if the -noauth flag was specified. Similarly to commit b3c16324e (ubik: Make ugen_ClientInit honor noAuthFlag), recover the original behavior avoiding warn messages about unauthenticated connections if the -noauth flag is provided. Change-Id: Iaf0ac6bd91ea160256823512f060afc94b5926bf Reviewed-on: https://gerrit.openafs.org/14306 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 904f5bd398db248c11b30ef7e360ce5141dcd1f3 Author: Michael Meffie Date: Thu Apr 16 16:29:09 2020 -0400 vlserver: fix missing read-only entries from ListAttributesN2 The ListAttributesN2() RPC can fail to list read-only entries under certain circumstances. This RPC is used by the `vos listvldb` command to retrieve vldb entries (unless the -name option is given). The `vos listvldb` command fails to list volume entries when run with the '-server' option for volumes that have read-only replicas, but have not been released. Consider the following example volume: $ vos create fs1.example.com a test $ vos addsite fs1.example.com a test $ vos addsite fs2.example.com a test $ vos listvldb ... test RWrite: 536870921 number of sites -> 3 server fs1.example.com partition /vicepa RW Site server fs1.example.com partition /vicepa RO Site -- Not released server fs2.example.com partition /vicepa RO Site -- Not released `vos listvldb` fails to find the volume when the search is limited to server 'fs2': $ vos listvldb -server fs2.example.com VLDB entries for server fs2.example.com Total entries: 0 Instead of the expected results: $ vos listvldb -server fs2.example.com test RWrite: 536870921 number of sites -> 3 server fs1.example.com partition /vicepa RW Site server fs1.example.com partition /vicepa RO Site -- Not released server fs2.example.com partition /vicepa RO Site -- Not released This situation makes it difficult to remove old server addresses from the vldb. In this situation, 'vos remaddrs' and 'vos changeaddr -remove' commands will complain the server addresses are still in use by volume entries, however running 'vos listvldb -server' will not show which volumes entries are in use. The entries are not listed for unreleased volumes because the ListAttributesN2() RPC is currently checking the volume VLF_ROEXISTS flag, instead of the server site flags (serverFlags) to determine when the entry is a read-only site. The volume VLF_ROEXISTS flag is set when a volume is released. To fix this, make ListAttributesN2 check for the VLSF_ROVOL site flag, instead of the VLF_ROEXISTS entry flag. Change-Id: Ib636fbe016d1d2f5b117624d9930dba83ebcef8a Reviewed-on: https://gerrit.openafs.org/14154 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 13a49aaf0d5c43bce08135edaabb65587e1a8031 Author: Cheyenne Wills Date: Mon Aug 17 08:20:11 2020 -0600 LINUX 5.9: Remove HAVE_UNLOCKED_IOCTL/COMPAT_IOCTL Linux-5.9-rc1 commit 'fs: remove the HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL defines' (4e24566a) removed the two referenced macros from the kernel. The support for unlocked_ioctl and compat_ioctl were introduced in Linux 2.6.11. Remove references to HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL using the assumption that they were always defined. Notes: With this change, building against kernels 2.6.10 and older will fail. RHEL4 (EOL in March 2017) used a 2.6.9 kernel. RHEL5 uses a 2.6.18 kernel. In linux-2.6.33-rc1 the commit messages for "staging: comedi: Remove check for HAVE_UNLOCKED_IOCTL" (00a1855c) and "Staging: comedi: remove check for HAVE_COMPAT_IOCTL" (5d7ae225) both state that all new kernels have support for unlocked_ioctl/compat_ioctl so the checks can be removed along with removing support for older kernels. Change-Id: Idd2716f3573ea455f8a5e1535bca584af0787717 Reviewed-on: https://gerrit.openafs.org/14300 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit f5051b87a56b3a4f7fd7188cbd16a663eee8abbf Author: Michael Meffie Date: Fri May 15 12:01:44 2020 -0400 vos: avoid CreateVolume when restoring over an existing volume Currently, the UV_RestoreVolume2 function always attempts to create a new volume, even when doing a incremental restore over an existing volume. When the volume already exists, the volume creation operation fails on the volume server with a VVOLEXISTS error. The client will then attempt to obtain a transaction on the existing volume. If a transaction is obtained, the incremental restore operation will proceed. If a full restore is being done, the existing volume is removed and a new empty volume is created. Unfortunately, the failed volume creation is logged to by the volume server, and so litters the log file with: Volser: CreateVolume: Unable to create the volume; aborted, error code 104 To avoid polluting the volume server log with these messages, reverse the logic in UV_RestoreVolume2. Assume the volume already exists and try to get the transaction first when doing an incremental restore. Create a new volume if the transaction cannot be obtained because the volume is not present. When doing a full restore, remove the existing volume, if one exists, and then create a new empty volume. Change-Id: I8bdc13130d12c81cd2cd18a9484852708cac64d7 Reviewed-on: https://gerrit.openafs.org/14208 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Tested-by: Marcio Brito Barbosa Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 624219a1b2192e5c7b6b45e2cbe784a9c5f33a96 Author: Michael Meffie Date: Tue Aug 4 10:34:07 2020 -0400 tests: Accommodate c-tap-harness 4.7 The SOURCE and BUILD environment variables have been changed to C_TAP_SOURCE and C_TAP_BUILD in the new version of c-tap-harness. The runtests command syntax has changed as well. Convert all of the old SOURCE and BUILD environment variables to the new C_TAP_SOURCE and C_TAP_BUILD names. Add the required -l command line option to specify the test list. Add the new runtests -v option to run the tests in verbose mode to make it easier to see which tests failed. Change-Id: I209a6dc13d6cd1507519234fce1564fc4641e70b Reviewed-on: https://gerrit.openafs.org/14295 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 3f377aa117273eba5c77ad652c0b086446b3f874 Author: Russ Allbery Date: Mon Aug 3 20:59:25 2020 -0400 Import of code from c-tap-harness This commit updates the code imported from c-tap-harness to abdb66561ffd4d2f238fdb06f448ccf09d80c059 (release/4.7) Upstream changes are: Daniel Collins (1): Add is_blob() test function. Daniel Kahn Gillmor (1): LICENSE: use https for all URLs Daria Brashear (1): Add verbose mode environment variable to runtests Julien ÉLIE (2): Document -v in usage and comments of runtests Avoid realloc of zero length in tests/runtests.c Marc Dionne (1): Add test_cleanup_register_with_data Russ Allbery (115): clang --analyze cleanups for runtests Modernize POD tests Update README to my current layout Explicitly note that test programs must be executable Fix comment typo in tests/runtests.c Switch to a copyright-format 1.0 LICENSE file Flush harness output after each line Show the test count as ? when the plan is deferred More correctly backspace over test counts when aborting Refactor test list handling Allow passing tests on the runtests command line Don't allow command-line arguments if a list was given Search for tests under the name given as well Release 2.0 Fix backward incompatibility when searching for tests Document decision to ignore TAP version directives Release 2.1 Document different runtests behavior in bail handling Change exit status of bail to 255 Release 2.2 Add a new test_cleanup_register C API Add warn_unused_result attributes Add portability for warn_unsed_result attributes to tap/macros.h Minor coding style fix (spacing) in runtests.c Split the runtests usage string for ISO C90 string limits Include stddef.h Diagnose failure to register the exit handler Use diag internally in the basic C TAP library Some additional comments about cleanup functions Move repetitive printing code in the C TAP library to a macro Set a flag when bailing for more correct cleanup Change my email address to eagle@eyrie.org Release 2.3 Add diag_file_add and diag_file_remove functions Don't die for unknown files passed to diag_file_remove Release 2.4 Update comment about AIX and WCOREDUMP Don't test for NULL before calling free Be more careful about file descriptors in child processes Run cleanup functions in non-primary processes as well Release 3.0 Update collective package copyright notices at start of LICENSE Check integer overflows on memory allocation, fix string creation Switch POD spelling test to use Lancaster consensus variable Add new bnrealloc API for brealloc with checked multiplication Rename nrealloc to reallocarray Return the test status from test functions Fix the overflow check for breallocarray Fix the overflow check for xreallocarray in runtests Restructure test result reallocation in runtests Change diag and sysdiag to always return true Release 3.1 Fix typos in basic.c and basic.h Fix usage message when running runtests with no arguments Update introductory runtests comments for current syntax Add the -l flag to suggested runtests invocation in README Support comments and blank lines in test lists Release 3.2 Update licensing information Various improvements to verbose support Compile warning-free with Clang, check Autoconf macros Release 3.3 Remove unnecessary assert.h include in tap/basic.c Fix some additional -v documentation issues Rebalance usage to avoid too-long strings Fix segfault in runtests with empty test list Release 3.4 Document running autogen if starting from Git Rename autogen to bootstrap Support and prefer C_TAP_SOURCE and C_TAP_BUILD Fix comment typo in tests/runtests.c Add missing va_end to is_double Release 4.0 Fix all non-https www.eyrie.org URLs Add is_bool C test function Add DocKnot metadata and a Markdown README file Update documentation for new DocKnot standards Release 4.1 Use more defaults from DocKnot templates Fix new fall-through warning in GCC 7 Use compiler warnings from rra-c-util, fix issues Merge pull request #4 from solemnwarning/master Coding style fixes and NEWS for is_blob Re-enable -Wunknown-pragmas for GCC Avoid zero-length realloc allocations in breallocarray Update copyright date on tests/runtests.c Release 4.2 Add SPDX-License-Identifier headers to source files Add and run new check-cppcheck target Fix instructions for running one test Identify values as left and right Fix is_string comparisons with NULL pointers Add support for running tests under valgrind Replace putc with fprintf Update shared files from rra-c-util Release 4.3 Update NEWS date for 4.3 release Collapse some copyright dates NEWS and coding style for test_cleanup_register_with_data Remove unused variables caught by Clang scan-build Update to rra-c-util 8.0 Fix error checking in bstrndup Release 4.4 Add support for C++ Document that C TAP Harness can be built as C++ Release 4.5 Regenerate README files Reformat using clang-format 10 Update to rra-c-util 8.1 Release 4.6 Fix spelling errors caught by codespell Protect the test suite against C_TAP_VERBOSE Switch to GitHub Actions for CI Add NEWS entry for GCC 10 warning fixes Release 4.7 Change-Id: I5a78215bf99b53bd848f0fa6bb9092deab38f24e Reviewed-on: https://gerrit.openafs.org/14294 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit eccd4b9778014c36a4b3af6d9e80194066bd2195 Author: Andrew Deason Date: Tue Jun 2 13:37:00 2020 -0500 afs: Always define our own osi_timeval32_t Since OpenAFS 1.0, osi_GetTime has taken a timeval-like pointer, which contains 32-bit fields (the actual type has been called either osi_timeval_t or osi_timeval32_t over time). For platforms that have a native timeval-like type with 32-bit fields, we just define osi_timeval32_t to that type, and elsewhere we define our own struct to be osi_timeval32_t. For platforms that use the native timeval, we can then define osi_GetTime() to just be, e.g., microtime(). This approach is difficult to maintain, though, because we must keep track of whether 'struct timeval' contains 32-bit fields on each platform, which can depend on many factors. It's easy to make mistakes (the current tree already contains mistakes), and there's not much benefit. To avoid all of this, just always define osi_timeval32_t to be our own struct with afs_int32 fields, and provide definitions for osi_GetTime that convert from the native time struct to our osi_timeval32_t. This does mean that for some platforms we do an unnecessary type conversion, but this is a small price to pay for more straightforward and maintainable code. To be a little more sure that our types are correct, change osi_GetTime to be defined as an inline function instead of a macro. At the same time, do a similar conversion for the KERNEL implementation of the rx clock_GetTime function. Get rid of platform-specific mess, and do a straightforward type conversion between osi_timeval32_t and struct clock in an inline function. Change-Id: I18819acb556a2a7f1b6da6994db9783c48108934 Reviewed-on: https://gerrit.openafs.org/14238 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a5c3dfe99fa1831e3b416e89f52a03fd1cf9f73d Author: Andrew Deason Date: Tue Jun 2 13:12:14 2020 -0500 afs: Move osi_GetTime out of param.h Most platforms currently #define osi_GetTime in their param.h. This is really redundant, since the definition of osi_GetTime almost never changes for a given platform, so we end up with many copies of the same osi_GetTime definition for a given platform. Move osi_GetTime out of param.h for these platforms, and define it in osi_machdep.h instead, which is where most platform-specific definitions go. For DFBSD, we don't have an osi_machdep.h at all yet, so create a new one to contain the osi_GetTime definition. Currently we don't build libafs at all on DFBSD, but do this anyway so we don't lose the existing osi_GetTime definition. For NBSD, we were providing (conflicting!) definitions for osi_GetTime in param.h and in osi_machdep.h. Just remove the definitions in param.h, since those should have been getting overridden by the osi_machdep.h definition. Change-Id: I7097d9fe2fcd38c06ecc275e8fe3a2c69c9d0436 Reviewed-on: https://gerrit.openafs.org/14237 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit c56873bf95f6325b70e63ed56ce59a3c6b2b753b Author: Cheyenne Wills Date: Mon Jul 27 12:31:35 2020 -0600 afs: Avoid using logical OR when setting f_fsid Building with clang-10 produces the warning/error message warning: converting the result of '<<' to a boolean always evaluates to true [-Wtautological-constant-compare] for the expression abp->f_fsid = (AFS_VFSMAGIC << 16) || AFS_VFSFSID; The message is because a logical OR '||' is used instead of a bitwise OR '|'. The result of this expression will always set the f_fsid member to a 1 and not the intended value of AFS_VFSMAGIC combined with AFS_VFSFSID. Update the expression to use a bitwise OR instead of the logical OR. Note: This will change value stored in the f_fsid that is returned from statfs. Using a logical OR has existed since OpenAFS 1.0 for hpux/solaris and in UKERNEL since OpenAFS 1.5 with the commit 'UKERNEL: add uafs_statvfs' b822971a. Change-Id: I3e85ba48058ac68e3e3ac7f277623f660187926c Reviewed-on: https://gerrit.openafs.org/14292 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 446457a1240b88fd94fc34ff5715f2b7f2f3ef12 Author: Cheyenne Wills Date: Mon Jul 27 12:31:03 2020 -0600 afs: Set AFS_VFSFSID to a numerical value Currently when UKERNEL is defined, AFS_VFSFSID is always set to AFS_MOUNT_AFS, which is a string for many platforms for UKERNEL. Update src/afs/afs.h to insure that the define for AFS_VFSFSID is a numeric value when building UKERNEL. Clean up the preprocessor indentation in src/afs/afs.h in the area around the AFS_VFSFSID defines. Thanks to adeason@sinenomine.net for pointing out a much easier solution for resolving this problem. Change-Id: I618fc4c89029a6cca2ca6f530b8f65399299a9d1 Reviewed-on: https://gerrit.openafs.org/14279 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e5f44f6e9af643cab3a66216dff901e0a4c5eda8 Author: Cheyenne Wills Date: Thu Jul 23 15:43:42 2020 -0600 clang-10: ignore fallthrough warning in generated code Clang-10 will not recognize '/* fall through */' as an indicator to turn off the fallthrough warning due to the lack of a 'break' in a case statement. Code generated by flex uses the '/* fall through */' comments to turn off compiler warnings for fallthroughs in case statements. For code generated by flex, ignore the implicit-fallthrough via pragma or disable the warning via a compile time flag. Add new env variable "CFLAGS_NOIMPLICIT_FALLTHROUGH" to selectively disable the compile check in Makefiles when checking is enabled. Change-Id: I4c054defda03daa2aeb645ae2271dfa0cb54925f Reviewed-on: https://gerrit.openafs.org/14275 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 16f1b2f894c28614df0f096be8232b1176e87c70 Author: Cheyenne Wills Date: Mon Jul 27 08:33:03 2020 -0600 clang-10: use AFS_FALLTHROUGH for case fallthrough Clang-10 will not recognize '/* fallthrough */' as an indicator to turn off the fallthrough diagnostic due to the lack of a 'break' in a case statement. Clang-10 requires the '__attribute__((fallthrough))' statement to disable the diagnostic. In addition clang-10 is finding additional locations where fall throughs occur. Determine if the compiler supports '__attribute__((fallthrough))' to disable the implicit fallthrough diagnostic. Define a new macro 'AFS_FALLTHROUGH' that will disable the fallthrough diagnostic. Set it as a wrapper for the Linux kernel's 'fallthrough' macro if available, otherwise set it as a wrapper macro for '__attribute__((fallthrough))' if the compiler supports it. Update CODING to document the use of AFS_FALLTHROUGH when needing to fallthrough between case statements. Replace the '/* fallthrough */' comments with AFS_FALLTHROUGH, and add AFS_FALLTHROUGH as needed. Replace some fallthroughs with a break (or goto) if the flow was was just to a break (or goto). e.g. case x: case x: somestmt; somestmt; break; case y: case y: break; break; Correct a mis-indented brace '}' in src/WINNT/afsd/smb3.c Note, the clang maintainers have rejected the use of comments as a flag to turn off the fall through warnings. Change-Id: Ia5da10fc14fc1874baca035a3cf471e618e0d5f5 Reviewed-on: https://gerrit.openafs.org/14274 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit e61ab9353e99d3298815296abf6b02c50ebe3df0 Author: Michael Meffie Date: Wed Jul 1 21:50:09 2020 -0400 redhat: Add make to the dkms-openafs pre-requirements If `make` is not installed before dkms-openafs, the OpenAFS kernel module is not built during the dkms-openafs package installation. The failure happens in the "checking if linux kernel module build works" configure step, which invokes `make` to check the linux buildsystem. configure fails when `make` is not available, and gives the unhelpful suggestion (in this case) of configuring with --disable-kernel module. Running the configure.log in the dkms build directory shows: configure:7739: checking if linux kernel module build works make -C /lib/modules/4.18.0-193.6.3.el8_2.x86_64/build M=/var/lib/dkms/openafs/... ./configure: line 7771: make: command not found configure: failed using Makefile: Avoid this build failure by adding `make` to the list of dkms-openafs package pre-requirements. Change-Id: I98b3508341eea1df4fa7b6f43e88add1bda9ee2c Reviewed-on: https://gerrit.openafs.org/14266 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 2d01f35d05a71da3594569c66e688b4bc6b28401 Author: Andrew Deason Date: Fri May 29 12:57:50 2020 -0500 vol: Blank opts in VOptDefaults Instead of needing to set every single field in the 'opts' structure individually, blank the whole thing to make sure the entire struct is initialized. Remove the now-redundant lines that initialize various items to 0. Change-Id: I799cdb55becd66a8f3d6ec2f81338843038d0abd Reviewed-on: https://gerrit.openafs.org/14280 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Kailas Zadbuke Reviewed-by: Yadavendra Yadav Reviewed-by: Benjamin Kaduk commit 4498bd8179e5e93a33468be3c8e7a30e569d560a Author: Andrew Deason Date: Mon Jun 22 22:54:52 2020 -0500 volser: Don't NUL-pad failed pread()s in dumps Currently, the volserver SAFSVolDump RPC and the 'voldump' utility handle short reads from pread() for vnode payloads by padding the missing data with NUL bytes. That is, if we request 4k of data for our pread() call, and we only get back 1k of data, we'll write 1k of data to the volume dump stream followed by 3k of NUL bytes, and log messages like this: 1 Volser: DumpFile: Error reading inode 1234 for vnode 5678 1 Volser: DumpFile: Null padding file: 3072 bytes at offset 40960 This can happen if we hit EOF on the underlying file sooner than expected, or if the OS just responds with fewer bytes than requested for any reason. The same code path tries to do the same NUL-padding if pread() returns an error (for example, EIO), padding the entire e.g. 4k block with NULs. However, in this case, the "padding" code often doesn't work as intended, because we compare 'n' (set to -1) with 'howMany' (set to 4k in this example), like so: if (n < howMany) Here, 'n' is signed (ssize_t), and 'howMany' is unsigned (size_t), and so compilers will promote 'n' to the unsigned type, causing this conditional to fail when n is -1. As a result, all of the relevant log messages are skipped, and the data in the dumpstream gets corrupted (we skip a block of data, and our 'howFar' offset goes back by 1). So this can result in rare silent data corruption in volume dumps, which can occur during volume releases, moves, etc. To fix all of this, remove this bizarre NUL-padding behavior in the volserver. Instead: - For actual errors from pread(), return an error, like we do for I/O errors in most other code paths. - For short reads, just write out the amount of data we actually read, and keep going. - For premature EOF, treat it like a pread() error, but log a slightly different message. For the 'voldump' utility, the padding behavior can make sense if a user is trying to recover volume data offline in a disaster recovery scenario. So for voldump, add a new switch (-pad-errors) to enable the padding behavior, but change the default behavior to bail out on errors. Change-Id: Ibd6e76c5ea0dea95e3354d9b34536296f81b4f67 Reviewed-on: https://gerrit.openafs.org/14255 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 37b55b30c65d0ab8c8eaabfda0dbd90829e2c46a Author: Cheyenne Wills Date: Thu Jul 16 15:52:00 2020 -0600 butc: fix int to float conversion warning Building with clang-10 results in 2 warnings/errors associated with with trying to convert 0x7fffffff to a floating point value. tcmain.c:240:18: error: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Werror, -Wimplicit-int-float-conversion] if ((total > 0x7fffffff) || (total < 0)) /* Don't go over 2G */ and the same conversion warning on the statement on the following line: total = 0x7fffffff; Use floating point and decimal constants instead of the hex constants. For the test, use 2147483648.0 which is cleanly represented by a float. Change the comparison in the test from '>' to '>='. If the total value exceeds 2G, just assign the max value directly to the return variable. Change-Id: I79b2afa006496a756bd7b50976050c24827aa027 Reviewed-on: https://gerrit.openafs.org/14277 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 899b1af4183fb09fd55a36e3d10ffbdb9671a47e Author: Cheyenne Wills Date: Thu Jul 16 15:07:15 2020 -0600 autoconf: fix detection for fallthrough attribute Due to bug , ax_gcc_func_attribute.m4 fails to properly detect __attribute__((fallthrough)) in clang. Until this is fixed in autoconf-archive upstream, fix our local copy of ax_gcc_func_attribute.m4, so we can detect __attribute__((fallthrough)) to make --enable-checking work with clang. Change-Id: I80a4557384f8e1438344e48bfe722e20c8773882 Reviewed-on: https://gerrit.openafs.org/14273 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 88da6b4dfa4ad2b53508f9e0b559392cecb69c86 Author: Cheyenne Wills Date: Thu Jul 16 15:05:13 2020 -0600 cf: Make local copy of ax_gcc_func_attribute.m4 Make a local copy of ax_gcc_func_attribute from autoconf-archive. This is needed in order to fix a bug in the detection of the fallthrough attribute. Remove ax_gcc_func_attribute.m4 from src/external/autoconf-archive/m4. Update LICENSE file to point to the local copy in src/cf. Change-Id: I6c4244d2cd4edab4262c1820435c00419d85303b Reviewed-on: https://gerrit.openafs.org/14272 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit bb5397e4c409e3c075ee73d6bf54a3b6eacc0060 Author: Mark Vitale Date: Fri Apr 20 00:57:28 2018 -0400 rx: prevent leakage of non-cached rx_connections (pthread) The rxi_connectionCache (AFS_PTHREAD_ENV only) allows applications to reuse rx_connection structs. Cached rx_connections are obtained via rx_GetCachedConnection and released via rx_ReleaseCachedConnection. This feature is used most heavily by libadmin and kauth, but there are other users in the tree as well. For instance, ubikclient routines ubik_ClientInit and ubik_ClientDestroy call rx_ReleaseCachedConnections (if AFS_PTHREAD_ENV) when disposing of their rx_connections. Unfortunately, in many cases these rx_connections were obtained via rx_NewConnection, _not_ from the cache via rx_GetCachedConnection. In those cases, rx_ReleaseCachedConnection will not find the rx_connection in the rxi_connectionCache, and thus it returns without doing anything. Therefore, when ubik_ClientInit is passed an existing ubik_client (for re-initialization) that contains rx_connections NOT allocated via rx_GetCachedConnection, those connections are not destroyed, but will be silently leaked. Similarly, ubik_ClientDestroy will leak its rx_connections when it frees the ubik_client struct. For example, the fileserver host package calls ubik_ClientInit (via hpr_Initialize) and ubik_ClientDestroy (via hpr_End) to manage connections to the ptserver. However, these connections were obtained via rx_NewConnection, not rx_GetCachedConnection. If the fileserver has a failed call to the ptserver that sets prfail=1, the next RPC scheduled for that client (in CallPreamble) will refresh the thread's ubik_client (viced_uclient_key) by calling hprEnd -> ubik_ClientDestroy -> rx_ReleaseCachedConnection. The "released" connections will be leaked. This problem exists in all versions of OpenAFS going back to IBM 1.0. Starting with 1.8.x, many components that were formerly LWP-only are now pthreaded and thus susceptible to this leak. It seems difficult and error-prone to identify all possible code paths that may pass a non-cached rx_connection to rx_ReleaseCachedConnection, and convert them to obtain connections via rx_GetCachedConnection. Instead, prevent all existing and future leaks by modifying the connection cache to: - flag all rx_connections it allocates - correctly release any rx_connection it is passed, whether they came from the cache or not. Change-Id: Ibe164ccd30a8ddd799438c28fd6e1d8a0a9040dd Reviewed-on: https://gerrit.openafs.org/13042 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 55fca11421055d0bcee79f118ea2a035393cc6e5 Author: Mark Vitale Date: Mon Apr 30 18:34:28 2018 -0400 rx: fix out-of-range value for RX_CONN_NAT_PING Commit 496fb87372555f6acddd4fd88b03c94c85f48511 ("rx: avoid nat ping until connection is attached") introduced functionality to defer turning on NAT ping for server connections until after reachability had been established for the client. Unfortunately, this feature could never work correctly because it assigned an out-of-range flag value of 256 (0x100) for the u_char flags field. Instead of calling this out as an error, both gcc and Solaris cc elide this flag so that it is never set in rx_SetConnSecondsUntilNatPing(), Furthermore, the test in rxi_ConnClearAttachWait() will always fail; therefore rxi_ScheduleNatKeepAliveEvent is never called after attach wait has ended. Fortunately, this bug is currently moot - not actually exposed in OpenAFS. (It was discovered by inspection). This is because there are currently no rx_connection objects in the tree that have both NAT ping and checkReach (rx_SetCheckReach) enabled. I also searched git history and found no time when this bug could ever have been exposed. This does raise the question of why the original commit was needed; but instead of reverting the original commit, this commit attempts to fix it. To prevent problems if NAT ping and checkReach are ever both enabled for an rx_connection, enlarge the rx_connection flags member so that the RX_CONN_NAT_PING value is no longer out of range. Change-Id: Ib667ece632f66fa5c63a76398acb3153fed6f9c3 Reviewed-on: https://gerrit.openafs.org/13041 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d231134aadcaf2bd3a91f26ba6d3d451713a6fba Author: Andrew Deason Date: Mon May 18 12:38:31 2020 -0500 auth: Avoid cellconfig.c stdio renaming Since commit 35777145 (solaris-fopen-sucks-20060916), cellconfig.c has redirected fopen, fclose, and fgets to local functions on non-64bit-sparc Solaris, in order to work around that platform's stdio limitations. Commit 7c431f7571 (auth: retire writeconfig.c) moved the contents of writeconfig.c into cellconfig.c. The previous writeconfig.c contained some calls to stdio, including calling fprintf() on a pointer returned by fopen() in that file. Because fopen() was redirected to our local version, this means that afsconf_SetExtendedCellInfo() calls fopen() to get an afsconf_iobuffer*, and passes that pointer to the real system fprintf() later on (instead of a native FILE*). The compiler does warn about this, but this only happens on Solaris, where --enable-checking is not implemented, so the build never fails. To avoid this, remove the #defines for fopen, fgets, and fclose. Instead, change all of the old cellconfig.c callers to explicitly call afsconf_fopen, afsconf_fgets, and afsconf_fclose. On the affected Solaris platforms, we keep our local definitions, and for other platforms, we just make those functions call their system stdio equivalents. For the code that was pulled in from writeconfig.c, callers will just call the system fopen, fprintf, and fclose. We still keep our local afsconf_FILE* definition on all platforms, so the compiler will still do typechecking for our local afsconf_f* functions on all platforms. So now if we make a mistake, it should be a mistake on all platforms, so platforms with --enable-checking should flag the error. Change-Id: I4064d7f5ee82d5acab04a33b01c0603564a391e8 Reviewed-on: https://gerrit.openafs.org/14214 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit cd65475e95e25c8e7071e099a682bdcc03d2cce1 Author: Andrew Deason Date: Fri Jul 26 15:28:44 2019 -0500 afs: Let afs_ShakeLooseVCaches run longer Currently, when afs_ShakeLooseVCaches runs osi_TryEvictVCache, we check if osi_TryEvictVCache slept (i.e. dropped afs_xvcache/GLOCK). If we sleep over 100 times, then we stop trying to evict vcaches and return. If we have recently accessed a lot of AFS files, this limitation can severely reduce our ability to keep our number of vcaches limited to a reasonable size. For example: Say a Linux client runs a process that quickly accesses 1 million files (a simple 'find' command) and then does nothing else. A few minutes later, afs_ShakeLooseVCaches is run, but since all of the newly accessed vcaches have dentries attached to them, we will sleep on each one in order to try to prune the attached dentries. This means that afs_ShakeLooseVCaches will evict 100 vcaches, and then return, leaving us with still almost 1 million vcaches. This will happen repeatedly until afs_ShakeLooseVCaches finally works its way through all of the vcaches (which takes quite a while, if we only clear 100 at once), or the dentries get pruned by other means (such as, if Linux evicts them due to memory pressure). The limit of 100 sleeps was originally added in commit 29277d96 (newvcache-dont-spin-20060128), but the current effect of it was largely introduced in commit 9be76c0d (Refactor afs_NewVCache). It exists to ensure that afs_ShakeLooseVCaches doesn't take forever to run, but the limit of 100 sleeps may seem quite low, especially if those 100 sleeps run very quickly. To avoid the situation described above, instead of limiting afs_ShakeLooseVCaches based on a fixed number of sleeps, limit it based on how long we've been running, and set an arbitrary limit of roughly 3 seconds. Only check how long we've been running after 100 sleeps like before, so we're not constantly checking the time while running. Log a new warning if we exit afs_ShakeLooseVCaches prematurely if we've been running for too long, to help indicate what is going on. Change-Id: I65729ace748e8507cc0d5c26dec39e74d7bff5d2 Reviewed-on: https://gerrit.openafs.org/14254 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9ff45e73cf3d91d12f09e108e1267e37ae842c87 Author: Andrew Deason Date: Mon Jul 16 16:53:34 2018 -0500 afs: Skip bulkstat if stat cache looks full Currently, afs_lookup() will try to prefetch dir entries for normal dirs via bulkstat whenever multiple pids are reading that dir. However, if we already have a lot of vcaches, ShakeLooseVCaches may be struggling to limit the vcaches we already have. Entering afs_DoBulkStat can make this worse, since we grab afs_xvcache repeatedly, we may kick out other vcaches, and we'll possibly create 30 new vcaches that may not even be used before they're evicted. To try to avoid this, skip running afs_DoBulkStat if it looks like the stat cache is really full. Change-Id: I1634530170a189f32cb962dd7df28f88bc758b71 Reviewed-on: https://gerrit.openafs.org/13256 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0532f917f29bdb44f4933f9c8a6c05c7fecc6bbb Author: Andrew Deason Date: Mon Jul 16 16:44:14 2018 -0500 afs: Log warning when we detect too many vcaches Currently, afs_ShakeLooseVCaches has a kind of warning that is logged when we fail to free up any vcaches. This information can be useful to know, since it may be a sign that users are trying to access way more files than our configured vcache limit, hindering performance as we constantly try to evict and re-create vcaches for files. However, the current warning is not clear at all to non-expert users, and it can only occur for non-dynamic vcaches (which is uncommon these days). To improve this, try to make a general determination if it looks like the stat cache is "stressed", and log a message if so after afs_ShakeLooseVCaches runs (for all platforms, regardless of dynamic vcaches). Also try to make the message a little more user-friendly, and only log it (at most) once per 4 hours. Determining whether the stat cache looks stressed or not is difficult and arguably subjective (especially for dynamic vcaches). This commit draws a few arbitrary lines in the sand to make the decision, so at least something will be logged in the cases where users are constantly accessing way more files than our configured vcache limit. Change-Id: I022478dc8abb7fdef24ccc06d477b349cca759ac Reviewed-on: https://gerrit.openafs.org/13255 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 42fb8786a8fff30ea97524f896c5aee4fa307f89 Author: Mark Vitale Date: Thu Jun 25 11:45:19 2020 -0400 viced: propagate return from CleanupTimedOutCallBacks_r The fileserver's FiveMinuteCheckLWP periodically calls CleanupTimedOutCallBacks, and logs an informational messages if the return code indicates that any callbacks were discarded. However, since the original IBM code import, CleanupTimedOutCallBacks has 1) ignored the return value from CleanupTimedOutCallBacks_r and 2) unconditionally returned 0. This makes the informational message essentially dead code. Instead, check the code from CleanupTimedOutCallBacks_r and pass it back to the caller. Change-Id: I631831c398e43431b79f4a3a0c6f01307ac0c05e Reviewed-on: https://gerrit.openafs.org/14256 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f9d20c631d7280ce00125a1208331931a6e3f31c Author: Andrew Deason Date: Thu Jun 18 21:16:09 2020 -0500 LINUX: Close cacheFp if no ->readpage in fastpath In afs_linux_readpage_fastpath, if we discover that our disk cache fs has no ->readpage function, we'll 'goto out', but we never close our cacheFp. To make sure we close it, add a filp_close() call to the 'goto out' cleanup code. Change-Id: I371c1d7ec51b03447fbcbe58fb89be7be0235022 Reviewed-on: https://gerrit.openafs.org/14252 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit af73b9a3b1fc625694807287c0897391feaad52d Author: Cheyenne Wills Date: Thu Jul 2 13:39:27 2020 -0600 LINUX: Don't panic on some file open errors Commit 'LINUX: Return NULL for afs_linux_raw_open error' (f6af4a155) updated afs_linux_raw_open to return NULL on some errors, but still panics if obtaining the dentry fails. Commit 'afs: Verify osi_UFSOpen worked' (c6b61a451) updated callers of osi_UFSOpen to verify whether or not the open was successful. This meant osi_UFSOpen (and routines it calls) could pass back an error indication rather than panic when an error is encountered. Update afs_linux_raw_open to return a failure instead of panic if unable to obtain a dentry. Update osi_UFSOpen to return a NULL instead of panic if unable to obtain memory or fails to open the file. All callers of osi_UFSOpen handle a fail return, though some will still issue a panic. Update afs_linux_readpage_fastpath and afs_linux_readpages to not panic if afs_linux_raw_open fails. Instead of panic, return an error. For testing, an error can be forced by removing a file from the cache directory. Note this work is based on a commit by pruiter@sinenomine.net Change-Id: Ic47e4868b4f81d99fbe3b2e4958778508ae4851f Reviewed-on: https://gerrit.openafs.org/14242 Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d2d27f975df13c3833898611dacff940a5ba3e2a Author: Cheyenne Wills Date: Fri Jun 19 08:01:14 2020 -0600 afs: Avoid panics on failed return from afs_CFileOpen afs_CFileOpen is a macro that invokes the open "method" of the afs_cacheOps structure, and for disk caches the osi_UFSOpen function is used. Currently osi_UFSOpen will panic if there is an error encountered while opening a file. Prepare to handle osi_UFSOpen function returning a NULL instead of issuing a panic (future commit). Update callers of afs_CFileOpen to test for an error and to return an error instead of issuing a panic. While this commit eliminates some panics, it does not address some of the more complex cases associated with errors from afs_CFileOpen. Change-Id: I2bdd525633dd44ebf8e26fcfd7059dfdfffb6142 Reviewed-on: https://gerrit.openafs.org/14241 Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7d85ce221d6ccc19cf76ce7680c74311e4ed2632 Author: Cheyenne Wills Date: Thu Jun 25 10:43:53 2020 -0600 LINUX 5.8: use lru_cache_add With Linux-5.8-rc1 commit 'mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()' (6058eaec), the lru_cache_add_file function is removed since it was functionally equivalent to lru_cache_add. Replace lru_cache_add_file with lru_cache_add. Introduce a new autoconf test to determine if lru_cache_add is present For reference, the Linux changes associated with the lru caches: __pagevec_lru_add introduced before v2.6.12-rc2 lru_cache_add_file introduced in v2.6.28-rc1 __pagevec_lru_add_file replaces __pagevec_lru_add in v2.6.28-rc1 vmscan: split LRU lists into anon & file sets (4f98a2fee) __pagevec_lru_add removed in v5.7 with a note to use lru_cache_add_file mm/swap.c: not necessary to export __pagevec_lru_add() (bde07cfc6) lru_cache_add_file removed in v5.8 mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() (6058eaec) lru_cache_add exported mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() (6058eaec) Openafs will use: lru_cache_add on 5.8 kernels lru_cache_add_file from 2.6.28 through 5.7 kernels __pagevec_lru_add/__pagevec_lru_add_file on pre 2.6.28 kernels Change-Id: I79ebe4a81425bf8a8a327ddf2d3474aff9df039d Reviewed-on: https://gerrit.openafs.org/14249 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Yadavendra Yadav Reviewed-by: Benjamin Kaduk commit ae9ea8da699ba3f2ab0f7d76ae3333349fe3dfa3 Author: Benjamin Kaduk Date: Tue Jun 30 21:55:45 2020 -0700 Recode a couple files from ISO 8859-1 to UTF-8 Reported by Debian's lintian(1). The CellServDB, as an externally maintained file, is left unchanged. Change-Id: I3bf241b924cb8cd7799a4c3e799f6acd375b2e8a Reviewed-on: https://gerrit.openafs.org/14265 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit ba8b92401b8cb2f5a5306313c2702cb36cba083c Author: Andrew Deason Date: Sun Jul 8 15:00:02 2018 -0500 afs: Bound afs_DoBulkStat dir scan Currently, afs_DoBulkStat will scan the entire directory blob, looking for entries to stat. If all or almost all entries are already stat'd, we'll scan through the entire directory, doing nontrivial work on each entry (we grab afs_xvcache, at least). All of this work is pretty pointless, since the entries are already cached and so we won't do anything. If many processes are trying to acquire afs_xvcache, this can contribute to performance issues. To avoid this, provide a constant bound on the number of entries we'll search through: nentries * 4. The current arbitrary limits cap nentries at 30, so this means we're capping the afs_DoBulkStat search to 120 entries. Change-Id: I66e9af5b27844ddf6cf37c8286fcc65f8e0d3f96 Reviewed-on: https://gerrit.openafs.org/13253 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6c808e05adb0609e02cd61e3c6c4c09eb93c1630 Author: Andrew Deason Date: Thu Jul 13 17:40:36 2017 -0500 afs: Avoid needless W-locks for afs_FindVCache The callers of afs_FindVCache must hold at least a read lock on afs_xvcache; some hold a shared or write lock (and set IS_SLOCK or IS_WLOCK in the given flags). Two callers (afs_EvalFakeStat_int and afs_DoBulkStat) currently hold a write lock, but neither of them need to. In the optimal case, where afs_FindVCache finds the given vcache, this means that we unnecessarily hold a write lock on afs_xvcache. This can impact performance, since afs_xvcache can be a very frequently accessed lock (a simple operation like afs_PutVCache briefly holds a read lock, for example). To avoid this, have afs_DoBulkStat hold a shared lock on afs_xvcache, upgrading to a write lock when needed. afs_EvalFakeStat_int doesn't ever need a write lock at all, so just convert it to a read lock. Change-Id: I5bd58b9e3a577c9e1ebf1bc3719e65a6c0af5cb8 Reviewed-on: https://gerrit.openafs.org/12656 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e44d6441c8786fdaaa1fad1b1ae77704c12f7d60 Author: Kailas Zadbuke Date: Wed Jun 3 15:44:08 2020 +0530 util: Handle serverLogMutex lock across forks If a process forks when another thread has serverLogMutex locked, the child process inherits the locked serverLogMutex. This causes a deadlock when code in the child process tries to lock serverLogMutex, since we can never unlock serverLogMutex because the locking thread no longer exists. This can happen in the salvageserver, since the salvageserver locks serverLogMutex in different threads, and forks to handle salvage jobs. To avoid this deadlock, we register handlers using pthread_atfork() so that the serverLogMutex will be held during the fork. The fork will be blocked until the worker thread releases the serverLogMutex. Hence the serverLogMutex will be held until the fork is complete and it will be released in the parent and child threads. Thanks to Yadavendra Yadav(yadayada@in.ibm.com) for working with me on this issue. Change-Id: I191c8272825c1667bb2150146e04b1dfe36a54e4 Reviewed-on: https://gerrit.openafs.org/14239 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 19cd454f11997d286bc415e9bc9318a31f73e2c6 Author: Andrew Deason Date: Mon Jul 16 16:08:13 2018 -0500 afs: Split out bulkstat conditions into a function Our current if() statement for determining whether we should run afs_DoBulkStat to prefetch dir entries is a bit large, and grows over time. Split this logic out into a separate function to make it easier to maintain, and add some comments to help explain each condition. This commit should have no visible effects; it's just code reorganization. Change-Id: I0086189308d2f5e4b321c63f24110d74cda6433c Reviewed-on: https://gerrit.openafs.org/13254 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a05d5b7503e466e18f5157006c1de2a2f7d019f7 Author: Andrew Deason Date: Thu Jul 13 17:40:21 2017 -0500 afs: Change VerifyVCache2 calls to VerifyVCache afs_VerifyVCache is a macro that (on most platforms) effectively expands to: if ((avc->f.states & CStatd)) { return 0; } else { return afs_VerifyVCache2(...); } Some callers call afs_VerifyVCache2 directly, since they already check for CStatd for other reasons. A few callers currently call afs_VerifyVCache2, but without guaranteeing that CStatd is not set. Specifically, in afs_getattr and afs_linux_VerifyVCache, CStatd could be set while afs_CreateReq drops GLOCK. And in afs_linux_readdir, CStatd could be cleared at multiple different points before the VerifyVCache call. This can result in afs_VerifyVCache2 acquiring a write-lock on the vcache, even when CStatd is already set, which is an unnecessary performance hit. To avoid this, change these call sites to use afs_VerifyVCache instead of calling afs_VerifyVCache2 directly, which skips the write lock when CStatd is already set. Change-Id: I7b75c9755af147b42a48160fa90c9849f2f03ddb Reviewed-on: https://gerrit.openafs.org/12655 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7c9fb4455745ed0015d4a6311bd4a7770efbf40d Author: Mark Vitale Date: Thu Jun 18 13:43:35 2020 -0400 LINUX: replace BUG() call with osi_Panic() in osi_linux_free If osi_linux_free fails, it printf's an error message, then calls BUG(). This is the sole open-coded call to BUG() in OpenAFS; all other calls to BUG() are indirect via osi_Panic(). For consistency, eliminate this direct BUG() call by replacing the printf and BUG() with an equivalent osi_Panic(). This also ensures that the error messsage is logged as critical, and prefixed with "openafs:". Change-Id: Id319dffa859308528a66991bbbc522ca49552d51 Reviewed-on: https://gerrit.openafs.org/14250 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit d8ec294534fcdee77a2ccd297b4b167dc4d5573d Author: Cheyenne Wills Date: Tue Jun 16 18:35:46 2020 -0600 LINUX 5.8: do not set name field in backing_dev_info Linux-5.8-rc1 commit 'bdi: remove the name field in struct backing_dev_info' (1cd925d5838) Do not set the name field in the backing_dev_info structure if it is not available. Uses an existing config test 'STRUCT_BACKING_DEV_INFO_HAS_NAME' Note the name field in the backing_dev_info structure was added in Linux-2.6.32 Change-Id: I20b80e49e8a15a2949003101f24d9ce39f63b59b Reviewed-on: https://gerrit.openafs.org/14248 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c48072b9800759ef1682b91ff1e962f6904a2594 Author: Cheyenne Wills Date: Thu Jun 18 16:39:22 2020 -0600 LINUX 5.8: Replace kernel_setsockopt with new funcs Linux 5.8-rc1 commit 'net: remove kernel_setsockopt' (5a892ff2facb) retires the kernel_setsockopt function. In prior kernel commits new functions (ip_sock_set_*) were added to replace the specific functions performed by kernel_setsockopt. Define new config test 'HAVE_IP_SOCK_SET' if the 'ip_sock_set' functions are available. The config define 'HAVE_KERNEL_SETSOCKOPT' is no longer set in Linux 5.8. Create wrapper functions that replace the kernel_setsockopt calls with calls to the appropriate Linux kernel function(s) (depending on what functions the kernel supports). Remove the unused 'kernel_getsockopt' function (used for building with pre 2.6.19 kernels). For reference Linux 2.6.19 introduced kernel_setsockopt Linux 5.8 removed kernel_setsockopt and replaced the functionality with a set of new functions (ip_sock_set_*) Change-Id: I517b674303c5decc19313d9de51d04ddef36b421 Reviewed-on: https://gerrit.openafs.org/14247 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cbc5c4b51fcd0a990216fc31abe308a9e85fd9df Author: Andrew Deason Date: Wed Jun 17 12:23:46 2020 -0500 tests: Modernize writekeyfile.c tests/auth/writekeyfile.c contains some code used to generate tests/auth/KeyFile, which is used to test code interpreting the old-style KeyFile format. This code currently has a few problems: - We don't check the results of afstest_mkdtemp, which could allow symlink attacks from other users on the system. - We duplicate some logic from afstest_BuildTestConfig, in order to build a temporary config dir. - writekeyfile isn't built or run by default (it only exists to generate KeyFile, so it's almost never run), so eventual bitrot is quite likely, and the existing code already generates warnings. To avoid this, change writekeyfile.c to use the existing afstest_BuildTestConfig to generate a local config dir. To ensure we avoid bitrot, build writekeyfile by default, and create a test to run it, to make sure it can generate a KeyFile as expected. Note that the KeyFile.short we test against is different than the KeyFile currently in the tree. The existing KeyFile was generated from an older OpenAFS release, which always generated 100-byte KeyFiles, even if we only have a few keys. The current codebase only writes out as much key data as needed, so the generated KeyFiles are shorter (but still understandable by older OpenAFS releases). Keep the old 100-byte KeyFile around, since that's what older OpenAFS would generate, and create a new KeyFile.short to test against, to make sure our code for generating KeyFiles doesn't change any further. Change-Id: Ibe9246c6dd808ed2b2225dd7be2b27bbdee072fd Reviewed-on: https://gerrit.openafs.org/14246 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 22a66e7b7e1d73437a8c26c2a1b45bc4ef214e77 Author: Cheyenne Wills Date: Tue Jun 16 15:20:20 2020 -0600 tests: Use usleep instead of nanosleep Commit "Build tests by default" 68f406436cc21853ff854c514353e7eb607cb6cb changes the build so tests are always built. On Solaris 10 the build fails because nanosleep is in librt, which we do not link against. Replace nanosleep with usleep. This avoids introducing extra configure tests just for Solaris 10. Note that with Solaris 11 nanosleep was moved from librt to libc, the standard C library. Change-Id: I6639f32bb8c8ace438e0092a866f06561dad54f1 Reviewed-on: https://gerrit.openafs.org/14244 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5f4a681eeb5e353f09aa895770f7336a2b381467 Author: Cheyenne Wills Date: Wed Jun 17 13:08:18 2020 -0600 tests: Emulate mkdtemp when not available Commit "Build tests by default" 68f406436cc21853ff854c514353e7eb607cb6cb changes the build so tests are always built. On Solaris 10 Update 10 and earlier the build fails because the mkdtemp function is not available. Introduce a wrapper 'afstest_mkdtemp' that uses mkdtemp if available, otherwise uses mktemp/mkdir. Change-Id: I0118f838ed9a89927e2ddac4cad822574601558a Reviewed-on: https://gerrit.openafs.org/14243 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 188ca8bf5276084a6892e5cfba3e24e478804382 Author: Michael Meffie Date: Thu Apr 16 09:41:41 2020 -0400 make-release: Run git describe once Run git describe once at the beginning of make-release to find the version information used to derive the tarball file names and saved in the .version file. This is a cleanup and refactoring change to prepare for a future commit. Change-Id: I0debeeffa5d2c63ab1498588766cb36424d15cd5 Reviewed-on: https://gerrit.openafs.org/14150 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit d0753c0ace8e43a7dc1db35c3f41130352278c04 Author: Michael Meffie Date: Fri Mar 27 11:29:24 2020 -0400 make-release: Create output directory if needed Automatically create the --dir directory if it does not already exist, which makes this script slightly easier to use. Remove the now uneeded mkdir from the top-level makefile. Change-Id: I1f4561120a70263b0b2b194e65fec55fb5666f40 Reviewed-on: https://gerrit.openafs.org/14115 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit d20d392091a13c3944973bcb0ce84783a4e0d179 Author: Michael Meffie Date: Thu Apr 16 07:21:51 2020 -0400 make-release: Remove unused optional version argument The make-release help shows an optional version argument, but in fact the version info is always generated from the git tag name argument, which makes sense when creating releases. Continue to throw away the second positional argument just in case someone is still passing a second argument, but issue a warning if they do. Change-Id: Ie4c6e6efb7693e53a02fd009eecd64b47250c848 Reviewed-on: https://gerrit.openafs.org/14149 Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 46eb00ffa1c6d7deda2c1b1b4fa1780b36e64417 Author: Michael Meffie Date: Thu Apr 16 07:37:39 2020 -0400 make-release: Clean up whitespace and spelling Fix whitespace errors, convert tabs to spaces, fix spelling errors, and fix pod markup in the make-release script. Change-Id: I24ede59d44a8818d89de454c0935586fccbd5d9a Reviewed-on: https://gerrit.openafs.org/14148 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit c9eab4b1ee947067bfcc3678bb89896b66f404f8 Author: Andrew Deason Date: Tue Jun 2 11:12:58 2020 -0500 afs: Remove osi_GetuTime osi_GetuTime has always been #define'd to be the same thing as osi_GetTime, ever since OpenAFS 1.0. Get rid of this redundant macro, and just use osi_GetTime instead. Change-Id: Ic826aeaa17314019b79cfb2df04a79309aa31db5 Reviewed-on: https://gerrit.openafs.org/14236 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit dedb1aed97e64036d8098e12904c9eb54fda7010 Author: Jeffrey Altman Date: Sun May 31 13:05:02 2020 -0400 afs/viced: New UAE (unified_afs) error codes The following registrations werte submitted to registrar@central.org as [rt.central.org #135105]. UAECANCELED, "Operation canceled" (49733499L) UAENOTRECOVERABLE, "State not recoverable" (49733500L) UAENOTSUP, "Not supported" (49733501L) UAEOTHER, "Other" (49733502L) UAEOWNERDEAD, "Owner dead" (49733503L) UAEPROCLIM, "Too many processes" (49733504L) UAEDISCON, "Graceful shutdown in progress" (49733505L) Change-Id: I1458b8a9441b3826756ca67af70eee5e835d989f Reviewed-on: https://gerrit.openafs.org/14235 Reviewed-by: Jeffrey Hutzelman Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit ed9a3b7165ae2300ebb185ca53e698e5ef93173b Author: Cheyenne Wills Date: Fri May 29 10:36:13 2020 -0600 util: Fix segfault in the func ConstructLocalPath The function ConstructLocalPath will segfault if passed a NULL for the command path parameter. Update ConstructLocalPath to test the passed command path for a NULL and return ENOENT. The segfault can be triggered by setting up a BosConfig with a dafs bnode that does not contain all the required parms. This setup results in bosserver segfaulting. With the fix, bosserver now logs an error and exits cleanly. Change-Id: I26015c8accd829f3101b073964777b41d16b07f7 Reviewed-on: https://gerrit.openafs.org/14223 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 336f5d91c6f4e93f77560d456fb29fbd82b237e5 Author: Mark Vitale Date: Sun May 10 20:53:22 2020 -0400 DARWIN: ensure OpenAFS.pkg is signed Installation fails because the OpenAFS.pkg was inadvertently omitted from the codesign logic. Ensure that the package is signed. Change-Id: I0745146bc523750912dd6ee95fc16a70572be175 Reviewed-on: https://gerrit.openafs.org/14221 Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d3f8d8122880de9f5b25868b39efd1cc7d385ff6 Author: Mark Vitale Date: Sun May 10 20:51:59 2020 -0400 DARWIN: ensure PrefPane materials are properly signed Notarization fails because some prefPane materials were inadvertently omitted by the codesign logic. Ensure that these objects are properly signed. Change-Id: Ifc58e6f834a3237b7991257ee85de4e90fc3da12 Reviewed-on: https://gerrit.openafs.org/14220 Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 80afdc2adabb098394e1b2178ba301964868befe Author: Andrew Deason Date: Fri Dec 20 21:02:45 2019 -0600 vol: Avoid building devname.c on AFS_NAMEI_ENV Everything in devname.c is for the inode vol backend, so skip building it when AFS_NAMEI_ENV is defined. While we're doing this, alter the #ifdefs inside this file to assume that we're not on XBSD, DARWIN, or LINUX, since those platforms are all namei-only. Change-Id: I3a46568940e1a865a381c1ac7e98aea94df9f3ef Reviewed-on: https://gerrit.openafs.org/13995 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 99eedfdb1659dd48d12542ad063d4711d401e153 Author: Andrew Deason Date: Fri Dec 20 21:01:13 2019 -0600 vol: Indent ifdef maze in devname.c Change-Id: I371eb1d79ae9fb3f07af993be834af6f6b59c100 Reviewed-on: https://gerrit.openafs.org/13994 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 71ce9fff8e682a77e17490a54e091656cbf96925 Author: Tim Creech Date: Mon Dec 9 21:13:58 2019 -0500 FBSD: Add support for FreeBSD 12.1 Change-Id: I5779c586b6b1255de0ee0dea66b09f3a5dffddc1 Reviewed-on: https://gerrit.openafs.org/13982 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 20dc2832268eb81d40e798da0d424c98cf26062c Author: Andrew Deason Date: Sun Nov 24 22:36:17 2019 -0600 FBSD: Ignore VI_DOOMED vnodes Currently on FreeBSD, osi_TryEvictVCache calls vgone() for our vnode after checking if the given vcache is in use. vgone() then calls our VOP_RECLAIM operation, which calls afs_vop_reclaim, which calls afs_FlushVCache to finally actually flush the vcache. The current approach has at least the following major issues: - In afs_vop_reclaim, we return success even if afs_FlushVCache() fails. This allows FreeBSD to reuse the vnode for another file, but the vnode is still being referenced by our vcache, which is referenced by the global VLRU and various other structures. This causes all kinds of weird errors, since we try to use the underlying vnode for different files. - After the relevant checks in osi_TryEvictVCache are done, another thread can acquire a new reference to our vcache (this can happen while vgone() is running up until the vnode is locked). This new reference will cause afs_FlushVCache to fail. - Our afs_vop_reclaim callback is called while the vnode is locked, and can acquire afs_xvcache. Other code locks the vnode while afs_xvcache is already held (such as afs_PutVCache -> vrele). This can lead to deadlocks if two threads try to run these codepaths for the same vnode at the same time. - afs_vop_reclaim optionally acquires afs_xvcache based on the return value of CheckLock(&afs_xvcache). However, CheckLock just returns if that lock is locked by anyone, not if the current thread holds the lock. This can result in the rest of the function running without afs_xvcache actually being held if we drop AFS_GLOCK at any point. - osi_TryEvictVCache() tries to vn_lock() the target vnode, but we may already have another vnode locked in the current thread. If the vnode we're trying to evict is a descendant of a vnode we already have locked, this can deadlock. To fix these issues, make some changes to how our vcache management works on FreeBSD: - Do not allow anyone to hold a new reference on a VI_DOOMED vnode. We do this by checking for VI_DOOMED in osi_vnhold, and returning an error if VI_DOOMED is set. - In afs_vop_reclaim, panic if afs_FlushVCache fails. With the new VI_DOOMED check, afs_FlushVCache show now never fail; and if it somehow does, panic'ing immediately is better than corrupting various structures and panic'ing later on. - Move around some of the relevant locking in afs_vop_reclaim to fix the lock-related issues. - In osi_TryEvictVCache, don't wait for the vnode lock (LK_NOWAIT); treat the vnode as "in use" if we can't immediately obtain the lock. Thanks to tcreech@tcreech.com and kaduk@mit.edu for insight and help investigating the relevant issues. FIXES 135041 Change-Id: I23e94ecebbddc8c68a8f4ea918d64efd0f9f9dfd Reviewed-on: https://gerrit.openafs.org/13972 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 145c90bdbeeff4ea95acacd7dc110f0c6fcba281 Author: Mark Vitale Date: Sun May 10 22:13:13 2020 -0400 DARWIN: remove vestigial etap_event_t typedefs These typedefs have been present since commit a41175cfbbf4d06ccfe14ae54bef8b7464ecd80b "initial-darwin-support-20010327"; at least some of this material was obtained directly from IBM after the initial code import. Based on research of old Darwin source code and kernel documentation, the Event Trace Analysis Package (ETAP) was a lock-profiling interface provided in older versions of Mach and xnu. ETAP was not enabled by default; the kernel had to be recompiled with certain options to enable it. Support for ETAP was removed from the xnu tree sometime between xnu-517 (10.3 Panther) and xnu-792 (10.4 Tiger), although some references remain in the latter under PPC support (osfmk/ppc/hw_lock.s). All remaining references to etap_event_t disappeared when PPC support was removed, some time between xnu-1456.1.26 (10.6 Snow Leopard) and xnu-1699.24.8 (10.7.2 Lion). Therefore, it is possible that these typedefs were needed in the past by (IBM/Transarc) AFS to support use of some lock APIs (e.g., simple_lock_init, usimple_lock_init) after the ETAP code was withdrawn from xnu. However, these typedefs have probably always been vestigial for OpenAFS, because OpenAFS has never used any lock API that took etap_event_t as an argument. Regardless, OpenAFS does not need these definitions to build and run on any currently supported version of macOS. Remove the vestigial code. No functional change should be incurred by this commit. Change-Id: I39b3f82a8933d15ef5b5de5eb92366c0a31f8bb6 Reviewed-on: https://gerrit.openafs.org/14219 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit f065706fed4edd53376a33339fe20de686eee6a1 Author: Mark Vitale Date: Sun May 10 22:07:39 2020 -0400 DARWIN: remove errant typedef for etap_event_t This code has been dead since its introduction, because XAFS_DARWIN_ENV is a typo for AFS_DARWIN_ENV. Introduced from day 1 of DARWIN support with commit a41175cfbbf4d06ccfe14ae54bef8b7464ecd80b "initial-darwin-support-20010327". No functional change should be incurred by this commit. Change-Id: I6b74f01b4dd1230559ac8d75f0644071357f38b7 Reviewed-on: https://gerrit.openafs.org/14218 Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c6eff25be9fc959f666b33425c9ee2635224826e Author: Mark Vitale Date: Mon May 18 14:19:25 2020 -0400 Convert all osi_timeval_t to osi_timeval32_t Since commit 130144850c6d05bc69e06257a5d7219eb98697d8 "xstat: cm xstat time values are 32 bit", OpenAFS has had two timeval definitions: osi_timeval_t and osi_timeval32_t. Since they are functionally equivalent, convert all references to osi_timeval_t to osi_timeval32_t. This makes clear that this struct is always expected to contain 32-bit members for tv_sec and tv_usec. There are still a few platforms where osi_timeval32_t is mistakenly defined with 64-bit members; these will be addressed in future commits. No functional change should be incurred by this commit. Change-Id: I3e8e44235e813571723fcd114194f6cb83de90e4 Reviewed-on: https://gerrit.openafs.org/14215 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit d6101128664918e6fcefbaeb68c4c1d439851411 Author: Mark Vitale Date: Mon May 4 17:35:05 2020 -0400 UKERNEL: remove dead code osi_SetTime osi_SetTime has been dead code since the original IBM code import. Remove it from the tree. No functional change is incurred by this commit. Change-Id: I25612a044ad550d798003979afc6845e502ebe3b Reviewed-on: https://gerrit.openafs.org/14191 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 03f44172180563cb9d12d79e5512aae815fee899 Author: Mark Vitale Date: Tue May 5 11:26:00 2020 -0400 UKERNEL: remove redundant declaration of osi_GetTime Commit c861bb0d779b54236b63eda87d9dfaf7792d1659 "Additional UKERNEL headers, prototyping and other fixes" added the following lines to src/rx/rx_prototypes.h: #if defined(UKERNEL) && !defined(osi_GetTime) extern int osi_GetTime(struct timeval *tv); #endif However, this appears to be redundant with the declaration in src/afs/afs_prototypes.h: #ifdef UKERNEL ... extern int osi_GetTime(struct timeval *tv); ... #endif which was added much earlier with commit 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821". Remove the redundant declaration in rx/rx_prototypes.h. No functional change is incurrred by this commit. Change-Id: I2032d302e862eed47250357e604cba4f26e89814 Reviewed-on: https://gerrit.openafs.org/14192 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3ab022fda9d2bde603c032d4a5bff0f79e825f3d Author: Mark Vitale Date: Thu Apr 16 09:02:00 2020 -0400 afs: remove commented xstats externs Extern declarations for the xstats recording areas have been commented out since 8f2df21ffe59e9aa66219bf24656775b584c122d "pull-prototypes-to-head-20020821". Remove the vestigial comments. No functional change is incurred by this commit. Change-Id: Ieef9a4b21e78db8d5427bed7b621ba043663b1d1 Reviewed-on: https://gerrit.openafs.org/14197 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Tested-by: BuildBot commit 4caadf71f556f789bcdd2bcc80b9642630329421 Author: Mark Vitale Date: Sun Apr 5 17:10:42 2020 -0400 afs: remove stats dead code afs_GetCMSTats, afs_AddToMean, and macro AFS_MEANCNT have been dead code since the original IBM code import. Remove them from the tree. No functional change is incurred by this commit. Change-Id: Icd6aeff7896d69a4d334531b5e0c632d807457ce Reviewed-on: https://gerrit.openafs.org/14196 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Tested-by: BuildBot commit 9a5790cfbb8e7b1a4a2e832911c71da49f604c20 Author: Mark Vitale Date: Mon May 18 17:20:26 2020 -0400 LINUX 5.6: define osi_timeval32_t for 32-bit Linux For 32-bit Linux (e.g., arch i586), AFS_LINUX_64BIT_KERNEL is not defined, so osi_timeval32_t is defined as a typedef of the native 'timeval'. However, as of commit c766d1472c70d25ad475cf56042af1652e792b23 "y2038: hide timeval/timespec/itimerval/itimerspec types" (Linux 5.6), the native timeval struct is no longer available. On such a kernel, the OpenAFS build will fail because osi_timeval32_t is not properly defined. Instead, add new conditionals to properly define osi_timeval32_t for this platform. Change-Id: I1eddeeb3651dcd3c55920ab1d2ad2838f4729bdd Reviewed-on: https://gerrit.openafs.org/14216 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 13e44b2b200cd99d0df4e03cf6413d3a6915783f Author: Andrew Deason Date: Mon Nov 18 23:17:12 2019 -0600 afs: Refactor osi_vnhold/AFS_FAST_HOLD Make a few changes to osi_vnhold and AFS_FAST_HOLD: - Currently, the second argument of osi_vnhold ("retry") is never used by any implementation. Get rid of it. - AFS_FAST_HOLD() is the same as osi_vnhold(). Get rid of AFS_FAST_HOLD, and just have all callers use osi_vnhold instead. - Allow osi_vnhold to return an error, and adjust callers to handle it. - Change osi_vnhold to be a real function, instead of a macro, to make nontrivial implementations less cumbersome. Most platforms never return an error from osi_vnhold(), so the added code paths to check the return value of osi_vnhold() will not trigger. However, this lets us add future commits that do make osi_vnhold() return an error. Change-Id: Id2f3717be6c305d06305685247ac789815e1ebf7 Reviewed-on: https://gerrit.openafs.org/13971 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d01398731550b8a93b293800642c3e1592099114 Author: Andrew Deason Date: Fri May 1 15:02:08 2020 -0500 vlserver: Return error when growing beyond 2 GiB In the vlserver, when we add a new vlentry or extent block, we grow the VLDB by doing something like this: vital_header.eofPtr += sizeof(item); Since we don't check for overflow, and all of our offset-related variables are signed 32-bit integers, this can cause some odd behavior if we try to grow the database to be over 2 GiB in size. To avoid this, change the two places in vlserver code that grow the database to use a new function, grow_eofPtr(), which checks for 31-bit overflow. If we are about to overflow, log a message and return an error. See the following for a specific example of our "odd behavior" when we overflow the 2 GiB limit in the VLDB: With 1 extent block, we can create 14509076 vlentries successfully. On the 14509077th vlentry, we'll attempt to write the entry to offset 2147483560 (0x7FFFFFA8). Since a vlentry is 148 bytes long, we'll write all the way through offset 2147483707 (0x8000003B), which is over the 31-bit limit. In the udisk subsystem, this results in writing to page numbers 2097151, and -2097152 (since our ubik pages are 1k, and going over the 31-bit limit causes us to treat offsets as negative). These pages start at physical offsets 2147482688 (0x7FFFFC40) and -2147483584 (-0x7FFFFFC0) in our vldb.DB0 (where offset is page*1024+64). Modifying each of these pages involves reading in the existing page first, modifying the parts we are changing, and writing it back. This works just fine for 2097151, but of course fails for -2097152. The latter fails in DReadBuffer when eventually our pread() fails with EINVAL, and causes ubik to log the message: Ubik: Error reading database file: errno=22 But when DReadBuffer fails, DReadBufferForWrite assumes this is due to EOF, and just creates a new buffer for the given page (DNewBuffer). So, the udisk_write() call ultimately succeeds. When we go to flush the dirty data to disk when committing the transaction, after we have successfully written the transaction log, DFlush() fails for the -2097152 page when the pwrite() call eventually fails with EINVAL, causing ubik to panic, logging the messages: Ubik PANIC: Writing Ubik DB modifications When the vlserver gets restarted by bosserver, we then process the transaction log, and perform the operations in the log before starting up (ReplayLog). The log records the actual data we wrote, not split into pages, and the log-replaying code writes directly to the db usying uphys_write instead of udisk_write. So, because of this, the write actually succeeds when replaying the log, since we just write 148 bytes to offset 2147483624 (0x7FFFFFE8), and no negative offsets are used. The vlserver will then be able to run, but will be unable to read that newly-created vlentry, since it involves reading a ubik page beyond the 31-bit boundary. That means trying to lookup that entry will fail with i/o errors, and as well as any entry on the same hash chains as the new entry (since the new entry will be added to the head of the hash chain). Listing all entries in the database will also just show an empty database, since our vital_header.eofPtr will be negative, and we determine EOF by comparing our current blockindex to the value in eofPtr. Change-Id: Ie0b7ac61f9121fa265686449efbae8e18edb1896 Reviewed-on: https://gerrit.openafs.org/14180 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Cheyenne Wills commit d73680c5f70ee5aeb634a9ec88bf1097743d0f76 Author: Cheyenne Wills Date: Mon May 11 14:06:19 2020 -0600 vol: Fix format-truncation warning with gcc-10.1 Building with gcc-10.1 produces a warning (error if --enable-checking) in vol-salvage.c error: ‘%s’ directive output may be truncated writing up to 755 bytes into a region of size 255 [-Werror=format-truncation=] 809 | snprintf(inodeListPath, 255, "%s" OS_DIRSEP "salvage.inodes.%s.%d", tdir, name, Use strdup/asprintf to allocate the buffer dynamically instead of using a buffer with a hardcoded size. Change-Id: Ib2f01c2eb73c7abc162be2b1939e55688a81f812 Reviewed-on: https://gerrit.openafs.org/14207 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c81579dc7b0c0ac6bc34f63384d705a4445c2bbd Author: Andrew Deason Date: Mon May 18 12:09:38 2020 -0500 auth: Close fd on SetExtendedCellInfo write error Currently, and since OpenAFS 1.0, if write() fails here, we leak the file descriptor. A write() failure should be very unlikely, but close the fd to make sure we avoid the leak. Change-Id: I4e8ed4216c4aa5041232fc798a7bc59f6a5570d9 Reviewed-on: https://gerrit.openafs.org/14213 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 85df3e3d43e033b1c25c33e4a74d4b7b59b567b5 Author: Andrew Deason Date: Sun Jul 21 18:55:49 2019 -0500 afs: Free rx/rxevent resources during shutdown Call shutdown_rx() and shutdown_rxevent() near the end of our shutdown sequence, in order to free various Rx resources and avoid memory leaks. Change-Id: Id2e912295cf760b5ad83057487e6c4c4fadda11b Reviewed-on: https://gerrit.openafs.org/13719 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 17b42fe67c18fab0003fb712092d36f06c93f2eb Author: Cheyenne Wills Date: Thu Apr 30 10:31:17 2020 -0600 LINUX-5.7: replace __pagevec_lru_add with lru_cache_add_file The Linux function __pagevec_lru_add is no longer exported in Linux 5.7-rc1 commit bde07cfc65da5fe6c63fe23f035f5ccc0ffd89e0 "mm/swap.c: not necessary to export __pagevec_lru_add()". As a replacement, the Linux function lru_cache_add_file can be used for adding a page to the lru cache. The internal processing of lru_cache_add_file manages its own internal pagevec and performs the following: get_page(...) if(!pagevec_add(...)) __pagevec_lru_add_file(...) Introduce an autoconf test for lru_cache_add_file and replace the calls associated with __pagevec_lru_add with lru_cache_add_file. NOTE: see Linux commit a0b8cab3b9b2efadabdcff264c450ca515e2619c "mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API" as a reference for this change. The lru_cache_add_file was introduced in Linux 2.6.28, therefore this change affects systems with Linux 2.6.28 kernels and later. Change-Id: I12b32fd5061fc136f8b96ef3605e0bab736ca9ed Reviewed-on: https://gerrit.openafs.org/14159 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit dca95bcb7efdff38564dcff3e8f4189735f13b3a Author: Cheyenne Wills Date: Wed Apr 29 16:26:02 2020 -0600 libafs: Abstract the Linux lru cache interface Define static functions afs_lru_cache_init, afs_lru_cache_add and afs_lru_cache_finalize to handle interfacing with Linux's lru facilities. This change's primary purpose is to isolate the preprocessor conditionals associated with the details of the system lru interfaces to just these functions and to simplify the areas that utilize lru caching by removing the preprocessor conditionals. As Linux's lru facilities change, additional conditional code will be needed. Change-Id: I74c94bb712359975e3fd1df85f1b338b215f61b0 Reviewed-on: https://gerrit.openafs.org/14167 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 44b7b93b593371bfdddd0be0ae603f4f8720f78b Author: Andrew Deason Date: Sat May 2 23:54:55 2020 -0500 afs: Drop GLOCK for RXAFS_GetCapabilities We are hitting the net here; we certainly should not be holding AFS_GLOCK while waiting for the server's response. Found via FreeBSD WITNESS. Change-Id: Ie727db27adaeed23ac8cff7665143bae2ce2ede8 Reviewed-on: https://gerrit.openafs.org/14181 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5d53ed0bdab6fea6d2426691bdef2b6f9cb7f2fe Author: Yadavendra Yadav Date: Wed Apr 29 05:10:05 2020 +0000 rxkad: Use krb5_enctype_keysize in tkt_DecodeTicket5 Inside tkt_DecodeTicket5 (rxkad/ticket5.c) function, keysize is calculated using krb5_enctype_keybits and then dividing number of bits by 8. For 3DES number of keybits are 168, so keysize comes out to 21(168/8). However actual keysize of 3DES key is 24. This keysize is passed to _afsconf_GetRxkadKrb5Key where keysize comparison happens, since there is keysize mismatch it returns AFSCONF_BADKEY. To fix this issue get keysize from krb5_enctype_keysize function instead of krb5_enctype_keybits. Thanks to John Janosik (jpjanosi@us.ibm.com) for analyzing and fixing this issue. Change-Id: Ia6f70b878feaa91855f9544ec1de81a6196a85a8 Reviewed-on: https://gerrit.openafs.org/14203 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 9866511bb0a5323853e97e3ee92524198813776e Author: Andrew Deason Date: Sun Jul 21 18:48:51 2019 -0500 rx: Avoid osi_NetSend during rx shutdown Commit 8d939c08 (rx: avoid nat ping during shutdown) added a call to shutdown_rx() inside the DARWIN shutdown sequence, before the rx socket was closed. From the commit message, it sounds like this was done to avoid NAT pings from calling osi_NetSend during the shutdown sequence after the rx socket was closed; calling shutdown_rx() before closing the socket would cause any connections we had to be destroyed first, avoiding that. The problem with this is that this means shutdown_rx() is called when osi_StopNetIfPoller is called, which is much earlier than some other portions of the shutdown sequence; some of which may hold references to e.g. rx connections. If we try to, for instance, destroy an rx connection after shutdown_rx() is called, we could panic. An earlier version of that commit (gerrit PS1) just tried to insert a check before the relevant osi_NetSend call, making us just skip the osi_NetSend if the shutdown sequence had been started. So to avoid the above issue, try to implement that approach instead. And instead of doing it just for NAT pings, we can do it for almost all osi_NetSend calls (besides those involved in the shutdown sequence itself), by checking this in rxi_NetSend. Also return an error (ESHUTDOWN) if we skip the osi_NetSend call, so we're not completely silent about doing so. This means we also remove the call to shutdown_rx() inside DARWIN's osi_StopNetIfPoller(). This allows us to interact with Rx objects during more of the shutdown process in cross-platform code. Change-Id: I4e631b28d090635aeacd59de0fd237d572f97e93 Reviewed-on: https://gerrit.openafs.org/13718 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 929d501421579290ce1d4f9aabe45980e5458a9a Author: Cheyenne Wills Date: Fri Apr 3 15:00:42 2020 -0600 Add more 'fall through' switch comments Commit a455452d (LINUX 5.3: Add comments for fallthrough switch cases) added the special /* fall through */ comment to various switch/case blocks, in order to avoid implicit-fallthrough warnings from causing the build to fail when building the Linux kernel module. In this commit, add additional /* fall through */ comments to the rest of the tree where falling through is intentional. Add a "break;" in one place in dumptool.c where falling through seems like a mistake, and flag certain functions as AFS_NORETURN to avoid needing to explicitly break or fallthrough. Check for the availability of the -Wimplicit-fallthrough compiler flag and use it when --enable-checking is set, to prevent additional cases from creeping into the tree. Note: the -Wimplicit-fallthrough compiler flag was added in gcc 7. Change-Id: Iae34e7969606603da8358d7cfa5fd04279b218dc Reviewed-on: https://gerrit.openafs.org/14125 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 4512d04a9b721cd9052c0e8fe026c93faf6edb9e Author: Kailas Zadbuke Date: Thu May 7 23:55:39 2020 -0400 salvaged: Fix "-parallel all" parsing In salavageserver -parallel option takes "all" argument. However the code does not parse the numeric part correctly. Due to this, only single instance of salvageserver process was running even if we provide the larger number with "all" argument. With this fix, numeric part of "all" argument will be parsed correctly and will start required number of salvageserver instances. Change-Id: Ib6318b1d57d04fecb84915e2dabe40930ea76499 Reviewed-on: https://gerrit.openafs.org/14201 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 790824ff749b6ee01c4d7101493cbe8773ef41c6 Author: Cheyenne Wills Date: Sun Apr 5 15:51:17 2020 -0600 cf: Use common macro to test compiler flags Use the AX_APPEND_COMPILE_FLAGS macro to test and set compiler specific flags. Remove the OPENAFS_GCC_SUPPORTS_MARCH check entirely (and the associated P5PLUS_KOPTS), since nothing has used it for quite some time. Change-Id: Ic9626c52ac62cf83d4b8c787aa5aa966e558a781 Reviewed-on: https://gerrit.openafs.org/14132 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 98b5ffb52117aefac5afb47b30ce9b87eb2fdebf Author: Andrew Deason Date: Mon Apr 20 13:03:15 2020 -0500 ubik: Avoid unlinking garbage during recovery In urecovery_Interact, if any of our operations fail around calling DISK_GetFile, we will jump to FetchEndCall and eventually unlink 'pbuffer'. But if we failed before opening our .DB0.TMP file, the contents of 'pbuffer' will not be initialized yet. During most iterations of the recovery loop, the contents of 'pbuffer' will be filled in from previous loops, and it should always stay the same, so it's not a big problem. But if this is the first iteration of the loop, the contents of 'pbuffer' may be stack garbage. Solve this in two ways. To make sure we don't use garbage contents in 'pbuffer', memset the whole thing to zeroes at the beginning of urecovery_Interact(). And then to make sure we're not reusing 'pbuffer' contents from previous iterations of the loop, also clear the first character to NUL each time we arrive at this area of the recovery code. And avoid unlinking anything if pbuffer starts with a NUL. Commit 44e80643 (ubik: Avoid unlinking garbage) fixes the same issue, but only fixed it in the SDISK_SendFile codepath in remote.c. Change-Id: Ica39e66efa89562068a4be3a14b2d13594b77f6d Reviewed-on: https://gerrit.openafs.org/14153 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit ca847ddf35e336a8bc3159ce4b26f0162417bbd5 Author: Andrew Deason Date: Sat Apr 4 22:35:07 2020 -0500 Use autoconf-archive m4 from src/external Switch to using the m4 macros from autoconf-archive in our src/external mechanism, instead of manually-copied versions in src/cf. The src/external copy of ax_gcc_func_attribute.m4 is identical to the existing copy in src/cf, so that should incur no changes. There are also a few new macros pulled in, but they are currently unused. Increase our AC_PREREQ in configure.ac to 2.64, to match the AC_PREREQ in some of the new files. Change-Id: I8acfe4df7b9a22d9b9e69004c3438034a2dacadb Reviewed-on: https://gerrit.openafs.org/14135 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit d8205bbb482554812fbe66afa3c337d991a247b6 Author: Autoconf Archive Maintainers Date: Tue Apr 7 10:23:16 2020 -0500 Import of code from autoconf-archive This commit updates the code imported from autoconf-archive to 24358c8c5ca679949ef522964d94e4d1cd1f941a (v2019.01.06) New files are: m4/ax_append_compile_flags.m4 m4/ax_append_flag.m4 m4/ax_check_compile_flag.m4 m4/ax_gcc_func_attribute.m4 m4/ax_require_defined.m4 Change-Id: I64e14d1b4d41ebfee82fa92da10239f73e28b4c9 Reviewed-on: https://gerrit.openafs.org/14138 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a072c65bba86cbcd81157e354d3719ac41a2c97d Author: Andrew Deason Date: Sat Apr 4 22:28:21 2020 -0500 Add autoconf-archive to src/external Add autoconf-archive to the src/external mechanism, so we can more easily import and update the AX_* m4 macros we pull in from autoconf-archive. Commits are imported from . We already have a copy of ax_gcc_func_attribute.m4 in the tree, so include that in the list of files. While we're here, also include a few more macros for checking compiler flags, which will be used in subsequent commits. Change-Id: I8c6288fc1d48a47837ca08f8b9207e0ada921af8 Reviewed-on: https://gerrit.openafs.org/14133 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c05d8b28d3213856d54896979382daa066b64673 Author: Michael Meffie Date: Fri Jul 5 09:28:50 2019 -0400 Update NEWS for OpenAFS 1.9.0 Add change descriptions for commits not in a stable release. Change-Id: Ib1d5ce9f558279660abb2473ce8a9fac4fcefa8d Reviewed-on: https://gerrit.openafs.org/13673 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 1547db22264f21b5d553f54498aee51879539786 Author: Benjamin Kaduk Date: Fri Mar 20 09:17:13 2020 -0700 Synchronize NEWS with 1.8.5 Pull in all the updates to NEWS that occurred on the 1.8.x branch in preparation for adding entries for 1.9.0. Change-Id: I713d1576ef96793f24824f909b26da802b21ec23 Reviewed-on: https://gerrit.openafs.org/14103 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit befc72749884c6752c7789479343ba48c7d5cea1 Author: Andrew Deason Date: Sun Apr 26 17:26:02 2020 -0500 rx: Use _IsLast to check for last call in queue Ever since commits 170dbb3c (rx: Use opr queues) and d9fc4890 (rx: Fix test for end of call queue for LWP), rx_GetCall checks if the current call is the last one on rx_incomingCallQueue by doing this: opr_queue_IsEnd(&rx_incomingCallQueue, cursor) But opr_queue_IsEnd checks if the given pointer is the _end_ of the last; that is, if it's the end-of-list sentinel, not an item on the actual list. Testing for the last item in a list is what opr_queue_IsLast is for. This is the same convention that the old Rx queues used, but 170dbb3c just accidentally replaced queue_IsLast with opr_queue_IsEnd (instead of opr_queue_IsLast), and d9fc4890 copied the mistake. So because this is inside an opr_queue_Scan loop, opr_queue_IsEnd will never be true, so we'll never enter this block of code (unless we are the "fcfs" thread). This means that an incoming Rx call can get stuck in the incoming call queue, if all of the following are true: - The incoming call consists of more than 1 packet of incoming data. - The incoming call "waits" when it comes in (that is, there are no free threads or the service is over quota). - The "fcfs" thread doesn't scan the incoming call queue (because it is idle when the call comes in, but the relevant service is over quota). To fix this, just use opr_queue_IsLast here instead of opr_queue_IsEnd. Change-Id: I04b90b1279f81dc518eb61e7bd450e3c0be37a77 Reviewed-on: https://gerrit.openafs.org/14158 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ebaefc5a06fb3b559ce3649676197d0a989efbde Author: Andrew Deason Date: Sat Apr 25 18:21:10 2020 -0500 tests: Give more leeway in rx/event-t Currently, the rx/event-t tests schedule a bunch of events up to 3 seconds in the future, and then we sleep for 3 seconds to give them a chance to run. Since we're cutting it so close, this can rarely result in a few events not being run (observed occasionally on FreeBSD 12.1, where we failed to run about 3 events out of 10000). To avoid this, just sleep for 4 seconds instead of 3. Also print out a little more info regarding the number of fired/cancelled events, so we can see the event count when it's wrong. Change-Id: I6269bea2c245aeed00c129ff638423d0fa81ad23 Reviewed-on: https://gerrit.openafs.org/14160 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2b4908d3be8c4bde135d836ccc4ca96e465628c3 Author: Mark Vitale Date: Thu Apr 23 17:49:20 2020 -0400 afs: fix afs_linux_mmap fstrace entry The format string for CM_TRACE_GMAP takes 4 substitutions, but afs_linux_mmap only supplies 3. This results in malformed output from fstrace: Type mismatch, using raw print. Gn_map vp 0x%lx addr 0x%lx len 0x%x off 0x%x (afs / zcm)raw op 701087775, time 715.322573, pid 9644 p0:0xc0a66ec0 p1:0x8b81a000 p2:131072 Repair the recording of CM_TRACE_GMAP. Change-Id: I2b7592e68cb42f5ae490ee8771558e5cc5a2181e Reviewed-on: https://gerrit.openafs.org/14168 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit df5480057c2994914e22bd14b169dbcd8857485a Author: Andrew Deason Date: Sun Apr 12 22:28:29 2020 -0500 tests: Skip SIGBUS test on FreeBSD Currently, 'softsig-helper -buserror' causes a SIGBUS on most platforms, but can result in SIGSEGV on FreeBSD by default (at least on 11.3-RELEASE). Skip the test on FreeBSD, until we can provide a more reliable way to generate SIGBUS. Note that when the sysctl machdep.prot_fault_translation is set to 1, 'softsig-helper -buserror' generates a SIGBUS instead of SIGSEGV, suggesting that generating a SIGBUS here is the old 'compat' behavior. When machdep.prot_fault_translation is 0 (the default), the code path in the FreeBSD kernel that dictates whether to send a SIGBUS or SIGSEGV in this situation depends on some autodetection heuristics, and so may produce different results depending on FreeBSD releases or even compiler settings (due to detection of ABI based on some ELF notes in the relevant binary). For some details on this sysctl, see or the FreeBSD source code. In 11.3-RELEASE, the decision to issue a SIGBUS or SIGSEGV can be found around sys/amd64/amd64/trap.c:355. Change-Id: Ib75b43cc12302532ee87a3744fc364424f2a3ca6 Reviewed-on: https://gerrit.openafs.org/14145 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 61993cf45a648906abb865756d5a98d9c2d7cc40 Author: Andrew Deason Date: Tue Nov 26 23:39:24 2019 -0600 FBSD: Avoid holding AFS_GLOCK during vinvalbuf Currently we call vinvalbuf(9) in a few places while holding AFS_GLOCK, but AFS_GLOCK is a non-sleepable lock (struct mtx), and vinvalbuf can sleep. This can trigger a panic in some rare conditions, with the message: Sleeping thread (tid 100179, pid 95481) owns a non-sleepable lock To avoid this, drop AFS_GLOCK around a few places that call vinvalbuf(). Change-Id: I58acb144b6ffa007675402e7639b63ff3745dec5 Reviewed-on: https://gerrit.openafs.org/13970 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e510e35b25f605090524598b6b48cd20d3102945 Author: Andrew Deason Date: Sun Sep 15 23:00:26 2019 -0500 afs: Fix ifdef indenting in afs_vcache.c Change-Id: Ib566156184cb3f64a0983babd5d9f7883c84cc85 Reviewed-on: https://gerrit.openafs.org/13877 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 7260c7164b9a2199c7b5f83279fa18af16e7d387 Author: Andrew Deason Date: Sun Sep 8 16:10:40 2019 -0500 FBSD: Remove MA_* abstractions In FBSD/osi_vnops.c, we have a few abstractions (e.g. MA_VOP_UNLOCK) that used to expand to different things for older FreeBSD versions. Currently, they always expand to the same thing, so just remove the abstractions. While we are changing these calls, also change one instance of MA_VOP_LOCK to vn_lock (instead of VOP_LOCK), since we're not usually supposed to call VOP_LOCK directly, according to the VOP_LOCK(9) manpage. The MA_VOP_LOCK call was added in commit bd707fb7 (freebsd-almost-working-client-20020216), seemingly by mistake. Change-Id: Ia0f28fe658057e87d9103a72296ab899dc762fb6 Reviewed-on: https://gerrit.openafs.org/13843 Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0ee53d2fe9341e60f420662749d5ae8c6d4b5f24 Author: Tim Creech Date: Fri Dec 13 22:24:57 2019 -0500 FBSD: Build vnode_if.h before libafs objs Currently, if we are building with -j2 or higher, we can easily fail to build some libafs objects because vnode_if.h does not exist yet. vnode_if.h is generated by the FreeBSD build, but none of our objects depend on it, so during parallel builds it may not be available by the time we build, for example, src/external/heimdal/hcrypto/sha256.c. This results in build errors that can look like this: --- sha256-kernel.o --- cc -I. -I.. -I../nfs [...]/src/external/heimdal/hcrypto/sha256.c In file included from [...]/src/external/heimdal/hcrypto/sha256.c:34: In file included from [...]/src/crypto/hcrypto/kernel/config.h:30: In file included from [...]/src/afs/sysincludes.h:354: /usr/src/sys/sys/vnode.h:588:10: fatal error: 'vnode_if.h' file not found #include "vnode_if.h" ^~~~~~~~~~~~ 1 error generated. *** [sha256-kernel.o] Error code 1 make[4]: stopped in [...]/src/libafs/MODLOAD 1 error To avoid this, make all of our libafs objects depends on vnode_if.h. [adeason@dson.org: Expanded commit message.] Change-Id: I5a7a6ece8d5fbe6cf1a5b94451c8e8ae93fdc55f Reviewed-on: https://gerrit.openafs.org/13983 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1bd03c9c22ca7f36b9f1647c258b5f18c8ac92c0 Author: Andrew Deason Date: Sun Apr 12 20:16:55 2020 -0500 tests: Run perl via 'env' The 'perl' binary may not be /usr/bin/perl, depending on the system. For example, on modern FreeBSD it tends to be /usr/local/bin/perl instead. To avoid relying on perl to be in a specific location, just run via /usr/bin/env instead, so we pick up perl from $PATH instead. Change-Id: Ic8dc247c82342ff79dfa80426c489ccb8e3e1450 Reviewed-on: https://gerrit.openafs.org/14144 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 17a845c8d44f453b09b21afd59182e616234e872 Author: Tim Creech Date: Sun Mar 5 18:15:58 2017 -0500 FBSD: Remove LOCKPARENT/ISLASTCN lookup logic Currently, our afs_vop_lookup on FBSD tries to only lock 'dvp' for ISDOTDOT requests when LOCKPARENT and ISLASTCN are set. There are a couple of problems with this: - The conditional locking logic involving LOCKPARENT/ISLASTCN is only relevant in very old FreeBSD releases (per-fs checking of these flags for parent locking went away around the FreeBSD 6 era). - Our current logic here is wrong anyway, since we try to lock 'dvp' twice when those flags are set. This was mostly introduced by commit 2f6be821 (FBSD: band-aid vnode locking in lookup), which added a lock/unlock pair for 'dvp' around the lock for 'vp', even though 'dvp' was unlocked several lines earlier. This means that if we hit the relevant code path, we will deadlock, since we try to lock 'dvp' twice. To avoid this, just remove the relevant logic for LOCKPARENT/ISLASTCN, since it is only relevant for old FreeBSD releases that are not supported by us or FreeBSD. Add and rearrange some comments around here to try to more explicitly explain the relevant locking rules. [adeason@dson.org: Commit message rewrite, adding comments, removing old FreeBSD code.] Change-Id: Iaa2c55d82c50d5a8ab42c67b0996a2b4fb6e09e6 Reviewed-on: https://gerrit.openafs.org/12578 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7df5c003ed6eb17a693d67ffdfc0556f0c569cc1 Author: Andrew Deason Date: Sun Apr 12 22:40:14 2020 -0500 FBSD: Remove unused 'wantparent' logic In afs_vop_lookup, the 'wantparent' variable doesn't actually change any logic in the function. In the if() clause that it's used, the value of 'wantparent' is only ever used if cnp->cn_nameiop is RENAME and ISLASTCN is set. But if both of those are true, then the second half of the if() conditional will always be true, so the value of 'wantparent' doesn't matter. So to remove this confusing unused logic, remove the 'wantparent' local var, and all its associated logic. Issue spotted by kaduk@mit.edu. Change-Id: Ia63b88d67d21cc2b81a0c25aa31ea60ab202b0a7 Reviewed-on: https://gerrit.openafs.org/14143 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7116de596a8f1d0be3da6eebe92d486f57aefd02 Author: Andrew Deason Date: Sun Aug 18 19:59:50 2019 -0500 FBSD: Add support for FreeBSD 11.3 Change-Id: Ibe3496f06da83a0b30182ea92081bae41fe766f3 Reviewed-on: https://gerrit.openafs.org/13792 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8002a46125e8224ba697c194edba5ad09e4cfc44 Author: Yadavendra Yadav Date: Wed Apr 15 05:33:00 2020 -0500 LINUX: Always crref after _settok_setParentPag Commit b61eac78 (Linux: setpag() may replace credentials) changed PSetTokens2 to call crref() after _settok_setParentPag(), since changing the parent PAG may change our credentials structure. But that commit did not update the old pioctl PSetTokens, so -setpag functionality remained broken on Linux for utilities that called the old pioctl ('klog' is one such utility). To fix this, we could copy the same code from PSetTokens2 into PSetTokens. But instead just move this code into _settok_setParentPag itself, to avoid code duplication. This commit also refactors _settok_setParentPag a little to make the platform-specific ifdefs a little easier to read through. Change-Id: I65a165ebb1d823e690926de31b28a7728d2561b9 Reviewed-on: https://gerrit.openafs.org/14147 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Yadavendra Yadav Reviewed-by: Benjamin Kaduk commit 826bb826274e48c867b41cb948d031a423373901 Author: Yadavendra Yadav Date: Wed Apr 15 05:33:00 2020 -0500 LINUX: Copy session keys to parent in SetToken Commit 48589b5d (Linux: Restore aklog -setpag functionality for kernel 2.6.32+) added code to SetToken() to copy our session keyring to the parent process, in order to implement -setpag functionality. But this was removed from SetToken() in commit 1a6d4c16 (Linux: fix aklog -setpag to work with ktc_SetTokenEx), when the same code was moved to ktc_SetTokenEx(). Add this code back to SetTokens(), so -setpag functionality can work again with utilities that use older functions like ktc_SetToken, like 'klog'. Change-Id: I68c9bf2e19783ea6f84b4c5ebf2ef188d1d8d6ad Reviewed-on: https://gerrit.openafs.org/14146 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit be50d9a517bda9f421414341bca34c0100d61ba0 Author: Michael Meffie Date: Fri Mar 20 18:17:56 2020 -0400 redhat: add make to the build requirements `make` is not necessarily installed, even if when all the other build requirements are installed. Add `make` to the list build requirements to complete the build requirements. With this change it is possible to build the packages after running the `yum-builddep` to install all of the needed build requirements. Change-Id: I032ba1f23d08468c5e21edc5662b20cc9498d1c9 Reviewed-on: https://gerrit.openafs.org/14119 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 7e41ee0bd50d39a356f0435ff370a0a7be40306f Author: Andrew Deason Date: Tue Apr 7 13:15:31 2020 -0500 vlserver: Correctly pad nvlentry for "O" RPCs For our old-style "O" RPCs (e.g. VL_CreateEntry, instead of VL_CreateEntryN), vlserver calls vldbentry_to_vlentry to convert to the internal 'struct nvlentry' format. After all of the sites have been copied to the internal format, we fill the remaining sites by setting the serverNumber to BADSERVERID. For nvldbentry_to_vlentry, we do this for NMAXNSERVERS sites, but for vldbentry_to_vlentry, we do this for OMAXNSERVERS. The thing is, both functions are filling in entries for a 'struct nvlentry', which has NMAXNSERVERS 'serverNumber' entries. So for vldbentry_to_vlentry, we are skipping setting the last few sites (specifically, NMAXNSERVERS-OMAXNSERVERS = 13-8 = 5). This can easily cause our O-style RPCs to write out entries to disk that have uninitialized sites at the end of the array. For example, an entry with one site should have server numbers that look like this: serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255} That is, one real serverid (a '1' here), followed by twelve BADSERVERIDs. But for a VL_CreateEntry call, the 'struct nvlentry' is zeroed out before vldbentry_to_vlentry is called, and so the server numbers in the written entry look like this: serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0} That is, one real serverid (a '1' here), followed by seven BADSERVERIDs, followed by five '0's. Most of the time, this is not noticeable, since our code that reads in entries from disk stops processing sites when we encounter the first BADSERVERID site (see vlentry_to_nvldbentry). However, if the entry has 8 sites, then none of the entries will contain BADSERVERID, and so we will actually process the trailing 5 bogus sites. This would appear as 5 extra volume sites for a volume, most likely all for the same server. For VL_CreateEntry, the vlentry struct is always zeroed before we use it, so the trailing sites will always be filled with 0. For VL_ReplaceEntry, the trailing sites will be unchanged from whatever was read in from the existing disk entry. To fix this, just change the relevant loop to go through NMAXNSERVERS entries, so we actually go to the end of the serverNumber (et al) array. This may appear similar to commit ddf7d2a7 (vlserver: initialize nvlentry elements after read). However, that commit fixed a case involving the old vldb database format (which hopefully is not being used). This commit fixes a case where we are using the new vldb database format, but with the old RPCs, which may still be used by old tools. Change-Id: Ic6882d1452963ca93403748917c313068acfdaab Reviewed-on: https://gerrit.openafs.org/14139 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 30a47c3282cb405459a6fced1fe5b4c77f4afd64 Author: Michael Meffie Date: Fri Mar 20 17:53:22 2020 -0400 redhat: fix rpmbuild warnings Fix warnings issued by recent versions of rpmbuild: warning: Macro expanded in comment on line 110: %{afsvers}/... warning: extra tokens at the end of %endif directive in line 1469: %endif # build_userspace warning: line 331: It's not recommended to have unversioned Obsoletes: Obsoletes: openafs-client-compat The first two warnings are just issues with comments, which apparently are not completely ignored by rpmbuild. The third issue is a warning about an unversioned "Obsoletes" directive. Remove the old Obsoletes for openafs-client-compat, which was obsoleted no later than the 1.4.x series (more than 10 years ago). While here clean up the spec by removing the old cvs $Revsion$ keyword from the comments at the top of the file, and removing an old commented out setup directive. Change-Id: I8d7a050ea6a0cc7a2d9a6af9a91d25ce545586e7 Reviewed-on: https://gerrit.openafs.org/14118 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 19524a49d4389bff6f7ba9d9c355489450579c01 Author: Andrew Deason Date: Mon Mar 30 14:21:21 2020 -0500 opr: Allow non-2^x for n_buckets in opr_cache_init Currently, opr_cache_init requires that opts->n_buckets is a power of 2 (since our underlying opr_dict requires this). However, callers may want to pick a number of buckets based on some other value. Requiring each caller to calculate the nearest power-of-2 is annoying, so instead just have opr_cache_init itself calculate a nearby power of 2. That is, with this commit, opts->n_buckets is allowed to not be a power of 2; when it's not a power of 2, opr_cache_init will calculate the next highest power of 2 and use that as the number of buckets. Change-Id: Icd3c56c1fe0733e3dac964ea9a98ff7b436254e6 Reviewed-on: https://gerrit.openafs.org/14122 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 3db8c37e8ef6bea0f03ef6b8f82ed93d52937d7d Author: Andrew Deason Date: Sun Apr 5 16:29:52 2020 -0500 libafs: Serialize INSTDIRS/DESTDIRS and COMPDIRS Our libafs build logic involves a few targets that 'cd' into a per-kernel subdir: notably INSTDIRS and DESTDIRS (the targets to 'make install' or 'make dest' our kernel modules) and COMPDIRS (the target to setup/build the kernel module). Both of these potentially 'cd' into a subdirectory (e.g. MODLOAD64), and run some make rules. Since INSTDIRS and COMPDIRS are different targets and don't depend on each other for many platforms, running those rules can happen in parallel. After they 'cd' into the relevant dir, they run a new 'make' in a subshell, and so underlying rules for building e.g. AFS_component_version_number.c are not serialized. So for a parallel build on, say, Solaris, we can encounter errors when two sub-makes try to make AFS_component_version_number.c at the same time, which looks something like this (with various lines output from other sub-processes mixed in): cd src && cd sys && gmake install gmake[3]: Leaving directory '/[...]/src/libuafs' rm -f AFS_component_version_number.c.NEW /opt/developerstudio12.6/bin/cc [...] -D_KERNEL -DSYSV -dn -m64 -xmodel=kernel -xvector=%none -xregs=no%float -Wu,-save_args -o AFS_component_version_number.o -c AFS_component_version_number.c mv: cannot access AFS_component_version_number.c.NEW gmake[4]: *** [/[...]/src/config/Makefile.version:13: AFS_component_version_number.c] Error 2 gmake[4]: Leaving directory '/[...]/src/libafs/MODLOAD64' gmake[3]: *** [Makefile:85: solaris_instdirs] Error 2 gmake[3]: *** Waiting for unfinished jobs.... To avoid this, just make INSTDIRS and DESTDIRS depend on COMPDIRS, so we can make sure they don't run at the same time. Change-Id: I2510e1894c44dd0864cf2eab5613b805342b6718 Reviewed-on: https://gerrit.openafs.org/14137 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 80edcab9997807f91798dacc2cc59efdba74be56 Author: Cheyenne Wills Date: Wed Apr 1 09:38:05 2020 -0600 butc: rename local var tapeblocks to numTapeblocks The local variable tapeblocks in GetConfigParams matches a global variable. Rename the local variable to avoid confusion with the global name. Change-Id: I1c30433696a35a74978ef0c23881c82054b416c5 Reviewed-on: https://gerrit.openafs.org/14128 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 8ae4531c5720baff9e11e4b05706eab6c82de5f9 Author: Michael Meffie Date: Mon Mar 23 09:46:05 2020 -0400 build: remove unused LINUX_PKGREL from configure.ac This change removes the unused LINUX_PKGREL definition from the configure.ac file. Commit 6a27e228bac196abada96f34ca9cd57f32e31f5c converted the setting of the RPM package version and release values in the openafs.spec file from autoconf to the makesrpm.pl script. That commit left LINUX_PKGREL in configure.ac because it was still referenced by the Debian packaging, which was still in-tree at that time. Commit ada9dba0756450993a8e57c05ddbcae7d1891582 removed the last trace of the Debian packaging, but missed the removal of the LINUX_PKGREL. Change-Id: I17aeccdb38078faa413f2cd3a935b43238982606 Reviewed-on: https://gerrit.openafs.org/14117 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit f16d40ad26df3ec871f8c73952594ad2e723c9b4 Author: Andrew Deason Date: Wed Apr 1 22:59:38 2020 -0500 vos: Print "done" in non-verbose 'vos remsite' Currently, 'vos remsite' always prints the message "Deleting the replication site for volume %lu ...", and then calls VDONE if the operation is successful. VDONE prints the trailing "done", but only if -verbose is turned on, and so if -verbose is not specified, the output of 'vos remsite' looks broken: $ vos remsite fs1 vicepa vol.foo Deleting the replication site for volume 1234 ...Removed replication site fs1 /vicepa for volume vol.foo To fix this, unconditionally print the trailing "done", instead of going through VDONE, so 'vos remsite' output now looks like this: $ vos remsite fs1 vicepa vol.foo Deleting the replication site for volume 1234 ... done Removed replication site fs1 /vicepa for volume vol.foo Change-Id: I0b42f4cb9b695331bf047243bf6ae4a1cdbb89c4 Reviewed-on: https://gerrit.openafs.org/14127 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0e2072ae386d4111bef161eb955964b649c31386 Author: Cheyenne Wills Date: Wed Apr 1 09:48:57 2020 -0600 Avoid duplicate definitions of globals GCC 10 changed a default flag from -fcommon to -fno-common. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678 for some background. The change in gcc 10 results in build link-time errors. For example: ../../src/xstat/.libs/liboafs_xstat_cm.a(xstat_cm.o):(.bss+0x2050): multiple definition of `numCollections'; Ensure that only one definition for global data objects exist and change references to use "extern" as needed. To ensure that future changes do not introduce duplicated global definitions, add the -fno-common flag to XCFLAGS when using the configure --enable-checking setting. Change-Id: I6780dd995fe6fb6c2102765ff3484c18e1e1cd58 Reviewed-on: https://gerrit.openafs.org/14106 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit f841c189a53f3a6bcf5c25336e4e0ad5362036e2 Author: Andrew Deason Date: Tue Mar 31 21:19:18 2020 -0500 vos: Properly print volume transaction flags Currently, the code in 'vos status' treats the 'iflags' and 'vflags' of a transaction like an enumerated type; that is, we only check if 'iflags' is equal to ITOffline or ITBusy, etc. But both of these flags fields are bitfields; any combination of the relevant flags could theoretically be set. Practically speaking, we only ever set at most one of the flags in 'iflags', but if anything ever did set more than one flag, our output would look broken (we'd print "attachFlags:" without any flags). For 'vflags', multiple flags are often set at once: the most common combination is VTDeleteOnSalvage|VTOutOfService. So currently, we usually print "attachFlags:" without any actual flags, since the 'vflags' field isn't exactly equal to VTDeleteOnSalvage (instead it's set to VTDeleteOnSalvage|VTOutOfService). And if we ever did see just VTDeleteOnSalvage set by itself, the way the switch() cases fall through to each other, we'd print out that _all_ flags are set. To fix all of this, just test for the individual flag bits instead. Change-Id: Ib4d207bc713f0ef8eb51b9dbeaf2af50395536ee Reviewed-on: https://gerrit.openafs.org/14126 Tested-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 4c4fb6e36634e5663c8be25acd4a1ac872e4738c Author: Andrew Deason Date: Tue Jul 23 13:50:31 2019 -0500 LINUX: Introduce afs_d_path Move our preprocessor logic around d_path into an osi_compat.h wrapper, called afs_d_path. This just makes it a little easier to use d_path, and moves a tiny bit of #ifdef cruft away from real code. Change-Id: I2032eda3fef18be6e77e3bf362ec5ce641e1d76d Reviewed-on: https://gerrit.openafs.org/13721 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 252b3bcc75ea141ff93a7b3147865f4b952fcaca Author: Andrew Deason Date: Fri Aug 24 13:03:24 2018 -0500 afs: Detect VIOCPREFETCH special case properly Currently, afs_syscall_pioctl handles the VIOCPREFETCH pioctl as a special case, calling into a different code path to handle backgrounding the prefetch operation. However, we detect that we're handling a VIOCPREFETCH operation just by looking at the lower 8 bits of the given opcode. This means that any pioctl that ends in 0x0F will trigger this codepath, such as if we add a 'C' or 'O' pioctl that uses code 0x0F. We only want to catch VIOCPREFETCH requests for this code path, so fix the check to also check if we're processing a 'V' pioctl. Change-Id: Ica8c2364f96aa3c8b4d2213bebd9a1e4cb6fa730 Reviewed-on: https://gerrit.openafs.org/13301 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 66d0f91791695ac585f0511d0dadafd4e570b1bf Author: Andrew Deason Date: Tue Mar 24 11:59:48 2020 -0500 tests: Wait for server start in auth/superuser-t The auth/superuser-t test runs an Rx server and client in two child processes. If the client process tries to contact the server before the server has started listening on its port, some tests involving RPCs can fail (notably test 39, "Can run a simple RPC"). Normally if we try to contact a server that's not there, Rx will try resending its packets a few times, but on Linux with AFS_RXERRQ_ENV, if the port isn't open at all, we can get an ICMP_PORT_UNREACH error, which causes the relevant Rx call to die immediately with RX_CALL_DEAD. This means that if the auth/superuser-t client is only just a bit faster than the server starting up, tests can fail, since the server's port is not open yet. To avoid this, we can wait until the server's port is open before starting the client process. To do this, have the server process send a SIGUSR1 to the parent after rx_Init() is called, and have the parent process wait for the SIGUSR1 (waiting for a max of 5 seconds before failing). This should guarantee that the server's port will be open by the time the client starts running. Note that before commit 086d1858 (LINUX: Include linux/time.h for linux/errqueue.h), AFS_RXERRQ_ENV was mistakenly disabled on Linux 3.17+, so this issue was probably not possible on recent Linux before that commit. Change-Id: I0032a640b83c24f72c03e7bea100df5bc3d9ed4c Reviewed-on: https://gerrit.openafs.org/14109 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Cheyenne Wills commit 18a0ea2f31e70e1bdbd7af40022ab107560ac0d0 Author: Andrew Deason Date: Tue Mar 24 11:34:51 2020 -0500 LINUX: Clear lock 'pid' fields with NULL Currently, when we release a lock, we set the e.g. pid_writer field to 0, to clear out any previous pid that was set. On Linux, the pid_writer field is a pointer, and sparse(1) complains about using a plain integer 0 in this way: CHECK [...]/afs_axscache.c [...]/afs_axscache.c:24:19: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:68:9: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:88:5: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:111:13: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:121:17: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:126:17: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:154:13: warning: Using plain integer as NULL pointer [...]/afs_axscache.c:165:9: warning: Using plain integer as NULL pointer This doesn't break anything, but it spews out quite a lot of warnings when building with sparse(1) available. To just reduce this noise a bit, assign these fields to actual NULL. Since some other platforms do use a plain integer in these fields (they are an actual pid), define 'MyPid_NULL' to use '0' or 'NULL' depending on the platform. Define MyPid_NULL to NULL only on Linux; this causes us to still assign 0 to a pointer on some platforms, but Linux is the only one that complains, so only bother using NULL on Linux for now. Change-Id: I35fcb896ceaa346c330622cfc2913b2975295836 Reviewed-on: https://gerrit.openafs.org/14108 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit abbadd3f4bf6fddc794b87d8d993ed6536c591e3 Author: Andrew Deason Date: Tue Mar 10 16:05:47 2020 -0500 rxgen: Properly generate brief union default arm Commit 13ae3de3 (Add "brief" option to rxgen) added the -b option to rxgen, which (among other things) makes rxgen stop including the name of an RPC-L union type within its fields. That is, instead of this: struct foo_type { afs_int32 foo_tag; union { /* ... */ } foo_type_u; }; rxgen -b generates this: struct foo_type { afs_int32 foo_tag; union { /* ... */ } u; }; And all of the autogenerated XDR code is altered to use the 'u' field instead of foo_type_u. However, if a 'default:' arm is defined in the definition for the RPC-L union, the autogenerated XDR code still tries to reference the non-brief name (e.g. foo_type_u). This causes a build failure when actually trying to compile the generated .xdr.c, like so: foo.xdr.c:809:39: error: 'foo_type' has no member named 'foo_type_u' if (!xdr_bytes(xdrs, (char **)&objp->foo_type_u.xxx, &__len, FOO_MAX)) { ^ foo.xdr.c:812:11: error: 'foo_type' has no member named 'foo_type_u' *(&objp->foo_type_u.xxx) = __len; This happens because the portion of emit_union() that generates the XDR code for the default arm wasn't updated to use a different formatting string when 'brief_flag' is set, like the rest of emit_union. To fix this, just check for brief_flag and use 'briefformat' accordingly, like the other code that checks for brief_flag. Currently nothing in the tree uses the default arm of RPC-L unions with 'rxgen -b', but external callers could, or our future code may do so. Change-Id: Ifcebfc48a3a64c68fee12ba0d177ae19b0956c58 Reviewed-on: https://gerrit.openafs.org/14107 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 8c335182115a1e16c66cde40c08ce9fd0144dccb Author: Marcio Barbosa Date: Thu Feb 27 22:28:14 2020 +0000 ubik: death to SVOTE_GetSyncSite The SVOTE_GetSyncSite RPC was intended to provide the IP address of the current sync-site. Unfortunately, the RPC-L incorrectly defined ahost as an input argument instead of an output argument. As a result, the IP address in question is not returned to the callers of SVOTE_GetSyncSite. Moreover, calls to this RPC must be made through connections associated with the VOTE_SERVICE_ID. Sadly, the ubik_Call* functions call SVOTE_GetSyncSite using connections associated with the USER_SERVICE_ID. Consequently, the server getting this request returns RXGEN_OPCODE, meaning that this RPC is not implemented by the service in question. Since RPC arguments cannot be changed without causing compatibility issues between different client / server versions and the RPC in question is being called through the wrong service id, remove SVOTE_GetSyncSite and its callers. Considering that in all versions of OpenAFS calls to this RPC always return RXGEN_OPCODE, no behavior change is introduced by this commit. Also, remove the "chaseCount logic" from the ubik_Call* functions. This logic prevents the loop counter from being moved backwards indefinitely, resulting in an infinite loop. Fortunately, without the VOTE_GetSyncSite() calls this counter cannot be moved backwards more than once. Change-Id: Idd071583e8f67109e003f7a5675de02a235e5809 Reviewed-on: https://gerrit.openafs.org/14043 Reviewed-by: Marcio Brito Barbosa Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d369f4e5c9f975d370ee1aa7546fe9da80e1e118 Author: Cheyenne Wills Date: Fri Mar 20 12:03:48 2020 -0600 tests: Add cache-t to .gitignore in tests/opr Commit 48fbb45 (opr: Introduce opr_cache) added a new test (cache-t), but did not update the .gitignore file for it. Change-Id: I6de6130257a62f495ac942c05937eb109ce84a75 Reviewed-on: https://gerrit.openafs.org/14102 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 59fef92683da7a8c6888e2f4f5127d7b437ac028 Author: Cheyenne Wills Date: Fri Mar 20 11:54:23 2020 -0600 tests: Add core to .gitignore in tests opr/softsig-t can produce a core file as part of its test. Change-Id: I3bc7e587151e5915038e31887018889a7ffa6993 Reviewed-on: https://gerrit.openafs.org/14101 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 32d35db64061e4102281c235cf693341f9de9271 Author: Marcio Barbosa Date: Thu Feb 13 00:39:00 2020 -0300 vos: take RO volume offline during convertROtoRW The vos convertROtoRW command converts a RO volume into a RW volume. Unfortunately, the RO volume in question is not set as "out of service" during this process. As a result, accesses to the volume being converted can leave volume objects in an inconsistent state. Consider the following scenario: 1. Create a volume on host_b and add replicas on host_a and host_b. $ vos create host_b a vol_1 $ vos addsite host_b a vol_1 $ vos addiste host_a a vol_1 2. Mount the volume: $ fs mkmount /afs/.mycell/vol_1 vol_1 $ vos release vol_1 $ vos release root.cell 3. Shutdown dafs on host_b: $ bos shutdown host_b dafs 4. Remove RO reference to host_b from the vldb: $ vos remsite host_b a vol_1 5. Attach the RO copy by touching it: $ fs flushall $ ls /afs/mycell/vol_1 6. Convert RO copy to RW: $ vos convertROtoRW host_a a vol_1 Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving the volume object for the RO copy set as VOL_STATE_ATTACHED (on success, this volume should be set as VOL_STATE_DELETED). 7. Add replica on host_a: $ vos addsite host_a a vol_1 8. Wait until the "inUse" flag of the RO entry is cleared (or force this to happen by attaching multiple volumes). 9. Release the volume: $ vos release vol_1 Failed to start transaction on volume 536870922 Volume not attached, does not exist, or not on line Error in vos release command. Volume not attached, does not exist, or not on line To fix this problem, take the RO volume offline during the vos convertROtoRW operation. Change-Id: I1e417a026ed819fab4435e8992311fcd4f339341 Reviewed-on: https://gerrit.openafs.org/14066 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 957b06984b77cba74bd90217b723220c1844809b Author: Marcio Barbosa Date: Fri Mar 6 15:15:38 2020 +0000 vol: fix namei_ConvertROtoRWvolume return code Commit 8632f23d6718a3cd621791e82d1cf6ead8690978 introduced checks for the return value of snprintf calls in namei_ops. On success, the value returned by this function represents the number of written characters. Unfortunately, the variable used to store this value is the same variable that represents the status code returned by namei_ConvertROtoRWvolume. Consequently, a successful execution of namei_ConvertROtoRWvolume results in a status code different the 0 (and equal to the number of written characters). To fix this problem, set the status code in question back to 0 after a successful execution of namei_ConvertROtoRWvolume. Change-Id: Ic6fd6483f8d94fd64587f8bae249b9d911d846b4 Reviewed-on: https://gerrit.openafs.org/14065 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 38d78e2496c3d242e44bad401ecffe15e3883388 Author: Cheyenne Wills Date: Fri Mar 6 10:00:25 2020 -0700 afs: Clean up compiler warning casting ptr to int In osi_probe.c, the macro 'check_result' casts a pointer to an int which on older Linux kernels (e.g. 2.6.18) produces several lines with the C warning: ... warning: cast from pointer to integer of different size Change the cast from int to long int. Linux 2.6.18 doesn't provide intptr_t or uintptr_t, and stdint.h is not available to kernel modules. But the size of a pointer is the size of a long (see uintptr_t in linux/types.h - Linux 2.6.24+), so change the cast from int to long. Note that the this code by default only gets pulled in for older Linux kernels (e.g. 2.6.18). For newer kernels, ENABLE_LINUX_SYSCALL_PROBING is not defined, and so most of osi_probe.c is not built. Change-Id: If1b41e11c46f4a14ff5127ed4d602485645ddf2a Reviewed-on: https://gerrit.openafs.org/14092 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Andrew Deason commit 57b4f4f9be1e25d5609301c10f717aff32aef676 Author: Andrew Deason Date: Fri Mar 13 13:00:35 2020 -0500 LINUX: Properly revert creds in osi_UFSTruncate Commit cd3221d3 (Linux: use override_creds when available) caused us to force the current process's creds to the creds of afsd during osi_file.c file ops, to avoid access errors in some cases. However, in osi_UFSTruncate, one code path was missed to revert our creds back to the original user's creds: when the afs_osi_Stat call fails or deems the truncate unnecessary. In this case, the calling process keeps the creds for afsd after osi_UFSTruncate returns, causing our subsequent access-checking code to think that the current process is in the same context as afsd (typically uid 0 without a pag). This can cause the calling process to appear to transiently have the same access as non-pag uid 0; typically this will be unauthenticated access, but could be authenticated if uid 0 has tokens. To fix this, modify the early return in osi_UFSTruncate to go through a 'goto done' destructor instead, and make sure we revert our creds in that destructor. Thanks to cwills@sinenomine.net for finding and helping reproduce the issue. Change-Id: I6820af675edcb7aa00542ba40fc52430d68c05e8 Reviewed-on: https://gerrit.openafs.org/14098 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Jeffrey Hutzelman Reviewed-by: Cheyenne Wills Tested-by: Cheyenne Wills commit a0071a30d532520e51262c3b6c194659e95bf389 Author: Andrew Deason Date: Thu Feb 20 09:37:28 2020 -0500 tests: Run more manpage tests by default Ever since commit f0774acd (Introduce TAP tests of man pages for command_subcommand), we've had tests to check that we have man pages for every subcommand in a command suite. This was done for several command suites, including 'bos', and 'fs', but the bos and fs tests were never added to the TESTS file. Add them, so the tests run by default in a 'make check'. Fortunately, the tests still pass today. Change-Id: I90c006845d054fa3e795203bb1deff675e558622 Reviewed-on: https://gerrit.openafs.org/14073 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e06b47fc0e63eff2098de422628b6c03396d419f Author: Andrew Deason Date: Thu Sep 12 14:36:04 2019 -0500 ubik: Rename flags to dbFlags Rename ubik_dbase->flags to ubik_dbase->dbFlags, to make it easier to distinguish between other fields and variables just called 'flags'. Change-Id: I17258f9a65e989943d066307e332550d66ca7500 Reviewed-on: https://gerrit.openafs.org/13864 Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e68109013d03829f2e9dc95586933212a0ea9ad7 Author: Andrew Deason Date: Thu Sep 12 12:37:04 2019 -0500 ubik: Clarify UBIK_VERSION_LOCK semantics Commit e4ac552a (ubik: Introduce version lock) added UBIK_VERSION_LOCK and version_data. The commit message mentions that holding either UBIK_VERSION_LOCK or DBHOLD is enough to be able to read the protected items and both locks must be held to modify them, but this isn't mentioned in the actual code. Add a comment explaining these locking rules, to make these rules clearer to readers. Change-Id: I715f89695add6d94e13d6ee1dc6addd1e748d3fd Reviewed-on: https://gerrit.openafs.org/13863 Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 086d185872da5f19447cf5ec7846e7ce5104563f Author: Cheyenne Wills Date: Wed Nov 20 12:43:03 2019 -0700 LINUX: Include linux/time.h for linux/errqueue.h The configuration test for errqueue.h fails with an undefined structure error on a Linux 3.17 (or higher) system. This prevents setting HAVE_LINUX_ERRQUEUE_H, which is used to define AFS_RXERRQ_ENV. Linux commit f24b9be5957b38bb420b838115040dc2031b7d0c (net-timestamp: extend SCM_TIMESTAMPING ancillary data struct) - which was picked up in linux 3.17 added a structure that uses the timespec structure. After this commit, we need to include linux/time.h to pull in the definition of the timespec struct. Change-Id: Ifab79f8454c771276d5fdf443c4d68400b70134a Reviewed-on: https://gerrit.openafs.org/13950 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 660a0855bb9351a72ef45cd72e02503c86bf2cea Author: Andrew Deason Date: Wed Sep 11 16:42:47 2019 -0500 ubik: Log urecovery_CheckTid-aborted txes Log when urecovery_CheckTid aborts/ends a running remote transaction. This is usually a rare event, occurring when some ubik sites get "stuck" or confused about the state of the quorum. Logging some details when this happens can be useful when investigating issues post-mortem, or just to see why a transaction failed. Change-Id: If0a7cd134aaac3722fe7214a1d8f0efab550ad11 Reviewed-on: https://gerrit.openafs.org/13862 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 091e8e9ca52e408c52e3310588d6c959a517a15c Author: Andrew Deason Date: Fri Aug 23 12:21:54 2019 -0500 ubik: Introduce ubik_CallRock In OpenAFS 1.0, the way we made dbserver RPC calls was to pass the relevant RPC and arguments to ubik_Call()/ubik_Call_New(), which coerced all of the RPC arguments into 'long's. To make this more typesafe, in commit 4478d3a9 (ubik-call-sucks-20060703) most callers were converted to use ubik_RPC_name()-style calls, which used functions autogenerated by rxgen. This latter approach, however, only lets us use the ubik_Call-style site selection code with RPCs processed by rxgen; we can't insert additional code to run before or after the relevant RPC. To make our dbserver calls more flexible, but avoid coercing all of our arguments into 'long's again, move back to the ubik_Call()-style approach, but use actual typed arguments with a callback function and a rock. Call it ubik_CallRock(). With this commit rxgen still generates the ubik_RPC_name()-style stubs, but the stubs just call ubik_CallRock with a generated callback function, instead of spitting out the equivalent of ubik_Call() in the generated code itself. To try to ensure that this commit doesn't incur any unintended extra changes, make ubik_CallRock consist of the generated code that was inside rxgen before this commit. This is almost identical to ubik_Call, but not quite; consolidating these two functions can happen in a future commit if desired. Change-Id: I0c3936e67a40e311bff32110b2c80696414b52d4 Reviewed-on: https://gerrit.openafs.org/13987 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 78049987aa3e84865e2e7e0f3dd3b54d66258e74 Author: Cheyenne Wills Date: Tue Mar 3 15:39:49 2020 -0700 LINUX 5.6: define time_t and use timespec/timespec64 The time_t type and the structure timeval were removed for use in kernel space code in Linux commits: 412c53a680a97cb1ae2c0ab60230e193bee86387 y2038: remove unused time32 interfaces c766d1472c70d25ad475cf56042af1652e792b23 y2038: hide timeval/timespec/itimerval/itimerspec types Add an autoconf test for the time_t type. If time_t is missing, define the time_t type when building the kernel module. Change the vattr structure in LINUX/osi_vfs.h to use timespec/timespec64 instead of the timeval structure. Conditionalize the definition of gettimeofday (needed by rand-fortuna.c) in crypto/hcrypto/kernel/config.h. It is unused by the Linux kernel module and the function uses struct timeval that is no longer available. Change-Id: Idc9a1ded748f833d804164d29c49c9aee26ae8f5 Reviewed-on: https://gerrit.openafs.org/14083 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit b8088b49dec23da19406fcb014e7100695dc8322 Author: Andrew Deason Date: Mon Mar 2 16:17:55 2020 -0600 LINUX: Avoid building rand-fortuna-kernel.o Currently, we build rand-fortuna-kernel.o for libafs on all platforms, even though we only use the fortuna RNG on AIX, DragonFlyBSD, HP-UX, and Irix. Everywhere else, our RAND_bytes() in src/crypto/hcrypto/kernel/rand.c uses osi_readRandom() instead of going through heimdal. Building rand-fortuna.c causes occasional build headaches for the kernel on Linux (see cc7f942, "LINUX: Disable kernel fortuna large frame errors"). The most recent instance of this is that Linux 5.6 removes the definition for struct timeval, which is referenced in rand-fortuna.c. The Linux kernel is constantly changing, and so trying to keep rand-fortuna.c building on Linux seems like a waste of ongoing effort. So, just stop building rand-fortuna-kernel.o on Linux. The original intent of building this file on all platforms was to avoid bitrot, so still keep building rand-fortuna-kernel.o on all other platforms even when it's not used; just avoid it on Linux specifically, the platform that requires the most effort. To accomplish this, move rand-fortuna-kernel.o from AFSAOBJS to AFS_OS_OBJS, and remove it from the Linux-only AFSPAGOBJS. Also remove our configure tests for -Wno-error=frame-larger-than=, since they're no longer used by anything. Change-Id: I0d5f14f9f6ba2bdd7391391180d32383b4da89ed Reviewed-on: https://gerrit.openafs.org/14084 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 48fbb45967381f10df092a1ec18b5fb820387e05 Author: Andrew Deason Date: Fri Sep 20 14:19:23 2019 -0500 opr: Introduce opr_cache Add a simple general-purpose in-memory cache implementation, called opr_cache. Keys and values are simple flat opaque buffers (no complex nested structures allowed), hashing is done with jhash, and cache eviction is mostly random with some LRU bias. Partly based off a different implementation by mbarbosa@sinenomine.net. Change-Id: I16b5988947ff603dfe31613cd7be3908a69264e5 Reviewed-on: https://gerrit.openafs.org/13884 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4ce922d339777faf647f7129f5ae3f173a7870b1 Author: Andrew Deason Date: Tue Jan 14 10:51:42 2020 -0600 afs: Properly type afs_osi_suser cred arg Currently, afs_osi_suser is declared with a void* argument, even though its only argument is always effectively a afs_ucred_t*. This allows us to call afs_osi_suser with any pointer type without the compiler complaining. Currently, some callers call afs_osi_suser with an incorrectly-typed afs_ucred_t** instead, like so: func(afs_ucred_t **credpp) { afs_ucred_t **acred = *acredpp; /* incorrect assignment */ if (afs_osi_suser(acred)) { /* ... */ } } The actual code in the tree hides this to some degree behind various function calls and layers of indirection (e.g. afs_suser()), but this is effectively what we do. This causes compiler warnings because we are doing incorrect pointer assignments, but the end result works because afs_osi_suser actually uses an afs_ucred_t*. The type confusion makes it very easy to accidentally give the wrong type to afs_osi_suser. This only really matters on SOLARIS, since that is the only platform that actually uses its argument to afs_osi_suser(). To fix all of this, just declare afs_osi_suser as taking an afs_ucred_t*, and fix all of the relevant functions to handle the right type. Change-Id: I1366aedf0f3d7689735a9424c5272233931e3bf2 Reviewed-on: https://gerrit.openafs.org/14085 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8d90a9d27b0ef28ddcdd3eb041c8a9d019b84b50 Author: Yadavendra Yadav Date: Thu Mar 5 07:21:55 2020 +0000 LINUX: Initialize CellLRU during osi_Init When OpenAFS kernel module gets loaded, it will create certain entries in "proc" filesystem. One of those entries is "CellServDB", in case we read "/proc/fs/openafs/CellServDB" without starting "afsd" it will result in crash with NULL pointer deref. The reason for crash is CellLRU has not been initialized yet (since "afsd" is not started) i.e afs_CellInit is not yet called, because of this "next" and "prev" pointers will be NULL. Inside "c_start()" we do not check for NULL pointer while traversing CellLRU and this causes crash. To avoid this initialize CellLRU during module intialization. Change-Id: I21cbc0e016b384f0ab456c05087384b6ed986b0d Reviewed-on: https://gerrit.openafs.org/14093 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 914193fa31af1f2aa9d755ce2215608b643053d0 Author: Michael Meffie Date: Fri Jan 24 13:40:28 2020 -0500 Cleanup vestiges of old shared library build directories Remove traces of the old shlibrpc and shlibafsauthent build directories, which are no longer needed since the conversion to libtool for building shared libraries. Change-Id: I8dbfdf9908b4a5527470b7cb4b969e7a160cdd51 Reviewed-on: https://gerrit.openafs.org/14045 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 832d0ab3124c481858bc8f440309d431cc74331f Author: Michael Meffie Date: Thu Dec 12 15:58:32 2019 -0500 doc: Replace src/SOURCE-MAP with src/README.md Replace the old and poorly maintained "SOURCE-MAP" file with a markdown formatted README.md file. Try to organize the directories in sections to hopefully make a more useful guide to the source code and build directories. Thanks to Cheyenne Wills and Benjamin Kaduk for suggestions. Change-Id: I50f58aa99453bc3412b60a7591d6957cfa83b5b1 Reviewed-on: https://gerrit.openafs.org/14003 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit df2688cf770ed2fd3f2c782f91fd576f098676cb Author: Michael Meffie Date: Fri Feb 21 10:08:42 2020 -0500 auth: accept a NULL afsconf_dir in afsconf_SetCellInfo again Commit 93b26c6f55245e2187e574eb928f5e0ce66a245e added the cellservDB field to the afsconf_dir structure to track the CellServDB pathname. This commit also changed the afsconf_SetCellInfo() and afsconf_SetExtendedCellInfo() functions to use the new cellservDB member to open the CellServDB file. Unfortunately, the bosserver intentionally calls afsconf_SetCellInfo() with a NULL afsconf_dir pointer when attempting to create the default CellServDB and ThisCell files (e.g., "localcell"), which causes the bosserver to crash on startup when the cell configuration is not present. Fix this by calling the static function to lookup the CellServDB pathname when a afsconf_dir data object is not given. Change-Id: I8d36f7c8afe6b4e13bfd04c421bf1109d1eb4238 Reviewed-on: https://gerrit.openafs.org/14061 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 302a203cf99fc0f11a402a31121cbe306f9bed30 Author: Michael Meffie Date: Thu Feb 20 16:09:49 2020 -0500 auth: pass the directory name to _afsconf_CellServDBPath Change the signature of the _afsconf_CellServDBPath() static function to take just the base directory name of the CellServDB file instead of the entire afsconf_dir data object. This makes it clear we do not need other members of the afsconf_dir structure to compose the CellServDB path. Change-Id: I57509b2ca09123e78df5533d63494c66b5b24cdf Reviewed-on: https://gerrit.openafs.org/14076 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Andrew Deason commit 7c431f7571bbc32b26180086d10932d41d0da08c Author: Michael Meffie Date: Thu Feb 20 15:58:27 2020 -0500 auth: retire writeconfig.c Move the afsconf_SetCellInfo() and afsconf_SetExtendedCellInfo() to the cellconfig.c file with the other afsconf_dir functions. Retire the now empty writeconfig.c file. At one point in the distant past afsconf_SetCellInfo() did not have a afsconf_dir argument, so it probably made sense to have a separate file to write the configuration. Later, the afsconf_dir argument was added to afsconf_SetCellInfo() and afsconf_SetExtendedInfo() to reset the auth cache, so these functions are now better placed in cellconfig.c. Note the contents of writeconfig.c were moved verbatim (including comments), so this commit should have no functional changes. Change-Id: Idff76f0d2dfa2383a8617373f0e38235a94f20f1 Reviewed-on: https://gerrit.openafs.org/14075 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit de031398c652045394adc150faaf0dcb6cf28bc3 Author: Andrew Deason Date: Wed Oct 2 15:14:21 2019 -0500 opr: Define opr_mutex_t in lockstub.h Like we do for opr_cv_t, define an opr_mutex_t to be a plain int, to allow opr mutexes to be defined easily without ifdef guards. Change-Id: Ib90017ac098ebc68ffd89890d448aabb2321f63e Reviewed-on: https://gerrit.openafs.org/13886 Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Tested-by: BuildBot commit 71a825a3d86faeaf69645d5faab1a14558069c4c Author: Benjamin Kaduk Date: Fri Jan 24 21:42:33 2020 -0800 RedHat: support the ppc64le architecture Reported by zhenjiang.cai@powercore.com.cn. FIXES 135065 Change-Id: I79718a8b4da8a73edf40e0221308c9babc5e85b5 Reviewed-on: https://gerrit.openafs.org/14046 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Michael Meffie Reviewed-by: Yadavendra Yadav Reviewed-by: Benjamin Kaduk commit cd3221d3532a28111ad22d4090ec913cbbff40da Author: Jeffrey Hutzelman Date: Thu May 2 16:02:47 2019 -0400 Linux: use override_creds when available Linux may perform some access control checks at the time of an I/O operation, rather than relying solely on checks done when the file is opened. In some cases (e.g. AppArmor), these checks are done based on the current tasks's creds at the time of the I/O operation, not those used when the file was open. Because of this, we must use override_creds() / revert_creds() to make sure we are using privileged credentials when performing I/O operations on cache files. Otherwise, cache I/O operations done in the context of a task with a restrictive AppArmor profile will fail. Change-Id: Icbe60874c348d6cd92b0a186d426918b0db9b0f9 Reviewed-on: https://gerrit.openafs.org/13751 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 042f809ccfe12bafed73aa4eb4db2c86737e0b22 Author: Michael Meffie Date: Fri Oct 18 13:43:36 2019 -0400 warn when starting without keys The server processes will happily start without keys and then fail all authenticated access, including database synchronization and local commands with -localauth. At least issue warnings to let admins know the keys are missing and that akeyconvert or asetkey needs to be run. The situation is not helped by fact the filenames of the key files have changed between versions. In 1.6.x the (non-DES) keys were in the rxkad.keytab file and in later versions they are in the KeyFile* files, so if you are used to 1.6.x it is not obvious what is wrong. Change-Id: Iff7fe9a5a5a0f5ea1f4e227d3f6129658f8eb598 Reviewed-on: https://gerrit.openafs.org/13911 Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a5f031d2fe50f068f5517ff8d64324c127b6420d Author: Mark Vitale Date: Wed Feb 19 14:48:07 2020 -0500 improve command-line help for --enable_peer_stats The command-line help for several OpenAFS servers lists an inaccurate description for the --enable_peer_stats option: "enable RX transport statistics" Improve the help description to be more clear and consistent with the description for --enable-process-stats. Introduced by the following commits: cd3492d volser: Convert command line parsing to cmd a5effd9 viced: Use libcmd for command line options 461603e vlserver: Use libcmd for command line parsing 0b9986c ptserver: Use libcmd for command line parsing Change-Id: Ibe23c61d4b838f3a3185390b18d25494fffde2ca Reviewed-on: https://gerrit.openafs.org/14072 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1626986bd6d70c526376cf7cedfd3ebbf6d3588a Author: Cheyenne Wills Date: Tue Feb 11 11:29:42 2020 -0700 LINUX 5.6: use struct proc_ops for proc_create The Linux commit d56c0d45f0e27f814e87a1676b6bdccccbc252e9 (proc: decouple proc from VFS with "struct proc_ops") was merged into Linux 5.6rc1. The commit replaces the 'file_operations' parameter for proc_create with a new structure 'proc_ops'. Conditionally initialize and use proc_ops structures instead of file_operations structures for calls to proc_create. Notes: * proc_ops.proc_ioctl is equivalent to file_operations.unlocked_ioctl * The macros HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL are both hardcoded to 1 in linux's fs.h * proc_ops.compat_ioctl is conditional on Linux's CONFIG_COMPAT macro which is a separate test from the HAVE_COMPAT_IOCTL macro Change-Id: I8570ca499696b4c31b381543107453fbfe355376 Reviewed-on: https://gerrit.openafs.org/14063 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 6d6a28720f4eae4652f2628fdfcc30983916f39d Author: Marcio Barbosa Date: Fri Feb 7 14:58:56 2020 -0300 macos: add anchors to synthetic.conf grep pattern The grep pattern that checks if /etc/synthetic.conf already has an entry for afs is intended to check if this file holds a single column entry named afs. Unfortunately, the current version does not completely enforce this restriction. To fix this problem, add anchors to the grep pattern in question. Change-Id: I15a1fa1c250027b7d3ab67e686cbfbae853251a2 Reviewed-on: https://gerrit.openafs.org/14062 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Yadavendra Yadav Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 09ec1073b4c5d2eb70dcf5d8063018bc82e5a35e Author: Mark Vitale Date: Sun Jan 26 20:17:40 2020 -0500 afs: silence bogus warning about dcListCount uninitialized Commit 3be5880d1d2a0aef6600047ed43d602949cd5f4d 'afs: Avoid panics in afs_InvalidateAllSegments' is correct, but at least one compiler (gcc 4.3.4 on SLES 11.3) is fooled into issuing a warning: [...]/afs_segments.c: In function 'afs_InvalidateAllSegments_once': [...]/afs_segments.c:506: error: 'dcListCount' may be used uninitialized in this function To silence the bogus warning, initialize dcListCount when defined. Change-Id: I5938c85c71d08ed61ec1f69a50afb19c9b31fa82 Reviewed-on: https://gerrit.openafs.org/14048 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 9238b1eb9ef02889855eaade76e5b7962e5f2f28 Author: Michael Meffie Date: Mon Jul 22 15:20:24 2019 -0400 vos: fix name availability check in vos rename The UV_RenameVolume() function first updates the volume name in the VLDB, then read-write volume header and backup volume header, and finally all of the read-only volume headers. If this function is interrupted or a remote site is not reachable, the names in some of the volume headers will be out of sync with name in the VLDB entry. The implementation of UV_RenameVolume() is idempotent, so can be safely called with the same name as in the volume's VLDB entry. This could be used to bring all the names in the volume headers in sync with the name in the VLDB. Unfortunately, due to the check of the -newname parameter, vos rename will not invoke UV_RenameVolume() when the name in the VLDB has already been changed. The vos rename command attempts to verify the desired name (-newname) is available before invoking UV_RenameVolume() by simply checking if a VLDB entry exists with that name, and incorrectly assumes when a VLDB entry exists with that name it is an entry for a different volume. Change the -newname check to allow vos rename to proceed when name has already been set in the VLDB entry of the volume being renamed. This allows admins to run vos rename command to complete a previously incomplete rename operation and bring the names in the volume headers in sync with the name in the VLDB entry. Note: Before this commit, administrators could workaround this vos rename limitation by renaming the volume twice, first to an unused volume name, then to the actual desired volume name. Remove the useless checks of the code1 return code after exit in the RenameVolume() function. These checks for code1 are never performed since the function exits early when the first VLDB_GetEntryByName() fails for any reason. Update the vos rename man page to show vos rename can be used to fix previously interrupted/failed rename. Also document the -oldname parameter accepts a numeric volume id to specify the volume to be renamed. Change-Id: Ibb5dbe3148e9b8295347925a59cd7bdbccbe8fe0 Reviewed-on: https://gerrit.openafs.org/13720 Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6c54bc9e121b923ec5fdd60ee510171987e55017 Author: Mark Vitale Date: Mon Jan 27 12:26:41 2020 -0500 uss: more gcc9 truncation warning appeasement uss_procs_PickADir needs a larger buffer to avoid a truncation warning. While here, replace some magic numbers with existing symbols. Change-Id: If981dddfa50bdbc8c4730cf8038429f071b1d5be Reviewed-on: https://gerrit.openafs.org/14049 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit bf1b3e2fc12a7502cfd74eb109eeb7131f7230d3 Author: Michael Meffie Date: Fri Jan 10 10:54:20 2020 -0500 tests: skip vos tests when a vlserver is already running The vos tests start a temporary vlserver process, which is problematic when the local system already has an installed vlserver. Attempt to temporarily bind a socket to the vlserver port, and if unable to bind with an EADDRINUSE error, assume the vlserver is already running and skip these tests. Change-Id: I1dd3bc4c7ebcd2c7bffc8aca422222a50058090e Reviewed-on: https://gerrit.openafs.org/14021 Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6d309f86089ea707dbeb6ab553e3dfd23b6c338c Author: Andrew Deason Date: Thu Jan 9 12:28:57 2020 -0600 afs: Remove osi_VMDirty_p The function osi_VMDirty_p is mentioned in a few places in src/afs, but it has always been ifdef'd or commented out, ever since OpenAFS 1.0. Remove the dead code. Change-Id: Ia7cad718114d91adf9e403e29f9ac976c3f08bfd Reviewed-on: https://gerrit.openafs.org/14023 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6ee2d6de7d87c93c849f3afbe4326906e4c10852 Author: Andrew Deason Date: Thu Jan 9 12:38:45 2020 -0600 aklog: Make dummy write AIX-specific This weird write() call exists to work around some old AIX-specific bug. The ifdef looks like it is intended to restrict this to pre-5 AIX, but it also turns this on for all non-AIX platforms. Make this area AIX-specific, to avoid this weird write on other platforms that have nothing to do with the relevant workaround. Change-Id: I092bcadb4ecc6277ae01e44e6a957e6bacc0cf2d Reviewed-on: https://gerrit.openafs.org/14022 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit dcf44ab5fc5c1f5e2e759ea4b6156f7e1faa4b7a Author: Michael Meffie Date: Fri Jan 10 09:06:38 2020 -0500 tests: do not resolve addresses in vos/vl test The vos-t test adds a set of 10.* test addresses to a test vlserver and runs vos to read them back. When the test is run in an environment where hosts have been assigned in the 10.* internal network, vos will resolve the addresses to hostnames and the test fails. Pass the -noresolve option to vos for this test when checking for the expected list of addresses. Example test output before this commit: ./vos-t ... # seen: 10.0.0.0 10.0.0.1 myhost.example.com 10.0.0.3 ... not ok 5 - vos output matches Change-Id: Ief43fe180a0dfff211f28d5f47be6224270907a3 Reviewed-on: https://gerrit.openafs.org/14020 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 37c5db3ce767868803135c916b282ff2e541d052 Author: Andrew Deason Date: Sun Dec 1 15:39:04 2019 -0600 FBSD: Declare vnops/vfsops static Declare our vnode and vfs operations as static functions, since they are not referenced outside of osi_vfsops.c/osi_vnodeops.c. Shuffle around the definitions in osi_vnodeops.c so that we don't need forward declarations for the functions. Change-Id: Idbbe05a8b248ac29c2795c365be6a4e99da536dd Reviewed-on: https://gerrit.openafs.org/13973 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a4e9365fff2b0e3daf7e9cf2b40e6027b7dd3a15 Author: Andrew Deason Date: Sun Dec 1 15:27:01 2019 -0600 FBSD: Remove support for 8.x and 9.x According to , FreeBSD 8.x EoL was on August 1, 2015, and FreeBSD 9.x EoL was on December 31, 2016. Remove our support for these versions, since they haven't been supported by FreeBSD itself for a while. FreeBSD 10.x EoL was on October 31, 2018, which has passed, but was less than a year ago. So keep 10.x in for now. Adjust our preprocessor checks accordingly: - In FBSD-specific dirs, assume AFS_FBSD100_ENV and lower is always true. Assume __FreeBSD_version is always at least 1000000. - In non-FBSD dirs, convert AFS_FBSD100_ENV and lower to AFS_FBSD_ENV. Change-Id: I965e65d3b95573bb374661217b24b686c7b68ed2 Reviewed-on: https://gerrit.openafs.org/13842 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit eab0bb0af87e9309bfb6b754f3521d24288bd933 Author: Andrew Deason Date: Wed Jan 1 20:25:05 2020 -0600 tests: Explicitly build target 'all' by default Commit 68f40643 (Build tests by default) added new targets in our top-level Makefile, that caused us to effectively run 'cd tests && make' as part of the default build. Since no explicit target is provided, 'make' tries to build the first target in the given Makefile. On some platforms (such as *BSD), 'make' finds the first defined target as a pattern rule (%.c) from our included makefiles, and tries to build the target %.c, which it cannot do. This causes the build to fail with: cd tests && make make[3]: don't know how to make %.c. Stop To fix this, just explicitly build the 'all' target when we build our tests by default. Change-Id: I319271482685ec35087c470d95fdcaec6e1d8c47 Reviewed-on: https://gerrit.openafs.org/13993 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit ce7a76a13e4009262dc42a6c93c371fb26116d41 Author: Andrew Deason Date: Tue Dec 31 12:25:32 2019 -0600 tests: Stop vlserver on errors Currently, if we encounter an error and 'goto out' after starting the test vlserver, we'll exit without stopping the test vlserver. This can confuse the test harness, causing 'runtests' to hang forever. To avoid this, move the afstest_StopServer() call to also run when we're bailing out, but only if the server has actally started of course. Change-Id: Ice5a56c20bc8d2eac85b3e760850c4d85e4601a8 Reviewed-on: https://gerrit.openafs.org/13992 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit a21a2f8edb79d6190976e920a9a90d0878411146 Author: Andrew Deason Date: Tue Dec 31 12:04:48 2019 -0600 tests: Introduce afstest_GetProgname Currently, in tests/volser/vos-t.c we call afs_com_err as "authname-t", which is clearly a mistake during some code refactoring (introduced in commit 2ce3fdc5, "tests: Abstract out code to produce a Ubik client"). We could just change this to "vos-t", but instead of specifying constant strings everywhere, change this to figure out what the current command is called, and just use that. Put this code into a new function, afstest_GetProgname, and convert existing tests to use that instead of hard-coding the program name given to afs_com_err. Change-Id: I3ed02c89f93798568783c7d717e8fb2e39dcce14 Reviewed-on: https://gerrit.openafs.org/13991 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 48d181ca1f4d753a51305d0352dadefed4323c00 Author: Andrew Deason Date: Tue Jan 7 13:02:21 2020 -0600 libtool: Serialize building libfoo.la and libfoo.a We have a few libraries where we have separate targets to build libfoo.la (to get libfoo.so) and libfoo.a. Currently, these targets can be built in parallel, and both are built with libtool. This can cause problems because of two behaviors with libtool: - When running --mode=link for libfoo.a or libfoo.la, it effectively runs 'rm -rf .libs/libfoo.*' to clean up its work area. - When running --mode=link for libfoo.a, libtool sets up some scratch space in .libs/libfoo.ax to unpack various static libs. So when 'make libfoo.a' is running, libtool creates a .libs/libfoo.ax dir, and unpacks various object files inside of it. If while that is running, 'make libfoo.la' runs, it causes libtool to remove that directory and all its contents. This causes 'make libfoo.a' to fail with confusing messages like this (for libafsrpc.a): /bin/sh ../../libtool --quiet --mode=link --tag=CC gcc -static -O -o libafsrpc.a [...] find: '.libs/libafsrpc.ax/libopr_pic.a': No such file or directory ar: .libs/libafsrpc.ax/libfsint_pic.a/afscbint.cs.o: No such file or directory make[3]: *** [Makefile:59: libafsrpc.a] Error To avoid this, prevent building libfoo.la and libfoo.a at the same time, by just making libfoo.la depend on libfoo.a. Do this for all of the libraries we build in this way: libafshcrypto, libkopenafs, libafsauthent, and libafsrpc. Change-Id: I821768b3b4cd99cf5bf98605068773347ada0fb2 Reviewed-on: https://gerrit.openafs.org/14017 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 057f848a9c7b12afbe6563878760c1eab64b99b3 Author: Andrew Deason Date: Fri Nov 1 15:19:23 2019 -0500 ubik: Introduce ugen_secproc_func We currently specify the signature of the 'secproc' function callback in multiple places. Consolidate them into a single typedef. Change-Id: Ic785f47fc726bff6c37f7fd826f1e2626d006776 Reviewed-on: https://gerrit.openafs.org/13986 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 86170750dd2cc49781fad53e539d67f4c1ed0a84 Author: Andrew Deason Date: Wed Oct 9 13:54:40 2019 -0500 doc: Document new rxgk options Commit e5b1e6f1 (Add rxgk client options to vl and pt utilities) added a couple of new command-line options related to rxgk, but didn't add them to the relevant man pages. Add a brief description of these new options to the manpages for pts, vos, ptserver, and vlserver. Change-Id: I2d9bfdeb0a31d396740ca2a4d42e14c025b6f79e Reviewed-on: https://gerrit.openafs.org/13947 Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit bebae936b4ef3bf47624c0ff0baae5521bad804e Author: Cheyenne Wills Date: Thu Jan 2 11:18:16 2020 -0700 afs: Fix EIO error when reading a 4G or larger file When reading a file with a file length of >= 4G, the cache manager is failing the read with an EIO error. In afs_GetDCache, the call to IsDCacheSizeOK is passed a parameter that contains only the lower 32bits of the file length (which requires a 64 bit value). This results in the EIO error if the length is over 2^32 -1. The AFSFetchStatus.Length member needs to be combined with the AFSFetchStatus.Length_hi to obtain the full 64bit file length. Fix the calls to IsDCacheSizeOK to use the full 64bit file length. Commit "afs: Check dcache size when checking DVs 7c60a0fba11dd24494a5f383df8bea5fdbabbdd7" - gerrit 13436 - added the IsDCacheSizeOK function and the associated calls. As a note, the AFSFetchStatus.DataVersion is the lower 32 bits of the full 64bit version number, AFSFetchStatus.dataVersionHigh contains the high order 32bits. The function IsDCacheSizeOK is passed just the 32bit component, the only use of the parameter is in an error message. Change-Id: Idbe6233bd6ef792ed2b92d9337aba334e23f1452 Reviewed-on: https://gerrit.openafs.org/14002 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit daf6616aab6732d6b417c15f6f401731ef8e44b5 Author: Marcio Barbosa Date: Sat Dec 21 19:56:41 2019 -0800 macos: add entry for afs into synthetic.conf The root mount point is read-only as of macOS 10.15. As a result, /afs cannot be created at this location. To workaround this restriction, macOS 10.15 provides an alternative way to create mount points at the root. To make it possible, an entry for the mount point in question must be added to /etc/synthetic.conf. The synthetic entities described in this file are not physically present on the disk. Instead, they are synthesized by the kernel during system boot. This commit adds an entry for afs into the file mentioned above. Knowing that this change only takes effect after reboot, also provide directions to the user during the installation process. Change-Id: I7a05f4b9a48e443dbaa20a624a92b8b54c510000 Reviewed-on: https://gerrit.openafs.org/13928 Tested-by: BuildBot Reviewed-by: Yadavendra Yadav Reviewed-by: Benjamin Kaduk commit 0563642cc1cb750c69a6471005adf36fabb2b7e3 Author: Marcio Barbosa Date: Sat Dec 21 19:11:57 2019 -0800 macos: add script to notarize OpenAFS In order to integrate the notarization process into our existing build scripts, this patch introduces a script to automatically notarize the OpenAFS package. Change-Id: Ia9743cd39485e68de540b79b165b9d92020ad187 Reviewed-on: https://gerrit.openafs.org/13671 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 10d176afd23bbf684017a7946dffb1d592ea04fa Author: Andrew Deason Date: Wed Oct 23 15:46:16 2019 -0500 Do not build shared-only libs for --disable-shared Commit 0f1e54c4 (Pass -shared when linking some shared libraries) changed some of our linking rules to pass -shared to libtool when linking. When building with the --disable-shared configure option, this causes those linker rules to fail, since shared libraries are disabled. Before commit 0f1e54c4, we could build with --disable-shared successfully. To allow us to build again with --disable-shared, just don't build the relevant shared-only libraries at all, when shared libraries are disabled. To accomplish this, introduce a new substitution variable, SHARED_ONLY, which allows certain lines in Makefiles to become commented-out when shared libraries are disabled. Update all of the shared-only libraries to be built conditionally based on this variable. Except for libuafs.la, which appears to be not referenced by anything. Just remove the rules for that instead. Change-Id: I82084a08d2f9c12ca438bd7b1626e1376159c975 Reviewed-on: https://gerrit.openafs.org/13927 Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d0941e81b2f1f499cebb57d8a81d82802913d9be Author: Andrew Deason Date: Fri Oct 25 19:04:44 2019 -0500 pts: Use cmd_AddParmAtOffset for common parms Update pts to use cmd_AddParmAtOffset and symbolic constants for our common parameters, instead of using bare literals like '16'. Change-Id: Ib8fe77983a6bba46c3182585774e067512449f0e Reviewed-on: https://gerrit.openafs.org/13946 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 90726f837cd03a4eef745ab6bc221987042a72a6 Author: Andrew Deason Date: Tue Oct 29 20:17:39 2019 -0500 tests: Check if vlserver died during startup Currently, the volser/vos test starts a local vlserver to communicate with. If the vlserver dies during startup, the spawned 'vos' subprocesses take forever to run, since we need to wait for our Rx calls to timeout for every operation. To make it less annoying to detect and investigate errors that might cause the vlserver to fail during startup, check if the vlserver dies right away. We already sleep for 5 seconds when starting the vlserver, so just check if the pid still exists after those 5 seconds. Change-Id: I6c33059542fa975e4cb389b718f9da190cd13289 Reviewed-on: https://gerrit.openafs.org/13942 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 94acb9f36b2e14d24a485e016ec7ab264115c0be Author: Andrew Deason Date: Mon Sep 9 14:27:40 2019 -0500 rx: Make rx_identity_free idempotent rx_identity_free sets the given identity to NULL, but it unconditionally derefs the given identity. Make it a no-op for NULL identities, to make related cleanup code and destructors simpler. Change-Id: I863c72be71fb4b3056a2cd8fc2bf19cfb2d5dfbb Reviewed-on: https://gerrit.openafs.org/13945 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d3d2530691a0d5e45e6752d5cc012357ecbd410e Author: Andrew Deason Date: Wed Aug 21 12:43:03 2019 -0500 rx: Make rx_opaque_free idempotent Currently rx_opaque_free sets the given argument to NULL, a style that helps prevent double-frees. However, it doesn't check if the given buffer is already NULL, which makes potential callers that use a 'goto done'-style cleanup block do something like: done: if (buf) rx_opaque_free(&buf); To avoid the extra if(), make rx_opaque_free a no-op if it's given a NULL buffer, similar to how free(NULL) is a no-op on most platforms. Slightly refactor how we reference our argument as well, to limit the number of layers of indirection the code needs to deal with. Do the same for rx_opaque_zeroFree. Note that there are currently no callers of rx_opaque_free/rx_opaque_zeroFree, but future commits will add some. Change-Id: Ic86a9c63903bebbddd311912cfbcb61198e3f0b0 Reviewed-on: https://gerrit.openafs.org/13944 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 71bf9ac08c1dd7566fd5d6b438293614afdc1d13 Author: Andrew Deason Date: Mon Sep 23 22:43:30 2019 -0500 ptserver: Fix WhoIsThisWithName indentation Many lines in this block in WhoIsThisWithName are oddly indented by 1 more space than usual. Fix them. Change-Id: I5e3ec4974cebc694c7b02c1ea6e037d4ec335a12 Reviewed-on: https://gerrit.openafs.org/13943 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 68f406436cc21853ff854c514353e7eb607cb6cb Author: Andrew Deason Date: Tue Oct 29 17:22:04 2019 -0500 Build tests by default While it's not feasible to run all of our tests by default during the build, we should be able to at least make sure the tests can build. So, make the default build targets also build our tests, by making the 'finale' target build the tests. Change-Id: Ieadd48ba2774526de8a13136e6cc8a50434ed2f5 Reviewed-on: https://gerrit.openafs.org/13941 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 0b8b6683fb525bbeaf118014beb2371e0cf23d90 Author: Andrew Deason Date: Mon Nov 11 20:34:27 2019 -0600 tests: Fix manpage tests for objdir builds The manpage tests have a couple of problems when running for objdir builds: - We try to specify './tests-lib/perl5' as a directory to find our helper library. However, the cwd when we're running the tests is in an objdir build, where the helper library is in the srcdir. Fix this by using the SOURCE env var specified by the tests wrapper. - All of these tests specify the directory in which to find the man pages in a subdir of BUILD, but our manpages are located in the src dir (since they are built by regen.sh, not by configure/make). Fix this by specifying a SOURCE-based directory instead. To avoid needing to make the same change for each of these tests, also refactor the manpage tests so each test only needs to specify the subdirectory and command name, and get rid of some of the common boilerplate. Change-Id: I96be199b1dec8db0545ae3cf19d2595c4afe4cdd Reviewed-on: https://gerrit.openafs.org/13940 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 63fd13bf9e6af21136007c9980816875ebea5f7c Author: Marcio Barbosa Date: Tue Nov 26 11:41:36 2019 -0800 macos: prepare for notarization With the public release of macOS 10.14.5, all new and updated kernel extensions must be notarized by Apple. To be taken into consideration, all executables must be signed and the Hardened Runtime capability must be enabled. This patch adds the missing prerequisites mentioned above. Change-Id: I2d3ad66cb7ce062b91d0616955f3bc2b06ca5822 Reviewed-on: https://gerrit.openafs.org/13670 Reviewed-by: Cheyenne Wills Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit c7864b73603842b8beaee03fcbb2426890205410 Author: Marcio Barbosa Date: Fri Jun 28 00:40:55 2019 -0300 macos: packaging support for MacOS X 10.15 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.15 "Catalina". Change-Id: I628a3210fa42b2f34ff78030930f83e836775392 Reviewed-on: https://gerrit.openafs.org/13669 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 93815caabc92acc6edc62b72805b44d2e46748cf Author: Marcio Barbosa Date: Mon Nov 18 06:34:08 2019 -0800 macos: add support for MacOS 10.15 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.15 "Catalina". Change-Id: I849d4c837bf9ae36fe5c33356bc1c66a2fc513ac Reviewed-on: https://gerrit.openafs.org/13668 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit d4302d42149988fa6d04d626967063dfa916c9fd Author: Marcio Barbosa Date: Thu Dec 12 19:03:04 2019 -0800 macos: upgrade *.xib files According to Xcode 11, the *.xib files updated by this commit use an older format that is potentially insecure when decoded. To fix this problem, Xcode automatically upgraded these files to the modern format. These changes are required to build OpenAFS on Catalina (Xcode 11). Change-Id: Ica8c464eff93496d87fc854b193bfb0dad07a3c2 Reviewed-on: https://gerrit.openafs.org/13935 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 677b038814817defec9421e698ce67b44a7fd7d1 Author: Marcio Barbosa Date: Thu Nov 7 23:56:13 2019 -0300 macos: tell the compiler the system include path In order to support multiple SDKs, macOS Catalina no longer has the /usr/include directory. As a result, the compiler needs to know where these headers can be found. To successfully build OpenAFS on OSX 10.15, set KROOT so the compiler knows the correct location of these headers. Change-Id: I5ef33b34b6a4e6111983a63a2d34326ca4af9d30 Reviewed-on: https://gerrit.openafs.org/13936 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit f4ab3767b7e65028b93e731da6f09ee385c51daf Author: Andrew Deason Date: Mon Nov 11 20:34:07 2019 -0600 tests: Fix most tests for objdir builds Fix a few miscellaneous issues with building and running our tests in objdir builds: - Our C tests use -I$(srcdir)/../.. in the CFLAGS, so we can #include . However, basic.h actually gets copied from src/external/c-tap-harness/tests/tap/ to tests/tap/ during the build, and so basic.h is available in the objdir, not srcdir. For objdir builds, this causes building the tests to fail with failing to find basic.h. Fix this to use TOP_OBJDIR as the include path instead. - Our 'make check' in tests/ tries to run ./libwrap; but our cwd will be in the objdir for objdir builds, and libwrap is a script in our srcdir. Fix this to run libwrap from the srcdir path. - In tests/opr/softsig-t, it tries to find the 'softsig-helper' binary in the same dir as 'softsig-t'. However, softsig-t is just a script in the srcdir, but softsig-helper is a binary built in the objdir. Fix this to use the BUILD env var provided by the tests wrapper, by default. Change-Id: Iff642613bfc88d0d7e348660dc62f59e6fa8af75 Reviewed-on: https://gerrit.openafs.org/13939 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 847b63af92dd527de31675a0c3c82c9a57e6c4b3 Author: Andrew Deason Date: Sun Aug 25 23:21:23 2019 -0500 FBSD: Remove pre-8 code Commit 123f0fb1 (config: remove support for old FreeBSD releases) removed our support for FreeBSD releases before FreeBSD 8. However, various areas of code still reference the symbols from those old versions (e.g. AFS_FBSD53_ENV). Remove our ifdef logic for these old symbols, according to the following rules: - In FBSD-specific dirs, assume AFS_FBSD80_ENV is always true (as well as the symbols for earlier versions) - In non-FBSD dirs, convert AFS_FBSD80_ENV to AFS_FBSD_ENV (and do the same for all earlier versions) This allows us to remove code that was specific to older FreeBSD versions, and simplify some ifdef conditionals. Also remove the definitions for AFS_FBSD80_ENV and earlier versions in our existing param.h files. With this commit, the functions afs_start, afs_vop_lock, afs_vop_unlock, and afs_vop_islocked are now always unreferenced, so remove them. Change-Id: Ia5a5ba5ee5b71a86cb4514305e20f1bb34487100 Reviewed-on: https://gerrit.openafs.org/13812 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit f9c716fca1becea5a41fbe86535759ef817c924d Author: Yadavendra Yadav Date: Fri Dec 6 15:23:34 2019 +0530 afs: Add ppc64le changes in osconf.m4 file. If swig package is installed on a ppc64le system, build fails for "libuafs" while running "shlib-build". "shlib-build" gets executed for builing ukernel.so and this is triggered if "LIBUAFS_BUILD_PERL" is not empty. Having "swig" package on system sets "LIBUAFS_BUILD_PERL" to 'LIBUAFS_BUILD_PERL' value. The reason for build failure was inside "shlib-build", 'linker' was not set (it was empty). 'linker' value is set based on SHLIB_LINKER, which was not defined in osconf.m4 if build system is ppc64le. To fix this add ppc64le_linux26 case in osconf.m4 file. Change-Id: I79d2f78b2af34207c81f4f5ab05fdc387404acad Reviewed-on: https://gerrit.openafs.org/13980 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d79a8e13e5c1f6d1cf13a308ea506609b578ed84 Author: Cheyenne Wills Date: Mon Dec 2 13:12:00 2019 -0700 util: Use a struct for afsUUID_to_string Replace the use of a character array with a structure that contains the size of the buffer that is needed. This allows the C compiler to perform a type check to ensure the correct sized buffer is used. In addition, the size of the buffer is now specified in just one location. Change the signature of the afsUUID_to_string function to return a pointer to the start of a formatted UUID. This allows the use of afsUUID_to_string in a way that is consistent with other object formatting functions: struct uuid_fmtbuf uuidstr; printf("... %s ...", afsUUID_to_string(uuid, &uuidstr)); Update callers to use the new uuid_fmtbuf struct when calling afsUUID_to_string. Change-Id: I6d6f86ce6c058defc6256e8e88dee4449dd4f7e6 Reviewed-on: https://gerrit.openafs.org/13831 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f5f8b9336919debc5c26c429b12a14b65e0b697c Author: Marcio Barbosa Date: Thu Nov 14 17:29:56 2019 -0300 viced: add opt to allow admin writes on RO servers Add the new option -admin-write to allow write requests from superusers on file servers running in readonly mode (-readonly). This lets sites run fileservers in readonly mode for normal users, but allows members of the system:administrators group to modify content. Change-Id: Id8ed3513a748815c07cb98e426c1d21ac300b416 Reviewed-on: https://gerrit.openafs.org/13707 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 7cdf1a93cfdfd4a0959200197f000679199abbd4 Author: Andrew Deason Date: Fri Nov 29 11:42:47 2019 -0600 afs: Skip checking chunkBytes sanity for RW files Currently, the IsDCacheSizeOK check can trigger a false positive for a dcache, if the data in the dcache was populated by a local write to a file that was later extended with sparse data. For example: say a client opens a new file, and writes 4 bytes to offset 0, and then writes 4 bytes to offset 0x400000. After the first write, the first chunk for the file will contain just 4 bytes, and after the second write, the first chunk is unchanged (since we're writing to a different area of the file), but the file is now 0x400004 bytes long. The sparse area of the file will be correctly filled with zeroes for local reads and on the fileserver, but the 4-byte chunk causes IsDCacheSizeOK to complain and mark the dcache as invalid. Even though nothing is wrong, this causes the following scary messages to potentially appear in the kernel log, and the relevant dcache to be invalidated: afs: Detected corrupt dcache for file 1.536870913.2.2: chunk 0 (offset 0) has 4 bytes, but it should have 131072 bytes afs: (dcache 0xfffffdeadbeefb4d, file length 4194308, DV 1, dcache mtime 1575049956, index 996, dflags 0x2, mflags 0x0, states 0x4, vcache states 0x1) afs: Ignoring the dcache for now, but this may indicate corruption in the AFS cache, or a bug. It's probably difficult or impossible to detect if this specific case is happening, so to avoid this scenario, just avoid doing the size check at all for RW data from the cache. Change-Id: Ia40ec838c525d9abc13a03be39028e4ca04a9457 Reviewed-on: https://gerrit.openafs.org/13969 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0593017177edd5b3bc6609d9dfcce55f15bba3e9 Author: Marcio Barbosa Date: Thu Nov 14 01:15:47 2019 -0300 viced: prevent writes on readonly fileservers Currently, a fileserver can be initialized as readonly. In this mode, writes on this server should not be allowed. Unfortunately, updates on files stored by readonly fileservers are not completely prevented. In some situations, the check for RO server is omitted (e.g. if the user is the owner of the file to be updated). In other situations, the same check is redundant. To fix these problems, consolidate this check in one place. Change-Id: Id53e15216404dfe691a87c7b4964ff08924c262c Reviewed-on: https://gerrit.openafs.org/13934 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 2ae2a15c9dc9b26eaa15964cc96fdeeb6d82c74c Author: Marcio Barbosa Date: Mon Jun 6 14:03:54 2016 -0300 sys: retry lsetpag if errno is EINTR The variable errno might be set by some system calls to indicate the reason why the system call in question did not work as expected. If the setpag system call is interrupted by a signal, the value of errno will be EINTR. This value means that setpag did not succeed because it was interrupted. If lsetpag did not succeed and errno is equal to EINTR, try again. Change-Id: Ibf306d62fc8d2fa9ccb0692f9031c5aa659b2bfe Reviewed-on: https://gerrit.openafs.org/12295 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 9563807791e2402f7a214a90e96cf6ed8ea5abfb Author: Marcio Barbosa Date: Thu Nov 7 00:10:12 2019 -0300 afs: afs_pag_wait() makes process unkillable To enforce a maximum average rate of one PAG allocation per second, afs_pag_wait(), called by afs_setpag*(), sleeps until the difference between the current time and pag_epoch gets greater than pagCounter. Unfortunately, this function ignores the code returned by afs_osi_Wait(). As a result, it is not possible to kill the process that requested the new pag while afs_pag_wait() is sleeping. To fix this problem, do not ignore the code returned by afs_osi_Wait(). Change-Id: I6be11a569edcafa6ecdf716e5315fc75f5a128e8 Reviewed-on: https://gerrit.openafs.org/12260 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 9d0854547522f7b2fb1bb7aa876fe9f901674747 Author: Andrew Deason Date: Sun Nov 17 20:58:15 2019 -0600 afs: Ensure CDirty is set during afs_write loop Currently, in afs_write(), we set CDirty on the given vcache, and then write the given data into various dcaches. When writing to a dcache, we call afs_DoPartialWrite, which may cause us to flush the dirty data to the fileserver and clear the CDirty bit. If we were given more than 1 chunk of data to write, we will then go through another iteration of the loop, writing more dirty data into dcaches, but CDirty will not be set. This can cause issues with, for example, afs_SimpleVStat() or afs_ProcessFS(), which use CDirty to determine whether or not to merge in FetchStatus info from the fileserver into our local cache. This can cause our local cache to incorrectly reflect the state of the file on the fileserver, instead of the state of the locally-modified file in our cache. A more detailed example is as follows. Consider a small C program that copies a file, fchmod()ing the destination before closing it: void do_copy(char *src_name, char *dest_name) { /* error checking elided */ src_fd = open(src_name, O_RDONLY); dest_fd = open(dest_name, O_WRONLY|O_CREAT|O_TRUNC, 0755); fstat(src_fd, &st); src_buf = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, src_fd, 0); write(dest_fd, src_buf, st.st_size); munmap(src_buf, st.st_size); close(src_fd); fchmod(dest_fd, 0100644); close(dest_fd); } Currently, on FBSD, using this to copy a 7862648-byte file, using a smallish cache (10000 blocks) will cause the destination to appear to be truncated, because avc->f.m.Length will be incorrect, even though all of the relevant data was written to the fileserver. On most other platforms such as SOLARIS and LINUX, this is not a problem, since currently they only write one page of data at a time to afs_write(), and so they never hit multiple iterations of the while() loop inside afs_write(). To fix this, just set CDirty on every iteration of the while() loop in afs_write(). In general, we need to set CDirty after calling afs_DoPartialStore() anywhere if the caller continues to write more data. But all callers already do this, except for this one instance in afs_write(). Thanks to tcreech@tcreech.com for helping find occurrences of the relevant issue. FIXES 135041 Change-Id: I0f7a324ea2d6987a576786292be2d06487359aa6 Reviewed-on: https://gerrit.openafs.org/13948 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 4a9078c6bbf51720a5eacf7e6ba21443e5103eee Author: Andrew Deason Date: Tue Nov 5 10:50:01 2019 -0600 afs: Avoid giving wrong 'tf' to afs_InitVolSlot Commit 75e3a589 (libafs: afs_InitVolSlot function) split out a bit of our code that initializes a struct volume into the afs_InitVolSlot function. However, it caused us to almost always pass a non-NULL 'tf' to afs_InitVolSlot, even if the target volume was not found. That is, before that commit, our code roughly did this: for (...; j != 0; j = tf->next) { ...; tf = &staticVolume; if (tf->volume == volid) break; } if (tf && j != 0) { use_tf_data(); } else { use_blank_data(); } The reason for the extra 'j != 0' check after the loop is to see if we hit the end of the volume hash chain, or if we actually found a matching 'tf' in the loop. And after that commit, the code did this: for (...; j != 0; j = tf->next) { ...; if (j != 0) { tf = &staticVolume; if (tf->volume == volid) break; } } if (tf) { use_tf_data(); } else { use_blank_data(); } The check for 'j != 0' was moved to inside the for loop, but 'j' is always nonzero in the loop (otherwise, the for() would exit the loop). This means that if we didn't find a matching 'tf' in the loop, our 'tf' would be non-NULL anyway, and so we'd initialize our volume slot from just the last entry in the hash chain. This means that for volumes that are not found in the VolumeItems file, our struct volume will probably be initialized with arbitrary data from another volume, instead of being initialized to the normal defaults (the 'else' clause in afs_InitVolSlot). This means that the 'dotdot' entry for the volume may be wrong, and so we may report the wrong parent dir for the root of a volume. However, the 'dotdot' entry should be fixed when the volume root is accessed via a mountpoint, so any such issue should be temporary. And of course, on some platforms (LINUX) we don't ever use the 'dotdot' information for a volume, and even on other platforms, often resolving the '..' entry is handled by other means (e.g. shells often calculate it themselves). But some 'pwd' calculations and other '..' corner cases may be affected. To fix this, change the relevant loop so that we only set 'tf' to non-NULL when we actually find a matching entry. Change-Id: I53118960462c0057725e749cbf588e98024217c3 Reviewed-on: https://gerrit.openafs.org/13933 Tested-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 360b9d5d71fb1de142ae4efd4660732476855a3f Author: Andrew Deason Date: Mon Nov 4 20:03:43 2019 -0600 afs: Avoid -1 error for vreadUIO/vwriteUIO Commit c6b61a45 (afs: Verify osi_UFSOpen worked) added various checks to return an error if a given osi_UFSOpen failed. However, two of these checks (in afs_UFSReadUIO and afs_UFSWriteUIO) result in us returning -1 on error, in functions that otherwise return errno codes (e.g. ENOSPC). An error code of -1 might get interpreted as RX_CALL_DEAD, which would be rather confusing, so use EIO as a generic error instead. Change-Id: I23b9a73b82d999d8ee4670b5e7ec39b9d820fb0f Reviewed-on: https://gerrit.openafs.org/13931 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit b3b56d79653566ef1442d296e31beb762d25ce42 Author: Andrew Deason Date: Mon Nov 4 16:10:25 2019 -0600 doc: Fix realm capitalization In this example, krbtgt.Example.COM clearly refers to the principal name converted from krbtgt/Example.COM, and so by convention the realm name would be in all caps. Fix this example to use the all-caps realm name, for consistency. This mistake was introduced by commit 1cc8feb6 (doc: replace hostnames with IETF example hostnames), the realm was in all caps before that commit. Mistake spotted by Chas Williams. Change-Id: Icaf4931868752064c4617c8ad778122e076ae3cb Reviewed-on: https://gerrit.openafs.org/13930 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 6ec46ba7773089e1549d27a0d345afeca65c9472 Author: Andrew Deason Date: Mon Sep 16 14:06:53 2019 -0500 OPENAFS-SA-2019-003: ubik: Avoid unlocked ubik_currentTrans deref Currently, SVOTE_Debug/SVOTE_DebugOld examine some ubik internal state without any locks, because the speed of these functions is more important than accuracy. However, one of the pieces of data we examine is ubik_currentTrans, which we dereference to get ubik_currentTrans->type. ubik_currentTrans could be set to NULL while this code is running, so there is a small chance of this code causing a segfault, if SVOTE_Debug() is running when the current transaction ends. We only ever initialize ubik_currentTrans as a write transation (via SDISK_Begin), so this check is pointless anyway. Accordingly, skip the type check, and always assume that any active transaction is a write transaction. This means we only ever access ubik_currentTrans once, avoiding any risk of the value changing between accesses (and we no longer need to dereference it, anyway). Note that, since ubik_currentTrans is not marked as 'volatile', some C compilers, with certain options, can and do assume that its value will not change between accesses, and thus only fetch the pointer value once. This avoids the risk of NULL dereference (and thus, crash, if pointer stores/loads are atomic), but the value pointed to by ubik_currentTrans->type would be incorrect when the transaction ends during the execution of SVOTE_Debug(). Change-Id: Ia36c58e5906f5e8df59936f845ae11e886e8ec38 Reviewed-on: https://gerrit.openafs.org/13915 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 93aee3cf40622993b95bd1af77080a31670c24bb Author: Andrew Deason Date: Wed Aug 7 21:19:47 2019 -0500 OPENAFS-SA-2019-002: Zero all server RPC args Currently, our server-side RPC argument-handling code generated from rxgen initializes complex arguments like so (for example, in _RXAFS_BulkStatus): AFSCBFids FidsArray; AFSBulkStats StatArray; AFSCBs CBArray; AFSVolSync Sync; FidsArray.AFSCBFids_val = 0; FidsArray.AFSCBFids_len = 0; CBArray.AFSCBs_val = 0; CBArray.AFSCBs_len = 0; StatArray.AFSBulkStats_val = 0; StatArray.AFSBulkStats_len = 0; This is done for any input or output arguments, but only for types we need to free afterwards (arrays, usually). We do not do this for simple types, like single flat structs. In the above example, we do this for the arrays FidsArray, StatArray, and CBArray, but 'Sync' is not initialized to anything. If some server RPC handlers never set a value for an output argument, this means we'll send uninitialized stack memory to our peer. Currently this can happen in, for example, MRXSTATS_RetrieveProcessRPCStats if 'rxi_monitor_processStats' is unset (specifically, the 'clock_sec' and 'clock_usec' arguments are never set when rx_enableProcessRPCStats() has not been called). To make sure we cannot send uninitialized data to our peer, change rxgen to instead 'memset(&arg, 0, sizeof(arg));' for every single parameter. Using memset in this way just makes this a little simpler inside rxgen, since all we need to do this is the name of the argument. With this commit, the rxgen-generated code for the above example now looks like this: AFSCBFids FidsArray; AFSBulkStats StatArray; AFSCBs CBArray; AFSVolSync Sync; memset(&FidsArray, 0, sizeof(FidsArray)); memset(&CBArray, 0, sizeof(CBArray)); memset(&StatArray, 0, sizeof(StatsArray)); memset(&Sync, 0, sizeof(Sync)); Change-Id: Iedccc25e50ee32bd1144e652b951496cb7dde5d2 Reviewed-on: https://gerrit.openafs.org/13914 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ea276e83e37e5bd27285a3d639f2158639172786 Author: Andrew Deason Date: Wed Aug 7 20:50:47 2019 -0500 OPENAFS-SA-2019-001: Skip server OUT args on error Currently, part of our server-side RPC argument-handling code that's generated from rxgen looks like this (for example): z_result = SRXAFS_BulkStatus(z_call, &FidsArray, &StatArray, &CBArray, &Sync); z_xdrs->x_op = XDR_ENCODE; if ((!xdr_AFSBulkStats(z_xdrs, &StatArray)) || (!xdr_AFSCBs(z_xdrs, &CBArray)) || (!xdr_AFSVolSync(z_xdrs, &Sync))) z_result = RXGEN_SS_MARSHAL; fail: [...] return z_result; When the server routine for implementing the RPC results a non-zero value into z_result, the call will be aborted. However, before we abort the call, we still call the xdr_* routines with XDR_ENCODE for all of our output arguments. If the call has not already been aborted for other reasons, we'll serialize the output argument data into the Rx call. If we push more data than can fit in a single Rx packet for the call, then we'll also send that data to the client. Many server routines for implementing RPCs do not initialize the memory inside their output arguments during certain errors, and so the memory may be leaked to the peer. To avoid this, just jump to the 'fail' label when a nonzero 'z_result' is returned. This means we skip sending the output argument data to the peer, but we still free any argument data that needs freeing, and record the stats for the call (if needed). This makes the above example now look like this: z_result = SRXAFS_BulkStatus(z_call, &FidsArray, &StatArray, &CBArray, &Sync); if (z_result) goto fail; z_xdrs->x_op = XDR_ENCODE; if ((!xdr_AFSBulkStats(z_xdrs, &StatArray)) || (!xdr_AFSCBs(z_xdrs, &CBArray)) || (!xdr_AFSVolSync(z_xdrs, &Sync))) z_result = RXGEN_SS_MARSHAL; fail: [...] return z_result; Change-Id: I2bdea2e808bb215720492b0ba6ac1a88da61b954 Reviewed-on: https://gerrit.openafs.org/13913 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a455452d7ee98d160620925bb8a0e3d0f4dfd7ec Author: Cheyenne Wills Date: Tue Oct 1 12:14:41 2019 -0600 LINUX 5.3: Add comments for fallthrough switch cases With commit 6e0f1c3b45102e7644d25cf34395ca980414317f (LINUX: Honor --enable-checking for libafs) building libafs against a linux 5.3 kernel compiles with errors due to fall through in case statements when --enable-checking / --enable-warning is used. e.g. src/opr/jhash.h:82:17: error: this statement may fall through [-Werror=implicit-fallthrough=] case 3 : c+=k[2]; ~^~~~~~ The GCC compiler will disable the implicit-fallthrough check for case statements that contain a "special" comment ( /* fall through */ ). Add the 'fall through' comment to indicate where fall throughs are acceptable. This commit only adds comments and does not alter any executable code. The -Wimplicit-fallthrough flag was enabled globally in the linux kernel build in 5.3-rc2 (commit: a035d552a93bb9ef6048733bb9f2a0dc857ff869 Makefile: Globally enable fall-through warning) Change-Id: Ie6ca425e04b53a22d07b415cb8afd172af7e8081 Reviewed-on: https://gerrit.openafs.org/13881 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 747afb94aa214217a749471679082c6ed8e81e92 Author: Marcio Barbosa Date: Thu Sep 20 08:44:59 2018 -0400 afs: avoid extra VL_GetEntryByName for .readonly's In the VLDB, there's only one logical entry for a volume and its associated clones; there are not separate entries for the RW volume "avol", the RO volume "avol.readonly", and the BK volume "avol.backup". And so, when looking up a volume in the VLDB by name, the vlserver ignores any trailing ".readonly" or ".backup" in the given name. More concretely, the result of calling VL_GetEntryByName*("avol") is identical to that from calling VL_GetEntryByName*("avol.readonly"). Accordingly, if afs_GetVolumeByName(name) failed because the volume was not found in the VLDB, afs_GetVolumeByName(name.readonly) will fail as well (barring a change in external circumstances, such as the volume being created or a network connection coming back up). Therefore, the extra call in EvalMountData() is not necessary and can be removed. Remove the extra call, to slightly improve the response time of the client if the volume in question does not exist, and to reduce vlserver load when patched clients are looking up nonexistent volumes. Change-Id: I4f2f668107281565ae72a563a263121bd9bb7e3c Reviewed-on: https://gerrit.openafs.org/13334 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 860cbec815d61db2d82870290652a3bc7471b8e3 Author: Michael Meffie Date: Tue Oct 1 16:16:16 2019 -0400 RedHat: package rxstat_* programs Install libadmin rxstat_* sample programs with 'make install'/'make dest'. Include these programs in the openafs rpm package. Change-Id: I81b965cf440c869072cce0065a3c74c4c699b8b8 Reviewed-on: https://gerrit.openafs.org/13883 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b03f3e6101ff21a6f148c555c213c47678482a7b Author: Cheyenne Wills Date: Thu Oct 3 10:21:43 2019 -0600 RedHat: Update makesrpm.pl to use @PACKAGE_VERSION@ instead of @VERSION@ Commit 2f2c2ce62aa17ecac3651d64c1168af926f7458b 'Remove automake autoconf vars' replaced the automake variable @VERSION@ with the autoconf variable @PACKAGE_VERSION@. (Gerrit #13357) The RedHat openafs.spec.in is not processed using autoconf, but by 'makesrpm.pl', which was not updated to use @PACKAGE_VERSION@. Update makesprm.pl to use @PACKAGE_VERSION@ instead of @VERSION@ Change-Id: I74d1d61e40e660459942ec68cfdedfe569a6abeb Reviewed-on: https://gerrit.openafs.org/13887 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d9fc4890f01a41fa5a63f97f2446b3afc35b473f Author: Andrew Deason Date: Thu Sep 26 13:35:51 2019 -0500 rx: Fix test for end of call queue for LWP Commit 6ad3d646 (rx: Correctly test for end of call queue) fixed a broken end-of-queue check in rx_GetCall, but it only fixed the RX_ENABLE_LOCKS version of rx_GetCall. The non-locks version (i.e. the LWP version) still had this bug. Fix it for the LWP case, to avoid some rare cases where an Rx call can get stuck in the incoming queue. Also remove the comment added by commit 170dbb3c (rx: Use opr queues), since we're fixing the mentioned problem. Change-Id: I5b96d97d9aba7bc4b383133b2136f949f3ed22bc Reviewed-on: https://gerrit.openafs.org/13880 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit aefc4c4f46e13f59b4cbe043e1a2a6f4ed99e076 Author: Mark Vitale Date: Tue Sep 17 15:14:44 2019 -0400 viced: consistently enforce host thread quota for ICBS(3) From time to time, the fileserver may issue potentially long-running RXAFSCB_* RPCs back to a host (client). If these are holding h_Lock_r (host->lock) while running, they may cause other service threads for the same host (client) to block. In order to prevent a given host from tying up too many service threads in this way, the fileserver enforces a quota limiting how many threads can be waiting for h_Lock_r on a particular host while waiting for one of the following RPCs to complete: - RXAFSCB_TellMeABoutYourself (TMAY) - RXAFSCB_WhoAreYou - RXAFSCB_ProbeUuid - RXAFSCB_InitCallBackState (ICBS) - RXAFSCB_InitCallBackState3 (ICBS3) Note: Although some of these RPCs are relatively lightweight, they may still experience network delays. This quota is enforced by calling h_threadquota() in h_Lookup_r and h_GetHost_r. The quota check is enabled for a given host by turning on host->hostFlags HWHO_INPROGRESS for the duration of the RXAFSCB_* RPC. The quota check is only needed, and should only be enabled, when the RPC is issued while h_Lock_r is held. However, there are a few paths to ICBS(3) where h_Lock_r is held but HWHO_INPROGRESS is not set. A delay in those paths may allow a host to consume an unlimited number of fileserver threads. One such path observed in a field report was SRXAFS_FetchStatus -> CallPreamble -> BreakDelayedCallBacks_r -> RXAFSCB_ICBS3. Instead, enable host thread quotas for all remaining unregulated ICBS(3) RPCs. Change-Id: I70b96055ff80d8650bdbaec0302b7d18a8f22d56 Reviewed-on: https://gerrit.openafs.org/13873 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a133f1b1e7eb605c36ac16a6ed115bef03e8a004 Author: Cheyenne Wills Date: Tue Sep 24 15:59:47 2019 -0600 Retire the AFS_PTR_FMT macro Originally '%x' was commonly used as the printf specifier for formatting pointer values. Commit 37fc3b01445cd6446f09c476ea2db47fea544b7d introduced the AFS_PTR_FMT macro to support platform-dependent printf format specifiers for pointer representation. This macro defined the format specifier as '%p' for Windows, and '%x' for non-Windows platforms. Commit 2cf12c43c6a5822212f1d4e42dca7c059a1a9000 changed the printf pointer format specifier from '%x' to '%p' on non-Windows platforms as well, so at this point '%p' is the printf pointer format specifier for all supported platforms. Since the AFS_PRT_FMT macro is no longer platform-dependent, and all C89 compilers support the '%p' specifier, retire the macro to simplify the printf format strings. Change-Id: I0cb13cccbe6a8d0000edd162b623ddcdb74c1cf7 Reviewed-on: https://gerrit.openafs.org/13830 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit 0f1e54c47c179bdbd69799170d9740e3e58e86db Author: Andrew Deason Date: Fri Aug 16 12:48:21 2019 -0500 Pass -shared when linking some shared libraries Currently, we use $(LT_LDLIB_shlib) to build most of our shared libraries. This invokes libtool, passing our various flags like PTH_LDFLAGS and PTH_CFLAGS (since all of our shared-library code is for pthreads). Notably, we do NOT pass the -shared flag; the -shared flag tells libtool to only build a shared library, and to not also build a static library (on systems where libtool supports building shared and static libraries simultaneously). Because of this, our LT_LDLIB_shlib invocations build both, which is reasonably correct for our per-module convenience libraries (that end up getting linked statically into the binaries that we install), but is not entirely correct for the public libraries that we install. Specifically, for ABI compatibility purposes, we must provide both shared and static libraries of the public libraries that we install, and since libtool on AIX does not build (or install) a static library at all with --mode-link unless -static is passed, we have separate rules to build the shared and static libraries for final installation. This can cause install errors with parallel make (on non-AIX systems), and possibly other errors, when we go to install the relevant library into TOP_LIBDIR. For example, in src/kopenafs, we have the following rules: ${TOP_LIBDIR}/libkopenafs.${SHLIB_SUFFIX}: libkopenafs.la ${LT_INSTALL_DATA} libkopenafs.la ${TOP_LIBDIR}/libkopenafs.la ${RM} ${TOP_LIBDIR}/libkopenafs.la ${TOP_LIBDIR}/libkopenafs.a: libkopenafs.a ${INSTALL_DATA} libkopenafs.a $@ The rule to install libkopenafs.so will invoke libtool to do the install, which will install libkopenafs.so, libkopenafs.so.X.Y, and libkopenafs.a (from .libs/libkopenafs.a, not the libkopenafs.a we built separately). If we are running the rule to install libkopenafs.a in parallel, it may fail with an error like so: /usr/bin/install -c -m 644 libkopenafs.a /home/buildbot/openafs/fedora26-x86_64/build/lib/libkopenafs.a /usr/bin/install: cannot create regular file '/home/buildbot/openafs/fedora26-x86_64/build/lib/libkopenafs.a': File exists make[3]: *** [Makefile:35: /home/buildbot/openafs/fedora26-x86_64/build/lib/libkopenafs.a] Error 1 Even without that error, this confusion means that the libkopenafs.a installed into TOP_LIBDIR may be the one from src/kopenafs/libkopenafs.a, or the one from libtool's src/kopenafs/.libs/libkopenafs.a; it depends on what order the rules are run. If those libraries are different, that could potentially cause all sorts of other problems. To avoid this, we can pass -shared to libtool when building our shared libraries. We used to pass -shared when building shared libraries, since -shared is almost always one our SHLIB_LDFLAGS set in src/osconf.m4. However, ever since commit 2c3a517e (Retire Makefile.shared), SHD_CFLAGS, SHD_LDFLAGS, and SHD_CCRULE have all been unused, and SHD_LDFLAGS was the only place where we used SHLIB_LDFLAGS. As a result, we never use SHLIB_LDFLAGS anywhere, and so we never pass -shared to anything. However, we cannot pass -shared to libtool when building all of our shared libraries, since we do need the static library for our per-module convenience libraries. For example, liboafs_rx.la has no separately-built static library (librx.a is for LWP, liboafs_rx.{so,a} is for pthreads), but liboafs_rx needs to be linked statically into all of our command-line tools. So to fix this, introduce a new linking rule, called LT_LDLIB_shlib_only, which causes the given library to be built only as a shared library (by giving -shared to libtool), and not as a static library. Update the build rules to use this new linking rule for the libraries that need it, and leave the others alone. Since the only use of LT_LDLIB_shlib_missing is also used for a public library (afshcrypto), also pass -shared in that rule. Also remove SHD_* and SHLIB_LDFLAGS variables, since they are unused. Change-Id: Ia9e040afa3819f1ff70d050a400fecb9624bb9ba Reviewed-on: https://gerrit.openafs.org/13786 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1de602aaada15df1008140784092c2a76a2613a1 Author: Yadavendra Yadav Date: Wed Aug 28 17:26:41 2019 +0530 aklog: avoid infinite lifetime tokens by default Currently we get tokens for infinite lifetime using aklog impersonate feature. Based on inputs from Ben, this was done for server to server tickets to be valid forever. However on 1.8.x we have other mechanisms that were usable for server-to-server authentication with strong enctypes, so we do not need to provide user level akimpersonate to generate tokens for infinite lifetime. For this we have added new option -token-lifetime , this can take values from 0 to 720 hours. If 0 is specified it means tokens will have infinite lifetime. By default 10 hours will be token lifetime for akimpersonate tokens. Change-Id: I8190be81771b34682cc000ac051888561dc63c2f Reviewed-on: https://gerrit.openafs.org/13828 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit dc99144da54d12e8a168c3dfb0255e2a40ba321f Author: Mark Vitale Date: Wed Jul 17 22:07:45 2019 -0400 rx: add missing CLEAR_CALL_QUEUE_LOCK to LWP rx_GetCall In all other places where we remove an rx_call from a queue, we also CLEAR_CALL_QUEUE_LOCK. This isn't necessary in the LWP (non-RX_ENABLE_LOCKS) version of rx_GetCall because rx_call does not have member call_queue_lock for LWP. However, for the sake of consistency for future maintainers, add a CLEAR_CALL_QUEUE_LOCK here as well; it is a no-op for LWP. No functional change is incurred by this commit. Change-Id: Ibbb005fa15dd517fc5282574d0d4abd74e937e02 Reviewed-on: https://gerrit.openafs.org/13695 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit fe6798d0d9e4df006ef96612b5c6e07fcc757b7e Author: Mark Vitale Date: Mon Sep 16 01:37:33 2019 -0400 SOLARIS: add autoconfig support for Studio 12.6 Add the canonical install path for Studio 12.6 to the autoconfig test. Change-Id: Id90ae1816845ed8aaa80be7b3d57846059084339 Reviewed-on: https://gerrit.openafs.org/13867 Tested-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e87c40f4546ee9c31b2eaad2a24be9fb9a0b25b1 Author: Mark Vitale Date: Thu Mar 14 23:15:29 2019 -0400 rx: clear call_queue_lock after removing call from queue The call_queue_lock is set to either rx_serverPool_lock or rx_freeCallQueue_lock, depending on whether an rx_call resides in the rx_incomingCallQueue or the rx_freeCallQueue, respectively. This value is used by rxi_ResetCall to lock the appropriate queue before removing a call. Therefore, the call_queue_lock should be cleared after a call is removed from a queue. This issue has no known external symptoms; however, repairing this is helpful to developers examining core files. Repair two instances where the call_queue_lock is not cleared. Change-Id: Id1d9ac8454c1e07c10766dffb2a2beac7122bf3e Reviewed-on: https://gerrit.openafs.org/13641 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 3be5880d1d2a0aef6600047ed43d602949cd5f4d Author: Andrew Deason Date: Mon Jul 8 14:49:23 2019 -0500 afs: Avoid panics in afs_InvalidateAllSegments Currently, afs_InvalidateAllSegments panics when afs_GetValidDSlot fails. We panic in these cases because afs_InvalidateAllSegments cannot simply return an error to its callers; we must invalidate all segments for the given vcache, or we risk serving incorrect data to userspace as explained in the comments. Instead of panicing, though, we could simply sleep and retry the operation until it succeeds. Implement this, retrying every 10 seconds, and logging a message every hour that we're stuck (in case we're stuck for a long time). When we retry the operation, do so in a background request, to avoid a somewhat common situation on Linux where we always get I/O errors from the cache when the calling process has a SIGKILL pending. Create a new background op for this, BOP_INVALIDATE_SEGMENTS. With this, the relevant vcache will be effectively unusable for the entire time we're stuck in this situation (avc->lock will be write-locked), but this is at least better than panicing the whole machine. Change-Id: Icdc58a94f0cd5857903836d94e5cf7814ce7e088 Reviewed-on: https://gerrit.openafs.org/13677 Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Tested-by: BuildBot commit 1c4e94da2a8fce9d79006ad6d6673d3d7de117d3 Author: Benjamin Kaduk Date: Fri Aug 9 07:59:44 2019 -0700 The interminable rework of afs_random() Commit f0a3d477d6109697645cfdcc17617b502349d91b restructured the operation on tv_usec to avoid using undefined behavior, but in the process introduced a behavior change. Historically (at least as far back as AFS-3.3), we masked off the low nybble (four bits) of tv_usec before adding the low byte (eight bits) of the rxi_getaddr() output. Why there was a desire to combine two sources of input for the overlapping four bits remains unclear, but restore the historical behavior for now, as the intent of commit f0a3d477d6109697645cfdcc17617b502349d91b was to not introduce any behavior changes. Change-Id: Icb8bc1edd34ca29c3094b976436177b18bfc8d1d Reviewed-on: https://gerrit.openafs.org/13759 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 276bd5c7f8a2ec7673d2ad084566203eb2055938 Author: Yadavendra Yadav Date: Wed Aug 28 17:04:31 2019 +0530 aklog: use any enctype in get_credv5 We currently always pass DES as the requested enctype to get_credv5_akimpersonate, but this means we will fail to use our service princ if we're using another enctype (say, AES) with rxkad-k5. To allow this to work with any enctype, just don't pass any requested enctypes, and just use the enctype inside the 'entry' returned to us from krb5_kt_get_entry. Remove all of the logic associated with the now-unused "allowed_enctypes" argument. Also remove the logic handling the case where "service_principal" is NULL (since no callers pass a NULL service_principal), to make it easier to take out the allowed_enctypes related code. Change-Id: Id11514ead26e15a287791c40509a001a1861df97 Reviewed-on: https://gerrit.openafs.org/13827 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 7a13bce2513baf5a3a61db94f3d88232241cea5b Author: Yadavendra Yadav Date: Wed Aug 28 16:43:35 2019 +0530 aklog: retry getting tokens for KRB5_KT_NOTFOUND error If we're creating tokens with -keytab and our AFS service principal is afs@, we'll first try creating tokens with afs/@ and krb5_kt_get_entry will fail with KRB5_KT_NOTFOUND. Since we do not retry for KRB5_KT_NOTFOUND error, we will not get tokens. So in order to get tokens for principal afs@ we should retry for KRB5_KT_NOTFOUND error. Thanks to jpjanosi@us.ibm.com for finding this issue and suggesting a fix. Change-Id: I8af9df9876973badc4631f509eebcda46d667cef Reviewed-on: https://gerrit.openafs.org/13826 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2a33a80f7026df6b5e47e42319c55d8b7155675a Author: Andrew Deason Date: Sun Jul 21 18:31:53 2019 -0500 rx: Introduce rxi_NetSend Introduce a small wrapper around osi_NetSend, called rxi_NetSend. This small wrapper allows future commits to change the code around our osi_NetSend calls, without needing to change every single call site, or every implementation of osi_NetSend. Change most call sites to use rxi_NetSend, instead of osi_NetSend. Do not change a few callers in the platform-specific kernel shutdown sequence, since those call osi_NetSend for platform-specific reasons. This commit on its own does not change any behavior with osi_NetSend; it is just code reorganization. Change-Id: I0a7eb39d85d4e542c2832bb40191ab49fb02d067 Reviewed-on: https://gerrit.openafs.org/13717 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6559297610de0f71c9050f3582d4d146e0cc1f3c Author: Yadavendra Yadav Date: Wed Aug 28 16:25:49 2019 +0530 aklog: Use HAVE_ENCODE_KRB5_ENC_TKT_PART for aklog impersonate In get_credv5_akimpersonate we use HAVE_ENCODE_KRB5_ENC_TKT which is not defined, due to this we always return -1 from this routine for non Heimdal case. We have a another define i.e HAVE_ENCODE_KRB5_ENC_TKT_PART which is defined if encode_krb5_enc_tkt_part function is present. In current code encode_krb5_enc_tkt_part is called from krb5_encrypt_tkt_part and krb5_encrypt_tkt_part is called from get_credv5_akimpersonate for non Heimdal case. So we should change HAVE_ENCODE_KRB5_ENC_TKT to HAVE_ENCODE_KRB5_ENC_TKT_PART. Also while we're here, add a declaration for the internal function encode_krb5_ticket, so we can build this newly-enabled code without warnings. Change-Id: I8f740e319ad279e284efaa407e6f92d0dc7a1bf6 Reviewed-on: https://gerrit.openafs.org/13825 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Andrew Deason commit d1e90b82ebb2685cbac3ecb3fd99136328b35357 Author: Stephan Wiesand Date: Fri Sep 6 13:35:02 2019 +0200 ptserver: Increase length limit of namelist, idlist, prlist, prentries An implementation limit of those lists was introduced in commit a0ffea098d8c5c5b46c6bf86a12d28d6e7096685 to prevent using unlimited amounts of memory in ptserver and the client. Subsequent reports indicate that the chosen limits are small enough to restrict functionality currently in use at some sites where membership lists exceed the current limit. Since this is just an implementation- defined limit and can freely change from release to release, increase the threshold by an order of magnitude to preserve functionality for existing deployments while still retaining some protection against attacker-controlled excessive memory allocation. Change-Id: I857bb3b697909668eb71224b631dfbb7e3c03d3c Reviewed-on: https://gerrit.openafs.org/13838 Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 54150f381de34d2a0c85ab15cf25801effd0c154 Author: Andrew Deason Date: Fri Aug 9 22:36:17 2019 -0500 LINUX: Check for -Wno-error=frame-larger-than= Commit cc7f942a (LINUX: Disable kernel fortuna large frame errors) added -Wno-error=frame-larger-than= to the CFLAGS for a file, but older gcc (like 4.3.4 from SLES 11.x) does not support this flag, causing a compiler error. To avoid this, add a configure check for -Wno-error=frame-larger-than=, and only use it if the compiler supports it. Thanks to mvitale@sinenomine.net for discovering the error. Change-Id: I5486d2d4711f2c301be1cb79f0aaad69a22e9d3a Reviewed-on: https://gerrit.openafs.org/13762 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit ddf7d2a7f4bfdcab238e791cb8c49bb803e76b09 Author: Cheyenne Wills Date: Fri Aug 9 13:25:26 2019 -0600 vlserver: initialize nvlentry elements after read Commit 7620bd33487207b348ed7aeba45f8d743132ba84 (vlserver: fix vlentryread() for old vldb formats) leaves the tail end of the serverNumber, serverParition and serverFlags arrays uninitialized since it only copies OMAXNSERVERS elements into arrays that have NMAXNSERVERS elements. Initialize the elements in the nvlentry server arrays that were not copied with BADSERVERID. Change-Id: I9533e3a40922c76d4179e0ada393103c2aa533dd Reviewed-on: https://gerrit.openafs.org/13755 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 83d9a86fb1af519a92ffc0d8f6d73cddded8f6f5 Author: Andrew Deason Date: Mon Aug 26 22:03:23 2019 -0500 opr: Include procmgmt_softsig.h for WINNT On WINNT, procmgmt_softsig.h exists to implement our opr softsig routines in terms of procmgmt routines. Any time we include opr/softsig.h in cross-platform code, we currently must also include afs/procmgmt_softsig.h so we can build on WINNT. We currently do not do this in src/xstat, causing build failures on WINNT. To avoid this, just make opr/softsig.h include procmgmt_softsig.h itself, so all of the opr/softsig.h users don't have to remember to do this. Link xstat_*_test against procmgmt, so linking will succeed for those tools. Change-Id: I2dc8226d438be25cdccbe96474220d7c81ae25b9 Reviewed-on: https://gerrit.openafs.org/13824 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit ab8b28540ef17d67db02d5dbcb7585443c164e45 Author: Yadavendra Yadav Date: Sat Aug 10 02:54:38 2019 +0530 aklog: Free client/server princs in get_credv5 Inside get_credv5, client_principal is static so the first time get_credv5 runs we'll allocate memory for it, and on subsequent calls we'll reuse the same value. However, if we call get_credv5_akimpersonate, we'll free client_principal and never change what client_principal points to. If we need to call get_credv5 again (because we need to retry getting creds), we'll reuse the old value for client_principal, but since it points to free memory we'll segfault or cause other problems. To avoid this, change get_credv5 so we allocate the client and server principals on each invocation of get_credv5 and free them before returning from get_credv5. Since we free the client and server principals inside get_credv5, remove freeing the client and server principals inside get_credv5_akimpersonate. Change-Id: Ie263aa2c03efc75e818d9007347dca9e42380dd4 Reviewed-on: https://gerrit.openafs.org/13761 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 2336164d1bf63980419d3a870f908f1f384fdfc0 Author: Andrew Deason Date: Sun Jul 21 17:02:34 2019 -0500 afs: Actually free resources during warm shutdown Currently, the shutdown_*() code paths for several subsystems only free the memory for that subsystem for "cold" shutdowns, and not for "warm" shutdowns. This means the memory gets leaked during a "warm" shutdown, since we never free these resources anywhere else. Specifically, this happens in shutdown_bufferpackage, shutdown_AFS, and shutdown_osinet. To avoid these leaks for warm shutdowns, just move the afs_cold_shutdown check around a little, so we free the relevant items in either codepath. Change-Id: I748311784f512b3e2f25bdcaa6629108a5790212 Reviewed-on: https://gerrit.openafs.org/13716 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 130a92214cc0b9a8f4ea24a3dcd3ed04575e3c4e Author: Yadavendra Yadav Date: Sat Aug 10 02:41:01 2019 +0530 aklog: free kbr5_creds before returning from rxkad_get_token rxkad_get_ticket allocates 'v5cred' which should be freed when we return from rxkad_get_token. Change-Id: I09b20781f0856ab8e230e0af271e9d0c58fee90c Reviewed-on: https://gerrit.openafs.org/13760 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit fbdf126df02eacc0442d80cc5bca0e16ddafe55e Author: Andrew Deason Date: Sun Aug 25 19:30:30 2019 -0500 rx: Convert rx_FreeSQEList to rx_freeServerQueue Currently, rx_serverQueueEntry structs are placed on the rx_FreeSQEList linked list instead of being freed directly, but managing this list is done a bit oddly. The first field in struct rx_FreeSQEList is an opr_queue, but we don't use the opr_queue_* macros to manage the list. Instead, we just assume the first field in a struct rx_serverQueueEntry is a pointer that we can use to link entries together. This is currently true and works, but it's an odd way of maintaining such a list, and of course would break if we ever moved the fields around in struct rx_serverQueueEntry. Make this code more closely follow the normal way of managing opr_queue lists, by using opr_queue_* macros, and changing rx_FreeSQEList to be an opr_queue itself. Change the name to rx_freeServerQueue to ensure all callers are changed, and to match the naming convention for the other linked lists for rx_serverQueueEntry structs. Also move rx_freeServerQueue and its associated lock freeSQEList_lock to be declared static inside rx.c, since neither are referenced outside of rx.c. The general idea for this commit suggested by kaduk@mit.edu. Change-Id: I2ea15af1ad3228fa5fdf9f323e9394838fba4bac Reviewed-on: https://gerrit.openafs.org/13811 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 3bc03e7a5f8ef521e71a30cb8e66e07e2d1b4605 Author: Andrew Deason Date: Sun Jun 23 17:48:53 2019 -0500 libafs: Create debug KMODDIR for FBSD debug inst Commit 99418024 (libafs: Create $(DESTDIR)$(KMODDIR) on FBSD inst) made it so we create the kmod installation dir before copying our module into it. However, if we build a 'debug' variant of our module, the FreeBSD build process also installs debug symbols in a different directory, ${DESTDIR}${KERN_DEBUGDIR}${KMODDIR}, which may not exist. So do the same thing for that dir too, if --enable-debug-kernel is turned on, so the build still works. To do this, introduce the LIBAFS_REQ_DIRS var, to make it easier to keep track of which dirs we may need to create. Change-Id: Id1ad72f6c19d5949d38ee97334b4014ae6ef16ad Reviewed-on: https://gerrit.openafs.org/13690 Reviewed-by: Benjamin Kaduk Tested-by: Andrew Deason commit f9e413eaa280377b7dca0214fe79668459035098 Author: Andrew Deason Date: Mon Aug 26 21:17:30 2019 -0500 xstat: Define AFS_PTHREAD_ENV on WINNT Commit 6b67cac4 (convert xstat and friends to pthreads) converted the xstat utilities to pthreads, but we still need to explicitly pass AFS_PTHREAD_ENV on WINNT to enable various pthread-specific code paths. So give -DAFS_PTHREAD_ENV for our objects in this dir. Change-Id: I222b99399a5fad3df528be2bc31823eb8bc52c62 Reviewed-on: https://gerrit.openafs.org/13823 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 7a76f4dc00984d42b0535a8edbedee034ada896f Author: Andrew Deason Date: Mon Aug 26 20:33:58 2019 -0500 WINNT: Link tbutc against mtafsutil.lib tbutc uses pthreads, not LWP, so link it against mtafsutil.lib (a pthread library), and not afsutil.lib (an LWP library). Change-Id: Id29888d88bfdd9585e017217a9951eb645c65336 Reviewed-on: https://gerrit.openafs.org/13822 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit c3716b3d7e32f47b084657e163b029e9f1756fa4 Author: Andrew Deason Date: Mon Aug 26 19:34:19 2019 -0500 rx: Export rx_GetCallStatus Commit 59d3a8b8 (vos: restore status information to 'vos status') added the function rx_GetCallStatus to Rx, and used it in the volserver, but didn't add the function to our .sym and .exp files, causing a linker error on at least WINNT. Add the function to the relevant .sym/.exp files, so we can link on all platforms. Change-Id: I859ac6d04d8a21eb6f8b4ba3f3720ca318e91334 Reviewed-on: https://gerrit.openafs.org/13820 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 90117793ca3000a20cb3bff8601e9f8ae56fb5db Author: Andrew Deason Date: Mon Aug 26 18:46:21 2019 -0500 WINNT: Do not link ptclient.obj in libafsauthent ptclient.c contains a stub definition for osi_audit, but audit.c already contains a real definition for osi_audit. libafsauthent doesn't seem to actually need anything from ptclient (and the Unix libafsauthent doesn't appear to use it), so just don't include ptclient when linking libafsauthent. Change-Id: I4172b80138e5ea121fc3ae2689cf4ed23c81e35b Reviewed-on: https://gerrit.openafs.org/13819 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit e4b689e8c7cb39b72854dd38b6a92134591c8bca Author: Andrew Deason Date: Mon Aug 26 18:14:48 2019 -0500 WINNT: Link butc against audit Since commit c43169fd (OPENAFS-SA-2018-001 Add auditing to butc server RPC implementations), butc references symbols from audit. So add audit to our libraries to link against, so we can link butc on WINNT. Change-Id: I65f4d87085a8917c9b11d7c27b8e3902cd2a1c1c Reviewed-on: https://gerrit.openafs.org/13818 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit f895a9b51671ffdc920fd9b4284337c5b737a0ef Author: Andrew Deason Date: Mon Aug 26 17:40:56 2019 -0500 WINNT: Make opr_threadname_set a no-op We don't supply an implementation for opr_threadname_set for WINNT; don't pretend that we do. Change-Id: Ifa8042253d0aa10f365356d93cea3fad4686371a Reviewed-on: https://gerrit.openafs.org/13817 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 75a5c1b06e44bb6207cee7bd653cda688869aade Author: Andrew Deason Date: Mon Aug 26 16:54:55 2019 -0500 rxkad: Improve ticket5 import from Heimdal The current method of importing our ticket5 code from Heimdal has a few issues: - The der-protos.h file we generate contains numerous function prototype declarations that looks like this: ret-type func(parm-list, type */* comment */); which cause numerous warnings on WINNT, because the '*/*' sequence looks like the end of a nonexistent comment. This was previously fixed manually in commit 8b5d3a73 (rxkad: remove warnings from der-protos.h), but each time we regenerated our ticket5 code, the same thing would happen. - We manually insert an include for "asn1_err.h" in our v5der.c, and the v5gen.c we pull in has an include for inside it. During a WINNT build, these can pull in different asn1_err.h files (one from us, and one from the "Heimdal compatibility layer SDK" or anything else in our include paths). Since the asn1_err.h in our tree doesn't have an include guard, the code for both gets included, which can cause various problems. - Our current asn1_err.h file that we include is ultimately generated by the awk-based compile_et from e2fsprogs, not the C-based compile_et from Heimdal. This likely happened by accident because the Heimdal build system uses the system compile_et by default. This flavor of compile_et generates arguably inferior comerr-based header files (they lack include guards, and they use #define constants instead of enums). Fix these issues with some edits to our README.v5 script: - Apply a simple sed filter when we pull in der-protos.h to change '*/*' into '* /*', to remove the relevant warnings. - Instead of inserting an include for asn1_err.h into v5der.c in our import script, just put it in ticket5.c, making it easier to see and edit. Change this to so it uses the same asn1_err.h as in v5gen.c. - Add a note to run the Heimdal build with COMPILE_ET=no, so the Heimdal build system uses the in-tree compile_et, instead of whatever is on the relevant system. With these changes, redo the Heimdal import from the same version of the Heimdal codebase. Change-Id: I01e06f2799f1c828b8224c3425079b313ffb5b6b Reviewed-on: https://gerrit.openafs.org/13816 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit b9b5385e6a04dcacd180f33e39495c7909fe4df3 Author: Andrew Deason Date: Mon Aug 26 16:08:31 2019 -0500 kauth: Move COUNT_REQ to beginning of block Commit b604ee7a (OPENAFS-SA-2018-002 kaserver: prevent KAM_ListEntry information leak) added a memset in kamListEntry before COUNT_REQ, but COUNT_REQ declares a local variable. This breaks the WINNT build, because we must declare variables at the beginning of a block. To fix this, just swap the two lines. Change-Id: I47eb61e6f95c2e38c619e90c8f093de325892c63 Reviewed-on: https://gerrit.openafs.org/13815 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 1534302d4489d2ba1c421077cdedb0187a2c1722 Author: Andrew Deason Date: Mon Aug 26 14:34:45 2019 -0500 rxgk: Add NTMakefile to install headers Commit 83eec909 (Implement afsconf_GetRXGKKey) added a reference to rx/rxgk_types.h inside cellconfig.p.h. Nothing ever added src/rxgk WINNT makefiles, so that include file is never installed into place, breaking the WINNT build when code tries to include cellconfig.h. To fix this and other code that needs rxgk header files, create an NTMakefile for src/rxgk, which just exists to install headers into place. Call it from the top-level NTMakefile right before copying in the auth headers. Change-Id: Id111479f55b4c330640e80d167a8af664fe3622e Reviewed-on: https://gerrit.openafs.org/13814 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 2df2de06e5df64f5666316b14d67de7e7c5dae70 Author: Andrew Deason Date: Sun Jul 21 21:15:11 2019 -0500 rx: Avoid leaking 'sq' in libafs rx_GetCall Currently, in rx_GetCall when building for the kernel, if we notice that we're shutting down (that is, if afs_termState has reached AFSOP_STOP_RXCALLBACK), we return immediately. However, 'sq' may have been allocated much earlier in this function, and if we return here, we never free 'sq' or set it on any list. Returning immediately is also unnecessary here; if we just 'break' out of our wait loop, 'call' will still be NULL, and we'll break out of the outer loop, and go through the rest of the function like normal. The only difference is, if we 'break' instead of 'return'ing, we'll put 'sq' on the free list before returning. So, just 'break' out of the loop instead of returning, so we put 'sq' on the free list and avoid leaking its memory. Change-Id: Ibb2f4e697a586392f76ccdbbefdae8d75740f6fe Reviewed-on: https://gerrit.openafs.org/13715 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 9eeb3ec09f5421ceab2be415a193bb3a3c44925f Author: Andrew Deason Date: Mon Aug 26 13:13:28 2019 -0500 WINNT: Build bubasics before audit Commit 9ebff4c6 (OPENAFS-SA-2018-001 audit: support butc types) made src/audit require the butc.h header, and updated Makefile.in to reflect this. However, this dir is also built on WINNT, and the NTMakefile was not updated to reflect this dependency. As a result, we might fail to build src/audit on WINNT, since butc.h may not exist yet, and we get an error like: cl [...] /c audit.c audit.c cl : Command line warning D9025 : overriding '/W4' with '/W3' audit.c(27) : fatal error C1083: Cannot open include file: 'afs/butc.h': No such file or directory NMAKE : fatal error U1077: 'C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\cl.EXE' : return code '0x2' To fix this, move 'bubasics' to be made before 'audit' in NTMakefile, so butc.h is available when we build 'audit'. Change-Id: I2053db7cd95353cf6b703b4033239810338890aa Reviewed-on: https://gerrit.openafs.org/13813 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 8bb9ae944ec7e101b6c8133fdb867c847164b5a7 Author: Andrew Deason Date: Wed Aug 21 12:04:45 2019 -0500 afs: Introduce afs_FreeFirstToken Change afs_FreeOneToken to unlink the given token from its container, instead of requiring its caller to do so. Rename the function to afs_FreeFirstToken, to help indicate the change in behavior. Also, while we are changing afs_FreeTokens to accommodate this change, simplify afs_FreeTokens a little, making it resemble afs_DiscardExpiredTokens a bit more. [kaduk@mit.edu: add note about dead store elimination] Change-Id: I0cf9d8b94236c736001a38cccfa7fdfff9f3e609 Reviewed-on: https://gerrit.openafs.org/13807 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 0a39efee224e8d4431ae79281ca353a7ba6fdce4 Author: Andrew Deason Date: Sun Jul 14 17:31:30 2019 -0500 FBSD: Use ucontext for FreeBSD 10+ on amd64 Currently, running any LWP program on recent FreeBSD on amd64 causes (or can cause) a SIGBUS very quickly. This is possibly because our stack management code in LWP only ensures our stacks are 4 or 8-byte aligned in most cases (except DARWIN, which gets 16-byte-aligned stacks), according to the value of STACK_ALIGN. The amd64 ABI mandates that stacks be 16-byte-aligned, and some function calls assume that this is followed, causing a SIGBUS when it is not. FreeBSD on amd64 currently uses process.amd64.s for its savecontext() implementation, which does not do any checking or fixup of the stack alignment. This behavior has been observed on amd64 with FreeBSD 11 specifically, but it probably happens on any FreeBSD release when using clang. FreeBSD switched to clang as the default compiler with FreeBSD 10, so this probably occurs with FreeBSD 10 and newer. We could perhaps try to fix this by changing our stack management code, but we can also avoid most of this nonsense by just using ucontext instead of our custom assembly code. So, do that, by setting USE_UCONTEXT for FreeBSD 10+. Also enable the same 'stackvar'-based workaround in savecontext() as Linux uses, since otherwise 'topstack' appears to always be NULL, and triggers our stack overflow checks. Note that while LWP use is deprecated, as of this commit many small utilities (like 'fs') are still linked to LWP, and so are unusable without a fix like this. Change-Id: Ie8e928bd71e7f6e9c0fb1379259c55527b6ccdf3 Reviewed-on: https://gerrit.openafs.org/13691 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 8f9c92a888df7b2fd61a3e84aaf1d2c96a8b10dd Author: Andrew Deason Date: Sun Jul 28 15:03:43 2019 -0500 FBSD: Set KERNBUILDDIR for --with-bsd-kernel-build Currently, specifying --with-bsd-kernel-build during configure causes us to set BSD_KERNEL_BUILD, which sets KBLD in MakefileProto.FBSD.in, but nothing ever uses KBLD. This means that when we use --with-bsd-kernel-build, we don't actually build against the configuration for that kernel, which can result in a libafs.ko that cannot be loaded or causes other errors. Specifically, if trying to build for a VIMAGE kernel, the kernel complains when trying to load libafs: [...] kernel: link_elf_obj: symbol in_ifaddrhead undefined [...] kernel: linker_load_file: Unsupported file type The FreeBSD module build system looks for KERNBUILDDIR for an alternative build, which it uses to pull in opt_global.h and other required pieces from the build tree. So just specify KERNBUILDDIR if we have one. At the same time, avoid setting our default value for BSD_KERNEL_BUILD for FBSD when the calculated dir doesn't exist. At least for the default GENERIC kernel on FreeBSD 11.2-RELEASE, there may not be a build dir on the running machine, and so setting BSD_KERNEL_BUILD to the calculated value causes the build to fail when it doesn't exist. Change-Id: Ib3079354f9f6dba13970de5308bbcecaf9b35059 Reviewed-on: https://gerrit.openafs.org/13746 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 1effc3517fdb4b4653d47c59bf67076567209324 Author: Tim Creech Date: Sun Mar 5 18:18:01 2017 -0500 FBSD: Call CURVNET_SET/CURVNET_RESTORE for VIMAGE In commit 9703b023 (FBSD: VIMAGE support), we changed a couple of our variable references to their V_* equivalents, to accommodate kernels with VIMAGE turned on. This allows us to build, but causes us to crash whenever we hit that code when VIMAGE is enabled, because the relevant macros reference 'curvnet', which is NULL outside of networking code. What we're supposed to do is to set 'curvnet' before entering networking code by calling 'CURVNET_SET(xxx)', and reset it afterwards by calling 'CURVNET_RESTORE()'. We must make exactly one _RESTORE call for each _SET, and they are supposed to be run at the same level of scope. So to avoid the crashes, make the relevant CURVNET_* calls whenever we look at networking info. We currently only do this in a few places: - In afs_SetServerPrefs, to try to detect if a given server address is in the same network as one our local interfaces (V_in_ifaddrhead) - In rxi_GetIFInfo, for some MTU-related info (V_ifnet) - In rxi_FindIfnet, for some MTU-related info (ifa_ifwithnet) As for what vnet we actually set 'curvnet' to, we could set it to the vnet of the current thread (TD_TO_VNET(curthread)), or we could set it to the vnet of an associated network object (a socket, an interface, etc). Since all of our network-related code goes through Rx, in this commit we set curvnet to the vnet of the Rx socket (rx_socket->so_vnet). Note that VIMAGE is optional in 11-RELEASE, but is turned on by default in 12.0-RELEASE. For more information, see: https://wiki.freebsd.org/VIMAGE/porting-to-vimage [adeason@dson.org: Reworded commit message; moved some code around.] Change-Id: If631b8942d7ee5cfe38a8f0c32b282d015f0bf35 Reviewed-on: https://gerrit.openafs.org/12580 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 1d2a1002bd1bc8d82c05399c06836ede83f9eeea Author: Andrew Deason Date: Wed Aug 21 11:48:53 2019 -0500 afs: Update style in afs_tokens.c Fix a few style nits and other minor edits in afs_tokens.c. Mark a few functions 'static' that are not referenced outside of that file. Change-Id: Icdae1adb8282f96c7ccc6d4d053216b360adc38e Reviewed-on: https://gerrit.openafs.org/13806 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c6eb9375ffa081329d69b9a36b40b8edb199990a Author: Andrew Deason Date: Wed Aug 21 12:37:06 2019 -0500 rx: Update style in rx_opaque.c Fix a few style nits in rx_opaque.c Change-Id: Ia03ba3f95911b791c63b3a07f2ab887063da36a7 Reviewed-on: https://gerrit.openafs.org/13805 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 339167ef1fda899655969f4572ff95271dfdb7cf Author: Andrew Deason Date: Wed Jul 10 15:14:28 2019 -0500 Remove dead code There is a perhaps-surprisingly large amount of code disabled behind directives like '#if 0', '#ifdef notdef', and '#ifdef notyet'. At best, this code is clutter, and at worst some of it is confusing/outdated, and/or confusingly nested inside other preprocessor conditionals. Sometimes this disabled code shows up when grepping the tree, and causes a nuisance when refactoring related areas of code. Get rid of all of it. If anyone ever wants this code back, it can always be restored by reverting portions of this commit. Also delete some comments that clearly refer to the disabled code, and in some cases, adjust the adjacent comments to make sense accordingly. This commit doesn't touch any files in src/external/. Change-Id: If260a41257e8d107930bd3c177eddb8ab336f0d1 Reviewed-on: https://gerrit.openafs.org/13683 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 0d6a43e0699fca00bff87c5e16c901e4579d2285 Author: Benjamin Kaduk Date: Sat Apr 12 17:24:04 2014 -0400 Remove a couple more uses of libafsauthent.a Change-Id: Ic49d2f44293c1fbe909b61d7f4c9ac7d5a3636bb Reviewed-on: https://gerrit.openafs.org/11095 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 1b0bb8a7fcbd69d513ed30bb76fd0693d1bd3319 Author: Andrew Deason Date: Thu Jul 18 22:56:48 2019 -0500 LINUX: Make sysctl definitions more concise Our sysctl definitions are quite verbose, and adding new ones involves copying a bunch of lines. Make these a little easier to specify, by defining some new preprocessor macros. Change-Id: I45fc8122b18587f42f52b3d41a1f4c6937ec0f8a Reviewed-on: https://gerrit.openafs.org/13700 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 6e0f1c3b45102e7644d25cf34395ca980414317f Author: Andrew Deason Date: Wed Jul 10 12:42:54 2019 -0500 LINUX: Honor --enable-checking for libafs When we build the kernel module on LINUX, we don't pass in any of our CFLAGS, since the Linux buildsystem itself figures out what flags are needed. However, this means that we don't pass in -Werror when --enable-checking is turned on, so warnings may not cause the build to fail. To fix this, create a new autoconf variable, called CFLAGS_WERROR, that only contains -Werror if --enable-checking is turned on. We then pass that into the Linux module buildsystem, so -Werror is given to the compiler when building our module. Change-Id: I0f1ec8b1a8096d10642c67b86314604c20ea2c60 Reviewed-on: https://gerrit.openafs.org/13682 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 13acb6fbefd6c4f4af951270ca07a1a5541052fa Author: Andrew Deason Date: Sun Jul 21 19:21:44 2019 -0500 afs: Free afs_thiscell during shutdown Currently, afs_thiscell can be allocated (via strdup) during client startup, but is never freed. Free it in shutdown_cell() to avoid leaking the memory. Change-Id: I77954ef35f949c8a638ba15615148ab784f7f48f Reviewed-on: https://gerrit.openafs.org/13714 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 82118acb6ed4f6fb90f3d864f4045d9c6bc2a55c Author: Andrew Deason Date: Sun Jul 21 17:58:48 2019 -0500 afs: Introduce shutdown_dynroot() Add a shutdown sequence for dynroot, which frees the afs_dynrootDir and afs_dynrootMountDir blobs, if they exist. Otherwise, we can leak the memory allocated for those blobs. Change-Id: I80fe41a0fcacbd272677ff778cd4ba51399f32f9 Reviewed-on: https://gerrit.openafs.org/13713 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ad1fe5e1a825a3b3f88c04fd84613e4105206443 Author: Andrew Deason Date: Sun Jul 14 22:53:39 2019 -0500 FBSD: Remove unnecessary explicit osi_fbsd_alloc AFS_KALLOC is already defined to be osi_fbsd_alloc on FBSD, so this extra #ifdef here is completely unnecessary. Remove it. Do the same for AFS_KFREE/osi_fbsd_free. Change-Id: I3e42ec433a732402cc9de9ba9c035774ec29c2a5 Reviewed-on: https://gerrit.openafs.org/13708 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d13b647aa392e1d802be1023930a8e1a07fb11ab Author: Andrew Deason Date: Sat Jul 20 23:09:27 2019 -0500 FBSD: Give 0 'rootrefs' to vflush on unmount Currently, in afs_unmount, we give vflush a 'rootrefs' arg of 1, indicating that we hold 1 reference on the root vnode. But ever since commit 6eb1088a (freebsd: properly track vcache references), we drop the ref for the root vnode at the beginning of this function. What happens currently in afs_unmount for a normal successful umount is something like this (at least, on FreeBSD 11.2-RELEASE): - We afs_PutVCache the afs_globalVp vcache, reducing its v_usecount and v_holdcnt to 0, and afs_globalVp is set to NULL. - vflush calls afs_root() to get the root vnode, which sees that afs_globalVp is NULL, and so calls afs_GetVCache for the root fid and returns it (and sets afs_globalVp to that vcache), with a v_usecount of 1. - vflush tries to vgonel() all of our vnodes, which calls our afs_vop_reclaim, which calls afs_FlushVCache(). For the root vnode specifically, vflush() sees that v_usecount is nonzero, and so skips calling vgonel() at first, but later calls vgone() on it specifically because we gave a nonzero 'rootrefs'. The resulting afs_FlushVCache() for the root vnode fails, because the root vnode's v_usecount is still 1. Since a failure from afs_vop_reclaim would cause a panic, we just log a warning and try to continue on anyway. - vflush() calls vrele() on the root vnode, right before returning. All of this allows the unmount to proceed, but this means that most of afs_FlushVCache() doesn't actually run for the root vcache, and it means we always log a warning like this on unmount: afs_vop_reclaim: afs_FlushVCache failed code 16 [...] In addition, this means that setting afs_globalVp at the beginning of afs_unmount() is largely pointless, since it gets set to a vcache again near the beginning of vflush(). To avoid all of this, stop lying to vflush about how many references to the root vnode we hold, and just say that we hold 0 references. Change-Id: Ib434c5fc48e67c3863fcad41279c3d9e0e0b8c2b Reviewed-on: https://gerrit.openafs.org/13709 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit f5acf1b1bfe940faf0a6f4bd11c55d6c90f60242 Author: Tim Creech Date: Sun Mar 5 18:17:23 2017 -0500 FBSD: Handle F_UNLCK in VOP_ADVLOCK When a_fl->type is F_UNLCK, FreeBSD gives our VOP_ADVLOCK an a_op of F_UNLCK, instead of F_SETLK like we expect. This causes afs_lockctl to return EINVAL, since F_UNLCK isn't a normal fcntl lock op, and so userspace requests to unlock fcntl-style locks always fail. This can be seen, for example, when trying to use sqlite3 to access a database that lives in afs. This F_UNLCK behavior in FreeBSD seems a bit peculiar, but has been around effectively forever (since 4.4BSD-Lite). So just work around it. [adeason@dson.org: minor style adjustments and commit message/comment rewording.] Change-Id: I8bfaff9274e40761aa291930430a08b83b524d1b Reviewed-on: https://gerrit.openafs.org/12579 Reviewed-by: Tim Creech Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ee7019a7630d01f29fecebd89ca69ad8a37e24e2 Author: Andrew Deason Date: Mon Jul 15 16:24:10 2019 -0500 afs: Fix a few ARCH/osi_vcache.c style errors Most of the ARCH/osi_vcache.c implementations were defining functions like: void osi_foo(args) { /* impl */ } But our prevailing style is: void osi_foo(args) { /* impl */ } Fix them to follow our prevailing style, and fix a couple of the more obvious errors with identation and goto label. Change-Id: Ie752ee67aa6acfec3bf9a28d7da41151f95fbbf6 Reviewed-on: https://gerrit.openafs.org/13699 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cba7c62f56f2a98b843fe6f83e22bc03f832e9aa Author: Andrew Deason Date: Mon Jul 15 17:51:41 2019 -0500 afs: Check for invalid afs_fakestat_enable values The only valid values for afs_fakestat_enable right now are 0, 1, and 2. Check if the given value actually matches one of those, in case we have mismatched libafs/afsd versions, and future code adds new values. Return EINVAL and log a message if we're given an unknown value. Change-Id: I36ad4263e7e3ab311f6edb97a9c48edc035f6753 Reviewed-on: https://gerrit.openafs.org/13698 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ca472e66fb97572784be429ec264e0e38d1d546b Author: Andrew Deason Date: Tue Aug 14 15:54:29 2018 -0500 LINUX: Turn on AFS_NEW_BKG AFS_NEW_BKG allows libafs to request the afsd background daemon processes to do certain userspace operations. This is currently only used on DARWIN for handling EXDEV file moves, but this framework can be useful on LINUX, as well. So, turn it on for LINUX. This commit does not introduce any new background operations for LINUX to actually use; we're just turning on the new framework. Future commits will introduce new background operations. Change-Id: I5d371f85b87899ce6ab2d5e520954a893679d37e Reviewed-on: https://gerrit.openafs.org/13284 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d29ae454adfd135ca434d6d94968b5929efc8e46 Author: Andrew Deason Date: Wed Jul 10 16:24:11 2019 -0500 afs: Remove reference to nonexistent function The real lie here is that TellALittleWhiteLie exists in afs_vcache.c. That has never been true, ever since OpenAFS 1.0. Change-Id: I5ba121db5b4f0bbe7a37054a3d2d8c46f6c49c0a Reviewed-on: https://gerrit.openafs.org/13697 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 3b0a9ff6af68c88d656aefe2242f12a7a9e04969 Author: Andrew Deason Date: Wed Jul 10 12:42:44 2019 -0500 afs: Remove useless afs_GetVCache arguments The 'avc' argument in afs_GetVCache has never been used, all the way back to OpenAFS 1.0. The 'cached' argument was set correctly, but none of its callers ever looked at the result of 'cached'. Remove these useless arguments. afs_LookupVCache and afs_GetRootVCache also had the same 'cached' argument, which was also never used by callers. Remove it for those, as well. Change-Id: I3536259f26536acc02fbb058787f417bf0f50b9a Reviewed-on: https://gerrit.openafs.org/13681 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2b7af1243f46496c0b5973b3fa2a6396243f7613 Author: Cheyenne Wills Date: Fri Aug 9 14:25:03 2019 -0600 LINUX 5.3.0: Use send_sig instead of force_sig Linux 5.3.0 commit 3cf5d076fb4d48979f382bc9452765bf8b79e740 "signal Remove task parameter from force_sig" (part of siginfo-linus branch) changes the parameters for the Linux kernel function force_sig. See LKML thread starting at https://lkml.org/lkml/2019/5/22/1351 According to the LKML discussion and the above commit message force_sig is only safe to deliver a synchronous signal to the current task. To send a signal to another task, we're supposed to use send_sig instead, which has been available since at least linux 2.6.12-rc12. Currently, rx_knet calls force_sig to kill the rxk_ListenerTask. With the Linux 5.3.0 kernel, this module fails to compile due to the above noted changes. Replace the force_sig call with send_sig. In order to use send_sig, the rxk_listener thread must allow SIGKILL and during shutdown (umount) SIGKILL must be unblocked for the rxk_listener thread. Note that SIGKILL is initially blocked on rxk_listener and is only unblocked when shutting down the thread. Having the signal blocked is sufficient to prevent unwanted signals from reaching the rxk_listener thread during normal operation. Change-Id: I0c31d66f4ecd887ff9253ba506565592010e8bcb Reviewed-on: https://gerrit.openafs.org/13753 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 02d82275c17284d04629282aa374bb39f511c989 Author: Cheyenne Wills Date: Thu Aug 8 16:53:13 2019 -0600 LINUX 5.3.0: Check for 'recurse' arg in keyring_search Linux 5.3.0 commit dcf49dbc8077e278ddd1bc7298abc781496e8a08 "keys: Add a 'recurse' flag for keyring searches" adds a new parameter to Linux kernel keyring_search function. Update the call to keyring_search to include the recurse parameter if available. Setting the parameter to true (1) maintains the current search behavior. Change-Id: I54b7ed686bf1fb4c42789e5d251ae76789e9fc88 Reviewed-on: https://gerrit.openafs.org/13752 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason commit e3dbd8a5886734f6390126e155cc259b0de5af51 Author: Cheyenne Wills Date: Thu Aug 8 12:07:51 2019 -0600 rxkad: ticket5.c fix typo in #if statement commit 98ca332c4a5ac9e5687fb4fe21b350134bc74d1b (rxkad: v5der.c format truncation warnings) contains a typo in the test for clang (_clang instead of __clang__) Correct the typo in the #if statement to test for __clang__ Change-Id: I0dbe603072740fcf2fb2cb2cea464a48009fee74 Reviewed-on: https://gerrit.openafs.org/13754 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cc7f942a81a3bbdc8154f511d054a2a018b39ce5 Author: Andrew Deason Date: Wed Jul 10 23:40:55 2019 -0500 LINUX: Disable kernel fortuna large frame errors The rand-fortuna.c we get from Heimdal's hcrypto currently sometimes causes a warning on LINUX when building in the kernel, because fortuna_reseed() has a (potentially) large stack size: .../src/libafs/MODLOAD-.../rand-fortuna-kernel.c:549:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Currently this does not cause the build to fail, even with --enable-checking, since -Werror is not given in the CFLAGS when building our kernel module. But if -Werror is passed in CFLAGS (in a future commit), this would cause the build to fail. Since this is an external source file, we cannot change it directly. At least for now, just prevent this warning from breaking the build by passing -Wno-error=frame-larger-than= into the CFLAGS for that file. Change-Id: Ieefdf2dbc318fdcd559435e5f329eef5cf9bb9ba Reviewed-on: https://gerrit.openafs.org/13684 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bf24b301a10dcb5710a98e58252213bd72c6f352 Author: Cheyenne Wills Date: Fri Aug 2 10:31:13 2019 -0600 restorevol: replace snprintf with asprintf GCC is generating format-truncations warnings. With newer levels of gcc (e.g. gcc8) and --checking-enabled these warnings result in errors and failed builds. In addition clang8 static analysis tools are reporting memory leaks. Replace snprintf with asprintf and eliminate some of the large work buffers that are being placed on the stack. In order to correct some of the format-truncation errors the size of the buffers grew significantly (e.g. gcc is reporting the need to resize some of the buffers from 256 bytes to 4K in order to eliminate the warnings). Ensure allocated work buffers are freed before function return. Obtained a clean build with gcc9/clang8 with --enable-checking and a clean scan-build report with clang8. Change-Id: Ie8e22fdff2e0ba6494b1b449f413ecbe38f367bd Reviewed-on: https://gerrit.openafs.org/13494 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e6b97b337bc97fdb1c8e4f1a0572c62dfc82d979 Author: Andrew Deason Date: Mon Jul 29 18:17:21 2019 -0500 afs: Skip IsDCacheSizeOK for CDirty/VDIR IsDCacheSizeOK currently can incorrectly flag a dcache as corrupted, since the size of a dcache may not match the size of the underlying file in a couple of RW conditions: - If someone is writing to a file beyond EOF, the intermediate 'sparse' area may be populated by 0-length dcaches until the data is written to the fileserver. - Directories may be modified locally instead of being fetched from the fileserver, which can sometimes result in a directory blob of differing sizes. To avoid false positives detecting dcache corruption, just skip the IsDCacheSizeOK check for directories, and any file with pending writes (CDirty). Also add some extra information to the logging messages when this "corruption" is detected, so false positives may be more easily detected in the future. Change-Id: I5130287d0de791cffea85aaec5a0899d5c8d092e Reviewed-on: https://gerrit.openafs.org/13747 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d6262c3f391e4176bec207fd0e8d4d6091a7f4e2 Author: Cheyenne Wills Date: Fri Jul 26 14:57:02 2019 -0600 gtx: Avoid incomplete function type in casts clang complains that these casts contain an incomplete function type (since the function argument is omitted rather than declared to be void). Since we just need the cast to pointer type, let the compiler do it implicitly and pass stock NULL, rather than trying to force a cast to function-pointer type. Change-Id: Ia2a4cf61d51faef3b4cd469133d9143ca5f57185 Reviewed-on: https://gerrit.openafs.org/13726 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5792e0211be275cf79d10e8c5f6ab2a14493e07a Author: Yadavendra Yadav Date: Fri Jul 26 19:59:25 2019 +0530 LINUX: Avoid re-taking global lock in afs_dentry_iput “dput” function internally can call dentry_iput which results in calling afs_dentry_iput. So in case before calling “dput” if global lock was held then when afs_dentry_iput is called it will again try to lock global lock and will result in deadlock scenario. So to avoid this deadlock make sure if global lock is already taken before calling afs_dentry_iput, don’t try to lock it again. This issue was partially fixed in commit 0dac4de8 (Linux: drop GLOCK before calling dput) Change-Id: I71f18c58d5254f0cf0c68ef04c22268ed70dd50f Reviewed-on: https://gerrit.openafs.org/13725 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 104a9d357da9452305694e97752fe6313fcd22c0 Author: Michael Meffie Date: Wed Jul 24 11:39:43 2019 -0400 build: fix --enable-rxgk help format Move the dnl macros out of the AC_ARG_ENABLE to fix the formatting of the --enable-rxgk help string. Before this commit: $ ./configure --help | grep -C2 rxgk --enable-kauth install the deprecated kauth server, pam modules, and utilities (defaults to disabled) --enable-rxgk Include experimental support for the RXGK security class (defaults to disabled) --disable-strip-binaries After this commit: $ ./configure --help | grep -C2 rxgk --enable-kauth install the deprecated kauth server, pam modules, and utilities (defaults to disabled) --enable-rxgk Include experimental support for the RXGK security class (defaults to disabled) --disable-strip-binaries Change-Id: Iaf6695643f11c7b636e3fba33ee7161e21df23a6 Reviewed-on: https://gerrit.openafs.org/13722 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4a57cc54dfb6789a86ee735360ee44209c1a901a Author: Cheyenne Wills Date: Tue Jul 2 16:58:28 2019 -0600 ptserver: testpt.c format-overflow warning GCC 9 introduced new warnings/errors and is flagging a sprintf with a format-overflow warning. With --checking-enabled, this error is causing testpt.c to fail during compile. Change the buffer size from 16 bytes to PR_MAXNAMELEN+1 and use snprintf instead of sprintf. Generate an error message and exit if snprintf truncates the string. Change-Id: I30fbe0971ba3e05dc6ac61e7b2ded2fd1777374d Reviewed-on: https://gerrit.openafs.org/13663 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 41ee558329560bce037ad2860282d8b49aa11b2d Author: Cheyenne Wills Date: Fri Jul 26 07:59:33 2019 -0600 uss: uss_procs.c format-overflow warning GCC 9 introduced new warnings/errors and is flagging a sprintf with a format-overflow warning. With --checking-enabled, this error is causing uss_procs.c to fail during compile. A file name with the full path is being composed and the size of the buffer was triggering a possible format-overflow warning/error. Use asprintf to allocate the buffer dynamically instead of using a buffer sitting on the stack (reducing the stack requirements by 2K). Produces new error message if asprintf returns an error. Change-Id: Ib233052aab9c3bc1ec24dac7e70f97933b478d3e Reviewed-on: https://gerrit.openafs.org/13664 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f938f5f248a3cb3f7ac871f5ef45a0e2d043706b Author: Cheyenne Wills Date: Tue Jun 25 15:39:40 2019 -0600 ptserver: Incorrect variable used to print error msg In testpt.c the variable cdir is used to print the name of the temporary dir. However at this point in the code cdir is NULL and the variable tmp_conf_dir contains the actual name that should be used in the error message. Flagged as an error when --enable-checking is on and using GCC 9. Change-Id: I0c854fd89c0bae1c313ae1f382e58fd410b719e6 Reviewed-on: https://gerrit.openafs.org/13662 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 98ca332c4a5ac9e5687fb4fe21b350134bc74d1b Author: Cheyenne Wills Date: Mon Jul 15 08:38:24 2019 -0600 rxkad: v5der.c format truncation warnings GCC 7 is producing new warnings due to better compile time analysis. With --enable-checking v5der.c is failing with 2 errors due to possible format-truncation in some snprintf calls. The format strings are being used to format a date and time values from a tm structure. The actual warnings/errors are being triggered from arithmetic being performed on the year and month members of the structure. The resulting values should not exceed the format lengths, but the compilers are still flagging the statements. v5der.c is part of the heimdal package that is pulled into the openafs source tree. v5der.c is not compiled directly but is #included in ticket5.c Update ticket5.c to change the severity of the format-truncation diagnostic to a warning if using GCC 7 (or higher). Note: since v5der.c is pulled from an external source (heimdal), any changes to update v5der.c directly would need to be performed upstream. Change-Id: Icda0d86444f505604abe9fa1cc2450d7538be7ef Reviewed-on: https://gerrit.openafs.org/13661 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit eaae6eba8ca10ba7a5a20ee0d1b5f91bc2bac6c6 Author: Benjamin Kaduk Date: Thu Jul 11 21:07:35 2019 -0700 aklog: require opt-in to enable single-DES in libkrb5 Since the introduction of rxkad-k5 in response to OPENAFS-SA-2013-003, it is not strictly necessary to configure libkrb5 to allow weak crypto in order to obtain an AFS token. A sufficient amount of time has passed since then that it is safe to assume that the default behavior is the more-secure one, and require opt-in for the insecure behavior. To indicate that the use of single-DES is quite risky, add the "-insecure_des" argument to both klog and aklog, to gate the preexisting calls that enable weak crypto/single-DES. These calls, and the -insecure_des option, may be removed entirely in a future commit. Change-Id: If175d0f95f0ede0f252844086a2a023da5580732 Reviewed-on: https://gerrit.openafs.org/13689 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 5f48367f2bd5bf1c0e689c79508177b649b9113b Author: Andrew Deason Date: Mon Mar 25 16:33:39 2019 -0500 afs: Avoid non-dir ENOENT errors in afs_lookup Historically, there have been many subsystems in libafs that can generate ENOENT errors for a variety of reasons. In addition to the expected case where we lookup a name that doesn't exist, other scenarios have caused ENOENT error codes to be generated, such as: internal inconsistencies, I/O errors, or even abort codes from the network. When one of these scenarios cause an ENOENT error code in one of those situations during afs_lookup() when the target name does actually exist, it can be confusing to a user, or even result in incorrect application behavior. On Linux in particular, ENOENT results from a lookup are cached in negative dcache entries, and so can cause future lookups for the same name to yield ENOENT errors. Various commits have tried to avoid this abuse of the ENOENT error code, such as 2aa4cb04 (afs: Stop abusing ENOENT). But we cannot prevent receiving ENOENT abort codes from the network, and mistakes in the future may cause more scenarios incorrectly yielding ENOENTs. However, in afs_lookup, we do know that legitimate ENOENT errors can only occur in one situation: when we have a valid directory blob, and the afs_dir_Lookup() operation itself returns an ENOENT error for the target name. For all other areas of afs_lookup(), we know that an ENOENT error is not legitimate, since we may not be sure if the target name exists or not. So to proactively avoid incorrect ENOENT results, prevent afs_lookup from returning ENOENT, except in the specific code path where afs_dir_Lookup is called. Change-Id: I1c91600fd38b1179f02fa6eadea631b6eb8edb6d Reviewed-on: https://gerrit.openafs.org/13537 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit fa15fbda0aa0c3810695d9b867d3258b60e76b7c Author: Andrew Deason Date: Tue Jul 24 23:22:01 2018 -0500 LINUX: Minor osi_vfsop.c cleanup - Fix the formatting on afs_mount/afs_get_sb definitions - Declare a couple of functions static that are not referenced outside of this file Change-Id: I4880c27dbe2acd296262d29f91736d0028a029c0 Reviewed-on: https://gerrit.openafs.org/13282 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 397199a1992d74d8b7e693a2d76df836f7a70080 Author: Andrew Deason Date: Tue Aug 14 15:53:20 2018 -0500 afs: Add AFS_USPC_SHUTDOWN bkg request When AFS_NEW_BKG was added, the kernel module indicated to the relevant afsd process that it's time to shutdown by returning -2. This works on DARWIN, but it's difficult to make this work on all platforms, because of the different way that platforms handle error codes from our pioctls and other AFS syscalls. Specifically, on LINUX, negative error codes are assumed to be negative errno codes, and so returning -2 from the syscall handler means we return -1 to userspace, with errno set to 2 (ENOENT). Getting this to work consistently across platforms is probably more trouble than its worth, so instead of relying on specific return codes from the syscall, just add a new background daemon operation called AFS_USPC_SHUTDOWN, which just tells the background daemon to exit. Change-Id: I00b245c8f734dc9e49d6b4268cd0f6a4f1896894 Reviewed-on: https://gerrit.openafs.org/13281 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 79dffe29c8a0ec55c4231a18077efdfa7c1edf53 Author: Cheyenne Wills Date: Fri Jul 5 08:23:10 2019 -0600 libadmin: overlap warning in strcpy with gcc9 GCC 9 with --enable-checking produces a new warning/error in afs_utilAdmin.c associated with a strcpy with the potential of an overlap. The index used is signed which triggers the new warning. The source and target of the strcpy are contained within the same higher level structure. Change the variable 'index' from signed to unsigned to resolve the warning/error. Change the variable 'total' in the same structure to unsigned to be consistent with it's usage with 'index'. Change-Id: Icaa99e278a5d8262caeaec0b2723e826a57554aa Reviewed-on: https://gerrit.openafs.org/13660 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7c60a0fba11dd24494a5f383df8bea5fdbabbdd7 Author: Andrew Deason Date: Thu Jan 17 16:21:25 2019 -0600 afs: Check dcache size when checking DVs Currently, if the dcache for a file has nonsensical length (due to cache corruption or other bugs), we never notice, and we serve obviously bad data to applications. For example, the vcache metadata for a file may say the file is 2k bytes long, but the dcache for that file only has 1k bytes in it (or more commonly, 0 bytes). This situation is easily detectable, since the dcache and vcache refer to the same version of the same file (when the DVs match), and so we can check if the two lengths make sense together. So to avoid giving bad data to userspace applications, perform a sanity check on the lengths at the same time we check for DV matches (to see if the dcache looks "fresh" and not stale). If the lengths do not make sense together, we just pretend that the dcache is old, and so we'll ignore it and fetch a new copy from the fileserver. Also check the size of the data fetched from the fileserver for a newly-fetched dcache in afs_GetDCache, to avoid returning a bad dcache if the dcache isn't already present in the cache. Change-Id: I338a4962322d8c0d06d1ea25fd7d252b5f83dc9f Reviewed-on: https://gerrit.openafs.org/13436 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit eed79e2d28dcab889d01869e57dec14fd30d421c Author: Andrew Deason Date: Wed Jul 3 12:55:53 2019 -0500 LINUX: Unlock page on afs_linux_read_cache errors When afs_linux_read_cache is called with a non-NULL task, it is responsible for unlocking 'page' (unless it's unlocked in a background task), even if we encounter an error. Currently we almost always do unlock the given page for a non-NULL task, but if we manage to hit one of the codepaths that 'goto out', we skip over the unlock_page() call near the end of the function, and the page never gets unlocked. As a result, the page stays locked forever. That generally means any future access to the same file will block forever, and when we try to flush the relevant vcache, we will block waiting for the page lock while holding GLOCK. (This can happen via the background daemon via e.g. afs_ShakeLooseVCaches -> osi_TryEvictVCache -> afs_FlushVCache -> osi_VM_FlushVCache -> vmtruncate -> ... -> truncate_inode_pages_range -> __lock_page on Linux 2.6.32-754.2.1.el6.) This quickly brings the whole client to a halt until the machine can be forcibly rebooted. To solve this, just move the 'out:' label to before the page unlock. Add a few locking-related comments around the relevant code to help explain some relevant details. The relevant code has changed and been refactored over the years, but this problem has probably existed ever since this code was originally converted to using the readpage() of the underlying cache fs, in commit 88a03758 (Use readpage, not read for fastpath access). Change-Id: If7e882ed54ca93ad6b9fdda938c606b241236241 Reviewed-on: https://gerrit.openafs.org/13672 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0d8ce846ab2e6c45166a61f04eb3af271cbd27db Author: Andrew Deason Date: Thu Jan 17 15:45:36 2019 -0600 afs: Introduce afs_IsDCacheFresh Numerous places in libafs check the DV of a dcache against the DV of the vcache for the same file, in order to check if the dcache is up to date and can be used. Consolidate all of these checks into a new function, afs_IsDCacheFresh, to make it easier for future commits to alter this logic. This commit should have no visible impact; it is just code reorganization. Change-Id: Iedc02b0f5d7d0542ab00ff1effdde03c2a851df4 Reviewed-on: https://gerrit.openafs.org/13435 Reviewed-by: Benjamin Kaduk Tested-by: Andrew Deason commit fb9de9e5fd4822df043a0d46e6a1101df2e08b85 Author: Andrew Deason Date: Thu Nov 15 12:37:16 2018 -0600 afscp: Add -l option Add the -l option to afscp, to "loop" the given FetchData/StoreData request over and over. When using this mode, we alternate between using a couple of rx calls, to avoid getting slowed down by rx BUSY packets when we start a new call on the same channel too quickly. Change-Id: I90ee8e9804a0bf59ff654398b1fe6e46a99a3062 Reviewed-on: https://gerrit.openafs.org/13657 Reviewed-by: Benjamin Kaduk Tested-by: Andrew Deason commit b0278994826f6bd1dfebc39f26282b8fbdadf1a0 Author: Mark Vitale Date: Wed May 22 22:50:00 2019 -0400 auth: make PGetTokens2 work with 3-char cellnames PGetTokens2 accepts two different types of input: - an integer 'iterator' to request the nth token set for a user - a string cellname to request the user's token set for that cell Unfortunately, it distinguishes between these by assuming if the input length is sizeof(afs_int32) (4 bytes), it must be an integer. This assumption is incorrect if the cellname is three (3) characters long plus a nul terminator. The result is that the cellname string is interpreted as a very large "n"; the subsequent search for the user's "very-large-nth-token" fails, making it appear that the user has no valid token for this cell. Improve on this heuristic by double-checking any putative integer input. If it is actually a 3-character string, then process the input as a cellname instead. Introduced by commit 5ec5ad5dcca84e99e5f55987cc4f787cd482fdde 'New GetToken pioctl'. While here, add doxygen comments. Change-Id: Ifa226fa1c35b95bc32642870f73359f97a9f1d61 Reviewed-on: https://gerrit.openafs.org/13599 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason commit 95ae30c30d98a3219fd021e0ed83200c1b6c266f Author: Mark Vitale Date: Wed May 22 23:03:11 2019 -0400 auth: eliminate pointless retries in ktc_ListTokensEx ktc_ListTokensEx is an iterator to provide the names of each cell for which a user has a token set. It does this by looking for the 1 through nth token set for a given user. However, as currently implemented, it always continues searching up to the 100x safety limit even when there are no more token sets for the user. Instead, return immediately when VIOC_GETTOK2 returns EDOM (no more tokens for this user). Introduced by commit a86ad262d2a8be36f43ab0885a84dde37ddfc464 'auth: Add the ktc_ListTokensEx function'. Change-Id: I880edc80fc6c5580e5919b74b0b561317a1455f0 Reviewed-on: https://gerrit.openafs.org/13598 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4eeed830fa31b7b8b5487ba619acbc8d30642aaa Author: Andrew Deason Date: Wed Jun 26 17:03:03 2019 -0500 afscp: Link against opr/roken/hcrypto Link afscp against libopr, libroken, and libafshcrypto, so afscp can be built again. Change-Id: I43ac3a8e7ed1ff012f4ae48ed6b81f5d0cd1d590 Reviewed-on: https://gerrit.openafs.org/13656 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f5f59cd8d336b153e2b762bb7afd16e6ab1b1ee2 Author: Cheyenne Wills Date: Tue Jun 25 10:40:53 2019 -0600 util: serverLog using memory after free clang's scan-build detected a "use of memory after it is freed" condition. The function OpenLogFile frees the variable ourName before creating a duplicate of the name passed to it. However there is a call that uses ourName as the parameter: OpenLogFile(ourName). This results in freeing ourName then doing a strdup of the same memory location. Test the passed parameter and if it's the same as ourName already skip the free and strdup. This bug was introduced in commit 340ec2f79208ee21c3130c4b1c13995947ce426c "util: allocate log filename buffers" Change-Id: I770008b074e0003c7c1532128f8322da811d6fcc Reviewed-on: https://gerrit.openafs.org/13659 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1210a8d6d96db2d84595d35ef81ec5d176de05e8 Author: Andrew Deason Date: Fri Jun 28 14:14:48 2019 -0500 LINUX: Run the 'sparse' checker if available The Linux kernel module buildsystem supports running an external tool (by default, the 'sparse' tool) during the build to run additional static checks on the source code to flag various warnings. Tell the kernel build to run such a tool, if 'sparse' is installed. This causes various new warnings in the build, such as: CHECK /.../src/libafs/MODLOAD-4.9.0-8-amd64-MP/afs_tokens.c /.../src/libafs/MODLOAD-4.9.0-8-amd64-MP/afs_tokens.c:73:1: warning: symbol 'afs_FreeOneToken' was not declared. Should it be static? /.../src/libafs/MODLOAD-4.9.0-8-amd64-MP/afs_tokens.c:160:1: warning: symbol 'afs_IsTokenExpired' was not declared. Should it be static? /.../src/libafs/MODLOAD-4.9.0-8-amd64-MP/afs_tokens.c:187:1: warning: symbol 'afs_IsTokenUsable' was not declared. Should it be static? None cause the build to fail currently, but are just printed for potential further investigation. To control detecting 'sparse', add the --with-sparse configure option and SPARSE configure variable. Default to checking if sparse is available, and enabling it if so. Further information on using sparse in the Linux kernel is available in Documentation/sparse.txt in the Linux tree. Using 'sparse' during the build was suggested by yadayada@in.ibm.com. Change-Id: I57944d792ba1c8093196a8b335a12dfa741b119b Reviewed-on: https://gerrit.openafs.org/13665 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3f0b9907d12c00725548dbaf84fee3e033cb974c Author: Pat Riehecky Date: Tue Jun 12 13:55:56 2018 -0500 afs: test condition mismatch resolved While it is unexpected, it is possible for the two disconnected flags to get out of sync resulting in a path to an undefined varible in use. (via cppcheck) Change-Id: I995b402e73c2c330485050dd2594a62fe67d1bca Reviewed-on: https://gerrit.openafs.org/13207 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fbe2a03aa69bc19768302685d902a25e4d6e157a Author: khm Date: Tue Jun 25 12:51:21 2019 -0700 add dkms dependency in Red Hat unit file Currently, there is no explicit relationship between OpenAFS and dkms. If dkms needs to rebuild the kernel module, OpenAFS will fail to mount because modprobe will not load the module. This change specifies that OpenAFS should run after dkms if dkms is present. Change-Id: I104cb3780bbc1196cf36852f094ca07c80279d01 Reviewed-on: https://gerrit.openafs.org/13654 Tested-by: BuildBot Reviewed-by: Michael Laß Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 877d9d79a32b9e81911cb567f844b11c693229f0 Author: Andrew Deason Date: Tue Oct 30 15:41:22 2018 -0500 aklog: Avoid misleading AFSCELL message Currently, if the AFSCELL environment variable is set, aklog (and other libauth-using utilities) print out a message when afsconf_GetLocalCell is called: Note: Operation is performed on cell env.example.com However, this message is also printed (with the AFSCELL cell) when aklog is given the -cell command-line argument, even though aklog actually uses the cell given on the command line. For example: $ AFSCELL=env.example.com aklog -cell cli.example.com -d Note: Operation is performed on cell env.example.com Authenticating to cell cli.example.com (server srv1.example.com). [...] libauth will normally not print the "Operation" message if we're not using the default cell, but it determines this by checking if someone called afsconf_GetCellInfo before calling afsconf_GetLocalCell. And currently, aklog calls afsconf_GetLocalCell before afsconf_GetCellInfo, so the message gets printed because libauth has no way of knowing that we're actually using a different cell. klog gets around this by making an additional ignored call to afsconf_GetCellInfo before afsconf_GetLocalCell, but we can fix this in aklog by just changing the order of the calls. So, just call afsconf_GetCellInfo first; if we're using the local cell, we can just give a NULL cell parameter, instead of looking up the local cellname first. Change-Id: I53469ee93d6e88632a944a87a031e0ffa4ede584 Reviewed-on: https://gerrit.openafs.org/13371 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e14a69cf925172d699c2ff31078f8a634a90747f Author: Andrew Deason Date: Sat Dec 8 15:08:26 2018 -0600 rx: Set listener pthread name When running under pthreads, set the name of the rx listener thread to "rx_Listener". This can be handy when investigating rx performance issues, since it makes it easier to identify which thread in the rx listener. Don't do this for "hot threads", since in that case we could return and stop being a listener thread. We could restore the original thread name, but doing so could have an impact on performance and "hot threads" should always be disabled these days, so don't bother. Change-Id: I24aebd4d7e4266cd06bb1a4314949d85835dfbaa Reviewed-on: https://gerrit.openafs.org/13600 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9d28f7390332c92b3d9e863c6fe70c26db28b5ad Author: Andrew Deason Date: Wed Jun 26 11:47:21 2019 -0500 Move afs_pthread_setname_self to opr Move the functionality in afs_pthread_setname_self from libutil to opr, in a new function opr_threadname_set. This allows us to more easily use the routine in more subsystems, since most code already uses opr. Change-Id: I79d49617a19cd292a3b09ccfd9c9f319355a184e Reviewed-on: https://gerrit.openafs.org/13655 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 99418024276c94da5982d7dad6126a8d53924d7e Author: Andrew Deason Date: Sun Jun 23 17:48:53 2019 -0500 libafs: Create $(DESTDIR)$(KMODDIR) on FBSD inst We rely on bsd.kmod.mk for our actual rules during 'make install', but that tries to install our kernel module into $(DESTDIR)$(KMODDIR), without creating it first. If the user tries to 'make install DESTDIR=/some/path' and that path doesn't exist, we will fail with something like: make DESTDIR=/home/adeason/git/destdir single_instdir_libafs /usr/bin/install -c -T release -o root -g wheel -m 555 libafs.ko /home/adeason/git/destdir/boot/modules/ install: /home/adeason/git/destdir/boot/modules/: No such file or directory *** Error code 71 To avoid this, add a dependency on the 'install' target which causes our target dir to be created. Change-Id: Icacc507867420265383e411572006df47ef22815 Reviewed-on: https://gerrit.openafs.org/13653 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 85d70ea953c6fb44f200ed4be13cded7413559b8 Author: Andrew Deason Date: Sun Jun 23 16:25:27 2019 -0500 asetkey: Fix random_key for Heimdal Go through our deref_key_length/deref_key_contents abstractions, so we can compile with Heimdal krb5. Also fix these macros to properly separate the 'key' macro argument, so we can use the macros in these new places. Change-Id: I3ee53bc70494a67ac5463819dc575c8ee37647c9 Reviewed-on: https://gerrit.openafs.org/13652 Tested-by: BuildBot Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 34fd532e35b6f373304effaa16c9c65062b12cd9 Author: Andrew Deason Date: Wed Aug 1 18:38:51 2018 -0500 DARWIN: Use tb->code_raw for BOP_MOVE Currently, BOP_MOVE communicates its error code to the requestor via the 'retval' field in struct afs_uspc_param, and we assume ptr_parm[0] of the given brequest is for a struct afs_uspc_param. But this is unnecessary, since struct brequest already has fields for error codes; namely, code_raw and code_checkcode. To avoid afs_BackgroundDaemon needing to interpret ptr_parm[0] in this way (and assuming the type of the pointer's target), change BOP_MOVE to just use the code_raw field for error codes, instead of interpreting ptr_parm[0]. This makes it easier to add more AFS_NEW_BKG background operations that do not pass a struct afs_uspc_param in the brequest parameters. Change-Id: I90a564468862142777159fbb78234744840b59fb Reviewed-on: https://gerrit.openafs.org/13280 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0c1d124b0b6ea3117885d2bca163170515cb8713 Author: Andrew Deason Date: Mon Aug 20 15:47:13 2018 -0500 rxkad: Update ticket5 from heimdal This updates the rxkad code that we pull from heimdal to heimdal 7.7.0 (heimdal.git commit e1959605bd). This also updates the instructions in README.v5 to accommodate changes in the heimdal tree, and converts ticket5.c to use KRB5_ENCTYPE_* constants instead of ETYPE_* constants (since heimdal has also similarly converted in krb5_asn1.h). This removes a few -Werror=format-truncation warnings that were present in the heimdal code before this commit. README.v5 tweaked in collaboration with kaduk@mit.edu. Change-Id: I5fdaab600b4a1b42658a60259fde3fc9f7dced04 Reviewed-on: https://gerrit.openafs.org/13287 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 54c34d32e884a5bfb2352e7c8767d743ef3e4647 Author: Mark Vitale Date: Wed Jun 12 23:44:32 2019 -0400 afs: remove bogus comment from afs_IsTokenExpired Remove an incorrect comment, introduced with commit adf2e6e827c6caf55247c5e63b88775393156ae5 'Unix CM: Generalise token storage'. No functional change is incurred by this commit. Change-Id: Ie56c4f22a06321c56f62fce9704419ce3c4e7bf2 Reviewed-on: https://gerrit.openafs.org/13640 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 3a5ab19fe04058e002bfea90f8b64fab4676de67 Author: Benjamin Kaduk Date: Fri Apr 19 10:38:24 2019 -0500 afs: add a file-level comment to afs_osidnlc.c This file doesn't currently do a great job of telling the reader what it's used for. Let's give them a hint, especially for the expansion of "DNLC". Change-Id: Ie5d1f1162a4b59c479bc2961b33cd696e83bdc3a Reviewed-on: https://gerrit.openafs.org/13557 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 30a6ab30f2451b9788328336dd937a4263f5f5c7 Author: Andrew Deason Date: Tue Feb 26 20:47:00 2019 -0600 ptserver: Check for superuser in WhoIsThisWithName In WhoIsThisWithName, if we don't understand the rx security class being used (such as rxgk), we'll set the calling id to the anonymous user and return an error. But for SYSADMINID specifically, we don't really need to know any security-class-specific details; we just need to know that the caller is the superuser. So add a fallback case to check for that; if we don't understand the calling rx security class, just check if the calling user is RX_ID_SUPERUSER, and use SYSADMINID if so. This allows the ptserver to handle rxgk localauth requests (and theoretically, localauth requests for any future security classes), and theoretically any localauth requests for future security classes. Based on a commit from mvitale@sinenomine.net. Change-Id: Ia9bc91fb5a0d9ebf16b32659c9068aa5a9da8401 Reviewed-on: https://gerrit.openafs.org/13508 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 316b862af6b6731f57a21f81b0948f3718b4c9f3 Author: Mark Vitale Date: Mon Feb 11 01:21:08 2019 -0500 ptclient: rxgk support Allow ptclient to use rxgk, with the new -rxgk option. While we're here, also allow the user to specify a security level of 3, to turn on rxkad encryption for non-localauth conns. Change-Id: I201154c1b5298f31912d8841f8310363e13afa08 Reviewed-on: https://gerrit.openafs.org/13501 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit e5b1e6f1adbe10e366bb4d9c745e90193badc1fb Author: Benjamin Kaduk Date: Sun Apr 13 22:01:59 2014 -0400 Add rxgk client options to vl and pt utilities Add options to use rxgk for outgoing connections to vlserver, vos, ptserver, and pts. For vlserver and ptserver, name the new option -s2scrypt, similar to the existing volserver option -s2scrypt. For vlserver and ptserver, specify 'rxgk-crypt' to turn on rxgk crypt connections for our server-to-server ubik communication. For vos and pts, just name the new option '-rxgk', and allow the user to specify the rxgk level to use ('clear', 'auth', or 'crypt'). The pts code is currently somewhat ill-suited to changing what rx security class and security level we use, but do the best we can without refactoring the whole thing. Change-Id: Iefae46291330d2b5e05b2a2bbaec1b9150b3c892 Reviewed-on: https://gerrit.openafs.org/11105 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit fc7e1700fe84f623fb9163466d24226df00b1a2c Author: Mark Vitale Date: Wed May 22 22:52:10 2019 -0400 pioctl: limit fruitless token searches getNthCell searches the afs_users table for the nth token set belonging to a given user. However, it is impossible for a user to have more than one token set per cell. If the caller specifies a number greater than the total number of cells this cache manager knows about, we know the search will be fruitless. Instead, return early in this case, avoiding both the lock and the search. Change-Id: I509408d9aaa8f511813c4d82c121e199121bb8f3 Reviewed-on: https://gerrit.openafs.org/13597 Tested-by: BuildBot Tested-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 8d2306e1dae84af9ccbadd2518beaf8543d4413b Author: Andrew Deason Date: Wed May 15 14:35:41 2019 -0500 Add --quiet option to lwptool Add an option to lwptool, called --quiet, to suppress printing the literal commands run. On error, we still print the exact failed command to stderr. For "pretty" V=0 builds, use this new option, to make our lwptool-using compile rules look more like our other compile rules. Change-Id: I3fed6db3205f8de5e275e9b70aba9e1995afd02f Reviewed-on: https://gerrit.openafs.org/13594 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4b6a4ff31a4197504bbcf2d4c14c24dee672d40e Author: Andrew Deason Date: Thu May 16 20:01:17 2019 -0500 Use the ppc64le_linuxXX sysname for ppc64le builds Commit 191e18eb (Open ppc64le_linux sysname space) added the ppc64le_linux26 sysname, but it still must be manually specified when running on ppc64le. Use the ppc64le_linux26 by default on ppc64le, so we can compile without needing to specify an explicit sysname. Change-Id: I5abbdde06622d5f2b067bfd003f9d4cd51c56f1a Reviewed-on: https://gerrit.openafs.org/13593 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 46563f929a851032d785634763963808d6e2bfeb Author: Andrew Deason Date: Thu May 16 16:12:47 2019 -0500 Do not define AFS_SYSCALL for ppc64le_linux26 AFS_SYSCALL is defined to the syscall number we can use for a certain platform (for pioctls and other AFS-specific kernel calls). On many modern platforms, such as Linux, we don't use direct syscalls anymore, instead routing our AFS-specific syscalls through an ioctl, and AFS_SYSCALL is just used as a fallback for compatibility for older OpenAFS releases that might still be using the syscall. For new platforms, we have no need for this compatibility code path, since there is no existing code we might need to be compatible with. We should avoid defining AFS_SYSCALL for those, so we can avoid manually-issuing syscalls in more cases. The ppc64le_linux26 platform is a very new platform (introduced in 191e18eb "Open ppc64le_linux sysname space"), and so should not have AFS_SYSCALL defined. So, remove AFS_SYSCALL from ppc64le_linux26's param.h. Change-Id: I7811831b05a17c9428556aca49681cd544da4ff1 Reviewed-on: https://gerrit.openafs.org/13592 Tested-by: BuildBot Reviewed-by: Mark Vitale Tested-by: Andrew Deason Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 191e18ebcee3698a76b55912de0a41111c384128 Author: Nathaniel Filardo Date: Wed May 1 23:01:51 2019 +0100 Open ppc64le_linux sysname space While here, add config/param.ppc64le_linux26.h; it's just like ppc64_linux26.h, except not AFSBIG_ENDIAN. Change-Id: I6671405f829f2bf50b6e8d3355ab9e8aed384c02 Reviewed-on: https://gerrit.openafs.org/13562 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Jeffrey Altman Reviewed-by: Benjamin Kaduk commit 5cd5cd9fa8754a5af346fa6a392363b046316c75 Author: Pat Riehecky Date: Fri Jun 1 16:33:37 2018 -0500 Fix static expressions in conditionals The conditions in these if statements are always true (or always false). Remove the check in cmdebug.c, as it is unnecessary, and fix the check in vlclient.c to actually check for a valid voltype. (via cppcheck) Change-Id: Ica7dfc9b81fe8bd0f156f6e4e616ed45e205985a Reviewed-on: https://gerrit.openafs.org/13158 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 13817774518ada28f5fe68e0d00ef5dd00b67b55 Author: Cheyenne Wills Date: Thu Apr 18 09:55:09 2019 -0600 redhat: RHEL8 add elfutils-devel as build dependency for kernel module Building the kernel modules under RHEL8 produces the following error message: Makefile:952: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel". Stop. Add elfutils-devel to the BuildRequires in the rpm spec when building rhel >= 8 Add elfutils-devel to the BuildRequires in the rpm spec that openafs-kmodtool produces FIXES 134900 Change-Id: Ie3e03336d9599caa6ceb7879199eab3b12eb971b Reviewed-on: https://gerrit.openafs.org/13560 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 9779dd29e7bd76a2b3b759587d6eb919682dfba0 Author: Andrew Deason Date: Thu Nov 9 12:50:53 2017 -0600 asetkey: add 'add-random' command Add a new command, 'add-random', to allow the creation of a new key with random data. This is helpful for certain rxgk keys, which only need to exist in KeyFileExt and not in any other database (like a krb5 KDC), and so aren't derived from a krb5 keytab. Change-Id: I1f3b27e074b0931deb8645f7550e0b315d82e249 Reviewed-on: https://gerrit.openafs.org/12768 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5120409cc998284f2fb0467c2f88030976140341 Author: Andrew Deason Date: Thu Nov 9 12:47:57 2017 -0600 asetkey: Add new 'delete' command variants The current 'delete' command from asetkey only lets the user delete old-style rxkad keys. Add a couple of new variants to allow specifying the key type and subtype, so the user can delete specific key types and enctypes if they want. Change-Id: If0dfaa70ea0b749dadd52a6b7d62fd3ad2b61d18 Reviewed-on: https://gerrit.openafs.org/12767 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 12b46b6af778625a9c360dca61a59fcf30b76fd1 Author: Andrew Deason Date: Fri Sep 28 14:55:56 2018 -0500 afs: Raise osidnlc NCSIZE The currrent size of the osi DNLC is very small; only 300 entries. Raise it to 4096 entries, to give it some chance of actually helping. In the future, of course, this should be runtime configurable, and we should also raise the hash table size. For now, just raise the number of entries without changing anything else, to try to make sure nothing breaks. With the hash size of 256, this means our hash chains will be at least 16 items long. However, traversing even hundreds of hash items should still be better than frequently hitting the disk cache to find entries, and acquiring more locks, etc. Change-Id: I48f496e8c25fa869ded83e97ff686ed028c923c5 Reviewed-on: https://gerrit.openafs.org/13531 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit e02ae66c7eef1bfc5df9c3e9f2acde3bc3102390 Author: Andrew Deason Date: Mon Apr 1 12:57:42 2019 -0400 doc: Remove one lingering reference to src/mcas Change-Id: I8b137d28d33a805c4aa941cc64a89d6a504fabc6 Reviewed-on: https://gerrit.openafs.org/13539 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 5d0acbbbc0a7bb250886b3040d9e4de05d4fd27f Author: Benjamin Kaduk Date: Tue Aug 1 20:57:52 2017 -0500 Remove src/mcas This lock-free library toolkit is intriguing and may be the subject of future work, but currently nothing uses this code, and these files are just clutter. Remove src/mcas and stop mentioning it in SOURCE-MAP; don't reference it in the rpctests, either. Reviewed-on: https://gerrit.openafs.org/12682 Tested-by: Benjamin Kaduk Reviewed-by: Mark Vitale Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk (cherry picked from commit bfc5d1ada2f5ce12bfafe65d352982adbefe9911) Change-Id: I98bec6f0a91e4aad05846a6791719cac63050f02 Reviewed-on: https://gerrit.openafs.org/13538 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3d22ce36dcb86df564d4d91ff0e174792b30d68f Author: Pat Riehecky Date: Wed Jun 6 10:01:02 2018 -0500 afsmonitor: avoid double free on exit The afsmonitor may leak memory and do a double free on shutdown when it was started with a non-zero -buffers parameter value. The deallocation of the cm results circular buffer incorrectly frees the base of the array of results instead of each result. The fs buffer clean up got this right. This fixes the clang scan-build warning: afsmonitor.c:461:7: warning: Attempt to free released memory free(tmp_cmlist); ^~~~~~~~~~~~~~~~ [mmeffie: update code and commit message] Change-Id: Ifd4ea5b9b865f04e5cf88560dd8a9dfdbe7e32cb Reviewed-on: https://gerrit.openafs.org/13161 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1b835d1c1a5d4a838ab1344abc6615626a28b715 Author: Andrew Deason Date: Thu Nov 9 00:03:04 2017 -0600 asetkey: Allow rxgk keys Add rxgk support to asetkey. This just allows asetkey to display rxgk keys more prettily, and allows the user to add literal rxgk key data on the command line, or add keytab-derived keys. Change-Id: Ic28fea628614be2b20276631bc7e7c2f85ccc154 Reviewed-on: https://gerrit.openafs.org/12766 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 5505ccbaf74f7d36cea180a65001d31bbc0abea0 Author: Benjamin Kaduk Date: Sun Apr 13 21:38:02 2014 -0400 auth: Add afsconf_ClientAuthRXGK variants Add various afsconf_ClientAuthRXGK* variants, to use local printed rxgk tokens with clear, auth, or crypt levels. Also add the flag AFSCONF_SECOPTS_RXGK for afsconf_PickClientSecObj, to let callers of afsconf_PickClientSecObj use rxgk connections. To allow selecting of the "clear" level, add the flag AFSCONF_SECOPTS_ALWAYSCLEAR. And to allow selecting the "auth" level but letting "crypt" be the default for rxgk, add the new flag AFSCONF_SECOPTS_NEVERENCRYPT. Change-Id: Ib27f2799eb927ac5aa71eab94212171344dd93df Reviewed-on: https://gerrit.openafs.org/11104 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0b3bd1b7cdc88ba62c8cd540e8628faa84e33cf9 Author: Andrew Deason Date: Thu Jan 17 00:04:36 2019 -0600 dir: Honor non-ENOENT lookup errors Currently, several places in src/dir/dir.c assume that any error from a lower-level function (e.g. FindItem) means that the item we're looking for does not exist in that directory. But if we encountered some other error, that may not be the case; the directory blob may be corrupt, we may have encountered some I/O error, etc. To detect cases like this, return the actual error code from FindItem &c, instead of always reporting ENOENT. For the code paths that are actually specifically looking for if the target exists (in afs_dir_Create), change our checks to specifically check for ENOENT, and return any other error. Do the same thing for a few similar callers in viced/afsfileprocs.c, as well. FIXES 134904 Change-Id: I41073464b9ef20e4cbb45bcc61a43f70380eb930 Reviewed-on: https://gerrit.openafs.org/13431 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 8b6ae2893b517bd4e008cae94acff70abe4d2227 Author: Andrew Deason Date: Thu Mar 21 15:24:06 2019 -0500 LINUX: Avoid lookup ENOENT on fatal signals Various Linux kernel operations on various Linux kernel versions can fail if the current process has a pending fatal signal (i.e. SIGKILL), including reads and writes to our local disk cache. Depending on what and when something fails because of this, some parts of libafs throw an ENOENT error, which may propagate up to callers, and be returned from afs_lookup(). Notably this can happen via some functions in src/dir/dir.c, and previously was possible with some code paths before they were fixed by commit 2aa4cb04 (afs: Stop abusing ENOENT). For the most part, the exact error given to the userspace caller doesn't matter, since the process will die as soon as we return to userspace. However, for ENOENT errors specifically for lookups, we interpret this to mean that the target filename is known to not exist, and so we create a negative dentry for that name, which is cached. Future lookups for that filename will then result in ENOENT before any AFS functions are called. The lingering abuses of the ENOENT error code should be removed from libafs entirely, but as an extra layer of safety, we can just avoid returning ENOENT from lookups if the current process has a pending fatal signal. So to do that, change all afs_lookup() callers in src/afs/LINUX to translate ENOENT to EINTR if we have a pending fatal signal. If fatal_signal_pending() is not available, then we don't do this translation. FIXES 134904 Change-Id: I00f1516c2aa0f45f1129f5d5a44150b7539c31cc Reviewed-on: https://gerrit.openafs.org/13530 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit b9f0b63792270383b23c6a6462cd5f4590db1975 Author: Andrew Deason Date: Sun Mar 4 17:33:47 2018 -0600 Use rxgk in afsconf_BuildServerSecurityObjects In afsconf_BuildServerSecurityObjects, create a server security object for rxgk. Currently, this will only accept printed rxgk tokens, not tokens negotiated via GSSNegotiate. Future commits will add functionality to handle user-negotiated tokens, fileserver-specific creds, etc. Change-Id: Ie2bbef0d591641e80bb85240316c4ee5f9f8ff05 Reviewed-on: https://gerrit.openafs.org/12941 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 83eec9093c8a3f177268a9164182e8ba3958dbc8 Author: Benjamin Kaduk Date: Wed Mar 26 06:24:02 2014 -0400 Implement afsconf_GetRXGKKey Also afsconf_GetLatestRXGKKey, as a side effect, since we want to have a single getkey function both for getting encrypting and decrypting keys; a kvno/enctype pair of 0/0 indicates that the "get latest" behavior is desired. Implement both functions in terms of an internal helper that takes as an argument the type of key to look for in the KeyFileExt. We can reuse these helpers wholesale for per-fileserver keys, later. This also requires implementing an ordering on the quality of the different RFC 3961 enctypes (which are stored as the subtype of keys of type afsconf_rxgk). This is subject to debate on the actual ordering, but since the IANA enctype registry changes rarely, just assign a full ordering on the standardized (symmetric!) enctypes. Implement this via a new function, rxgk_enctype_better, in rxgk_crypto_rfc3961.c. Introduce a new header file, rxgk_types.h, so we can avoid including the entire rxgk.h header in cellconfig.p.h. Change-Id: I81389b21238fd6588cc4381b026816005f81a30c Reviewed-on: https://gerrit.openafs.org/11099 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4091b9271b1bfbf27f9d6871aa884df81220861a Author: Ben Kaduk Date: Wed Dec 4 13:03:46 2013 -0500 Add rxgk support to userok Change-Id: I5da2a89532453b6bec61fc87218a61455e39f6f0 Reviewed-on: https://gerrit.openafs.org/10576 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 69e083d4aaf8731049cbedf85ee5ade31277f251 Author: Ben Kaduk Date: Fri Dec 13 18:46:11 2013 -0500 Build rxgk support into libafsrpc Add a dependency on the appropriate $(GSSAPI_LIBS) and link in the librxgk_pic.la helper. Careful control of what functions are exposed allows static linking to continue to work when rxgk is disabled, though a stub is needed for the case of rxgk_GetServerInfo, so that there is a symbol present to satisfy the export symbol list. Consumers of libafsrpc.a need not be modified in accordance with this change. Change-Id: I76c0329ba842fb0d4d66534810b114a0813c90a0 Reviewed-on: https://gerrit.openafs.org/10591 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 20b0f5b4d0b55e79e55442978c297663a5e18b76 Author: Benjamin Kaduk Date: Fri Sep 1 17:45:10 2017 -0500 Add rxgk_GetServerInfo stub Provide a stub function that libafsrpc can export when rxgk support is disabled. (It always returns failure, of course.) Change-Id: Id9f816d25c1a8f56995ec185ae83db0924de0010 Reviewed-on: https://gerrit.openafs.org/12721 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ce38ed952962b4bbba80a4d3bff1ee1ac01ca4e4 Author: Andrew Deason Date: Fri Mar 2 00:24:54 2018 -0600 rxdebug: Add rxgk support Change-Id: I6ffeb7b36f41816ca1c3d12bb5e8097dd5d7a3fd Reviewed-on: https://gerrit.openafs.org/12940 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 67da564a5b0acd01fe67829fe28ea808e0d278a4 Author: Ben Kaduk Date: Tue Dec 10 00:09:35 2013 -0500 Implement rxgk client security object routines Change-Id: Ic7e11b02cb1573cfdb6d11d4de9a77ab1c563262 Reviewed-on: https://gerrit.openafs.org/10573 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit cda288a2e4ebbd3c915f946a50fa2b59d7ee12b4 Author: Ben Kaduk Date: Mon Dec 9 22:13:16 2013 -0500 Implement the rxgk server security object routines Provide non-trivial implementations of the security class routines used by the server, along with helpers as necessary. The identity supplied in a client's token is given as a list of PrAuthNames; we assume that at most one name is supplied at present, as the meaning of compound identities (and the use of compound identities for keyed cache managers) is not fully specified yet. Convert the PrAuthName to an rx_identity for caching in the server connection state, as the rx_identity type is more compatible with superuser checks on the connection. Also provide an rxgk_GetServerInfo routine which extracts the cached identity, for use in libauth when making superuser checks. This moves our dependency on rx_identity from the private data structures into the public header, so move the nested include accordingly. Change-Id: I0f48b69d4ab758d8a4d76ebfb1daf3009c4fe060 Reviewed-on: https://gerrit.openafs.org/10572 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ae9b90170ffa02f7b65339b3c138709362f27d69 Author: Andrew Deason Date: Tue Mar 12 17:03:09 2019 -0500 rxgk: Avoid calling xdr_destroy on blank xdrs A couple of callers in rxgk_token.c call xdr_destroy(&xdrs) in a cleanup code path; at present the code is fine because we are careful to only jump to the cleanup path from a state where the xdrs are initialized, but this is needlessly fragile (and is an undocumented requirement of the code). Since xdr_destroy() unconditionally looks at xdrs.x_ops->x_destroy, this could cause a NULL dereference if an error is encountered in a future version where the 'xdrs' may be zeroed when the cleanup path runs. Change-Id: I23c1bd09c88238bc602cc92572df4cd2278c69c9 Reviewed-on: https://gerrit.openafs.org/13521 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit aa6661f653d86d4e792587eefbc37705b68e5137 Author: Andrew Deason Date: Tue Mar 12 18:42:42 2019 -0500 rxgk: Do not require gss_pseudo_random We actually do not yet call gss_pseudo_random anywhere in the rxgk codebase. We will need this later, so print a warning when we don't have it, but let rxgk build so we can build on platforms without gss_pseudo_random for now (Solaris/SEAM). Change-Id: I1cee935a12caad1ac00717f468d7e6661e0817c9 Reviewed-on: https://gerrit.openafs.org/13520 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit de883869d7ac2af6a640f8cf9f3d8c7c37433ce5 Author: Andrew Deason Date: Fri Feb 1 23:25:02 2019 -0600 auth: Make afsconf_PutTypedKeyList idempotent Currently, if we call afsconf_PutTypedKeyList on a key list, we set the key list to NULL. But then if we call afsconf_PutTypedKeyList on a NULL key list, we segfault because we try to dereference the list. Change afsconf_PutTypedKeyList to be a noop if we give it a NULL list, avoiding a segfault in such a situation. Change-Id: I2c1de0c0a05ab036667031eb0e765933917826a6 Reviewed-on: https://gerrit.openafs.org/13507 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 635594d6cceba6de4e09be5a9e9b908f7d16697d Author: Andrew Deason Date: Wed Mar 13 18:30:43 2019 -0500 rx: Do not ignore RXS_* op errors Several places in rx call an RXS_* security layer operation, but ignore the error code. Though errors for these operations are rare or impossible currently, if they ever do return an error there could be noticeable consequences, like a connection getting an uninitialized challenge nonce, or sending a challenge packet with uninitialized payload. Change these call sites to record and handle the error. Errors from the security class normally mean aborting the entire conn, but for many operations we need to behave differently: - For RXS_DestroyConnection, errors don't make sense, since we're just freeing an object. Change the op to return void, and update our implementations of DestroyConnection to match. - For RXS_GetStats, just clear the relevant stats structure on error instead. This change also results in us clearing the stats structure when there is no security class associated with the connection; previously we just reused the same struct data as the previous conn. - For RXS_CreateChallenge, aborting the entire conn is difficult, because some code paths have callers that potentially lock multiple calls on the same conn (rxi_UpdatePeerReach -> TryAttach -> rxi_ChallengeOn -> RXS_CreateChallenge), and aborting our conn requires locking every call on the conn. So instead we just propagate an error up to our callers, and we abort just the call we have. - For RXS_GetChallenge, we cannot abort the conn when rxi_ChallengeEvent is called directly, because the caller will have the call locked. But when rxi_ChallengeEvent is called as an event (when we retry sending the challenge), we can. - For RXS_SetConfiguration, propagate the error up to our caller. Update all rx_SetSecurityConfiguration callers to record and handle the error; all of these are during initialization of daemons, so have them log an error and exit. Change-Id: I138b3e06da00470c7d70c458879cc741d296d225 Reviewed-on: https://gerrit.openafs.org/13522 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2ee35afa339731f6a60f1e5e99ccaf63baa6c891 Author: Stephan Wiesand Date: Fri Mar 22 12:46:17 2019 +0100 Add param.h files and sysnames for FreeBSD 11.2 Thanks to Måns Nilsson for filing the bug. Note that this change differs from the proposed patch in the report, in that it doesn't define the 10.4 symbols in the 11.2 param.h files. FIXES 134850 Change-Id: I83b3a81609c109eef243533b0e1defa3aca0d526 Reviewed-on: https://gerrit.openafs.org/13534 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Stephan Wiesand commit e7ea4781f07b29f7f0fc0b5ba17303bd68022e54 Author: Karl Behler Date: Fri Mar 22 12:22:05 2019 +0100 man-pages: create the man3 subdirectory in prep-noistall This should fix a build failure reported on the openafs-devel list today. Change-Id: I227922f78aaa614b73dd1f5c1c61116168fc0b69 Reviewed-on: https://gerrit.openafs.org/13533 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 11cc0a3c4e0d76f1650596bd1568f01367ab5be2 Author: Andrew Deason Date: Sat Mar 2 15:58:00 2019 -0600 afs: Cleanup state on rxfs_*Init errors Currently, rxfs_storeInit and rxfs_fetchInit return early if they encounter an error while starting the relevant fetch/store RPC (e.g. StartRXAFS_FetchData64). In this scenario, they osi_FreeSmallSpace their rock before returning, but they never go through their destructor to free the contents of the rock (rxfs_storeDestroy/rxfs_fetchDestroy), leaking any resources inside that have already been initialized. The only thing that could have been initialized by this point is v->call, so hitting this condition means we leak an Rx call, and means we can report the wrong error code (since we never go through rx_EndCall, we never look at the call's abort code). For rxfs_fetchInit, most code paths call rx_EndCall explicitly, except for the code path where StartRXAFS_FetchData64 itself fails. For both fetches and stores, it's difficult to hit this condition, because this requires that the StartRXAFS_* call fails, before we have sent or received any data from the wire. However, this can be hit if the call is already aborted before we use it, which can happen if the underlying connection has already been aborted by a connection abort. Before commit 0835d7c2 ("afs: make sure to call afs_Analyze after afs_Conn"), this was most easily hit by trying to fetch data with a bad security object (for example, with expired credentials). After the first fetch failed due to a connection abort (e.g. RXKADEXPIRED), afs_GetDCache would retry the fetch with the same connection, and StartRXAFS_FetchData64 would fail because the connection and call were already aborted. In this case, we'd leak the Rx call, and we would throw an RXGEN_CC_MARSHAL error (-450), instead of the correct RXKADEXPIRED error. This causes libafs to report that the target server as unreachable, due to the negative error code. With commit 0835d7c2, this doesn't happen because we call afs_Analyze before retrying the fetch, which detects the invalid credentials and forces creating a new connetion object. However, this situation should still be possible if a different call on the same connection triggered a connection-level abort before we called StartRXAFS_FetchData64. To fix this and ensure that we don't leak Rx calls, explicitly call rxfs_storeDestroy/rxfs_fetchDestroy in this error case, before returning from rxfs_storeInit/rxfs_fetchInit. Thanks to yadayada@in.ibm.com for reporting a related issue and providing analysis. Change-Id: I15e02f8c9e620c5861e3dcb03c42510528ce9a60 Reviewed-on: https://gerrit.openafs.org/13510 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6e5638ac7297701a99ea396dee1df8f56a6a50da Author: Andrew Deason Date: Mon Feb 25 11:35:24 2019 -0600 Remove references to SunOS 4 We already removed support for Solaris versions before Solaris 8, in commit e4c2810f ("Remove support for Solaris pre-8"), but there are still some references to SunOS (meaning SunOS 4) in the tree. This is even older than Solaris (aka SunOS 5), so get rid of these. This commit removes most references to SunOS 4 regarding platform support, and a few comments. This also removes a few comments that were just wrong or nonsensical (e.g. CMAPPED in afs.h is used by other platforms; some comments in platform-specific osi_file.c files referenced SunOS for some reason). Change-Id: I0dd3176c582409176fd898f9c9539fbd833ea789 Reviewed-on: https://gerrit.openafs.org/13506 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 872902dcf99186864cfcaf01ab945123f2506c6c Author: Andrew Deason Date: Wed Mar 6 23:06:16 2019 -0600 rx: Make rxi_Free(NULL, size) a no-op Commit 75233973 (afs: Make afs_osi_Free(NULL) a no-op) intended to make some of our free abstractions behave like the userspace free, so freeing NULL is a no-op. However, that commit still left rxi_Free such that rxi_Free(NULL, size) would decrement the relevant allocation counters. So to make our free abstractions more consistent, just skip all of rxi_Free when the given pointer is NULL. Change-Id: I89047e1846eb3e2932d2a125676fb7ffec8972dc Reviewed-on: https://gerrit.openafs.org/13514 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit df23589d2cc0419d8e74b5f1b824512d95623d2e Author: Ben Kaduk Date: Tue Dec 10 17:47:42 2013 -0500 Add rxgk_util.c A few helper routines for the security class implementation. Change-Id: I395802b6c3b2436df4b00906544fc797f3e12e9b Reviewed-on: https://gerrit.openafs.org/10937 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 23c4c3bc0cea68b8d05517065daea849fadad609 Author: Ben Kaduk Date: Mon Dec 9 23:07:17 2013 -0500 Add rxgk_packet.c Routines to apply and verify encryption and MICs to the data in rx packets. Backend to the rxgk_crypto framework for the actual crypto operations. Change-Id: I724efacf7df1d688c0d61a327fa9ee9c8168d715 Reviewed-on: https://gerrit.openafs.org/10571 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit aa231f105ea92275672941cbc2178d9ca26261e0 Author: Mark Vitale Date: Mon Feb 11 18:08:42 2019 -0500 rxgk: fix typo in make dest rule make dest should create directories in DEST, not DESTDIR. Fix the rule. Change-Id: I355e35cc6902517956935d3d2970836494490e69 Reviewed-on: https://gerrit.openafs.org/13489 Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6e988a5b3900fe73c314c9960d6fb7753ff98411 Author: Cheyenne Wills Date: Fri Mar 1 08:46:32 2019 -0700 bos: remove smail-notifier smail-notifier is a sample program that is undocumented and has not been well maintained. It produces copious compiler warnings, and would require effort to bring the code up to decent coding practices. The bosserver provides a -notifier feature that can be used for notifications, but that feature does not depend on this sample program. Removed the code, cleaned up the Makefiles and .gitignore. Change-Id: I6bd56559121d12ad007acc571b6653aa934eb97f Reviewed-on: https://gerrit.openafs.org/13509 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit df8534909fdc1fa8417aa788c0fa71c5dbe7eb30 Author: Benjamin Kaduk Date: Sat Feb 2 17:02:08 2019 -0600 scout: band-aid -Wformat-truncation gcc8 gets pretty confused about the bounds on these things (presumably due to our alignment options) and thinks this could potentially be a huge string. Check for truncation to appease the compiler, instead of trying to ensure that the buffer is big enough. Change-Id: I4c1e0e6a5a38ee67845cbb7791b280b965989bc8 Reviewed-on: https://gerrit.openafs.org/13470 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 8632f23d6718a3cd621791e82d1cf6ead8690978 Author: Benjamin Kaduk Date: Sat Feb 2 12:49:07 2019 -0600 vol: check snprintf return values in namei_ops gcc8 is more aggressive about parsing format strings and computing bounds on the generated text from functions like snprintf. In this case it seems best to detect cases of truncation and error out, rather than trying to increase stack buffer sizes or switch to asprintf. These paths should be well-behaved since they are local to the fileserver, so this is mostly about appeasing the compiler's -Wformat-truncation checks to allow us to build with --enable-checking. Change-Id: Id3f15e450c0f03143c0cc7e40186d5944a8fa3b4 Reviewed-on: https://gerrit.openafs.org/13463 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 453060c27a5d33d3c27128d169298f9d66d06f1a Author: Benjamin Kaduk Date: Sat Feb 2 19:52:26 2019 -0600 libadmin: appease clang -Wsometimes-uninitialized clang thinks that 'time' can be used uninitialized: bos.c:1472:9: error: variable 'time' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (as->parms[TIME].items) { ^~~~~~~~~~~~~~~~~~~~~ bos.c:1478:57: note: uninitialized use occurs here if (!bos_ExecutableRestartTimeSet(bos_server, type, time, &st)) { ^~~~ bos.c:1472:5: note: remove the 'if' if its condition is always true if (as->parms[TIME].items) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~ bos.c:1445:5: note: variable 'time' is declared here bos_RestartTime_t time; ^ but in this command description, the TIME argument is required. Add a never-triggered error exit to appease the compiler when --enable-checking is activated. Change-Id: I38fac64fc5aba071f84f2f9e1b497df22df76f09 Reviewed-on: https://gerrit.openafs.org/13476 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 7c15e6efe62fb3fe1970c56331df09b257abf6d9 Author: Benjamin Kaduk Date: Sat Feb 2 19:48:20 2019 -0600 uss: signed/unsigned char fallout When char is signed, assigning 255 to a variable of type char changes the value, which causes clang to emit a warning and fail the --enable-checking build. Change-Id: Id02e2526a9a9dd6657dee55b9dc22da03d102d8c Reviewed-on: https://gerrit.openafs.org/13475 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit f0a3d477d6109697645cfdcc17617b502349d91b Author: Benjamin Kaduk Date: Sat Feb 2 19:45:31 2019 -0600 rework afs_random() yet again clang 7 notes that ~0 is signed and that left-shifting into the sign bit is undefined behvaior. Use a new construction to clear the low byte of tv_usec with only bitwise operations that are independent of the width of tv_usec and stay within the realm of C's defined behavior. Change-Id: I3e4f0fa4a8b8b72df23ef0c8ad7c4a229ac942f3 Reviewed-on: https://gerrit.openafs.org/13474 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 96c0b88947c7aab605170bdca633d3716051a58e Author: Benjamin Kaduk Date: Sat Feb 2 18:39:53 2019 -0600 Avoid incomplete function type in casts clang complains that these casts contain an incomplete function type (since the function argument is omitted rather than declared to be void). Since we just need the cast to pointer type, let the compiler do it implicitly and pass stock NULL, rather than trying to force a cast to function-pointer type. Change-Id: I7f19f2936fe5425573c68fdd727ea90de02defd7 Reviewed-on: https://gerrit.openafs.org/13473 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 8f03ff3bdd8eb9f4557cdb7054aee9b8ea432160 Author: Benjamin Kaduk Date: Sat Feb 2 17:10:29 2019 -0600 dumpscan: appease gcc8 -Wformat-overflow gcc does not benefit from our external knowledge that tm_year is tightly bounded, and thinks it could still be in the range [-2147481748, 2147483647], which would overflow our string buffer. The function in question does not have error handling in place, so rather than adding some or trying to assert the proper bounds, just use a slightly larger buffer for safety. Change-Id: Iafcba5588b805347ddcc0102969bd0e2a3173dd0 Reviewed-on: https://gerrit.openafs.org/13472 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit dff81f1b78fecc54f5af91f7d728925ffca62d2c Author: Benjamin Kaduk Date: Sat Feb 2 17:09:36 2019 -0600 venus: appease gcc8's -Wformat-string Interestingly, even before this commit, the buffer size was larger than what the kernel would accept. Since the kernel does its own length checking, it's simplest to just allow slightly larger requests here and have them fail later. Change-Id: I9ed636e4ad025240cb27b3cc066a8f2a72959396 Reviewed-on: https://gerrit.openafs.org/13471 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit a89297a066d8689f8fc29a7428cfe3ed6235d010 Author: Benjamin Kaduk Date: Sat Feb 2 15:44:54 2019 -0600 butc: -Wformat-truncation fallout Increase some buffer sizes to appease gcc8. While here, use snprintf instead of plain sprintf(!). Change-Id: I39d29522b92070ce2845ba3d392aaf2d97fc7b6e Reviewed-on: https://gerrit.openafs.org/13468 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 584b0f2b6b4391c0c879352bb1786c0f267666c9 Author: Benjamin Kaduk Date: Sat Feb 2 14:43:04 2019 -0600 vlserver: use large enough buffer for rxinfo string The "[dotted-quad] rxkad:name.inst@cell" construct can be as large as (3*4+3)+7+3*64+2+1 == 217 characters (including trailing NUL); size our buffer accordingly to avoid the risk of truncation. Change-Id: Iee635aa66f5f639dfb0572c559a87b5313c305a9 Reviewed-on: https://gerrit.openafs.org/13466 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 7620bd33487207b348ed7aeba45f8d743132ba84 Author: Benjamin Kaduk Date: Sat Feb 2 14:23:03 2019 -0600 vlserver: fix vlentryread() for old vldb formats When we're using old format compatibility, use OMAXNSERVERS for the array lengths instead of MAXNSERVERS. Otherwise we'll try to copy more data than we've read. Detected by gcc8 as: vlutils.c:183:2: error: ‘memcpy’ forming offset [149, 151] is out of the bounds [0, 148] of object ‘tentry’ with type ‘struct vlentry’ [-Werror=array-bounds] memcpy(nbufp->serverFlags, oep->serverFlags, NMAXNSERVERS); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vlutils.c:141:26: note: ‘tentry’ declared here struct vlentry *oep, tentry; ^~~~~~ Change-Id: Ie720ca037c5a8bd6aaff5b6d5348161e0175b23b Reviewed-on: https://gerrit.openafs.org/13465 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d6b88e3bd5219a8dffebc07df23e30f1d16f095f Author: Benjamin Kaduk Date: Sat Feb 2 12:56:26 2019 -0600 vol: avoid -Wformat-truncation issues in vol-salvage.c Make some formerly-64-character buffers VMAXPATHLEN (plus a smidgeon) to give them space to hold the composed paths. Change-Id: I403c822a8b7376d08fb29f0127315ec439a5cf0d Reviewed-on: https://gerrit.openafs.org/13464 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 9a5ba85d1853327d8184287e58a6e03fabaaf23d Author: Benjamin Kaduk Date: Sat Feb 2 15:26:23 2019 -0600 uss: Allocate buffer space for trailing NUL Appease gcc8's -Wformat-truncation engine. Change-Id: I2113770f63357edf0f5ca273daf0c516a72034a8 Reviewed-on: https://gerrit.openafs.org/13467 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d1c32aed108b8ac013757be26052a82aa96bb52f Author: Ben Kaduk Date: Mon Dec 9 14:35:52 2013 -0500 Add rxgk_token.c Routines for constructing tokens (both regular and printed), extracting and decrypting tokens, and helpers therein. Provide the ability to print a token using a given session key and using a random session key; the former is useful for certain variants of localauth wherein a dummy GSS negotiation is performed with the same identity acting as initiator and acceptor. Include a paranoid sanity-check that only the routines intended to produce printed tokens can produce tokens with a zero-length identities list. Change-Id: I0cde7fd0cdf9a27777523cd502b21bdccef41dcc Reviewed-on: https://gerrit.openafs.org/10567 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 868e6248401756594f7abf985c2741d80d3a8517 Author: Mark Vitale Date: Mon Feb 11 02:54:31 2019 -0500 ptclient: enable pthreaded support ptclient has been essentially disabled for pthreads since the ibm-1.0 release. Remove the conditionals to make a functional pthreaded ptclient. Change-Id: Ib0f60b3ab395827b73e5646b014e28ab09607e0e Reviewed-on: https://gerrit.openafs.org/13500 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit ce0eb0f8b2936310eb1b55629772750103475d9a Author: Michael Meffie Date: Wed Nov 21 07:39:24 2018 -0500 auth: refactor afsconf_Open Move code to check the AFSCONF environment variable and read the .AFSCONF files to separate functions. Rename the internal functions afsconf_OpenInternal and afsconf_CloseInternal to the more aptly named LoadConfig and UnloadConfig in preparation for other changes. Add doxygen comments for these functions. Change-Id: Ie3361036c59c9e6ef99801891fff9fad63840344 Reviewed-on: https://gerrit.openafs.org/13397 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2aafe7df403e6a848185d15495139c07bced2758 Author: Andrew Deason Date: Wed Aug 9 20:06:05 2017 -0500 SOLARIS: Switch non-embedded vnodes for Solaris 11 Newer updates to Solaris 11 have been including several changes to the vnode struct. Since we embed a vnode in our struct vcache, our kernel module must be recompiled for any such change in order for the openafs client to work at all. To avoid the need for this, switch Solaris to using a non-embedded vnode in our struct vcache. Follow a similar technique as is used in DARWIN and XBSD, where we allocate a vnode in osi_AttachVnode, and free it in afs_FlushVCache. Change-Id: I85fd5d084a13bdea4353b5ad9840fddbc45ce8c0 Reviewed-on: https://gerrit.openafs.org/12696 Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Tested-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit a6499e0b086d964f3fcc65fe4be31edc33015061 Author: Andrew Deason Date: Wed Aug 9 20:06:03 2017 -0500 SOLARIS: Fix vnode/vcache casts A few places were using vnodes and vcaches interchangeably. This is incorrect, since they may not always be the same thing if we stop embedding vnodes directly in vcaches Fix these to properly go through AFSTOV/VTOAFS to convert between vcaches and vnodes. Change-Id: I8a2e42d7b83a5374d2b16b19c47417e7f44d4f27 Reviewed-on: https://gerrit.openafs.org/12695 Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: Mark Vitale commit 9a2b11747ce355d9adc8a5a646c88f8f3d9765ee Author: Andrew Deason Date: Wed Aug 9 20:06:00 2017 -0500 SOLARIS: Accept vnodes in vnode ops Currently, our vnode op callbacks look like this: int gafs_fsync(struct vcache *avc, afs_ucred_t *acred); And a pointer to gafs_fsync is given directly to Solaris. This cannot be correct, since 'struct vcache' is an OpenAFS type, so Solaris cannot possibly give us a 'struct vcache'. The actual correct signature for such a function is something like this: int gafs_fsync(struct vnode *vp, afs_ucred_t *acred); And then the 'gafs_fsync' function is supposed to translate 'vp' into a vcache. This works on Solaris right now because we embed the vnode as the first member in our vcache, and so a pointer to a vnode is also a pointer to a vcache. However, this would break if we ever change Solaris vcaches to use a non-embedded vnode (like on some other platforms). And even now, this causes a lot of warnings in osi_vnodeops.c, since the function signatures are wrong for our vnode callbacks. So to fix this, change all of these functions to accept a 'struct vnode', and translate to/from vnodes and vcaches appropriately. Change-Id: Ic1c4bfdb7675037d947273ed987cacd05eddfc92 Reviewed-on: https://gerrit.openafs.org/12694 Reviewed-by: Mark Vitale Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: Mark Vitale commit 41a22dbf719629e0977fa963b3d19c6594d0d729 Author: Andrew Deason Date: Wed Aug 9 20:05:56 2017 -0500 SOLARIS: Reorder definitions for vnode callbacks Currently, many of the functions for our vnode ops are forward-declared, right before they are referenced in the relevant vnop template array. Move the function definitions to before the references, so we can simply get rid of the forward declarations. These functions are also all only referenced in this file, so declare them 'static'. Change-Id: Icd82b6d6176342e2576ce333b40c4b79e8c692c1 Reviewed-on: https://gerrit.openafs.org/12693 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: Mark Vitale commit aa46af6ae35e4f026a8ed94012c3bc18c954de23 Author: Andrew Deason Date: Wed Aug 9 20:05:50 2017 -0500 SOLARIS: Clean up some osi_vnodeops func defs Currently, the Solaris osi_vnodeops.c file forward-declares many of its function definitions, but doesn't declare the arguments. For example: int afs_nfsrdwr(); This avoids type-checking for a few functions that are called before they are defined in this file. Furthermore, many of these functions are only used within this file, but are not declared 'static'. To fix this weirdness, remove most of the forward declarations (most are not referenced until the function is defined), and fully declare the rest. Declare functions 'static' that are not referenced outside of this file. This commit only changes functions up to the 'afs_getsecattr' definition. The rest of the file will be fixed in a future commit. Change-Id: I3f58b9ad8e9c3ea8b3fe3dffacd5118eee0a7ff2 Reviewed-on: https://gerrit.openafs.org/12692 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk Tested-by: Mark Vitale commit d0a2889098526aa148d99e042aa8c3f7855565f7 Author: Mark Vitale Date: Wed Feb 6 16:55:03 2019 -0500 auth: remove stale "magic number" comment A comment in GenericAuth() refers to a "magic number" which used to be present as: *aindex = 2; Commit d5622d03196762bd8a60404fea98b4bb044e076d made this a proper enum: *aindex = RX_SECIDX_KAD; Update the comment to remove mention of a "magic number". No functional change is incurred by this commit. Change-Id: I1d4770211fe4f88822426a9fe19db77bbb0d7738 Reviewed-on: https://gerrit.openafs.org/13490 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 297c479989efb6bd9d4011a43d6c0dc92596761b Author: Pat Riehecky Date: Fri Sep 21 10:05:24 2018 -0500 cmd: bail if out of memory while printing syntax Bail with an error message to stderr if we are unable to format the command syntax due to a string allocation error. Found via scan-build. [mmeffie: updated commit] Change-Id: Ib3bc7f53c295d8dde6c07b9c4990cd1b3bcee58c Reviewed-on: https://gerrit.openafs.org/13335 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 63f015d05293cd853dbd44e5115e6b378644dfb6 Author: Andrew Deason Date: Wed Jan 16 23:44:58 2019 -0600 LINUX: Propagate afs_linux_readdir BlobScan errors In afs_linux_readdir, if we detect an error code from BlobScan, currently we 'break' out of the current while() loop. But right after this loop, we reset 'code' to 0, ignoring the error we just got from BlobScan, and acting like we just reached the end of the directory. This means that if BlobScan could not process the given directory at all, we'll just fail to iterate through some of the entries in the given directory, and not report an error. To fix this, process errors from BlobScan like we do for afs_dir_GetVerifiedBlob, and return an error code and log a message about the corrupted dir. Change-Id: I8bd628624ffc04fc55fd6a0820c73018bd9e4a18 Reviewed-on: https://gerrit.openafs.org/13430 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2e556c0f23ae439c804352cf51fcf30878b03c7a Author: Andrew Deason Date: Sat Nov 3 01:04:43 2018 -0500 ptserver: Check for -restricted in SPR_Delete Currently, all prdb write operations, except for SPR_Delete, will fail with PRPERM if called by a non-system:administrators caller while restricted mode is active. SPR_Delete is missing this check, and so is not affected by the -restricted option. Fix this by inserting the same check for -restricted as all other code paths that check for -restricted. Change-Id: I35f19d0b715423cd91769e6de845efa330368e50 Reviewed-on: https://gerrit.openafs.org/13374 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit bfe912ede6f452d10cfbd5fd549f44ee027acb1b Author: Benjamin Kaduk Date: Sat Feb 2 12:25:35 2019 -0600 vol: fix vutil format-truncation nit We need one more byte for the trailing NUL. Change-Id: I1379e958e3b5ec92802060c4541f419599e49311 Reviewed-on: https://gerrit.openafs.org/13462 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3a8fa4ecd65d5d743fdc573c9f0f261aee2063b6 Author: Andrew Deason Date: Sat Nov 3 00:58:58 2018 -0500 ptserver: Fix AccessOK -restricted for SYSADMINID According to the documentation, as well as other code paths that check for -restricted, the -restricted option does not affect members of system:administrators. Currently, though, AccessOK only bypasses the -restricted check if the caller is SYSADMINID itself (i.e. localauth). Fix AccessOK to only do the -restricted checks if the caller is not in system:administrators, to match the documentation as well as other ptserver operations. Change-Id: I3074d4537845f1f4deb7f4b72cdb819391b617e3 Reviewed-on: https://gerrit.openafs.org/13373 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit dfc78d533ef64c8d6daf134e2a0f67c5c16f7369 Author: Andrew Deason Date: Tue Oct 30 14:29:24 2018 -0500 ptserver: Fix AccessOK -restricted for addToGroup The function AccessOK is used by all of ptserver RPC handlers that need to do an authorization check, and the last two arguments are set as such: - When adding a member to a group, 'mem' is PRP_ADD_MEM and 'any' is PRP_ADD_ANY - When removing a member from a group, 'mem' is PRP_REMOVE_MEM and 'any' is 0 - When modifying an entry (setFieldsEntry) or modifying some global database fields, 'mem' and 'any' are both set to 0 - When reading an entry and not modifying it, 'mem' and/or 'any' are set to other values (depending on if we're checking membership, examining the entry itself, etc) Commit 93ece98c (ptserver-restricted-mode-20050415) added a check to AccessOK to make it return false for -restricted mode when we are adding a member to a group, or when 'mem' and 'any' are both 0. This didn't catch the case when we are removing a member from a group, though, when 'mem' is PRP_REMOVE_MEM. It looks like commit a614a8d9 (ptutils-restricted-accessok-20081025) tried to fix this by adding a check for PRP_REMOVE_MEM, but it also required 'any' to be set to 0 for the conditional to succeed. This is true when removing a member from a group, but when adding a member to a group, 'any' is PRP_ADD_ANY, and so this check fails. This means that currently, when restricted mode is turned on, non-admins can still run addToGroup and setFieldsEntry successfully. Fix this by checking for PRP_ADD_MEM/PRP_REMOVE_MEM separately from checking if 'mem'/'any' are set to 0. Break up this conditional into separate if() statements with comments to try to make the checks more clear. Change-Id: I7e647865b772c42e70014f48ce9cd53ef511cd5b Reviewed-on: https://gerrit.openafs.org/13370 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 10f2c469f45eece0e12573388ae66e392e2dff1c Author: Cheyenne Wills Date: Fri Jan 25 17:35:51 2019 -0700 Redhat: 'clean build area' error message during dkms build/install dkms invokes a make clean command before and after building the kernel module. The make clean that is issued at the start of building results in a nuisance error message because the Makefile doesn't yet exist Building module: cleaning build area...(bad exit status: 2) In the dkms.conf file, built from within the openafs.spec, change the command defined in the CLEAN statement to test for the existence of the Makefile prior to running the actual make clean Change-Id: Ifc0d5eed6ef0cbc3ddfd193d27bbcb8a7cf52f2a Reviewed-on: https://gerrit.openafs.org/13460 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 26b1dc036719a588a5cadecb14053bd4079c1f48 Author: Andrew Deason Date: Fri Feb 1 16:31:50 2019 -0600 Avoid calling krb5_free_context(NULL) Several places in the code currently call krb5_free_context(ctx) in a cleanup code path, where 'ctx' may or may not be NULL. This is not guaranteed to be okay, so check for NULL to make sure we don't cause issues in these code paths. While we are here cleaning up krb5_free_context() calls, also fix a few call sites in afscp_util.c that were not calling krb5_free_context in all error paths. Change-Id: I881f01bdf94f00079f84c4bd4bcfa58998e51ac9 Reviewed-on: https://gerrit.openafs.org/13461 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 86d04ea70fd2e99606b1d1b5b68d980d92e7a3cd Author: Andrew Deason Date: Wed Jan 16 23:46:34 2019 -0600 afs: Throw EIO in DRead on empty dir blob DRead currently returns ENOENT if we try to read a page beyond the end of the given dir blob. We do this to indicate we've hit EOF, but we do this even if the dir blob is completely empty (which is not a valid dir blob). If a dir blob in the cache is truncated due to cache corruption issues, that means we'll indicate a normal EOF condition in that directory for most code paths. If someone is trying to list the directory's entries, for instance, we'll just return that there are no entries in the dir, even though the dir itself is just invalid. To avoid this for at least some cases, return an EIO error instead if the dir blob is completely empty. Change-Id: I8544e125ad12632523d7c514fe63ff9d87e1cd8f Reviewed-on: https://gerrit.openafs.org/13429 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1a0e5e867107b3f849c17f30976831b5bf5a0e94 Author: Andrew Deason Date: Thu Jan 31 15:44:38 2019 -0600 volser: Remove unused VolRestore flags args SAFSVolRestore has a 'flags' argument, which the volserver passes on to various internal functions, but the value of the flags never actually changes any behavior. Remove the 'aflags' argument (and the derived 'incremental' arg) from a few of our internal functions. The relevant arguments have been unused since OpenAFS 1.0. Change-Id: Ib6ba3d5d9aa3e29d720921cb32fe45c871cd803e Reviewed-on: https://gerrit.openafs.org/13458 Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 1637e0a220157f9e4eb82de82e7372216f95af4e Author: Michael Meffie Date: Tue Jan 29 11:22:41 2019 -0500 xstat: remove unused variable Fix unused variable warning for unused variable oneShotCode. Change-Id: I8c2a5e8bf0cfc2570985b17d8e250403d459e50a Reviewed-on: https://gerrit.openafs.org/13455 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie commit 2c6d979be68ee95c9928b91f328b03070342173e Author: Michael Meffie Date: Tue Jan 29 11:20:52 2019 -0500 scout: fix missing softsig header Fix implicit declaration of function opr_softsig_Init() in scout. Change-Id: I2bb9eb5240b053b2f16ef1f37035b01dbc42fb84 Reviewed-on: https://gerrit.openafs.org/13454 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie Tested-by: Michael Meffie commit c37cdbeab4e4675e71b7764994cd7e68ac46c111 Author: Michael Meffie Date: Tue Jun 12 11:37:01 2018 -0400 viced: use calloc in SRXAFS_GetXStats The file server stats are maintained in global static structures, which are zero-ed on program start. The full contents are memcpy-ed to allocated buffers as rx output arguments, so no uninitialized data is sent over the wire. However, this commit converts the output buffer allocation from malloc to calloc to make this more clear from code inspection and make the code more robust. While here, clean up the comments in SRXAFS_GetXStats and remove the commented out code for a collection type which was never implemented. Remove the comments about overwriting spare xstat values, which seems to be a remnant from an early version of the code. For informational purposes, add a note at the top of SRXAFS_GetXStats to make it clear the CallPremable() is intentionally avoided in this implementation of the GetXStats RPC. Apparently, the CallPremable() is omitted since the OpenAFS file server does not to send callbacks to clients issuing only GetXStats RPCs, and so also avoids sending TMAY requests to clients like xstat_fs_test. Note that the presumably older GetStatistics and GetStatistics64 do unfortunately invoke CallPreamble(), so programs such as scout, must be able receive RXAFSCB RPCs from OpenAFS file servers. Change-Id: I7b90c7c6c561c74961fb7f7694a9576e1bed44d6 Reviewed-on: https://gerrit.openafs.org/13204 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6b67cac432043a43d7cdfa6af972ab54412aff94 Author: Michael Meffie Date: Tue Oct 17 16:39:50 2017 -0400 convert xstat and friends to pthreads Convert the xstat, fsprobe, and gtx libraries and test programs to pthreads. Build these libraries with libtool. Build the scout and afsmonitor programs with pthreads instead of LWP. Change-Id: Ie1737e71b4e57735bf7b6c7dc3177d717ea35ac6 Reviewed-on: https://gerrit.openafs.org/12753 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 6575af97f4baf1728882ebe8f4ce474334f52ea5 Author: Michael Meffie Date: Thu Nov 15 16:19:51 2018 -0500 auth: fix afsconf_GetExtendedCellInfo memory leak Commit c4a127d0578e521b97131c5dedf9da58f71b0242 (ubik-clone-support-20010212) added changes to support ubik clone sites. This commit added the afsconf_GetExtendedCellInfo function, which returns the info given by the original afsconf_GetCellInfo, plus an array of booleans (as chars) to indicate which cell servers are ubik clones. Unfortunately, the afsconf_GetExtendedCellInfo function calls the afsconf_OpenInternal function on an already opened configuration. It does so to look for server entries which are marked as clone sites in the CellServDB file. Opening the already opened configuration leaks at least the cellName and local realms information, and is generally confusing. Instead, remember which sites are designated as clone sites when the CellServDB is read when the configuration is opened, and return that info to the callers of afsconf_GetExtendedCellInfo. This commit adds the clone array to the afsconf_cell structure and changes to afsconf_GetCellInfo() for this new server-related data. As part of this change, remove the no longer needed cell and clones arguments to the internal function afsconf_OpenInternal, which were added by commit c4a127d0578e521b97131c5dedf9da58f71b0242. Update the testcellconfig test program to output the new afsconf_cell clone member. This leak was found with valgrind. Change-Id: I73db60b6a4a77e620e0511ca45cc3418503278a4 Reviewed-on: https://gerrit.openafs.org/13396 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Reviewed-by: Mark Vitale Tested-by: BuildBot commit 80ed9d98779135d43f23c9e51e7bd6bce36405f1 Author: Michael Meffie Date: Fri Nov 16 10:00:17 2018 -0500 auth: plug auth realms memory leaks The function _afsconf_FreeRealms, called by afsconf_CloseInternal, leaks two afsconf_realms structures. The function _afsconf_LoadRealms also leaks those two structures when it fails. These memory leaks were discovered with valgrind. Change-Id: I1436ce21609951bc3433b6c91221cc45e78881bc Reviewed-on: https://gerrit.openafs.org/13395 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Reviewed-by: Mark Vitale commit 93b26c6f55245e2187e574eb928f5e0ce66a245e Author: Michael Meffie Date: Fri Dec 7 20:29:03 2018 -0500 Add the CellServDB pathname to the afsconf_dir The determination of the CellServDB pathname is platform-dependent. However, error reporting in the current code base assumes the CellServDB location is platform-independent. Add the pathname of the CellServDB file to the configuration directory structure and set the new cellservDB member when opening the configuration. Use this value when checking if the CellServDB has changed and update the callers to use the cellservDB member when reporting errors about the CellServDB file. Change-Id: I5a3393fb9d4ae3c637d5a0d773598115314bfe1c Reviewed-on: https://gerrit.openafs.org/13408 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit ce327b568f4ff522aa008f235d97e0d9144eb92c Author: Andrew Deason Date: Thu Jan 17 00:12:06 2019 -0600 afs: Do not ignore errors in afs_CacheFetchProc afs_CacheFetchProc currently has a section of code that looks like this pseudocode: if (!code) do { while (length > 0) { code = read_from_rx(); if (code) { break; } code = write_to_cache(); if (code) { break; } } code = 0; } while (moredata); return code; When we encounter an error when reading from rx or writing to the cache, we break out of the current loop to stop processing and return an error. But there are _two_ loops in this section of the code, so what we actually do is break out of the inner loop, set 'code' to 0, and then usually return (since 'moredata' is usually never set). This means that when we encounter an unexpected error either from the net or disk (or the memcache layer), we ignore the error and return success. This means that we'll store a subset of the relevant chunk's data to disk, and flag that chunk as complete and valid for the relevant DV. If the error occurred before we wrote anything to disk, this means we'll store an empty chunk and flag it as valid. The chunk will be flagged as valid forever, serving invalid data, until the cache chunk is evicted or manually kicked out. This can result in files and directories appearing blank or truncated to applications until the bad chunk is removed. Possibly the most common way to encounter this issue is when using a disk cache, and the underlying disk partition is full, resulting in an unexpected ENOSPC error. Theoretically this can be seen from an unexpected error from Rx, but we would have to see a short read from Rx without the Rx call being aborted. If the call was aborted, we'd get an error from the call to rx_EndCall() later on. To fix this, change all of these 'break's into 'goto done's, to be more explicit about where we are jumping to. Convert all of the 'break's in this function in the same way, to make the code flow more consistent and easier to follow. Remove the 'if () do' on a single line, since it makes it a little harder to see from a casual glance that there are two nested loops here. This problem appears to have been introduced in commit 61ae8792 (Unite CacheFetchProcs and add abstraction calls), included in OpenAFS 1.5.62. Change-Id: Ib965a526604e629dc5401fa0f1e335ce61b31b30 Reviewed-on: https://gerrit.openafs.org/13428 Tested-by: BuildBot Reviewed-by: Cheyenne Wills Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 21ad6a0c826c150c4227ece50554101641ab4626 Author: Cheyenne Wills Date: Fri Jan 18 17:22:44 2019 -0700 Linux_5.0: replaced current_kernel_time with ktime_get_coarse_real_ts64 In Kernel commit fb7fcc96a86cfaef0f6dcc0665516aa68611e736 the current_kernel_time/current_kernel_time64 functions where renamed and the calling was standardized. According to the Linux Documentation/core-api/timekeeping.rst ktime_get_coarse_real_ts64 is the direct replacement for current_kernel_time64. Because of year 2038 issues, there is no replacement for current_kernel_time. Updated code that used current_kernel_time to use new name and calling convention. Updated autoconf test that sets IATTR_TAKES_64BIT_TIME as well. Change-Id: I607bdcf6f023425975e5bb747e0e780b3d2a7ce5 Reviewed-on: https://gerrit.openafs.org/13434 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit b892fb127815bdf72103ae41ee70aadd87931b0c Author: Cheyenne Wills Date: Fri Jan 18 16:53:58 2019 -0700 Linux_5.0: replace do_gettimeofday with ktime_get_real_ts64 In Kernel commit e4b92b108c6cd6b311e4b6e85d6a87a34599a6e3 the do_gettimeofday function was removed. According to the Linux Documentation/core-api/timekeeping.rst ktime_get_real_ts64 is the direct replacement for do_gettimeofday Updated the macro osi_GetTime to use ktime_get_real_ts64 if it is available. Change-Id: I7fcd49958de83a6a040e40bd310a228247c481b2 Reviewed-on: https://gerrit.openafs.org/13433 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 10b02075a262dbe802266ea4bcac3936dff5dd23 Author: Mark Vitale Date: Fri Jan 18 17:05:49 2019 -0500 LINUX: correct include for ktime_get_coarse_real_ts64() The include for the ktime_get_coarse_real_ts64() autoconf test is incorrect; ktime_get_coarse_real_ts64() has always been in linux/ktime.h (via #include timekeeping.h), not linux/time.h. This autoconf test still ran correctly because the OpenAFS build was inadvertently picking up ktime.h via the default autoconf include path. Therefore, this commit is needed only to provide documentation and clarity to future maintainers. Introduced as a cut-n-paste error (from the current_kernel_time test) with commit 3c454b39d04f4886536267c211171dae30dc0344 for Linux 4.20. Change-Id: I994b03a1700330756216c7feab0121c82d0f3ee4 Reviewed-on: https://gerrit.openafs.org/13437 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3969bbca6017eb0ce6e1c3099b135f210403f661 Author: Cheyenne Wills Date: Thu Jan 17 16:00:37 2019 -0700 Linux_5.0: Use super_block flags instead of Mount flags when filling sb In Kernel commit e262e32d6bde0f77fb0c95d977482fc872c51996 the mount flags (MS_) were moved from uapi/linux/fs.h to uapi/linux/mount.h. This caused a compile failure in src/afs/LINUX/osi_vfsops.c The Linux documentation in uapi/linux/mount.h indicates that the MS_ (mount) flags should only be used when calling sys_mount and filesystems should use the SB_ (super_block) equivalent. src/afs/LINUX/osi_vfsops.c utilized the mount flag MS_NOATIME while filling the super_block. Changed to use SB_NOATIME (which has the same numeric value as MS_NOATIME) if available. Change-Id: I2b2199de566fbadd45e857b37d24ce63002c7736 Reviewed-on: https://gerrit.openafs.org/13432 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 892045a9803ed471986569705d9d727165ca7ecf Author: Marcio Barbosa Date: Sat Aug 11 13:17:28 2018 -0400 vol: remove empty directories left by vos zap -force The vos zap -force command does not remove the directories associated with the volume in question (AFS_NAMEI_ENV). When the vos zap -force command is executed, the volume server goes through the /vicep*/AFSIDat directories and removes the files associated with the volume id received as an argument. Unfortunately, the volume server does not remove the directories associated with this volume. As a result, empty directories are left behind. To fix this problem, remove the empty directories left behind when vos zap -force is executed. Change-Id: I56fd52918223f87e424121bac6a086d7b0a46284 Reviewed-on: https://gerrit.openafs.org/12879 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 345a739b7bb6c9c142a2b0fe584fed6c44d6c655 Author: Andrew Deason Date: Tue Nov 13 11:09:52 2018 -0600 roken: Use srcdir for roken-post.h roken-post.h is a source file, not a generated file in the objdir. Specify $(srcdir) so we can work with objdir builds. Change-Id: I1d00ba1f28bea99770c2af56890fbf22ee764820 Reviewed-on: https://gerrit.openafs.org/13387 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit a28f9d28aef18936eb0ea02491ce64c72eeb1fe9 Author: Cheyenne Wills Date: Wed Nov 28 15:45:20 2018 -0700 Redhat: correct path to kernel module in dkms.config This fix corrects some annoying error and warning messages during dkms install or uninstall. Install: DKMS: build completed. openafs: Running module version sanity check. ERROR: modinfo: could not open /lib/modules/2.6.32-754.6.3.el6.x 86_64/weak-updates/openafs.ko: No such file or directory - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/2.6.32-754.6.3.el6.x86_64/extra/ Adding any weak-modules WARNING: Can't read module /lib/modules/2.6.32-754.6.3.el6.x86_6 4/weak-updates/openafs.ko: No such file or directory egrep: /lib/modules/2.6.32-754.6.3.el6.x86_64//weak-updates/open afs.ko: No such file or directory Remove Status: Before uninstall, this module version was ACTIVE on this kernel. Removing any linked weak-modules rmdir: failed to remove `.': Invalid argument WARNING: Can't read module /lib/modules/2.6.32-754.6.3.el6.x86_6 4/weak-updates/openafs.ko: No such file or directory egrep: /lib/modules/2.6.32-754.6.3.el6.x86_64//weak-updates/open afs.ko: No such file or directory openafs.ko: - Uninstallation - Deleting from:/lib/modules/2.6.32-754.6.3.el6.x86_64/extra/ - Original module - No original module was found for this module on this kernel - Use the dkms install command to reinstall any previous module version. Background: Commit 1c96127e37c0ec41c7a30ea3e4aa68f3cc8a24f6 standardized the location where the openafs.ko module is installed (from /kernel/3rdparty to /extra/). The RPM Spec file was not updated to build the dkms.conf file with the corrected location. From the documentation for dkms DEST_MODULE_LOCATION is ignored on Fedora Core 6 and higher, Red Hat Enterprise Linux 5 and higher, Novell SuSE Linux Enterprise Server 10 and higher, Novell SuSE Linux 10.0 and higher, and Ubuntu. Instead, the proper distribution-specific directory is used. However the DEST_MODULE_LOCATION is still used saving and restoring old copies of the module. The NO_WEAK_MODULES parameter prevents dkms from creating a symlink into weak-updates directory, which can lead to broken symlinks when dkms-openafs is removed. The weak modules facility was designed to eliminate the need to rebuild kernel modules when kernel upgrades occur and relies on the symbols within the kABI. Openafs uses symbols that are outside the kABI, and therefor is not a candidate for a weak module. Change-Id: I52a332036056a359a57a3ab34d56781c896a2eea Reviewed-on: https://gerrit.openafs.org/13404 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 0bd55a02bb5707b1b8b26347d5cb6ad71765f622 Author: Michael Meffie Date: Thu Dec 27 09:32:35 2018 -0500 build: declare test targets as phony Modern versions `make` will not build the 'test' target since a directory exists with the same name. $ grep -C1 '^test:' Makefile test: cd test; $(MAKE) $ make test make: 'test' is up to date. Declare these targets as .PHONY to force make to build the test programs even when the 'test' directory is present. Also use '&&' to concatenate commands instead ';' to avoid running the second command when the first fails. Change-Id: Id561d7610f80b87b59c632801fa0a4b216feb42d Reviewed-on: https://gerrit.openafs.org/13419 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f6182922455aa0cbee19d138b0827eb87dc2b7ce Author: Andrew Deason Date: Mon Jan 14 17:12:27 2019 -0600 lwp: Avoid freeing 'stackmemory' on AIX32 Commit 55013a11 (lwp: Fix possible memory leak from scan-build) added some free() calls to some otherwise-leaked memory. However, one of these calls frees the 'stackmemory' pointer, which on AIX32 is not a pointer from malloc/calloc, but calculated from reserveFromStack(). To avoid corrupting the heap, skip this free call on AIX32. This commit adds another #ifdef to avoid this, which is unfortunate, but this is also how the free is avoided in the existing code for Free_PCB(). Change-Id: I6c4518f810e56c362ee744f250747fe8fc765b13 Reviewed-on: https://gerrit.openafs.org/13426 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d0dbd0f12119f0e874ba30adec81061ac6ae27c7 Author: Mark Vitale Date: Fri Oct 5 10:39:23 2018 -0400 rx: remove rx_atomic bitops The rx_atomic bitops were introduced with commit 1839cdbe268f4b19ac8e81ae78548f5c78e0c641 ("rx: atomic bit ops"). The last (only) reference to them was recently removed with commit 5ced6025b9f11fadbdf2e092bf40cc87499ed277 ("rx: Convert rxinit_status to rx_IsRunning()"). Remove the now unreferenced bitops. This commit is comprised of partial or complete reverts of the following commits: ae4ad509d35 rx: fix rx_atomic warnings under Solaris (partial) c16423ec4e6 rx: fix atomics on darwin (partial) 9dc6dd9858a rx: Fix AIX test_and_set_bit (complete) 1839cdbe268 rx: atomic bit ops (complete) Note: The rx_atomic bitops for Linux systems are known to be broken due to incorrect casting of rx_atomic_t into the unsigned long operand expected by the native Linux bitops. The failure modes include silent overruns on little-endian and incorrect results on big-endian. Do not merely revert this commit in order to bring these bitops back into the tree. Change-Id: I6b63519f63d370ccc8df816b4388487909c17dcd Reviewed-on: https://gerrit.openafs.org/13390 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b2475c11f4d430402a82cb5b018dbccdaa0dccd8 Author: Andrew Deason Date: Thu Dec 20 14:29:47 2018 -0600 rx: Statically check rx_statisticsAtomic size Currently, rx_GetStatistics assumes that struct rx_statistics and rx_statisticsAtomic have the same size (we just memcpy between them). However, this is never checked, and rx_statistics contains many 'int' fields where rx_statisticsAtomic has rx_atomic_t fields. If these are not the same size, our rx stats will silently break, so add a static assert to make sure they are the same size. Change-Id: I889867f4a85530c30dd15d32d1822144ea128a95 Reviewed-on: https://gerrit.openafs.org/13414 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fa3ce81178b23ee2d96f4e496484c23ed0ce7bfc Author: Andrew Deason Date: Thu Dec 20 14:37:31 2018 -0600 Revert "rx: fix rx_atomic warnings under Solaris" This reverts commit ae4ad509d35aab73936a1999410bd80bcd711393. While that commit did fix the mentioned warnings on Solaris, it also changed the size of rx_atomic_t. Our code in rx_stats.c assumes that an rx_atomic_t is 4-bytes wide, and so changing the size of rx_atomic_t broke our reporting for stats in the 'rx_stats' structure. To fix this, revert that commit. This reintroduces the mentioned warnings, but those warnings are reported for our atomic bit-op functions, which are unused and will be removed by another commit. Change-Id: Ie3e72cc06690d9f8de79e8f0274ea51079004c38 Reviewed-on: https://gerrit.openafs.org/13415 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 67c406e57a9a4409b3da811546660ac596888b2f Author: Michael Meffie Date: Thu Nov 15 13:49:21 2018 -0500 auth: update the auth test programs Fix build errors for the auth test programs. Close the configuration directory before exiting the testcellconf program so we can check for leaks. Add a call to afsconf_GetExtendedCellInfo to the testcellconf test program. Use libcmd to parse the testcellconf command line options. Add the -reload option to testcellconf to perform an optional reload test. The user must have file permissions to touch the CellServDB to perform the reload test. Change-Id: I1cb4cacf9a15ccf7066fb32bfe5f5d03ef64bfd7 Reviewed-on: https://gerrit.openafs.org/13394 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit d6f52d11c358f71b2c4357cb135e898de7c6277b Author: Mark Vitale Date: Mon Oct 29 16:48:14 2018 -0400 afs: avoid afs_GetDownDSlot panic on afs_WriteDCache failure If afs_GetDownDSlot() finds insuffcient free slots in the afs_freeDSList, it will walk the afs_DLRU attempting to flush and free eligible dcaches. However, if an error occurs during the flush to CacheItems (afs_WriteDCache()), e.g., -EINTR, afs_GetDownDSlot() will assert. However, a panic in this case is overkill, since afs_GetDownDSlot() is a best-effort attempt to free dslots. The caller (afs_UFSGetDSlot()) will allocate more dcaches if needed. Instead: - Refactor afs_GetDownDSlot() by moving the QRemove() call to after the afs_WriteDCache logic, so it accompanies the logic that puts the dcache back on the freelist. This is safe because we hold the afs_xdcache W lock for the duration of the routine. - If afs_WriteDCache() returns an error, return early and let the caller handle any recovery. Change-Id: Ifd0d56120095c9792998ff935776bbd339a76c8a Reviewed-on: https://gerrit.openafs.org/13364 Reviewed-by: Andrew Deason Tested-by: Andrew Deason Reviewed-by: Cheyenne Wills Reviewed-by: Benjamin Kaduk commit 59d3a8b86da648e3c5b9774183c6c8571a36f0c4 Author: Mark Vitale Date: Fri Nov 30 12:10:50 2018 -0500 vos: restore status information to 'vos status' Commit d3eaa39da3693bba708fa2fa951568009e929550 'rx: Make the rx_call structure private' created accessors for several rx_call members. However, it simply #ifdef'd out the packet counters and timestamps reported by 'vos status' (AFSVol_Monitor). This is a regression for the 1.8.x 'vos status' command. Instead, supply an accessor so 'vos status' can again be used to monitor the progress of certain volume operations. FIXES 134856 Change-Id: I91f5831b21f128bd8e86db63387a454c9e57bcdf Reviewed-on: https://gerrit.openafs.org/13400 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Mark Vitale commit d9d9571785dabc5c311111b1263fe0881b0ccda5 Author: Andrew Deason Date: Thu Dec 13 12:25:32 2018 -0600 afs: Reword "cache is full" messages Currently, there are multiple different areas in the code that log a message that look like this, when we encounter an ENOSPC error when writing to the cache: *** Cache partition is FULL - Decrease cachesize!!! *** The message is a bit unclear, and doesn't even mention AFS at all. Reword the message to try to explain a little more what's happening. Also, since we log the same message in several different places, move them all to a common function, called afs_WarnENOSPC, so we only need to change the message in one place. Change-Id: If1c259bd22a382ff56ed29326aa20c86389d06bc Reviewed-on: https://gerrit.openafs.org/13410 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 16b981ec6697b511c36c09adfeb8b79eaf2345b0 Author: Mark Vitale Date: Thu Nov 15 15:41:24 2018 -0500 afs: remove dead code afs_osi_SetTime afs_osi_SetTime() has been dead code since -settime support was removed with commit 1d9888be486198868983048eeffabdfef5afa94b 'Remove -settime/RXAFS_GetTime client support'. Remove the dead code. No functional change is incurred by this commit. Change-Id: Ie5559325b4c98d7e0786c75ae6507ab9c2c47376 Reviewed-on: https://gerrit.openafs.org/13393 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit aa80f892ec39e2984818090a6bb2047430836ee2 Author: Mark Vitale Date: Thu Nov 15 15:31:37 2018 -0500 Linux 4.20: do_settimeofday is gone With Linux commit 976516404ff3fab2a8caa8bd6f5efc1437fed0b8 'y2038: remove unused time interfaces', do_settimeofday() is gone. However, OpenAFS only calls do_settimeofday() from afs_osi_SetTime(), which has been dead code since -settime support was removed from afsd with commit 1d9888be486198868983048eeffabdfef5afa94b 'Remove -settime/RXAFS_GetTime client support'. Instead of fixing afs_osi_SetTime() to use a current Linux API, remove it as dead code. No functional change is incurred by this commit. However, this change is required in order to build OpenAFS on Linux 4.20. Change-Id: I74913deb249de66b0da71539f2596c971f0fd99a Reviewed-on: https://gerrit.openafs.org/13392 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 3c454b39d04f4886536267c211171dae30dc0344 Author: Mark Vitale Date: Tue Nov 13 11:20:09 2018 -0500 Linux 4.20: current_kernel_time is gone With Linux commit 976516404ff3fab2a8caa8bd6f5efc1437fed0b8 'y2038: remove unused time interfaces' (4.20-rc1), current_kernel_time() has been removed. Many y2038-compliant time APIs were introduced with Linux commit fb7fcc96a86cfaef0f6dcc0665516aa68611e736 'timekeeping: Standardize on ktime_get_*() naming' (4.18). According to Documentation/core-api/timekeeping.rst, a suitable replacement for: struct timespec current_kernel_time(void) would be: void ktime_get_coarse_real_ts64(struct timespec64 *ts)) Add an autoconf test and equivalent logic to deal. Change-Id: I4ff622ad40cc6d398267276d13493d819b877350 Reviewed-on: https://gerrit.openafs.org/13391 Tested-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit bfb2ebdfc2c0bfd252a14ddbe1681ab22b6733c5 Author: Andrew Deason Date: Mon Oct 15 16:10:59 2018 -0500 ubik: calloc ubik_dbase Instead of using malloc and initializing various fields to 0, allocate our ubik_dbase using calloc, to more easily ensure all fields are initialized. Change-Id: I5c2f345a82a2eb73d53ffc3e1b0fa408af6a8311 Reviewed-on: https://gerrit.openafs.org/13363 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 84b3e1c43685862c147603627a020a68650d6e1c Author: Mark Vitale Date: Fri Oct 26 09:12:44 2018 -0400 viced: fix typo in help for option -unsafe-nosalvage Change-Id: I4e72533747250cee1b7d8c091c63c78948be6c28 Reviewed-on: https://gerrit.openafs.org/13367 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d058acb354cab9856303cc341a1f439e4f7f3454 Author: Mark Vitale Date: Thu Oct 25 10:27:41 2018 -0400 viced: correct option parsing for -vlru*, -novbc Commit a5effd9f1011aa319fdf432c67aec604053b8656 "viced: Use libcmd for command line options" modernized the option parsing for (da)fileserver, but introduced a few errors for the following options: -vlruthresh -vlruinterval -vlrumax -novbc Correct the errors. Change-Id: If57dfabaa8d4e456b63d47694d288bd8c4235ad2 Reviewed-on: https://gerrit.openafs.org/13365 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a8219383946b821a907d75b02b7255ca1a162d23 Author: Andrew Deason Date: Sat Oct 20 16:56:01 2018 -0500 budb: Remove db.lock Ever since commit dc8f18d6 (Protect ubik cache accesses), the 'lock' field in struct memoryDB has been unused. Remove it from the struct definition. Change-Id: I90131421ae2e2322debf4249e7464126480832d1 Reviewed-on: https://gerrit.openafs.org/13362 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7eeec611811ad81f55de4befd70ed47466a5b248 Author: Andrew Deason Date: Sat Oct 20 16:56:57 2018 -0500 ubik: Remove version_cond Several areas in the code do something like this whenever the database version is changed: #ifdef AFS_PTHREAD_ENV opr_cv_broadcast(&ubik_dbase->version_cond); #else LWP_NoYieldSignal(&ubik_dbase->version); #endif However, ever since commit 3fae4ea1 (ubik: remove unused code), nothing in the tree waits for this condvar, so it currently doesn't do anything. Remove this unneeded code. Change-Id: I6903ed89f9dcee2ce154be8883d656d297c97902 Reviewed-on: https://gerrit.openafs.org/13361 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0f65b40b24599d58cf30bfd47fae83ab54e1416a Author: Andrew Deason Date: Wed Oct 17 16:35:36 2018 -0500 Remove one more automake VERSION reference The configure summary was still referencing the old automake-specific VERSION var. Use the autoconf PACKAGE_VERSION var instead, so this actually shows our version. Change-Id: I18007935d0235931f1d2e023abddee7356e8ac2d Reviewed-on: https://gerrit.openafs.org/13360 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit db38561dea2dc092dcd74082676b2a7c7f56b51c Author: Michael Meffie Date: Wed Apr 4 18:42:46 2018 -0400 autoconf: remove unnecessary mkdir during configure Remove an uneeded mkdir command to create the JAVA/libjafs object directory, since this directory is automatically created by the config.status when generating the JAVA/libjafs/Makefile. Change-Id: Ib02a38c5c23790cb07e5c2433fd4870e8763c3a3 Reviewed-on: https://gerrit.openafs.org/12994 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit eb47fa9879785a8a88ef041667845bb4d005b77e Author: Michael Meffie Date: Wed Apr 4 18:20:02 2018 -0400 autoconf: remove spurious no-op Change-Id: I27242481dc3039f6776deb89e31793deee7f2840 Reviewed-on: https://gerrit.openafs.org/12993 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b1b3322a68d50318c44caeb7889fd181dc441149 Author: Michael Meffie Date: Wed Apr 4 18:13:24 2018 -0400 autoconf: fix pio checks name The autoconf macro to perform the positional i/o checks was misnamed as hpux checks (since there happens to be a specific check for hpux at the top of the macro). Change the macro name and m4 file name to be more accurately named. Change-Id: Ib85728fbfe67930cb5f9f1f0e34f7aa1195fdfc6 Reviewed-on: https://gerrit.openafs.org/12992 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 65b55bcc26f69f25c67518f672b34be73f3be370 Author: Michael Meffie Date: Thu Dec 21 11:59:38 2017 -0500 vol: avoid query for parent id when deleting disk header When a DAFS volume server removes a volume disk header file (V*.vol), the volume server invokes an fssync command to have the file server delete the Volume Group Cache (VGC) entry corresponding to the volume id and the parent id of the removed volume header. The volume parent id is unknown to the volume server when removing a volume disk header on behalf of a "vos zap -force" operation. In this case, the volume server issues a fssync query to attempt look up to the parent id from the file server's VGC. If this fssync query fails for some reason, volume server is unable to delete the VGC entry for the deleted volume header. The volume server logs an error and vos zap reports a undocumented error code. One common way this can be encountered is to issue a "vos zap -force" on a file server that has just been restarted. In this case, the VGC may not be fully populated yet, so the volume server is not able to look up the parent id of the given volume. With this commit, relax the requirement for the parent id when deleting VGC entries. A placeholder of 0 is used to mean any parent id for the given volume id. This obviates the need to query for the parent id when performing a "vos zap -force", and allows the volume server to remove any VGC entries associated with the volume id being zapped. Change-Id: Iee8647902d93a3c992fca4c4f3880a3393f0b95f Reviewed-on: https://gerrit.openafs.org/12839 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2f2c2ce62aa17ecac3651d64c1168af926f7458b Author: Andrew Deason Date: Thu Oct 11 00:18:17 2018 -0500 Remove automake autoconf vars Commit 4706854f (autoconf: updates and cleanup) removed our invocation of AM_INIT_AUTOMAKE, which defines the output variables PACKAGE and VERSION. Several files in our build system are still referencing @PACKAGE@ and @VERSION@, though, leaving them un-substituted. This most easily is seen as the AFSVersion version string remaining as "@VERSION@" when the tree is built without git, but it also affects some packaging in the tree. Remove references to @VERSION@ and @PACKAGE@, replacing them with their autoconf equivalents @PACKAGE_VERSION@ and @PACKAGE_TARNAME@. Change-Id: I6c6a09a46c4af4259009a4a60cfdaee63d6258c2 Reviewed-on: https://gerrit.openafs.org/13357 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d75bc6370f625479a67c7c0a50cce23c4d4a4ce5 Author: Andrew Deason Date: Fri Sep 28 17:12:40 2018 -0500 afs: Remove afs_xosi Since OpenAFS 1.0, all platforms in libafs have a lock called afs_xosi, which is acquired and released around calls like VOP_GETATTR on cache files. However, this lock doesn't appear to protect anything; on all platforms, the code that runs while the lock is held uses only calls VOP_GETATTR and accesses local variables (aside from afs_osi_cred, which we use similarly in many other places). The purpose of the lock has never been documented, and is not mentioned at all in the afs_rwlocks text file. The comment by the afs_xosi lock declaration suggests that the lock was originally introduced to protect access to 'tvattr', which perhaps was a global variable in the past. All uses of 'tvattr' are local now, though, so protecting access to it doesn't make any sense. So, remove afs_xosi, to remove the unnecessary serialization of VOP_GETATTR calls. Change-Id: Ib3764600ae0155057361418c86b49a3507bdcd94 Reviewed-on: https://gerrit.openafs.org/13350 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0548ee436d0f0f92a980d22e03149faedf38dc70 Author: Andrew Deason Date: Mon Oct 1 11:56:53 2018 -0400 afs: Free 'addrs' array Currently, 3 places in libafs allocate an 'addrs' array in a very similar way to loop through our list of servers: ForceAllNewConnections(), afs_LoopServers(), and PCallBackAddr(). Of these, only afs_LoopServers actually frees the array. ForceAllNewConnections and PCallBackAddr leak the memory, but these are only hit from infrequent pioctls that can only be run by root, so the impact is small. Fix ForceAllNewConnections and PCallBackAddr to free the array. Change-Id: Ic348e29cefa7c41cbcb30f738f943e8d022a97f0 Reviewed-on: https://gerrit.openafs.org/13355 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2aeabf8c5bca22b400653e2bc88b6f36d47b05ca Author: Marcio Barbosa Date: Sun Sep 30 17:38:53 2018 -0400 macos: packaging support for MacOS X 10.14 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.14 "Mojave". Change-Id: Ia1238b454350777bbfbf3dfd2be0c6c523348928 Reviewed-on: https://gerrit.openafs.org/13349 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 72b2670a9e2e3937ed4e47485b9e9fa6953b5444 Author: Marcio Barbosa Date: Wed Sep 26 00:18:38 2018 -0300 macos: add support for MacOS 10.14 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.14 "Mojave". Change-Id: Ib7cbd531ad6db3340d59e76abdecbe75886a4d5c Reviewed-on: https://gerrit.openafs.org/13348 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bd58bb85004a18bb6681ff2b0c13a04e23c4d9c4 Author: Marcio Barbosa Date: Mon Oct 1 17:44:22 2018 -0400 auth: check if argument of afsconf_Close* is null Currently, we do not check if the argument of afsconf_Close / afsconf_CloseInternal is equal to null. In order to avoid a possible segmentation fault, add the checks. Change-Id: I45635ad2d735505637072867edb7ff17da3c671a Reviewed-on: https://gerrit.openafs.org/13352 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie commit 0835d7c2a183f896096684df06258aefd297f080 Author: Michael Meffie Date: Fri Mar 16 09:25:18 2018 -0500 afs: make sure to call afs_Analyze after afs_Conn The afs_Conn function is used to pick a connection for a given RPC. The RPC is normally wrapped within a do-while loop which calls afs_Analyze to handle the RPC code and manage the server connection references. Among other things, afs_Analyze can mark the server as down, blacklist idle servers, etc. There are some special cases in which we break out of this do-while loop early, by putting the connection reference given by afs_Conn and then jumping out of the loop. In these cases, be sure to call afs_Analyze to put the server connection we got from afs_Conn, and to handle the RPC return code, possibly marking the server as down or blacklisted. Change-Id: Ic2c43f20d153376b93d79bbb5145914f8e478957 Reviewed-on: https://gerrit.openafs.org/13288 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 07ed94cfd817dc5a4e2d2712570087388fe7828f Author: Marcio Barbosa Date: Fri Oct 5 11:26:34 2018 -0400 DARWIN: replace macro exported by automake Commit 4706854f57043c8393baa922dd1974176e110a19 removed automake references from the source tree. As a result, VERSION (exported by AM_INIT_AUTOMAKE and obtained from Autoconf's AC_INIT macro) is not available anymore. Unfortunately, a reference to this macro can be found in src/afs/DARWIN/osi_module.c. Consequently, builds on OS X fail with the following message: osi_module.c:144:32: error: use of undeclared identifier 'VERSION' To fix this problem, replace VERSION by PACKAGE_VERSION (defined by AC_INIT). Change-Id: Ib3821d79c4cddd59c399985762e13dec755d8642 Reviewed-on: https://gerrit.openafs.org/13354 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f0bab78cbe4f59609fa18647a480cc6989948786 Author: Michael Meffie Date: Mon Oct 1 11:38:37 2018 -0400 ubik: do not reuse the offset variable for the sync site address The ubik SendFile function performs a sanity check of the host address before proceeding with the file transfer. Currently this check reuses the file offset local variable to hold the value of the sync site address, a 32-bit IPv4 address. Not only is this confusing, but also causes a signed/unsigned type mismatch when comparing host addresses. Instead of being so stingy with local variables, declare a new local variable of the correct type to hold the value of the sync site address. This separation is also a prerequisite for supporting larger address types in the future. Change-Id: I116fe210f418e6914afeff37c44d30bf795e2413 Reviewed-on: https://gerrit.openafs.org/13351 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d7ae7df42ced260471578dccc160f2f7a5bc686e Author: Andrew Deason Date: Mon Sep 24 15:41:23 2018 -0500 vlserver: Remove sascnvldb "sascnvldb" appears to be a variant of cnvldb that was used to convert vldb database blobs from even older versions than what cnvldb handles. However, it has never been built by default (some makefile rules reference the program, but it's never built unless the user explicitly runs 'make sascnvldb'), and it currently cannot build due to a variety of compiler errors. Remove the dead code. Change-Id: I5692d2cd058aa4ae9222ce25721001aabcca5eb7 Reviewed-on: https://gerrit.openafs.org/13345 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0796de43eaceb3a28799ad0bbe11e335a3f919bc Author: Mark Vitale Date: Fri Jun 22 16:52:08 2018 -0400 fsint: remove dead code The last references to these objects were removed with commit 3828c257ae33306bbdd3c6db9381601fe5b1b110 "dead-code-and-prototyes-20060214". A few mentions of CBS and BBS are left in the documentation as historical references: - doc/man-pages/pod1/rxgen.pod - src/kauth/AuthServer.mss Change-Id: Ia24eef7bb1509ff10d11de5c51e688e27f69417a Reviewed-on: https://gerrit.openafs.org/13324 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 77ae3dc899e89f327328c874628f100a765846c4 Author: Michael Meffie Date: Fri Apr 4 10:27:10 2014 -0400 cmd: improve help for programs without subcommands Some programs do not have subcommands (other than the standard "help", and "version" subcommands). The cmd library provides the "noopcode" mechanism for new subcommand-less programs, but older programs take advantage of the optional "initcmd" token to simulate subcommand-less programs. The "initcmd" token is optional to run the command, however it is required to display the command help. For example, running the xstat_cm_test program without any options gives a syntax error: $ xstat_cm_test xstat_cm_test: Missing required parameter '-cmname' ... Retrying with -help (or help, -h, --help), gives the rather unhelpful output: $ xstat_cm_test -help xstat_cm_test: Commands are: apropos search by help text help get help on commands initcmd initialize the program It is not obvious to the user how to get the command usage for the program, nor that the initcmd subcommand to "initialize the program" is actually is a placeholder to run the program. Instead, display the command usage when help is requested and initcmd is the only defined subcommand for a program. For example: $ xstat_cm_test -help Usage: src/xstat/xstat_cm_test [initcmd] -cmname + -collID + [-onceonly] [-frequency ] [-period ] [-debug] [-help] Where: -onceonly Collect results exactly once, then quit -debug turn on debugging output The libcmd library now supports an "noopcode", which should used for future subcommand-less programs, but converting old programs to remove the initcmd opcode could break scripts which actually specify the optional initcmd token. This commit adds a new libcmd flag called CMD_IMPLICIT which is used to denote built-in subcommands such as "version" and "help". Change-Id: Iee9cb2761254543f74166e5c240685f85b6915b6 Reviewed-on: https://gerrit.openafs.org/10983 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2daa413e3ec061e0653adbd1d6549f15e0659a62 Author: Andrew Deason Date: Tue Aug 7 17:27:24 2018 -0500 Avoid format truncation warnings With gcc 7.3, we start getting several warnings like the following: vutil.c: In function ‘VWalkVolumeHeaders’: vutil.c:860:34: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 63 [-Werror=format-truncation=] snprintf(name, VMAXPATHLEN, "%s" OS_DIRSEP "%s", partpath, dentry->d_name); Most or all of these truncations should be okay, but increase the size of the relevant buffers so we can build with warning checking turned on. Change-Id: Iac62d6fcfa46f523c34bf1b0ebc2770d3d67c174 Reviewed-on: https://gerrit.openafs.org/13274 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit fa6edf73d4bbe39012f3431c60584a282a823233 Author: Andrew Deason Date: Tue Sep 25 16:52:14 2018 -0500 vlserver: Remove 'register' argument Commit 4a531cb7 (death to register) removed the 'register' declaration from variables/arguments. But commit 3bf03502 (vlserver: Add a struct for trans-specific data) accidentally added one back in at around the same time, probably due to a rebase/merge mistake. Take the 'register' declaration back out. Change-Id: I73f206a57ab6b97195771e39556d2b0064be4cf3 Reviewed-on: https://gerrit.openafs.org/13346 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4a2b5101afda24b2d937e7350ca35b0b3d3c4af8 Author: Benjamin Kaduk Date: Wed May 30 19:38:57 2018 -0500 CellServDB update 14 May 2018 Update all three copies in the tree, and the rpm specfile. Change-Id: I572ff4e39ab757128f0082a4f447565e94b8dee5 Reviewed-on: https://gerrit.openafs.org/13134 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 02dede5d40a55421ab4f093c1c90b8f785a40ec1 Author: Andrew Deason Date: Mon Aug 20 14:53:35 2018 -0500 Log binding ip address and port during startup Many daemons currently have the ability to bind to a specific ip address using the -rxbind parameter. The behavior can be a little unintuitive, however, since we only bind to the ip address we find via NetInfo/NetRestrict processing, and only if we end up with a single ip address. Since that processing involves examining the set of ip addresses available, this can have confusing results if, for instance, a daemon starts up while an administrator is changing the local ip configuration. If a daemon binds to a different ip address than the administrator expects, this can be very confusing, especially since for most daemons we don't log our bound ip anywhere. To help alleviate this, change the startup code for all of our daemons to log what ip we are trying to bind to (or "0.0.0.0" if none), along with our local port. Change-Id: I18d3647c4d134177a0a17c6a64583d444558a9f6 Reviewed-on: https://gerrit.openafs.org/13272 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 794748af87134d0b89fbca3be6e0480a96a0655c Author: Michael Meffie Date: Tue Oct 10 22:57:01 2017 -0400 fsprobe: add fsprobe_Wait function Move the lwp code to wait in the fsprobe applications down to the fsprobe library. This is a non-functional change in anticipation of converting the fsprobe library and programs to pthreads. Change-Id: I2972b13e2e3eeb691c64c91b0640bbc97e7d0b21 Reviewed-on: https://gerrit.openafs.org/12747 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2c1a7e47336c8f8d14dd6c65d53925a9e0e87c66 Author: Michael Meffie Date: Mon Oct 9 22:23:31 2017 -0400 xstat: add xstat_*_Wait functions Add the xstat_cm_Wait and xstat_fs_Wait functions and move the code to wait for the xstat data collection to complete from the applications down to the xstat library. This is a non-functional change in anticipation of converting the xstat library and programs to pthreads. Change-Id: Ifd1d6bcda618c89b4ce46e1e64f33b0b30a89a72 Reviewed-on: https://gerrit.openafs.org/12746 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5ced6025b9f11fadbdf2e092bf40cc87499ed277 Author: Andrew Deason Date: Thu Nov 2 16:41:52 2017 -0500 rx: Convert rxinit_status to rx_IsRunning() Currently, all rx code examines the atomic rxinit_status to determine if rx is running (that is, if rx_InitHost has been called, and rx_Finalize/shutdown_rx hasn't been called). This is used in rx.c to see if we're redundantly calling our setup/teardown functions, and outside of rx.c in a couple of places to see if rx-related resources have been initialized. The usage of rxinit_status is a little confusing, since setting bit 0 indicates that rx is not running, and clearing bit 0 indicates rx is running. Since using rxinit_status requires atomic functions, this makes code checking or setting rxinit_status a little verbose, and it can be hard to see what it is checking for. (For example, does 'if (!rx_atomic_test_and_clear_bit(&rxinit_status, 0))' succeed when rx running, or when rx is not running?) The current usage of rxinit_status in rx_InitHost also does not handle initialization errors correctly. rx_InitHost clears rxinit_status near the beginning of the function, but does not set rxinit_status if an error is encountered. This means that any code that checks rxinit_status (such as another rx_InitHost call) will think that rx was initialized successfully, but various resources aren't actually setup. This can cause segfaults and other errors as the code tries to actually use rx. This can easily be seen in bosserver, if bosserver is started up while the local host/port is in use by someone else. bosserver will try to rx_InitHost, which will fail, and then we'll try to rx_InitHost again, which will immediately succeed without doing any init. We then segfault quickly afterwards as we try to use unitialized rx resources. To fix all of this, refactor code using rxinit_status to use a new function, called rx_IsRunning(), to make it a little clearer what we're checking for. We also re-introduce the LOCK_RX_INIT locks to prevent functions like rx_InitHost and rx_Finalize from running in parallel. Note that non-init/shutdown code (such as rx_upcall or rx_GetIFInfo) does not need to wait for LOCK_RX_INIT to check if rx is running or not. These functions only care if rx is currently setup enough to be used, so we can immediately return a 'yes' or 'no' answer. That is, if rx_InitHost is in the middle of running, rx_IsRunning returns 0, since some resouces may not be fully initialized. Change-Id: Ia14a6a725c9662b9db0adef48c33b48a93ffe051 Reviewed-on: https://gerrit.openafs.org/12761 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 00aa9200be86b187c903503e56b2af55639ea2b8 Author: Andrew Deason Date: Sat Sep 22 01:58:17 2018 -0500 SOLARIS: Fix libafs $(KOBJ) parallel make race Currently, our COMPDIRS make rule for SOLARIS libafs builds looks like this: ${COMPDIRS} ${INSTDIRS} ${DESTDIRS}: for t in $(KOBJ) ; do # set some variables ; \ cd $$t ; \ $(MAKE) $@_libafs || exit $$? ; \ cd ../ ;\ done And Makefile.common has this: all: setup $(COMPDIRS) Where the 'setup' rule creates the $(KOBJ) dirs and sets up some symlinks. For parallel builds, this means that our commands in the ${COMPDIRS} target can be running in parallel with the 'setup' target, and so our $(KOBJ) dirs may not exist by the time we try to 'cd $$t'. For single-KOBJ platforms this actually largely works, since the 'cd' will fail, but then the subsequent 'make' will run (just in the wrong dir), but this can cause us to wastefully re-compile the same source files (and cause some possibly confusing error messages). For platforms with multiple KOBJs, this causes obvious problems, since we don't cd into each KOBJ dir. To solve this, just have the ${COMPDIRS}/etc rule depend on setup, so we know that 'setup' has finished running. Also change our way of 'cd'ing into each KOBJ dir to actually cause the rule to fail, to make any errors here more obvious and consistent. Change-Id: Id2e662f36ef47a6182716728167b2da4713893c6 Reviewed-on: https://gerrit.openafs.org/13344 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 68be8d7a1884fe678016b5ea20c16b3b124e8406 Author: Andrew Deason Date: Fri Sep 21 22:13:25 2018 -0500 SOLARIS: Fix platforms for KOBJ definition Currently, we define KOBJ to "MODLOAD32 MODLOAD64" for the following platforms: Which doesn't make any sense, since "all" includes sun4x_511 and sunx86_511. The previous commits that modified this line, e4c2810f (Remove support for Solaris pre-8) and c6a22d67 (SOLARIS: Do not build x86 kernel module on 5.11), clearly meant to change the platforms sun4x_511 and sunx86_511 to use the KOBJ on the next line, but omitted the leading "-" for the platform. This doesn't break anything, since the Makefile on these platforms expands to: KOBJ = MODLOAD32 MODLOAD64 KOBJ = MODLOAD64 So the first KOBJ line is effectively ignored. It's confusing, though, so fix this line so these platforms only get one KOBJ definition. Change-Id: Idea9fdee4ac5883428748c2a5fdfa9707406436a Reviewed-on: https://gerrit.openafs.org/13343 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit c1d39153da00d5525b2f7874b2d214a7f1b1bb86 Author: Andrew Deason Date: Thu Sep 6 13:42:11 2018 -0500 Run ctfconvert/ctfmerge for all objects Commit 88cb536f (autoconf: detect ctf-tools and add ctf to libafs) introduced running ctfconvert and ctfmerge for libafs on Solaris, but didn't add any CTF data for userspace code. This commit causes the same commands to be run for every binary that we build (if the ctf tools are available). To accomplish this, also refactor how we run ctfconvert and ctfmerge. The approach in commit 88cb536f would require us to modify the makefile rule for every executable to run RUN_CTFCONVERT and RUN_CTFMERGE, which is somewhat impractical. So instead in this commit, we modify all of our *_CCRULE and *_LDRULE variables to wrap the compiler invocation with the new CC_WRAPPER script. This means our *RULE variables change from something like this: FOO_CCRULE = $(RUN_CC) $(CC) $(XXX_FLAGS) -o $@ to something like this: FOO_CCRULE = $(RUN_CC) $(CC_WRAPPER) $(CC) $(XXX_FLAGS) -o $@ CC_WRAPPER expands to the script src/config/cc-wrapper, which just runs ctfconvert or ctfmerge on the relevant files after the compiler/linker runs. If the CTF tools are not configured, CC_WRAPPER expands to nothing, to limit our impact on other platforms. This commit was developed in collaboration with mbarbosa@sinenomine.net. Change-Id: Id19ba9d739edc68f01c2db7d5caa20758ec8144a Reviewed-on: https://gerrit.openafs.org/13308 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 78ed034603781a979687a45c08eb8b13e515e8bf Author: Andrew Deason Date: Tue Aug 7 11:17:43 2018 -0500 Call rx_InitHost once during daemon startup Currently, a few daemons calls rx_InitHost in different places, and under different conditions. For example, vlserver calls rx_InitHost only when we -rxbind to a specific ip address, and then also makes an additional rx_Init call. Other daemons always call rx_InitHost, or just call rx_InitHost sometimes and don't make an extra rx_Init call. To try to make the various daemons behave a little more consistently, change the startup code to always call rx_InitHost, and to only call it once. Note that rx_InitHost is the same as calling rx_Init with INADDR_ANY as the ip address, and calling rx_Init* after a previous rx_Init* call is effectively a no-op. Change-Id: Ifd15175349a7b4695e684ca82deb8a8af5063073 Reviewed-on: https://gerrit.openafs.org/13271 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 38a094137f067255c586dd5c85f3040d7a7c4486 Author: Andrew Deason Date: Fri Sep 21 17:16:52 2018 -0500 pthread.m4: Add missing 'test' to conditional Commit c5def62d (autoconf: update pthread checks) accidentally omitted a 'test' in one of the conditionals. This causes an ugly error message during configure: checking for pthread_attr_init in -lpthread... yes ./configure[31043]: x-lpthread: not found [No such file or directory] Replace the missing 'test'. Change-Id: I28b82594e43a4ab42a5eb9fcc78e0ce8c5517d8b Reviewed-on: https://gerrit.openafs.org/13342 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3fae4ea19a175aed7ff3f6e9c7fdf2aa2f1b5cb3 Author: Mark Vitale Date: Wed Nov 9 16:58:00 2016 -0500 ubik: remove unused code ubik_GetVersion and ubik_WaitVersion have been unused since at least OpenAFS 1.0. Remove them. No functional change should be incurred by this commit. Change-Id: Iee6952f35d8c34e9f05a4e6011f5795f7222fb08 Reviewed-on: https://gerrit.openafs.org/13325 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 809ee49b80d7bc0e720aaebe78fb9ecfd453065d Author: Andrew Deason Date: Fri Sep 21 12:11:46 2018 -0500 Remove alpha_dux/alpha_osf references Several files were still referencing the alpha_dux* and alpha_osf* sysnames. The code for these platforms has been removed, so get rid of this cruft. Change-Id: I042fcc29be322bf557829974242553bb6d5b2be4 Reviewed-on: https://gerrit.openafs.org/13339 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 42625220bb615e2fd7f0dc24e50a502e0596e546 Author: Andrew Deason Date: Fri Sep 21 12:03:37 2018 -0500 libafs: Remove .i Makefile rules Makefile.common.in defines a suffix rule to generate .i files from .c files, but we never actually need to do this. The rule originates from before OpenAFS 1.0, which also did not use the rule. Remove the unused definitions. Change-Id: I057b2aca7d17e3e85e93d886a65c954e8d9d708f Reviewed-on: https://gerrit.openafs.org/13338 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 930d8ee638112ca8bf27a9528c0a527cfab54c7d Author: Mark Vitale Date: Fri Aug 17 18:48:08 2018 -0400 volser: ensure GCTrans transaction walk remains valid Commit bc56f5cc97a982ee29219e6f258b372dbfe1a020 ("volser: Delete timed-out temporary volumes") introduced new logic to GCTrans(). Unfortunately, part of this logic temporarily drops VTRANS_LOCK in order to call VPurgeVolume(). While this lock is dropped, other volser_trans may be added or deleted from the allTrans list. Therefore, GCTrans should not trust the next pointer (nt = tt->next) which was obtained before the lock was dropped. One symptom observed in the field was a segfault while examining tt->volume. Neither tt nor volume were valid any longer, since tt had been set from a stale nt at the top of the loop. To repair, improve, and clarify this logic: - Refactor so nt is assigned correctly and as late as possible. - Add comments to explain the placement of the assigns to future maintainers. Change-Id: Ibd3a504bddd3622730aa349576341e20f2f27836 Reviewed-on: https://gerrit.openafs.org/13286 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 89b50fdec9ab2dafe24b873f25c2cdb71b154e44 Author: Marcio Barbosa Date: Sat Aug 11 15:51:05 2018 -0400 volser: add more logs for failures during restore In the current version of the volserver, some failures during volume restores are not logged. In order to help debugging, this commit introduces extra logs for possible failures during this process, so we guarantee that an error at any point during the restore causes a message to be logged. Change-Id: I3647155aeb3f10316d9d7fecb5b126efc909f7b4 Reviewed-on: https://gerrit.openafs.org/13252 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 7c27365ea24aed5787f6fc03f30f6085c78ece51 Author: Michael Meffie Date: Mon Oct 9 22:16:09 2017 -0400 afsmonitor: remove unused LWP_WaitProcess Remove the unimplemented once-only flag and the unused LWP_WaitProcess call. Change-Id: Idec5815f6f20019b9be4b973794d8b05cea7f6c9 Reviewed-on: https://gerrit.openafs.org/12745 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 95b0641ad8cfd0358576c6e1a93266fc59ecf710 Author: Mark Vitale Date: Thu Sep 6 14:09:26 2018 -0400 volser: combine GCTrans conditional clauses In preparation for a future commit, combine two conditional clauses in GCTrans(). No functional change should be incurred by this commit. Change-Id: Ib08d5b83dd26327124fe0119e6e5f459adc5f78a Reviewed-on: https://gerrit.openafs.org/13303 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit f62fb17b3cf1c886f8cfc2fabe9984070dd3eec4 Author: Michael Meffie Date: Tue Apr 19 20:46:33 2016 -0400 ubik: positional io for db reads and writes The ubik library was written before positional i/o was available and issues an lseek system call for each database file read and write. This change converts the ubik database accesses to use positional i/o on platforms where pread and pwrite are available, in order to reduce system call load. The new inline uphys_pread and uphys_pwrite functions are supplied on platforms which do not supply pread and pwrite. These functions fall back to non-positional i/o. If these symbols are present in the database server binary then the server process will continue to call lseek before each read and write access of the database file. This change does not affect the whole-file database synchronization done by ubik during database recovery (via the DISK_SendFile and DISK_GetFile RPCs), which still uses non-positional i/o. However, that code does not share file descriptors with the phys.c code, so there is no possibility of mixing positional and non-positional i/o on the same FDs. Change-Id: I28accd24f7f27b5e8a4f1dd0e3e08bab033c16e0 Reviewed-on: https://gerrit.openafs.org/12272 Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 8375a7f7dd0e3bcbf928a23f874d1a15a952cdef Author: Marcio Barbosa Date: Sat Aug 11 14:00:18 2018 -0400 volser: warn if older version of volume is restored Volume restores work by overwriting vnodes with the data in the given volume dump. If we restore a partial incremental dump from an older version of the volume, this generally results in a partly-corrupted volume, since directory vnodes may contain references that don't exist in the current version of the volume (or are supposed to be in a different directory). Currently, the volserver does not prevent restoring older volume data to a volume, and this doesn't necessarily always result in corrupted data (for instance, if we are restoring a full volume dump over an existing volume). But restoring old volume data seems more likely to be a mistake, since reverting a volume back to an old version, even without corrupting data, is a strange thing to do and may cause problems with our methods of cache consistency. So, log a warning when this happens, so if this is a mistake, it doesn't happen silently. But we still do not prevent this action, since it's possible something could be doing this intentionally. We detect this just by checking if the updateDate in the given header is older than the current updateDate for the volume on disk. Note: Restoring a full dump file (-overwrite f) will not result in corrupted data. In this scenario, the restore operation removes the volume on disk first (if present). After that, the dump file is restored. In this case, we do not log anything (the volume is not corrupted). Change-Id: Iac55cc8bb1406ca6af9a5e43e7d37c6bfa889e91 Reviewed-on: https://gerrit.openafs.org/13251 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit dc25b9f509385bef7e6f73f03a796ea033922300 Author: Michael Meffie Date: Fri Oct 27 23:25:10 2017 -0400 update: convert upserver and client from LWP to pthreads Build the upserver and the upclient with phreads instead of LWP and convert the IOMRG sleeps in the client to regular sleeps. Change-Id: I183765ef180f34d38b87a13ec49f16f4a60afcc8 Reviewed-on: https://gerrit.openafs.org/12754 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 066b3a9fd7a4d99e8aefe5cc20e57b31b137f979 Author: Pat Riehecky Date: Fri Jun 1 16:29:25 2018 -0500 Correct some redundant if() clauses A few if() conditions currently contain redundant syntax, due to typos. Fix the conditions to actually check different things, according to what the author probably originally intended. (via cppcheck) Change-Id: I7e46217e1f84fe65677ada345d227f31f1988fe6 Reviewed-on: https://gerrit.openafs.org/13157 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6892bfbd701899281b34ee337637d438c7d8f8c6 Author: Michael Meffie Date: Wed Apr 20 18:17:16 2016 -0400 ubik: remove unnecessary lseeks in uphys_open The ubik database file access layer has a file descriptor cache to avoid reopening the database file on each file access. However, the file offset is reset with lseek on each and every use of the cached file descriptor, and the file offset is set twice when reading or writing data records. This change removes unnecessary and duplicate lseek system calls to reduce the system call load. Change-Id: I460b226d81e4eb64dc87918175acab495aa698cd Reviewed-on: https://gerrit.openafs.org/12271 Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Reviewed-by: Marcio Brito Barbosa Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit da699c8b81e818ba97ff8115397d7f7afe0bf512 Author: Michael Meffie Date: Mon Sep 10 23:47:33 2018 -0400 klog.krb5 -lifetime is not implemented The klog.krb5 -lifetime option was copied from earlier versions of log and klog, which had the ability to set the krb4 token lifetime. However, the -lifetime option is not feasible the krb5 version, and so is not implemented in klog.krb5. Update the klog.krb5 man page to document the -lifetime option has no effect. Remove the code which unnecessarily checks the unused klog.krb5 -lifetime command line argument. The unused lifetime variable was discovered by Pat Riehecky using the clang scan-build static analyzer. Change-Id: I5f459ec46eaff87a69ccdf7de386a671d0944a5a Reviewed-on: https://gerrit.openafs.org/13309 Tested-by: BuildBot Reviewed-by: PatRiehecky Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 8f314560c9b00acb63e1929503f6bf2e43bb1ff6 Author: Michael Meffie Date: Tue Sep 11 12:03:30 2018 -0400 util: add defines for ktime never and now values Add preprocessor symbolic names for ktime values representing never and right now. The names are intended to be consistent with the ktime date never value definition. This commit does not make any functional change. Change-Id: Ia6735b585e50aeb018481f76552fbb4f607b8529 Reviewed-on: https://gerrit.openafs.org/13310 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 800318b43fdf461ad95cd7f3940718f3f0a609a7 Author: Andrew Deason Date: Thu May 10 16:22:52 2018 -0500 ubik: Buffer log writes with stdio Currently, when we write ubik i/o operations to the db log, we tend to issue several syscalls involving small writes and fstat()s. This is because each "log" operation involves at least one write, and each log operation tends to be pretty small. Each logged operation hitting disk separately is unnecessary, since the db log does not need to hit the disk at all until we are ready to commit the transaction. So to reduce the number of syscalls when writing to the db, change our log writes to be buffered in memory (using stdio calls). This also avoids needing to fstat() the underlying log file, since we open the underlying file in append-only mode, since we only ever append to (and truncate) the log file. To implement this, we introduce a new 'buffered_append' phys operation, to explicitly separate our buffered and non-buffered operations, to try to avoid any bugs from mixing buffered and non-buffered i/o. This new operation is only used for the db log. Change-Id: I5596117c6c71ab7c2d552f71b0ef038f387e358a Reviewed-on: https://gerrit.openafs.org/13070 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Joe Gorse Reviewed-by: Benjamin Kaduk Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot commit 93fd6d31ce441c5ab394f31355584d17ef6e455a Author: Marcio Barbosa Date: Mon Sep 10 18:14:55 2018 +0000 autoconf: Use `uname -p` instead of $HOST_CPU for ctf tools Currently, we check if the ctf tools are present searching for them in a few directories. One of these directories (/opt/onbld/bin/$HOST_CPU) looks at the $HOST_CPU variable, which on x86 can be 'x86_64' or 'i386', but the only valid directories for the onbld tools are 'i386' and 'sparc'. So instead of $HOST_CPU, just use $(uname -p), which is only ever 'i386' on x86, and 'sparc' on sparc. [adeason@sinenomine.net: reword commit message] Change-Id: I972cf1cc0dda81f5ee454b14ddbe2830c82c838d Reviewed-on: https://gerrit.openafs.org/13275 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fa55a3fe77b4adfce4071fe73f02687e65d4e027 Author: Michael Meffie Date: Sat Jun 9 05:16:02 2018 +0000 doc: the last partition name is /vicepiu The last valid partition name supported by OpenAFS is /vicepiu, not /vicepiv. Update the docs and man pages to say so. Change-Id: I6e1cce775d332d76f605a26f16502c651461994b Reviewed-on: https://gerrit.openafs.org/13177 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3588127191c3ebf2e411212bbea9a33a9081e009 Author: Michael Meffie Date: Sat Jun 9 04:39:49 2018 +0000 tests: partition name to id function tests Add unit tests for the utility functions to convert between partition names and partition ids. Change-Id: I4b12f9d611cb9f3ce49909cda5cbcedd3e6c3d10 Reviewed-on: https://gerrit.openafs.org/13176 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 70926959094497d440daf9a78e1e1ea5a7ddc9b8 Author: Ben Kaduk Date: Mon Dec 9 15:26:06 2013 -0500 Add rxgk_crypto_rfc3961.c rxgk wrappers around an external crypto library, in this case, our in-tree rfc3961 library. Primitives for encryption/decryption and MIC/VerifyMIC, ways to generate and free rxgk_key objects, etc.. Change-Id: I7525086043baf54f5c3019b3f5ab3495760c4236 Reviewed-on: https://gerrit.openafs.org/10565 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 6534b10a4180ec10bceebbc11405718e7969fa21 Author: Andrew Deason Date: Thu Jul 26 15:48:00 2018 -0500 Remove DUX/OSF code Remove code for DUX/OSF platforms. DUX code was removed from the libafs client in commit 392dcf67 ("Complete removal of DUX client code") and the alpha_dux* param files were removed in dc4d9d64 ("afs: Remove AFS_BOZONLOCK_ENV"). This code has always been disabled since those commits, so remove any code referencing AFS_DUX*_ENV, AFS_OSF_ENV, and related symbols. Change-Id: I3787b83c80a48e53fe214fdecf9a9ac0b63d390c Reviewed-on: https://gerrit.openafs.org/13260 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 8ad4e15ffc883c9a99f9636d7d8a5ed0a2fcc26a Author: Marcio Barbosa Date: Tue May 31 09:08:08 2016 -0300 venus: fix memory leak In GetPrefCmd, when we request server prefs from the kernel and our output buffer is not big enough, pioctl() will return E2BIG and we allocate more memory and try again. However, if the size of the output buffer reaches 16k bytes and this space is still not enough (or if pioctl fails and errno != E2BIG), we return without releasing the memory that was previously allocated. To fix this problem, free our output buffer when this happens. Change-Id: Ib34cb12629528ddf2a763386f0ac5494eb8be695 Reviewed-on: https://gerrit.openafs.org/12293 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit c553170bcf3b97ba3745f21040c8e07b128ef983 Author: Jeffrey Altman Date: Wed Jun 6 21:23:14 2018 -0400 rx: reset packet header userStatus field on reuse OpenAFS Rx fails to set the rx packet header userStatus field for most packets sent other than type RX_PACKET_TYPE_ACK. If the userStatus field is not set, its value will be random garbage based upon the prior use of the memory allocated to the rx_packet. This change explicitly sets the userStatus field to zero for all DATA and Special packet types. Background ---------- OpenAFS Rx allocates a pool of rx_packet structures that are reused for both incoming and outgoing Rx packets throughout the lifetime of the process (or kernel module). The rx packet header field userStatus is set by rxi_Send() to rx_call.localStatus. rxi_Send() is called from both rxi_SendAck() when sending RX_PACKET_TYPE_ACK packets and from rxi_SendSpecial() when called with a non-NULL call structure (RX_PACKET_TYPE_BUSY, RX_PACKET_TYPE_ACKALL, or RX_PACKET_TYPE_ABORT). rx_call.localStatus defaults to zero and can be modified by the application calling rx_SetLocalStatus(). The userStatus field is neither set nor reset when sending RX_PACKET_TYPE_DATA packets and all packets sent without a call structure. When allocated packets are reused in these cases, the value of the userStatus leaks from the prior packet use. The userStatus field is expected to be zero unless intentionally set by the application protocol to another value. The AFS3 suite of rx services uses the rx_header.userStatus field only in the RXAFS service and only as part of the definition for RXAFS_StoreData and RXAFS_StoreData64 RPCs. The StoreData RPCs use the rx_header.userStatus field as an out-of-band communication mechanism that permits the fileserver to signal to the cache manager when the RXAFS_StoreData[64] has been assigned to an application worker (thread) and the worker has acquired all of the required locks and other resources necessary to complete the RPC. This signal can be sent before all of the application data has been received. The cache manager reads the userStatus value via rx_GetRemoteStatus(). When bit-0 of the remote status value equals one and CSafeStore mode is disabled, the cache manager can wakeup any threads blocked waiting for the store operation to complete. Cache managers that perform a workload heavy in RXAFS_StoreData[64] RPCs will end up with an increasing percentage of packets in which the userStatus field is one instead of zero. Fileservers processing a workload heavy in RXAFS_StoreData[64] RPCs will likewise end up with an increasing percentage of packets in which the userStatus field is one instead of zero. Cache managers and Fileservers will therefore send DATA and call free special packets with a non-zero userStatus field to peer services (RXAFS, RXAFSCB, VL, PR). The failure to reset the userStatus field has not been a problem in the past because only the OpenAFS cache manager has ever queried the userStatus via rx_GetRemoteStatus() and only when issuing RXAFS_StoreData[64] RPCs. Failure to correct this flaw interferes with future use of the userStatus field in yet to be registered AFS3 RPCs and existing non-AFS3 services that make use of the userStatus when sending data to a service. Change-Id: I32c0bba93b8e5197036d92168956b6e2a95406fb FIXES: 134554 Reviewed-on: https://gerrit.openafs.org/13165 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2d8045d67686fbb80696b47b4a60e48e7e74fec9 Author: Mark Vitale Date: Tue Sep 11 15:59:41 2018 -0400 budb: SBUDB_FindLatestDump should check result of FillDumpEntry FillDumpEntry may return an error, but FindLatestDump doesn't check its result. Therefore, SBUDB_FindLatestDump may return invalid results. Instead, check the return code from FillDumpEntry and abort the call if it fails. Change-Id: If0b44ba2a12a76511129d77110ef669b00780ff0 Reviewed-on: https://gerrit.openafs.org/13312 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 91bab84e7a3b7de2591c475ba4912b0db8899f05 Author: Mark Vitale Date: Tue Sep 11 16:29:59 2018 -0400 butc: repair build error Commit c43169fd36348783b1a5a55c5bb05317e86eef82 introduced a build error by invoking TLog with an extraneous set of internal parentheses. Remove the offending parentheses. Change-Id: Ibc52501b01ecbe9f86262566446d63e66486272f Reviewed-on: https://gerrit.openafs.org/13311 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d5816fd6cd1876760a985a817dbbb3940cf3bddb Author: Benjamin Kaduk Date: Tue Sep 11 10:51:01 2018 -0500 Fix typos in audit format strings Commit 9ebff4c6caa8b499d999cfd515d4d45eb3179769 introduced audit framework support for several butc-related data types, but had a typo ('$d' for '%d') in a couple of places, that was not reported by compiler format-string checking. Fix the typo to properly print all the auditable data. Change-Id: Ibefa9f8f1c0567bc6fe606327af26fcb0dbeadba commit 345ee34236c08a0a2fb3fff016edfa18c7af4b0a Author: Benjamin Kaduk Date: Sun Sep 9 10:44:38 2018 -0500 OPENAFS-SA-2018-001 backup: use authenticated connection to butc Use the standard routine to pick a client security object, instead of always assuming rxnull. Respect -localauth as well as being able to use the current user's tokens, but also provide a -nobutcauth argument to fall back to the historical rxnull behavior (but only for the connections to butc; vldb and budb connections are not affected). Change-Id: Ibf8ebe5521bee8d0f7162527e26bc5541d07910d commit 736364f1e3426b7b15836cd95ce25f0e516ce3f2 Author: Benjamin Kaduk Date: Thu Sep 6 18:50:39 2018 -0500 OPENAFS-SA-2018-001 butc: require authenticated connections with -localauth The butc -localauth option is available to use the cell-wide key to authenticate to the vlserver and buserver, which in normal deployments will require incoming connections to be authenticated as a superuser. In such cases, the cell-wide key is also available for use in authenticating incoming connections to the butc, which would otherwise have been completely unauthenticated. Because of the security hazards of allowing unauthenticaed inbound RPCs, especially ones that manipulate backup information and are allowed to initiate outboud RPCs authenticated as the superuser, default to not allowing unauthenticated inbound RPCs at all. Provide an opt-out command-line argument for deployments that require this functionality and have configured their network environment (firewall/etc.) appropriately. Change-Id: Ia6349757a4c6d59d1853df1a844e210d32c14feb commit c43169fd36348783b1a5a55c5bb05317e86eef82 Author: Benjamin Kaduk Date: Sun Sep 9 11:49:03 2018 -0500 OPENAFS-SA-2018-001 Add auditing to butc server RPC implementations Make the actual implementations into helper functions, with the RPC stubs calling the helpers and doing the auditing on the results, akin to most other server programs in the tree. This relies on support for some additional types having been added to the audit framework. Change-Id: Ic872d6dfc7854fa28bd3dc2277e92c7919d0d0c0 commit 9ebff4c6caa8b499d999cfd515d4d45eb3179769 Author: Benjamin Kaduk Date: Sat Sep 8 19:42:36 2018 -0500 OPENAFS-SA-2018-001 audit: support butc types Add support for several complex butc types to enable butc auditing. Change-Id: I6aedd933cf5330cda40aae6f33827ae65409df32 commit 50216dbbc30ed94f89bdd0e964f4891e87f28c0b Author: Benjamin Kaduk Date: Sat Sep 8 20:35:25 2018 -0500 OPENAFS-SA-2018-001 butc: remove dummy osi_audit() routine This local stub was present in the original IBM import and is unused. It will conflict with the real audit code once we start adding auditing to the TC_ RPCs, so remove it now. Change-Id: I3e74e01464af122f245c3b0fe8f3985e422d13b4 commit a4c1d5c48deca2ebf78b1c90310b6d56b3d48af6 Author: Mark Vitale Date: Fri Jul 6 03:14:19 2018 -0400 OPENAFS-SA-2018-003 rxgen: prevent unbounded input arrays RPCs with unbounded arrays as inputs are susceptible to remote denial-of-service (DOS) attacks. A malicious client may submit an RPC request with an arbitrarily large array, forcing the server to expend large amounts of network bandwidth, cpu cycles, and heap memory to unmarshal the input. Instead, issue an error message and stop rxgen when it detects an RPC defined with an unbounded input array. Thus we will detect the problem at build time and prevent any future unbounded input arrays. Change-Id: Ib110f817ed1c8132ea2549025876a5200c728fab commit 8b92d015ccdfcb70c7acfc38e330a0475a1fbe28 Author: Mark Vitale Date: Fri Jul 6 03:21:26 2018 -0400 OPENAFS-SA-2018-003 volser: prevent unbounded input to various AFSVol* RPCs Several AFSVol* RPCs are defined with an unbounded XDR "string" as input. RPCs with unbounded arrays as inputs are susceptible to remote denial-of-service (DOS) attacks. A malicious client may submit an AFSVol* request with an arbitrarily large string, forcing the volserver to expend large amounts of network bandwidth, cpu cycles, and heap memory to unmarshal the input. Instead, give each input "string" an appropriate size. Volume names are inherently capped to 32 octets (including trailing NUL) by the protocol, but there is less clearly a hard limit on partition names. The Vol_PartitionInfo{,64} functions accept a partition name as input and also return a partition name in the output structure; the output values have wire-protocol limits, so larger values could not be retrieved by clients, but for denial-of-service purposes, a more generic PATH_MAX-like value seems appropriate. We have several varying sources of such a limit in the tree, but pick 4k as the least-restrictive. [kaduk@mit.edu: use a larger limit for pathnames and expand on PATH_MAX in commit message] Change-Id: Iea4b24d1bb3570d4c422dd0c3247cd38cdbf4bab commit 97b0ee4d9c9d069e78af2e046c7987aa4d3f9844 Author: Mark Vitale Date: Fri Jul 6 01:09:53 2018 -0400 OPENAFS-SA-2018-003 volser: prevent unbounded input to AFSVolForwardMultiple AFSVolForwardMultiple is defined with an input parameter that is defined to XDR as an unbounded array of replica structs: typedef replica manyDests<>; RPCs with unbounded arrays as inputs are susceptible to remote denial-of-service (DOS) attacks. A malicious client may submit an AFSVolForwardMultiple request with an arbitrarily large array, forcing the volserver to expend large amounts of network bandwidth, cpu cycles, and heap memory to unmarshal the input. Even though AFSVolForwardMultiple requires superuser authorization, this attack is exploitable by non-authorized actors because XDR unmarshalling happens long before any authorization checks can occur. Add a bounding constant (NMAXNSERVERS 13) to the manyDests input array. This constant is derived from the current OpenAFS vldb implementation, which is limited to 13 replica sites for a given volume by the layout (size) of the serverNumber, serverPartition, and serverFlags fields. [kaduk@mit.edu: explain why this constant is used] Change-Id: Id12c6a7da4894ec490691eb8791dcd3574baa416 commit 124445c0c47994f5e2efef30e86337c3c8ebc93f Author: Mark Vitale Date: Thu Jul 5 23:51:37 2018 -0400 OPENAFS-SA-2018-003 budb: prevent unbounded input to BUDB_SaveText BUDB_SaveText is defined with an input parameter that is defined to XDR as an unbounded array of chars: typedef char charListT<>; RPCs with unbounded arrays as inputs are susceptible to remote denial-of-service (DOS) attacks. A malicious client may submit a BUDB_SaveText request with an arbitrarily large array, forcing the budb server to expend large amounts of network bandwidth, cpu cycles, and heap memory to unmarshal the input. Modify the XDR definition of charListT so it is bounded. This typedef is shared (as an OUT parameter) by BUDB_GetText and BUDB_DumpDB, but fortunately all in-tree callers of the client routines specify the same maximum length of 1024. Note: However, SBUDB_SaveText server implementation seems to allow for up to BLOCK_DATA_SIZE (2040) = BLOCKSIZE (2048) - sizeof(struct blockHeader) (8), and it's unknown if any out-of-tree callers exist. Since we do not need a tight bound in order to avoid the DoS, use a somewhat higher maximum of 4096 bytes to leave a safety margin. [kaduk@mit.edu: bump the margin to 4096; adjust commit message to match] Change-Id: Ic3fe2758a9c97ed02c6e6d05f0de0865959b5b04 commit 7629209219bbea3f127b33be06ac427ebc3a559e Author: Mark Vitale Date: Thu Jul 5 21:11:30 2018 -0400 OPENAFS-SA-2018-003 vlserver: prevent unbounded input to VL_RegisterAddrs VL_RegisterAddrs is defined with an input argument of type bulkaddrs, which is defined to XDR as an unbounded array of afs_uint32 (IPv4 addresses): typedef afs_uint32 bulkaddrs<> The <> with no value instructs rxgen to build client and server stubs that allow for a maximum size of "~0u" or 0xFFFFFFFF. Ostensibly the bulkaddrs array is unbounded to allow it to be shared among VL_RegisterAddrs, VL_GetAddrs, and VL_GetAddrsU. The VL_GetAddrs* RPCs use bulkaddrs as an output array with a maximum size of MAXSERVERID (254). VL_RegisterAddrss uses bulkaddrs as an input array, with a nominal size of VL_MAXIPADDRS_PERMH (16). However, RPCs with unbounded array inputs are susceptible to remote denial-of-service attacks. That is, a malicious client may send a VL_RegisterAddrs request with an arbitrarily long array, forcing the vlserver to expend large amounts of network bandwidth, cpu cycles, and heap memory to unmarshal the argument. Even though VL_RegisterAddrs requires superuser authorization, this attack is exploitable by non-authorized actors because XDR unmarshalling happens long before any authorization checks can occur. Because all uses of the type that our implementation support have fixed bounds on valid data (whether input or output), apply an arbitrary implementation limit (larger than any valid structure would be), to prevent this class of attacks in the XDR decoder. [kaduk@mit.edu: limit the bulkaddrs type instead of introducing a new type] Change-Id: Ibcc962ccc46aec7552b86d1d9fda7cc14310bc03 commit f5a80115f8f7f9418287547f0fc7fdb13d936f00 Author: Benjamin Kaduk Date: Thu Aug 30 10:38:56 2018 -0500 OPENAFS-SA-2018-002 butc: Initialize OUT scalar value In STC_ReadLabel, the interaction with the tape device is synchronous, so there is no need to allocate a task ID for status monitoring. However, we do need to initialize the output value, to avoid writing stack garbage on the wire. Change-Id: Id2066e1fe95fa1de02577dfd844697b1ae770f30 commit 7a7c1f751cdb06c0d95339c999b2c035c2d2168b Author: Mark Vitale Date: Tue Jun 26 06:01:16 2018 -0400 OPENAFS-SA-2018-002 ubik: prevent VOTE_Debug, VOTE_XDebug information leak VOTE_Debug and VOTE_XDebug (udebug) both leave a single field uninitialized if there is no current transaction. This leaks the memory contents of the ubik server over the wire. struct ubik_debug - 4 bytes in member writeTrans In common code to both RPCs, ensure that writeTrans is always initialized. [kaduk@mit.edu: switch to memset] Change-Id: I91184b4ed0c159982a883ebaa9634406400eae93 commit b604ee7add7be416bf20973422a041e913d20761 Author: Mark Vitale Date: Tue Jun 26 05:26:21 2018 -0400 OPENAFS-SA-2018-002 kaserver: prevent KAM_ListEntry information leak KAM_ListEntry (kas list) does not initialize its output correctly. It leaks kaserver memory contents over the wire: struct kaindex - up to 64 bytes for member name - up to 64 bytes for member instance Initialize the buffer. [kaduk@mit.edu: move initialization to top of server routine] Change-Id: I5cc430fc996e7e89d38a384d092b9d4fad248fa4 commit be0142707ca54f3de99c4886530e7ac9f48dd61c Author: Mark Vitale Date: Tue Jun 26 05:12:32 2018 -0400 OPENAFS-SA-2018-002 butc: prevent TC_DumpStatus, TC_ScanStatus information leaks TC_ScanStatus (backup status) and TC_GetStatus (internal backup status watcher) do not initialize their output buffers. They leak memory contents over the wire: struct tciStatusS - up to 64 bytes in member taskName (TC_MAXNAMELEN 64) - up to 64 bytes in member volumeName " Initialize the buffers. [kaduk@mit.edu: move initialization to top of server routines] Change-Id: I0337d233e1dced56e351ed00471c9738fcd3b9db commit 52f4d63148323e7d605f9194ff8c1549756e654b Author: Mark Vitale Date: Tue Jun 26 05:00:25 2018 -0400 OPENAFS-SA-2018-002 butc: prevent TC_ReadLabel information leak TC_ReadLabel (backup readlabel) does not initialize its output buffer completely. It leaks butc memory contents over the wire: struct tc_tapeLabel - up to 32 bytes from member afsname (TC_MAXTAPELEN 32) - up to 32 bytes from member pname (TC_MAXTAPELEN 32) Initialize the buffer. [kaduk@mit.edu: move initialization to the RPC stub] Change-Id: I30f4aa32801791913b397a58c36c86c019dc51ef commit e96771471134102d3879a0ac8b2c4ef9d91a61b8 Author: Mark Vitale Date: Tue Jun 26 04:39:44 2018 -0400 OPENAFS-SA-2018-002 budb: prevent BUDB_* information leaks The following budb RPCs do not initialize their output correctly. This leaks buserver memory contents over the wire: BUDB_FindLatestDump (backup dump) BUDB_FindDump (backup volrestore, diskrestore, volsetrestore) BUDB_GetDumps (backup dumpinfo) BUDB_FindLastTape (backup dump) struct budb_dumpEntry - up to 32 bytes in member volumeSetName - up to 256 bytes in member dumpPath - up to 32 bytes in member name - up to 32 bytes in member tape.tapeServer - up to 32 bytes in member tape.format - up to 256 bytes in member dumper.name - up to 128 bytes in member dumper.instance - up to 256 bytes in member dumper.cell Initialize the buffer in common routine FillDumpEntry. Change-Id: Ic057a6c906ce2acd39e0e4ea0a0ba1e100bba3e9 commit 211b6d6a4307006da1467b3be46912a3a5d7b20b Author: Mark Vitale Date: Tue Jun 26 03:56:24 2018 -0400 OPENAFS-SA-2018-002 afs: prevent RXAFSCB_TellMeAboutYourself information leak RXAFSCB_TellMeAboutYourself does not completely initialize its output buffers. This leaks kernel memory over the wire: struct interfaceAddr Unix cache manager (libafs) - up to 124 bytes in array addr_in ((AFS_MAX_INTERFACE_ADDR 32 * 4) - 4)) - up to 124 bytes in array subnetmask " - up to 124 bytes in array mtu " Windows cache manager - 64 bytes in array addr_in ((AFS_MAX_INTERFACE_ADDR 32 - CM_MAXINTERFACE_ADDR 16)* 4) - 64 bytes in array subnetmask " - 64 bytes in array mtu " The following implementations of SRXAFSCB_TellMeAboutYourself are not susceptible: - fsprobe - libafscp - xstat_fs_test Initialize the buffer. Change-Id: I2ef868dd9269db7004a21cf913b6787948357d10 commit b52eb11a08f2ad786238434141987da27b81e743 Author: Mark Vitale Date: Tue Jun 26 03:47:41 2018 -0400 OPENAFS-SA-2018-002 afs: prevent RXAFSCB_GetLock information leak RXAFSCB_GetLock (cmdebug) does not correctly initialize its output. This leaks kernel memory over the wire: struct AFSDBLock - up to 14 bytes for member name (16 - '\0') Initialize the buffer. Change-Id: I4c5c8d67816c51645c0db44dc8f19b1b27c02757 commit 9d1aeb5d761581a35bef2042e9116b96e9ae3bf5 Author: Mark Vitale Date: Tue Jun 26 03:37:37 2018 -0400 OPENAFS-SA-2018-002 ptserver: prevent PR_ListEntries information leak PR_ListEntries (pts listentries) does not properly initialize its output buffers. This leaks ptserver memory over the wire: struct prlistentries - up to 62 bytes for each entry name (PR_MAXNAMELEN 64 - 'a\0') Initialize the buffer, and remove the now redundant memset for the reserved fields. Change-Id: I29d70c7e4dd567b8b046037f29f71911b8a0593f commit 26924fd508b21bb6145e77dc31b6cd0923193b72 Author: Mark Vitale Date: Tue Jun 26 03:00:02 2018 -0400 OPENAFS-SA-2018-002 volser: prevent AFSVolMonitor information leak AFSVolMonitor (vos status) does not properly initialize its output buffers. This leaks information from volserver memory: struct transDebugInfo - up to 29 bytes in member lastProcName (30-'\0') - 16 bytes in members readNext, tranmitNext, lastSendTime, lastReceiveTime Initialize the buffers. This must be done on a per-buffer basis inside the loop, since realloc is used to expand the storage if needed, and there is not a standard realloc API to zero the newly allocated storage. [kaduk@mit.edu: update commit message] Change-Id: I79091fc63435ed2a795955f95bb867bc625ad398 commit 76e62c1de868c2b2e3cc56a35474e15dc4cc1551 Author: Mark Vitale Date: Tue Jun 26 02:33:05 2018 -0400 OPENAFS-SA-2018-002 volser: prevent AFSVolPartitionInfo(64) information leak AFSVolPartitionInfo and AFSVolPartitionInfo64 (vos partinfo) do not properly initialize their reply buffers. This leaks the contents of volserver memory over the wire: AFSVolPartitionInfo (struct diskPartition) - up to 24 bytes in member name (32-'/vicepa\0')) - up to 12 bytes in member devName (32-'/vicepa/Lock/vicepa\0')) AFSVolPartitionInfo64 (struct diskPartition64) - up to 248 bytes in member name (256-'/vicepa\0')) - up to 236 bytes in member devName (256-'/vicepa/Lock/vicepa\0') Initialize the output buffers. [kaduk@mit.edu: move memset to top-level function scope of RPC handlers] Change-Id: If64c02f36f10f52bfbab4b21ad1f60032c223c82 commit 70b0136d552a0077d3fae68f3aebacd985abd522 Author: Mark Vitale Date: Mon Jun 25 18:03:12 2018 -0400 OPENAFS-SA-2018-002 ptserver: prevent PR_IDToName information leak SPR_IDToName does not completely initialize the return array of names, and thus leaks information from ptserver memory: - up to 62 bytes per requested id (PR_MAXNAMELEN 64 - 'a\0') Use calloc to ensure that all memory sent on the wire is initialized, preventing the information leak. [kaduk@mit.edu: switch to calloc; update commit message] Change-Id: Iad623f2cc4c54b79f14a64b8714ba12579d05447 commit 03e804b629c17ca7a4e5789cf98b283c52bd59ed Author: Ben Kaduk Date: Thu Jan 10 11:57:00 2013 -0500 Configure glue for rxgk Add an --enable-rxgk switch to control whether the feature is used. For the sake of buildbot coverage, we still attempt to build the core subdirectory provided that a sufficiently usable GSS-API library is available, but do not install anything when rxgk is disabled at configure time. Future commits will use the configure argument to control the behavior of other rxgk-aware code in the tree. We provide a few new symbols to conditionally compile code for rxgk. The two new high-level symbols are: - AFS_RXGK_ENV: when defined, rxgk is available - AFS_RXGK_GSS_ENV: when defined, we can use GSS-API calls AFS_RXGK_GSS_ENV is turned on only for userspace pthread builds. For now, AFS_RXGK_ENV is only turned on for userspace pthread builds, and non-ukernel kernel builds. This effectively disables rxgk integration in any ukernel or LWP code, but this can be changed in the future by changing when AFS_RXGK_ENV is defined. Change-Id: Iab661d47aac77c1a238e809362015b869752df18 Reviewed-on: https://gerrit.openafs.org/10564 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Tested-by: BuildBot commit de43a0f8829e26b2c56347176d7938810a38469c Author: Michael Meffie Date: Thu Apr 12 23:18:55 2018 -0400 Suppress statement not reached warnings after noreturn functions Use the AFS_UNREACHED macro to suppress statement not reached warnings while building under Solaris Studio. These warnings are emitted for statements following functions declared with the noreturn function attribute. Change-Id: Ic18cbb3ea78124acbe69edc0eccb2473b46648fe Reviewed-on: https://gerrit.openafs.org/13010 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b7e9a9b28aaa024b6d6efc6ca74edc690500fc0d Author: Michael Meffie Date: Tue Apr 10 18:29:44 2018 -0400 lwp: add missing lwp prototypes for solaris Add missing lwp function prototypes for Solaris. This fixes the compile time warning messages: warning: implicit function declaration: LWP_NoYieldSignal Change-Id: I69c3660bb2631215cd296c08729c8e84d60660fd Reviewed-on: https://gerrit.openafs.org/13008 Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit ae4ad509d35aab73936a1999410bd80bcd711393 Author: Michael Meffie Date: Fri Jan 19 03:30:22 2018 -0500 rx: fix rx_atomic warnings under Solaris The Solaris implementation of the rx_atomic functions generate numerous complile time warnings due to an integer type mismatch. "rx_atomic.h", line xxx: warning: argument #1 is incompatible with prototype: The rx_atomic_t is an unsigned int under Solaris, however the Solaris atomic_set_long_excl and atomic_clear_long_excl functions take a ulong_t type Solaris does not provide 'unsigned int' variants of these two functions. Fortunately, ulong_t variants of all the atomic we need for rx are available, in current as well as older versions of Solaris, so convert the Solaris rx_atomic_t type to be a ulong_t and convert all of the Solaris atomic calls to the ulong_t variants to avoid integer type mismatches. Change-Id: Ib54ca4bb8b9f044684301f0fb7971aec223e5993 Reviewed-on: https://gerrit.openafs.org/12991 Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit 3915911bcea2ede55799a15cec614e8291952e1f Author: Michael Meffie Date: Thu Aug 9 16:24:41 2018 -0400 afs: declare nfs translator dispatch functions static Declare the nfs translator dispatch functions to be static to enforce they are not to be called from outside of the translator. Change-Id: I1c3d8917c080409424e21e377405472094941da0 Reviewed-on: https://gerrit.openafs.org/13277 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f8672d0c0f6f58e773ce0e6e4b2fc7b19a5e7ffe Author: Michael Meffie Date: Thu Mar 29 23:36:21 2018 -0400 afs: use void * for generic pointers in the nfs translator dispatcher Replace the use of char * and char ** with void * for representing generic pointers in the nfs dispatcher functions. This was done to fix a large number of compile time warnings, and allows us to remove a number of explicit casts. Also, remove the unnecessary char * casts of memset and memcpy arguments in the nfs translator dispatcher. This commit fixes a large number of Solaris Studio warning messages in the form: ... warning: argument #X is incompatible with prototype: Change-Id: I42e2d40b8112ada9417724282c0230f48a40324f Reviewed-on: https://gerrit.openafs.org/12989 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit dc2141bf56b43a9531335f581767d7766895b8d2 Author: Michael Meffie Date: Thu Mar 29 23:32:40 2018 -0400 afs: change afs_nfs{2,3}_dispatcher signature The fourth argument of the afs_nfs{2,3}_dispatcher functions is a pointer to a pointer to a exportinfo structure. However, this argument is not an output argument, so the extra level of indirection is unnecessary. A separate local variable is used as an output argument to the afs_nfsclient_reqhandler call within the dispatchers, which is not passed back to the afs_nfs{2,3}_dispatcher caller. In anticipation of other changes to fix warning messages, simplify the signature of the afs_nfs{2,3}_dispatcher functions to avoid taking the address of the exportinfo structure when calling afs_nfs{2,3}_dispatcher. Change-Id: I6fb1a190e6aab286bfac41df783688a0be46a21f Reviewed-on: https://gerrit.openafs.org/12988 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e7678fb5fb6725055b576b86f6ef994594f0bb92 Author: Michael Meffie Date: Thu Mar 29 23:15:47 2018 -0400 afs: fix missing afs_nfs3_dispatcher return value Fix a missing early return value in the function afs_nfs3_dispatcher. All callers check the return code of afs_nfs3_dispatcher and interpret values greater that 1 to be errors. Return 3 as an error code for this code path, which is the next available error code in afs_nfs3_dispatcher. This commit fixes the following Solaris Studio warning message: ... warning: function expects to return value: afs_nfs3_dispatcher Change-Id: I47b545bd57a46c03006b9f031da3647c8a530377 Reviewed-on: https://gerrit.openafs.org/12987 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit 388eaec3452ed4b18a95ee34efcbe4cf64814701 Author: Michael Meffie Date: Thu Mar 15 18:53:59 2018 -0400 roken: do not clobber __attribute__ The roken-common.h header defines an empty macro called __attribute__ when HAVE___ATTRIBUTE__ is not defined. This macro conditionally removes the `format' function attributes in the roken headers at compile time. Unfortunately, the empty __attribute__ macro will also clobber other attribute types encountered after the roken.h header inclusion. This is not an issue when building under gcc or clang, since the empty attribute macro will not be defined. However Solaris Studio supports a subset of the function attribute types, with `format' not currently supported. This means roken will define an empty __attribute__ macro, which prevents the use of other attribute types. This commit does not change the roken files directly because they are external. Instead, the processing of the roken.h.in file has been updated to undefine the __attribute__ macro at the end of the generated roken.h header. Change-Id: Iea5622ae175e7f82a60780838948178bd7f8b56f Reviewed-on: https://gerrit.openafs.org/12961 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 1711917e7ded7ebebae74d7bfeb8359a69db8869 Author: Andrew Deason Date: Fri Jun 29 14:48:58 2018 -0500 autoconf: Split out krb5/gss tests Move our krb5 and GSS-related autoconf tests into their own separate files, in src/cf/krb5.m4 and src/cf/gss.m4. Change-Id: I4202df5d810f2d3942fc4ffb3fd406869f68029b Reviewed-on: https://gerrit.openafs.org/13237 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 9d3ef9337fafe5dcf3865d3aced290be0f887c11 Author: Marcio Barbosa Date: Thu May 31 09:46:56 2018 -0300 autoconf: do not reference the missing script Currently, OpenAFS does not use automake. As a result, the missing script is not copied to the build-tools directory. Since this script is not present in the tree, am_missing_run is not initialized. Unfortunately, the current version still has a few references to this variable. In order to preserve a similar behavior, this commit replaces these references by AC_ERROR. While we are changing these, remove the AC_CHECK_PROGS calls for AR and STRIP, since libtool already checks these for us. Change-Id: I833dc6e8611dc7227db4ec77b0160dfa47b7e531 Reviewed-on: https://gerrit.openafs.org/12982 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a9644daa965fbf316943a07ad985b8ead2f4f31d Author: Peter Foley Date: Mon Feb 29 16:39:14 2016 -0500 Remove obsolete retsigtype Only relevent for pre-c89 K&R compilers. [mmeffie@sinenomine.net: avoid changes to src/external] Change-Id: I1b3bf14ddd50f1a6b3d50e0376abffffdb64fb81 Reviewed-on: https://gerrit.openafs.org/12203 Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 451602a5e3a503d46eaecb3738d259e46023afcd Author: Michael Meffie Date: Sat May 26 19:52:27 2018 -0400 autoconf: reformat long lines The autoupdate tool was run to modernize the autoconf macros but generates very long lines. Manually reformat the long lines to make them more reasonable. Change-Id: I6f08138aa7134d8110da885ea4375cebbe903575 Reviewed-on: https://gerrit.openafs.org/13125 Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 2e23fceec872795a39b915b73e48eb77a5d65afe Author: Peter Foley Date: Mon Feb 29 13:28:28 2016 -0500 autoconf: autoupdate macros Run autoupdate on macros. [mmeffie@sinenomine.net: re-run autoupdate, no other edits] Change-Id: I8b45edea97cf2e065f23f02d2d7f6a0e7adcb8a5 Reviewed-on: https://gerrit.openafs.org/12202 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit f9c584a794c6a4c5d03fa1ee7f1b2b5e1309e7ee Author: Michael Meffie Date: Fri Apr 20 11:47:57 2018 -0400 autoconf: update curses.m4 Replace the obsolete AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM in the curses check for the getmaxyx macro. This change was done manually instead of using autoupdate because the program prologue argument for this particular check is an m4 macro, which will not expand to code when autoupdate adds m4 quotes to the AC_LANG_PROGRAM arguments. Change-Id: I85b65fb9b59b45d31286436a9f15110cec31bec8 Reviewed-on: https://gerrit.openafs.org/13021 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason commit c5def62d7be4891f534b753374acbf5b524701eb Author: Michael Meffie Date: Mon Apr 16 10:42:49 2018 -0400 autoconf: update pthread checks Replace obsolete AC_TRY_COMPILE with AC_COMPILE_IFELSE. Replace shell if/then conditionals with AS_IF macros. Reformat indentation and quoting. This change was done manually, since autoupdate copes poorly with the old, nested AC_TRY_COMPILE macros. Change-Id: I2c34d1426f154daff65999076821f49ddaa16a24 Reviewed-on: https://gerrit.openafs.org/13018 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa commit 4706854f57043c8393baa922dd1974176e110a19 Author: Peter Foley Date: Mon Feb 29 13:19:01 2016 -0500 autoconf: updates and cleanup Update autoconf macros to their modern equivalents, according to what the 'autoupdate' tool does. While we're here, remove automake references that aren't being used, and remove the obsolete AC_PROG_LIBTOOL in favor of AFS_LT_INIT. Change-Id: I71066d6d72f8b1d8663e26fec83ae23d7f73f059 Reviewed-on: https://gerrit.openafs.org/12199 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e01053e04a207bc0a7cf07cc9924e37450540fb4 Author: Michael Meffie Date: Thu Jan 25 18:27:00 2018 -0500 SOLARIS: suppress -xarch=amd64 is deprecated warnings The -m64 flag to specify 64bit builds was introduced in Sun Studio 10, circa 2005. The old flag -xarch=amd64 was deprecated as of Sun Studio 12, circa 2007. Ever since Sun Studio 12, the compiler complains with a warning message when the old -xarch=amd64 flag is given: cc: Warning: -xarch=amd64 is deprecated, use -m64 to create 64-bit programs Update the cflags when building the Solaris kernel module for x86 to use the modern -m64 under Solaris 11 or later. Since Solaris 11 has been available since 2010, it is very unlikely a compiler on Solaris 11 would not support the modern -m64 flag. Change-Id: Ib13c00f1c69f34ab1905a8dd4a46c90895046f25 Reviewed-on: https://gerrit.openafs.org/12959 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit cc1724e6f5a8f485197aba6246c909869e58d0b2 Author: Perry Ruiter Date: Thu Apr 23 21:33:27 2015 -0700 afsd: Improve syscall tracing When afsd is started with the -debug flag, extensive debug output is generated including tracing for each syscall. Unfortunately the existing syscall tracing is not especially helpful. It dumps out two constants that we already knew at compile time, the first parameter of the syscall along with the syscall's return code. Specifically it does not tell you which syscall is currently being traced. Here's a current example of afsd -debug: afsd: cacheFiles autotuned to 581250 afsd: dCacheSize autotuned to 10000 afsd: cacheStatEntries autotuned to 15000 SScall(183, 28, 6860800)=0 SScall(183, 28, -847416368)=0 SScall(183, 28, 1)=0 afsd: Forking rx listener daemon. afsd: Forking rx callback listener. afsd: Forking rxevent daemon. SScall(183, 28, 0)=0 SScall(183, 28, 1)=0 ... This patch drops the compile time constants (183 and 28 in the above sample output) and replaces them with the name of the syscall being traced. Additionally the first parameter to a syscall is as likely to be an address as a decimal value so display it in hex. Here's an example of afsd -debug with these changes: afsd: cacheFiles autotuned to 581250 afsd: dCacheSize autotuned to 10000 afsd: cacheStatEntries autotuned to 15000 os_syscall(AFSOP_SET_THISCELL, 0x68bf80)=0 os_syscall(AFSOP_SEED_ENTROPY, 0x7fff9ce40c10)=0 os_syscall(AFSOP_ADVISEADDR, 0x1)=0 afsd: Forking rx listener daemon. afsd: Forking rx callback listener. afsd: Forking rxevent daemon. os_syscall(AFSOP_RXEVENT_DAEMON, 0x0)=0 os_syscall(AFSOP_BASIC_INIT, 0x1)=0 ... [mmeffie@sinenomine.net: avoid c99 array initialization.] Change-Id: I4f3d46d420d19abeddbf719efa04aef7e553d51f Reviewed-on: https://gerrit.openafs.org/11858 Tested-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1f29c9f05f53966df1bbd9ece479155f78f995e0 Author: Michael Meffie Date: Fri Mar 16 20:51:42 2018 -0400 autoconf: attribute type checks Check for function attributes by type and update src/afs/stds.h to conditionally include the attributes detected, instead of checking for specific compilers and compiler versions. This allows attributes to be used when building under Solaris Studio. Change-Id: I8a4dbc1b2cb6032d28176349481085bf6deb309c Reviewed-on: https://gerrit.openafs.org/12963 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 8ecc4976b83a034263b348d1b001dda378b26932 Author: Michael Meffie Date: Thu Aug 9 15:18:50 2018 -0400 opr: avoid empty nonnull argument index lists Commit 71dc077831d339fc5822f2c2c79b65afe14b12f8 changed the AFS_NONULL macro in opr.h to fix a build error on windows by adding an empty argument index list. However, Solaris compilers do not support empty parameter lists. Specify the argument index to allow so nonnull function attributes can be supported on Solaris. Change-Id: I3e629868374eb6484923c253da2cdd1d8eacdb2f Reviewed-on: https://gerrit.openafs.org/13276 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f9b3cf888304d42c2a1a8472fdeeab68a7347859 Author: Michael Meffie Date: Sun Jan 14 09:38:26 2018 -0500 autoconf: check for format __attribute__ to avoid warnings Building with Solaris Studio generates a ludicrous number of warnings in the form: roken.h, line ...: warning: attribute "format" is unknown, ignored Modern Solaris Studio supports several GCC-style function attributes, including the `noreturn' attribute, however does not support the `format' attribute. Currently, configure defines HAVE___ATTRIBUTE__ when the `noreturn' attribute is available. roken headers conditionally declare printf-like functions with the `format' function attribute when HAVE___ATTRIBUTE__ is defined, leading to the warning messages when building under Solaris Studio. Unsupported function attributes generate warnings, not errors. Fix these warnings by defining HAVE___ATTRIBUTE__ if and only if the `format' attribute is supported by the compiler, instead of checking for `noreturn'. Note that the `format' type is currently the only attribute used by roken at this time. Change-Id: I569167333d65df2583befc19befa8d719b93d75a Reviewed-on: https://gerrit.openafs.org/12956 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b818854f19e33315d1b6453b72a55b54d740e976 Author: Michael Meffie Date: Fri Mar 16 20:41:35 2018 -0400 autoconf: import gcc function attribute check macro Import Gabriele Svelto's AC_GCC_FUNC_ATTRIBUTE autoconf macro to check for GCC-style function attributes. This macro is part of the GNU Autoconf Archive[1]. The imported file is distributed under an all-permissive license. [1] https://www.gnu.org/software/autoconf-archive/ Change-Id: I64ccd00717fa9606a26aeeeea9030f4fb4877cf8 Reviewed-on: https://gerrit.openafs.org/12962 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 0da5ac4d9fb2a9b46c7415403a3cd26e711554e2 Author: Andrew Deason Date: Tue Aug 7 17:08:26 2018 -0500 afs: Return memcache allocation errors During cache initialization, we can fail to allocate our dcache entries for memcache. Currently when this happens, we just log a message and try to disable dcache access. However, this results in at least one code path that causes a panic anyway during startup, since afs_CacheTruncateDaemon will try to trim the cache, and afs_GetDownD will call afs_MemGetDSlot, and we cannot find the given dslot. To avoid this, change our cache initialization to return an error, instead of trying to continue without a functional dcache. This causes afs_dcacheInit to return an error in this case, and by extension afs_CacheInit and the AFSOP_CACHEINIT syscall. Also change afsd to actually detect errors from AFSOP_CACHEINIT, and to bail out when it does. Thanks to gsgatlin@ncsu.edu for reporting the relevant panic. Change-Id: Ic89ff9638201faae6c4399a2344d4da3e251d537 Reviewed-on: https://gerrit.openafs.org/13273 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 0bc5c15029cf7e720731f1415fcf9dc972d57ef4 Author: Joe Gorse Date: Mon Jul 2 20:36:04 2018 +0000 LINUX: Update to Linux struct iattr->ia_ctime to timespec64 with 4.18 With 4.18+ Linux kernels we see a transition to 64-bit time stamps by default. current_kernel_time() returns the 32-bit struct timespec. current_kernel_time64() returns the 64-bit struct timespec64. struct iattr->ia_ctime expects struct timespec64 as of 4.18+. Timestamps greater than 31-bit rollover after 2147483647 or January 19, 2038 03:14:07 UTC. This is the same approach taken by the Linux developers for converting between timepsec64 and timespec. Change-Id: Icc1cf5d1a6679f5c749f8720f225a9b293f675fd Reviewed-on: https://gerrit.openafs.org/13241 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ee66819a0c1a9efa98b76a1c18af6233bda1e233 Author: Andrew Deason Date: Thu Jul 26 17:57:38 2018 -0500 libuafs: Stop clobbering CFLAGS Currently, in the libuafs MakefileProto for every platform, CFLAGS is set to a bunch of flags, ignoring any CFLAGS set by the 'make' command-line provided by the user. Since most of the rest of the tree honors CFLAGS, it is confusing and can cause errors when src/libuafs ignore the user-set CFLAGS. One example of this breaking the build is when building RHEL RPMs for certain sub-architectures of the current machine. If you try to 'rpmbuild --target=i686' on 32-bit x86 RHEL 5, we will build with -march=i686 in the CFLAGS, which will be used to build most objects and is used in our configure tests. As a result, our configure tests will say that gcc atomic intrinsics are available. But when we go to build libuafs objects, we will not have -march=i686 in our CFLAGS, which causes (on RHEL 5) gcc to default to building for i386, which does not have gcc atomic intrinsics available. This causes build errors like this: libuafs.a(rx.o): In function `rx_atomic_test_and_clear_bit': [...]/BUILD/openafs-1.8.0/src/rx/rx_atomic.h:462: undefined reference to `__sync_fetch_and_and_4' To fix this, change the libuafs MakefileProtos to not set CFLAGS directly; instead, set them in a new variable UAFS_CFLAGS. Makefile.common then pulls those flags into MODULE_CFLAGS, which is used in our *_CCRULE build rules. While we are here, also move the common set of CFLAGS set by each platform's MakefileProto into Makefile.common. Now, each MakefileProto only needs to set CFLAGS that are specific to that platform, which ends up being very few (since most platforms were using the exact same set of CFLAGS). Relevant issue identified and analyzed by mbarbosa@sinenomine.net. Change-Id: I1bd21a6e7669137be3e5edee86227fd37f841d62 Reviewed-on: https://gerrit.openafs.org/13262 Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a85aab9dfe7c2ee9e025bc15d849de2dd0a48913 Author: Marcio Barbosa Date: Thu Jul 26 10:30:35 2018 -0700 redhat: actually remove unused AFS::ukernel man page Commit 278581c24a802834719e0d57f27978321556c9bb (redhat: package libuafs perl bindings) added swig as a build dependency on RHEL 6+/Fedora 15+ to build and package AFS::ukernel perl bindings for libuafs. The man page for AFS::ukernel is generated from the pod files unconditionally, so needs to be removed from the staging directories when AFS::ukernel is not packaged. Unfortunately, the full path to the staged AFS::ukernel manpage was not given in that commit, so the rpmbuild will fail on RHEL 5 with the error: RPM build errors: Installed (but unpackaged) file(s) found: /usr/share/man/man3/AFS::ukernel.3.gz Fix this error by specifying the full path to the AFS::ukernel man page to actually remove it when we are not packaging AFS::ukernel files. [mmeffie: updated commit message] Change-Id: If43f083a1014216e2f9a2669bf9e834149a40944 Reviewed-on: https://gerrit.openafs.org/13257 Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 9ff5f8f7601cc9761cc6a4ef0e8b7c8c2c8dddb5 Author: Andrew Deason Date: Fri Jul 27 13:36:15 2018 -0500 ubik: Save errno before logging The value of errno can change after a syscall, and ViceLog may issue syscalls (such as write()). So, make sure we save errno here before calling ViceLog(). Issue spotted by kaduk@mit.edu. Change-Id: I0f3308d64cd779bd97c97834ec2b270f5edd7ba6 Reviewed-on: https://gerrit.openafs.org/13263 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0e1c042615d1aeb919a22568cdd2b2ea42c677ba Author: Mark Vitale Date: Fri May 4 17:32:51 2018 -0400 ubik: improve logging for database synchonizations As an aid for debugging database synchronization issues, ensure that the logging is consistent and unambiguous for both the client and server sides of DISK_GetFile and DISK_SendFile. Add new error messages as required. In addition, rework the "recovery sending version to " message in urecovery_Interact. This message is misleading because the new version database is only sent to a DB server if its version is not up to date. Instead, move this message into the version check block immediately below it. Also reword it for clarity and promote its log level from 5 to 0. Finally, remove the now-superfluous "recovery stating local database" log message. Change-Id: If8bbaa1215cab9fd24b157a0ee57759b34e77e9c Reviewed-on: https://gerrit.openafs.org/13079 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit eac22d3e46c72c0e2b82f35c5187d50b6fa136a2 Author: Mark Vitale Date: Fri Mar 17 18:12:23 2017 -0400 ubik: urecovery_AbortAll diagnostic msgs As a troubleshooting aid for developers, add a few counters and a log msg so we know when transactions are being aborted (if any) by urecovery_AbortAll. Change-Id: I528df6d51acd5d10bb2de30f43b8d4415adc7f8a Reviewed-on: https://gerrit.openafs.org/12618 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit 8b0e312d043d435f0e55c6dc14f5446ffedc7ce4 Author: Mark Vitale Date: Mon May 8 21:11:27 2017 -0400 ubik: log important messages at default log level Many important ubik messages (e.g., errors, warnings, sync state changes) are logged at log level 5 (-d 5) or higher. Many sites are reluctant to run ubik servers at a logging level higher than the default due to the large number of extremely noisy informational messages at log level 5. Therefore, many important log messages are never seen. Instead, issue critical errors, warnings, and other important messages at log level 0 so that they are always seen, even at the default logging level. In addition, disambiguate the two "I am no longer sync-site" messages by adding a unique reason text to each. Change-Id: I057edf01e2502e39c5135836f1d0081d03559270 Reviewed-on: https://gerrit.openafs.org/12617 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Michael Meffie commit 483cad0121d848836b4155817b86231ef21be27a Author: Michael Meffie Date: Fri Jul 6 15:22:36 2018 -0400 vldb_check: write mh entry header flags in network order Commit 6b93ad695e53a86dbe9eea13bd0ff651e1d8c9b7 fixed a false error reported when the vldb contained more than one mh extent blocks. That fix changed the readMH() function to convert the flags field to host byte order of all the mh blocks, not just the first block, in order to check the value of those flags. Unfortunately, that commit missed converting non-zero blocks back to network byte order in the complementary writeMH() function, which is used to write the data back to disk when vldb_check is run with the -fix option. FIXES 134589 Change-Id: I4cdbd57b3336e78a9eb1e543ee6d09b33f5e6153 Reviewed-on: https://gerrit.openafs.org/13245 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7523397333c0f8c6a08312434968d84b8ff56306 Author: Andrew Deason Date: Fri Jun 29 15:25:48 2018 -0500 afs: Make afs_osi_Free(NULL) a no-op In userspace, we assume that free(NULL) does nothing, which makes certain cleanup code paths simpler. This may or may not be true for our free() abstractions that can run in the kernel (like afs_osi_Free, rxi_Free, etc), which is confusing. To make the higher-level free() abstractions more consistent, change afs_osi_Free to guarantee that passing a NULL pointer does nothing. Change-Id: If7c7011795f66464eeb578eacfc943475b4d59f8 Reviewed-on: https://gerrit.openafs.org/13236 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e60766286b7a581dcdd14466884ea7fdcae10918 Author: Stephan Wiesand Date: Mon Jul 2 14:05:47 2018 +0200 redhat: parallel builds Parallel builds can be an order of magnitude faster. Add the _smp_mflags macro to all invocations of make in the rpm spec, to make use of all available cores and SMT threads on the build system. This should also help noticing new dependency issues early. Note the macro can be overridden on the rpmbuild command line. Change-Id: Idddf8b867500d1ee73ff51de9d8a173bb4cc8c68 Reviewed-on: https://gerrit.openafs.org/13240 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ab61bcffefdd0a431a435def193cd9a46e3b8ab6 Author: Stephan Wiesand Date: Mon Jul 2 13:33:20 2018 +0200 redhat: speed up userland-only rpm builds When building with --define "build_modules 0", have configure skip the Linux kernel tests, which are slow and many. Change-Id: Ie318bf4939776c9a3f8594dcdd5be54b446f33dd Reviewed-on: https://gerrit.openafs.org/13239 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit babf419886d687f8359159f35e8b89aff5e166f8 Author: Stephan Wiesand Date: Mon Jul 2 13:28:07 2018 +0200 redhat: package new file include/opr/lock.h Commit 792dd44ac57032a3f2a4743c83c8a0208a08ecec added the installation of include/opr/lock.h, but the rpm spec fails to pick it up, making rpm builds fail. Add the new file to the files list for the -devel package. FIXES 134579 Change-Id: I998f48bd88308d81779dd775b322590eda75d5c8 Reviewed-on: https://gerrit.openafs.org/13238 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 89e80c354c404dedc0e5197f99710db0e5e08767 Author: Andrew Deason Date: Thu Jul 5 17:16:48 2018 -0500 LINUX: Detect NULL page during write_begin In afs_linux_write_begin, we call grab_cache_page_write_begin to get a page to use for writing data when servicing a write into AFS. Under low-memory conditions, this can return NULL if Linux cannot find a free page to use. Currently, we always try to reference the page returned, and so this causes a BUG. To avoid this, check if grab_cache_page_write_begin returns NULL, and just return -ENOMEM, like other callers of grab_cache_page_write_begin do. Linux's fault injection framework is useful for testing code paths like these. The following settings made it possible to somewhat-reliably exercise the relevant code path on a test RHEL7 system: # grep ^ /sys/kernel/debug/fail_page_alloc/* /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:Y /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:N /sys/kernel/debug/fail_page_alloc/interval:1 /sys/kernel/debug/fail_page_alloc/min-order:0 /sys/kernel/debug/fail_page_alloc/probability:100 /sys/kernel/debug/fail_page_alloc/space:90 /sys/kernel/debug/fail_page_alloc/task-filter:Y /sys/kernel/debug/fail_page_alloc/times:-1 [...] Change-Id: I00908658ae43aa3c8e12f2a0b956016d4441016c Reviewed-on: https://gerrit.openafs.org/13242 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b1ad473be01162fe9b3835544a835c4dcf0fcb35 Author: Mark Vitale Date: Sat Jun 30 17:35:09 2018 -0400 rxevent: prevent negative rx_connection refCount rxi_ChallengeEvent is called directly from rxi_ChallengeOn to start the first challenge; subsequent calls to rxi_ChallengeEvent are from the event handler. When called as an event, we must putConnection the reference held by the event. But when called directly for the first time, the event has not been scheduled yet and so has not taken a reference on the connection. For this case, we must not putConnection or the rx_connection refCount will go negative. One reported symptom of this bug is a fileserver crash with: 'Assertion failed! file rx.c, line 1327.' Introduced by commit 304d758983b499dc568d6ca57b6e92df24b69de8 ('Standardize rx_event usage'). Change-Id: I67122ff84ac9b1b6445ad4005e76e5f8482fd7be Reviewed-on: https://gerrit.openafs.org/13228 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 328590dc5669cae3db6c509871b612b0384ea33d Author: Jeffrey Altman Date: Sat Mar 24 01:22:54 2018 -0400 volser: DoVolDelete returning VNOVOL is success When moving, copying or releasing volumes, do not treat a failure to delete a volume because the volume no longer exists as an error. The volume clone has flags VTDeleteOnSalvage | VTOutOfService assigned to it which means that the fileserver won't attach the volume and volume has its deleteMe field assigned the value of DESTROY_ME. Such a volume will be deleted the next time the salvager scans the partition. Once the transaction is complete the volume might be removed. Change-Id: I0bd38906e3836e0c96f3784a8bd9ad63f5b857c6 Reviewed-on: https://gerrit.openafs.org/12976 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0322dd56b20b2e2fd6eb7f217964174fb5d25cdd Author: Andrew Deason Date: Thu Jun 28 13:08:47 2018 -0500 afs: Change afs_AllocDCache to return error codes Currently, afs_AllocDCache can fail in 2 different situations: - When we are out of dslots on the free/discard lists - When we encounter an i/o error when trying to traverse the dslot lists But afs_AllocDCache cannot distinguish between these two cases to its caller in any way, since all we have to return is a struct dcache (and so we return NULL on any error). Currently, the caller of afs_AllocDCache in afs_GetDCache is determining which of these cases happened by looking at afs_discardDCList and afs_freeDCList, to see if they look empty. This is not great for at least a couple of reasons: - We are examining afs_discardDCList/afs_freeDCList after we drop afs_xdcache (but while still holding GLOCK) - If afs_discardDCList/afs_freeDCList are somehow changed while afs_AllocDCache is running, we may infer the wrong reason why afs_AllocDCache failed. (currently impossible, but this seems fragile) And in general, this check against afs_discardDCList/afs_freeDCList is rather indirect. It may be easier to follow if afs_AllocDCache just directly returned the reason why it failed. So do that, by changing afs_AllocDCache to return an error code, and providing the struct dcache in an output argument. This involves similiarly changing several called functions in the same way, to return error codes. We only define 2 such error codes with this commit: - ENOSPC, when we are out of free/discrad dslots - EIO, when we encounter a disk i/o error when trying to examine the dslot list Note that this commit should not change any real logic; we're mostly just changing how errors are returned from these various functions. Change-Id: I07cc3d7befdcc98360889f4a2ba01fdc9de50848 Reviewed-on: https://gerrit.openafs.org/13227 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4ab70de9641807bd06056f0c1ac79550453b9574 Author: Andrew Deason Date: Thu Jun 28 12:50:52 2018 -0500 afs: Make afs_AllocDCache static Nothing using afs_AllocDCache outside of afs_dcache.c. Declare the function static, to ensure that nobody else uses it, and to maybe allow for more compiler optimization. Change-Id: I4e4d1e77e20e853fc20b3d5c5289a5f4124de7a4 Reviewed-on: https://gerrit.openafs.org/13226 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e14ba54095ea44ca2d6e6833280a201186da91f8 Author: Mark Vitale Date: Fri Mar 17 21:42:31 2017 -0400 ubik: log when a server is marked down, and why In order to better manage voting and recovery, each ubik server tracks (in array ubik_servers) which of its fellow quorum members are 'up' or not. However, ubik currently logs only when a server is "back up"; that is, ubik_server->up transitions from 0 to 1. Add new log messages to identify the time and reason when a server is "marked down" (i.e., ubik_server->up transitions from 1 to 0). Also modify two existing messages to have consistent wording with the new "marked down" messages. Also change them to ViceLog (log level 0) so they will always be logged. Change-Id: I29ee93e96cb7b28b943171d1477671c540a10d78 Reviewed-on: https://gerrit.openafs.org/12616 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0839a3326858f7d7a0042614710dcf7316bb6018 Author: Mark Vitale Date: Thu Jun 14 14:38:54 2018 -0400 afs: remove dead code afs_CheckLocks has been dead code since openafs-ibm-1_0. No functional change incurred. Change-Id: I9d57cf3bbbddef182fb128f65b04465bfe0fb492 Reviewed-on: https://gerrit.openafs.org/13210 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b4b50118d889999042e23507df6eab6eb164b38b Author: Mark Vitale Date: Thu Jun 14 14:03:45 2018 -0400 vol: remove dead code PartitionID has been dead code since openafs-ibm-1_0. No functional change incurred. Change-Id: I93da25ef853716db7a0b7f945f8b19a15a055a43 Reviewed-on: https://gerrit.openafs.org/13209 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9d0b2698ac7ab8bb689f30d819bbef08c05a8bf7 Author: Benjamin Kaduk Date: Fri Jun 15 09:07:04 2018 -0500 Comment out missing comerr functions from afsauthent.def Apparently commit 70c4922980d1596155b4021cd72d6895c2371e23 was overzealous in making Windows match Unix, as these functions are not available in the Windows build. Change-Id: Ia24430e5069cd61c0557a07d1bd2c35a6872db8c Reviewed-on: https://gerrit.openafs.org/13219 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 907e09ff2b7e86005765a594db27e1df194ec204 Author: Benjamin Kaduk Date: Fri Jun 15 08:39:47 2018 -0500 Comment out opr_AssertionFailed from afsrpc.def Apparently the Windows utilities link opr.lib directly, so this caused a "multiply defined symbol" error. Change-Id: I0499f789a493960b99052e00763703698b3f9517 Reviewed-on: https://gerrit.openafs.org/13216 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 94f1c1e2a7125e93ed49de31522be806af28626b Author: Benjamin Kaduk Date: Fri Jun 15 08:16:26 2018 -0500 Comment out (again!) xdr_Capabilities from afsrpc.def This shows up as an "unresolved external" when linking (though apparently this error does not cause a buildbot failure), noticed when viewing a related windows build log. Change-Id: I8bd5e344c1b0e12e0c70e0340bacbc6a94984767 Reviewed-on: https://gerrit.openafs.org/13215 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 472d6b1ee2f7de415e0fa0f8be0636f86956b6fc Author: Michael Meffie Date: Thu Jun 14 15:01:18 2018 -0400 ubik: do not assign variables in logging argument lists Several logging statements in ubik contain an assignment statement within the logging function call argument list, which would set a variable as side effect of evaluating the function call arguments. These embedded assignments are problematic since the logging function calls have been replaced by ViceLog macros, which avoid the overhead of a function call depending on logging levels. Remove the embedded assignments within the logging argument lists so the variables are always set regardless of the logging level. Change-Id: Ifc0f32df2d01f9d8105b49e2c56a95758b184449 Reviewed-on: https://gerrit.openafs.org/13211 Tested-by: BuildBot Reviewed-by: Joe Gorse Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit e08b9c8d36da3f37efabfb3f94476108a5985d23 Author: Benjamin Kaduk Date: Thu Jun 14 20:37:46 2018 -0500 Remove the unused opr_AssertFailU() function Change-Id: Idb55adeea508d3376269bce998eb8b1c3e4cbd59 Reviewed-on: https://gerrit.openafs.org/13213 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 691757576fb6d60a34fef2c4bc50ae581b65ad76 Author: Benjamin Kaduk Date: Thu Jun 14 20:35:46 2018 -0500 Un-export opr_AssertFailU It appears to have been created for parity with osi_AssertFailU, but was then never used. It is safe to remove the export line, since this export has never been in a released version of OpenAFS. Change-Id: Ia0bdaec891450fe9a3ca10badcaba68bea27c466 Reviewed-on: https://gerrit.openafs.org/13212 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 14da55719c9e9bff1f8e7e02c8a8d47c59fb7b4a Author: Pat Riehecky Date: Wed Jun 6 11:10:25 2018 -0500 mcas: Make sure 'padding' is null-terminated With 'padding' explicitly filled with all spaces string copy operations may result in unexpected values. Padding is extended by 1 and null terminated to avoid unexpected behavior. (via cppcheck) Change-Id: I8a9845ae87002018705ad23c2b089c8ef571b7bc Reviewed-on: https://gerrit.openafs.org/13164 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6e7db633efad1c88bb300089e3bd4c9feaea5f23 Author: Benjamin Kaduk Date: Thu May 31 19:02:18 2018 -0500 libafsrpc: export more xdr functions Most of the xdr functions in the library text are to support RXAFS and RXAFSCB RPCs, which we explicitly do not expose from libafsrpc. As such, they do not need to be in the export list, but a couple of generic ones probably should be exported. Do so, for both Unix and Windows. Change-Id: I12ddf2427d807f4ee7b07af1e1c498fc119a0f1c Reviewed-on: https://gerrit.openafs.org/13139 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0b1edd96ac7a952148ec14f8baaf60c8d8bbc04f Author: Benjamin Kaduk Date: Thu May 31 19:00:03 2018 -0500 libafsrpc: export some more rx functions Change-Id: I6aea7eff7a5bc957896a5a7457a945dd0feaec88 Reviewed-on: https://gerrit.openafs.org/13138 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f01ee714152a0a6247f2f456aa1f0a728d74373c Author: Benjamin Kaduk Date: Thu May 31 18:40:21 2018 -0500 Export missing opr functions from libafsrpc Our assertion macros expand to function calls, and we have assertions included in macros in installed headers, so the public needs to be able to link against them. Export for both Unix and Windows. Change-Id: Ibd1da844f274398e9296f00241b1be48bb95e4fe Reviewed-on: https://gerrit.openafs.org/13137 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c12cfd7331727142cb928e08ec32a708d0cfd1e9 Author: Benjamin Kaduk Date: Sun May 27 22:54:01 2018 -0500 libafsauthent: export additional xdr_ functions Formally, we need to use xdr_free to deallocate storage for RPC output variables, in case the XDR stack uses a different allocator than the standard application allocator. Some types have non-autogenerated wrappers exposed already (e.g., token_FreeSet()), but for a handful of the base ptint types we need to expose the xdr routines in order for a safe way to deallocate their storage to be available. Change-Id: Iaac349cfaa1a07d5908a88e4c230874c6301471a Reviewed-on: https://gerrit.openafs.org/13131 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 12f4fd2901fee8bf27c2cec97efd3d242c6ff025 Author: Andrew Deason Date: Thu Apr 26 12:27:12 2018 -0500 afs: Stop looking for dcaches on Get*DSlot errors In various places in the code, we'll be looking for a dslot, calling afs_GetValidDSlot (or afs_GetUnusedDSlot) in a loop. In a few places, we currently keep looking for the dslot when we get an error back, since afs_GetValidDSlot may return successfully for other slots, and we might find the dslot we're looking for. This behavior was introduced in a few commits, including: - commit 2679af76 (afs: Traverse discard/free dslot list if errors) - commit 00fd34a6 (afs: Handle easy GetValidDSlot errors) - commit 9a558660 (afs: Cope with afs_GetValidDSlot errors) This behavior means that if afs_GetValidDSlot/afs_GetUnusedDSlot returns an error for a particular dcache slot, but other slots are okay, then we may still find the dcache we're looking for. However, by far the most common reason that afs_GetValidDSlot/afs_GetUnusedDSlot fails is because our disk cache is completely unusable; it is very rare that only a few slots cannot be used, but others are fine (this would mean that the disk cache was corrupted in oddly specific ways, or there are small isolated errors in the underlying disk). So continuing the dcache search in these situations is not very useful. On Linux, this is most commonly seen by the underlying disk cache i/o calls returning -EINTR, which can happen if a SIGKILL signal is pending for the current process when we try to do the i/o. In this situation, all attempts to read in a dslot from disk will fail; trying other slots or waiting will not improve the situation. Depending on which specific code path encounters an afs_Get*DSlot error, we can then flood the log with "disk cache read error in CacheItems" messages emitted from afs_UFSGetDSlot, since we keep calling afs_Get*DSlot in our loop. The worst offender of this is usually afs_GetDSlotFromList via afs_AllocDCache, since we end up calling afs_GetUnusedDSlot for every single dslot in the free and discard lists. However, our other call sites that are looking for dcaches for a specific file can still generate quite a few of these messages, since we'll end up calling afs_GetValidDSlot for every slot in a dcache hash chain. So to avoid flooding the log in these situations, change most callers of afs_GetValidDSlot and afs_GetUnusedDSlot to stop on the first error, and act like we never found a dcache that we were looking for. This commit also adjusts one caller in afs_ProcessOpCreate, which was not handling errors from afs_GetValidDSlot at all, and changes FlushVolumeData to be able to return error codes. Change-Id: I3047da690d39c000ef59dfc0ad526ecc5e382104 Reviewed-on: https://gerrit.openafs.org/13034 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit bec329c1c81d96b5933527f7cdb3638f24833087 Author: Andrew Deason Date: Thu Apr 26 12:01:57 2018 -0500 afs: Avoid GetDCache delays on screwy cache Currently, if our afs_AllocDCache call fails in afs_GetDCache, we retry once per second for 5 minutes. The reasoning is that we're out of dcache slots, and so if we wait a little while, maybe something will become freeable and we can continue. However, afs_AllocDCache can also fail if we have plenty of free dslots, but we are unable to successfully call afs_GetUnusedDSlot() on any of them. This can happen if our disk cache is screwed up, and so waiting and retrying will not make things better (but we'll spew a ton of "disk cache read error in CacheItems slot" errors in the log each time, and do so 300 times). So instead, only do our sleep/retry loop if we actually appear to be out of free or discarded dslots. Otherwise, just return an error immediately, since sleeping and retrying will not make anything better. Change-Id: I331913ab882216e3f71cc44da91f7f7d33c34004 Reviewed-on: https://gerrit.openafs.org/13033 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0ff2364bd5e68c0a7587f8fbc552bf20b99d7039 Author: Andrew Deason Date: Thu Apr 26 12:02:18 2018 -0500 afs: Avoid GetDCache panic on AllocDCache failure Currently, in afs_GetDCache, if afs_AllocDCache fails, we retry for 5 minutes and then panic. Panicing in this situation is completely unnecessary; afs_GetDCache can fail for a variety of other mundane reasons (such as, if we can't fetch the requested data from the relevant fileserver). It may seem unusual for afs_AllocDCache to fail for over 5 minutes (this is supposed to mean that we're out of dslots, and our attempts to free up dslots have failed). However, afs_AllocDCache can also fail if we are having issues in accessing the disk cache, and so we may not be out of cache space or dslots at all; we just can't access the cache. In this case, afs_AllocDCache can easily fail forever; waiting longer or trying to free up cache space isn't going to help. So, to avoid panicing in such situations, just make afs_GetDCache return an error. We just need to make sure afs_xdcache is unlocked, and then we can just jump to 'done', like plenty of other codepaths do; no extra cleanup is required. Also since we are removing a panic, add a log message when this situation happens, so EIO errors don't suddenly pop up silently. Change-Id: I9b8dd6c861b8066822c44758566c05abd7dc1660 Reviewed-on: https://gerrit.openafs.org/13032 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot commit 35b6d2a6d5a1ca13544a217a35688e9a0f6b6ec6 Author: Andrew Deason Date: Wed Feb 28 18:25:46 2018 -0600 rxgk: Define some protocol constants rxgk_int.xg is missing a few constants mentioned in the respective protocol specs: - The RPC-L definitions for PrAuthName are defined, but no PRAUTHTYPE_* constants for the 'kind' field are defined. Define at least PRAUTHTYPE_GSS, which rxgk uses. - The rxgk spec indicates a size of 20 for the nonces used in rxgk challenge and response packets. Define a constant (RXGK_CHALLENGE_NONCE_LEN) for this value, to make it easier to define similarly-sized structures. - The rxgk-afs spec defines the time value of 0 as a special "never expires" value. Define a constant (RXGK_NEVERDATE) to represent it. Change-Id: I07e1a1b19d1c887fd3e1a1d0f270d5af7b8581b0 Reviewed-on: https://gerrit.openafs.org/12939 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 27d7b8fe4603c39362983758fe6a749fa5ffa4e5 Author: Mark Vitale Date: Fri May 4 15:42:14 2018 -0400 ubik: make ContactQuorum_* routines static Most of the ContactQuorum_* routines are only used in ubik.c, so make them all static - except for ContactQuorum_DISK_SetVersion, which is called from disk.c. Change-Id: I7d1ccd839e01ea8ee8d768dd369a892773361b05 Reviewed-on: https://gerrit.openafs.org/13078 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 8b1e730c11a6ed7dc067ef185302bd57a69f6d1e Author: Mark Vitale Date: Wed May 9 16:50:55 2018 -0400 ubik: remove unused ContactQuorum_DISK_Write This function is not used; remove it. No functional change is incurred by this commit. Change-Id: I7e3bb26fb62b0e28c8703154eb3df384d4dbc32d Reviewed-on: https://gerrit.openafs.org/13077 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b9fe4d4290ad19faf3b5fb5dc0c3b1ee3ee5ab69 Author: Mark Vitale Date: Mon May 8 17:50:00 2017 -0400 ubik: disambiguate "Synchonize database with server" msgs Ubik issues the same message in two very different cases: - sync server issues DISK_GetFile to obtain the latest version - non-sync server receives DISK_SendFile from the sync server Modify the messages so they provide more information and are distinguishable from each other. Change-Id: I99e8adc7229260f478a0df15791216e090d2e113 Reviewed-on: https://gerrit.openafs.org/12615 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit fdc8adbf0904cbbc0590379c5cb702a15273b40c Author: Mark Vitale Date: Tue Jun 5 14:12:20 2018 -0400 xdr: remove dead code, whitespace from xdr_enum The 'enum sizecheck' declaration has been unused since openafs-ibm-1_0; it is apparently vestigial from the original XDR code. Remove it, along with some extraneous whitespace. No functional change is incurred by this commit. Change-Id: I9f725ab6aff6cafa911975e9edaed8f07c8a328a Reviewed-on: https://gerrit.openafs.org/13076 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit eb1d2ef203a2a99c908b3b89d9ea8337a91b944b Author: Mark Vitale Date: Wed Jun 6 15:23:26 2018 -0400 xdr: avoid xdr_enum memory overrun Since openafs-ibm-1_0, xdr_enum has used xdr_long to read and write, even though enum_t is defined as int. For systems where sizeof(int) == sizeof(long), this works by accident. But other systems (e.g., DARWIN ARCHFLAGS=x86_64) xdr_enum will overrun its int-sized second parameter. For XDR_DECODE, this results in memory corruption. This was first noticed with OpenAFS 1.8.0 on macOS 10.13; if aklog is issued while already holding a token, it will fail in token_SetsEquivalent with a segfault in decodeToken. The root cause is that the address passed to decodeToken had been overwritten by a previous call to tokenType -> xdr_enum -> xdr_long. Instead, modify xdr_enum to use xdr_int for its work. Change-Id: I671d55588d88e0640f365624b83bd04b53dc97cc Reviewed-on: https://gerrit.openafs.org/13075 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit ef6a1e8118a25b885889179739a3539a598068bc Author: Benjamin Kaduk Date: Sun May 27 16:23:16 2018 -0500 libafsauthent: export ugen_ClientInit* Windows was only exporting the bare version and not the Cell/Flags/Server versions; Unix was exporting none of them. These routines for obtaining a ubik client are more generic than the historical (and already exported) ubik_ClientInit routine, allowing for the use of an alternative configuration directory, additional flags, and the like. Change-Id: I6577ef5f95d2b801c049befa9fddd3b605ff80f5 Reviewed-on: https://gerrit.openafs.org/13130 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 1974eac772157651594c1b76ea8f55e4567b3ec5 Author: Benjamin Kaduk Date: Sun May 27 16:03:12 2018 -0500 libafsauthent: Export more token-manipulation functions For both Windows and Unix. Change-Id: Icd90a2fd3f674b13dd44323d9bc20a8f1070a16e Reviewed-on: https://gerrit.openafs.org/13129 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 4008f83ca80c5ed7b612a13f760b4bb8b9866f2b Author: Benjamin Kaduk Date: Sun May 27 15:18:12 2018 -0500 libafsauthent: export ktc token 'Ex' routines for Unix We need these to handle the modern identity structures (they are already exported on Windows). Change-Id: I3a3f766e9c9a9fad96f2656c4f066a67cacee4a6 Reviewed-on: https://gerrit.openafs.org/13128 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit cdd1f16f5ef52093a8f7d3f87a45775d3c87b780 Author: Benjamin Kaduk Date: Sun May 27 14:18:07 2018 -0500 libafsauthent: export more afsconf_ functions We have new functions for (among other things) typed keys, and generic rx identity management; expose them as well as the legacy key- and user- management functions, on both Unix and Windows. Change-Id: Id9bc394d631f9c00915520aff763af497ef2035b Reviewed-on: https://gerrit.openafs.org/13127 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit bcce41bd99b4361631b64cf4749d1dcf80df1cd7 Author: Benjamin Kaduk Date: Sun May 27 13:11:05 2018 -0500 Synchronize libafsauthent afsconf_ exports with windows The Windows library was exporting several more afsconf_* symbols than the Unix one; bring them into sync. Change-Id: Ifba074124a0a3cfeed256553d7dbedbebd3c2996 Reviewed-on: https://gerrit.openafs.org/13126 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 1dc9bb4e7362029db073250f23a09f949e1655de Author: Mark Vitale Date: Fri May 25 17:05:28 2018 -0400 afs: fix broken volume callbacks (e.g. vos release) Commit e99bfcfaa3bca3e65f03928718c2c9eb5eff7c8c ('afs: use jenkins hash for dcache, vcache tables') introduced new hashing implementations for the dcache and vcache hash tables. Unfortunately, a typo introduced a bug into the VCHashV hash function; instead of hashing by volume id, it currently hashes by vnode. The most common symptom is that volume callbacks (RXAFSCB_Callback with fid :0:0) fail to find and invalidate all the files for the specified volume. This typically manifests as persistent stale RO content after a 'vos release' for new RW content. This bug only affects the Unix cache manager; the Windows cache manager implementation of RXAFSCB_Callback was unaffected. Change-Id: I7edca660671b880a69f0c499d54adffbbe62d2b2 Reviewed-on: https://gerrit.openafs.org/13090 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e71985bce593e9dba43443e084eb726fcc5259e3 Author: Pat Riehecky Date: Fri May 25 12:03:35 2018 -0500 Remove pointless assignments scan-build identified these var assignements as being unused or redundant. Change-Id: I3b51e3e1503c0724a2cf1bab37e1c02f4ae533b2 Reviewed-on: https://gerrit.openafs.org/13086 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 9670937d5f12f1edc7bdcb588133f53ec1af2d6f Author: Pat Riehecky Date: Fri May 25 12:48:15 2018 -0500 Convert extended character set to unicode Change-Id: I9989f16ac670e007827ecfe8e02daf9b36d98d4e Reviewed-on: https://gerrit.openafs.org/13088 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2b08d687b992f238fa59773ef2ff1710c520f861 Author: Pat Riehecky Date: Fri May 25 12:11:54 2018 -0500 Add missing va_end Per man va_start: Each invocation of va_start() must be matched by a corresponding invocation of va_end() in the same function. Change-Id: I703bb3e633435f9c9a62717333a6027476b6bab8 Reviewed-on: https://gerrit.openafs.org/13087 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a411366f57dcf39cc17b6d61d8332e520dff57d1 Author: Pat Riehecky Date: Wed May 23 15:50:45 2018 -0500 Add braces to empty conditional blocks GCC 7+ is able to quickly optimize away empty if/else blocks if the braces are provided. While this adds some additional syntax, it should also result in faster optimization, so change our empty blocks after conditionals to use braces. FIXES 134377 Change-Id: I2b5e39fd8a3819e07077c2a4f28a9aa5ac432e1e Reviewed-on: https://gerrit.openafs.org/13081 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 759f29cfdfabed4dc5c1b96a0b2b79a3f83c08e3 Author: Michael Meffie Date: Mon Apr 25 11:19:10 2016 -0400 Windows: define AFS_IHANDLE_PIO_ENV for ihandle pio Support for positional i/o in the ihandle package was added to the windows platform in commit 50b6a116a1c412d0e6d7442d13d6e92c9dbb35ee using native windows functions. That commit also defined HAVE_PIO in the windows version of the afsconfig.h file. Unfortunately, that definition of HAVE_PIO is not limited to the ihandle package. Remove the project-wide HAVE_PIO definition from the windows afsconfig.h file and define the new AFS_IHANDLE_PIO_ENV symbol when position i/o support is available in the ihandle package. Build the fallback ih_pread and ih_pwrite functions (which use lseek) only when positional i/o is not available in the ihandle package for the current platform. Use AFS_IHANDLE_PIO_ENV instead of HAVE_PIO in ih_open() to determine when it is is safe to share ihandles among threads. Change-Id: I39b078177bc5a2f1daf8a8f8e6bfb1c76e6dfaf7 Reviewed-on: https://gerrit.openafs.org/12270 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 343234d221ae8388f55748f5c494a42d5d69bfa0 Author: Michael Meffie Date: Mon Apr 25 11:06:11 2016 -0400 ubik: convert ubik_print to ViceLog Use the server logging macros instead of the utility functions to avoid function call overhead, especially at logging level 25. The server logging macros perform a logging level check in-line to avoid the unnecessary ubik_dprint* calls. Change-Id: Ia86efad6257b764f0922957017fe8326f0de76d3 Reviewed-on: https://gerrit.openafs.org/12619 Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 8225518cd08b810bf3d8c74e27e3d3a753b6b30b Author: Mark Vitale Date: Tue Apr 24 14:41:11 2018 -0400 ptserver: improve PR_GetHostCPS logging The IP address of the host is logged as a signed number. Instead, log it as the unsigned (and hex) representation of the host IP addr. Change-Id: Ic8b2b7da852a3dc7e9984b63da70d0403845452e Reviewed-on: https://gerrit.openafs.org/13043 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 849ddd4fde0759e385cf3ed4054fc11c36a62fc3 Author: Benjamin Kaduk Date: Sat May 5 15:59:08 2018 -0500 Export afs_getDirPath from shared libraries Add this function to the export list for libafsauthent on Windows and Unix. Change-Id: Ib6f219e407b75a6052d6e29008977c8545b2aa36 Reviewed-on: https://gerrit.openafs.org/13059 Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 554c38473d1465af4c4613209229c274807fffd8 Author: Benjamin Kaduk Date: Sat May 5 15:42:51 2018 -0500 Rename getDirPath to afs_getDirPath in preparation for export The symbol name getDirPath is rather generic and we probably shouldn't squat on it in the application's namespace. In preparation for exporting this functionality from the Unix shared libraries, rename it to afs_getDirPath. Retain a Windows-only wrapper getDirPath that can continue to be exported from libafsauthent on Windows, for ABI compatibility. New consumers should use afs_getDirPath. Change-Id: Ie3f3f7b0662451353834d2e3b5c3dd1131c1935e Reviewed-on: https://gerrit.openafs.org/13058 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit b48fe6b57f13bacb368e27389ccd3f9c279822da Author: Benjamin Kaduk Date: Sat May 5 15:35:03 2018 -0500 Remove duplicates from liboafs_util.la.sym Remove the extra copy of things which appeared twice. Change-Id: I95542172f28759852a76589d05845869cf7e9c9a Reviewed-on: https://gerrit.openafs.org/13057 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3be1de0e823db7068e27b9c5c30a91673f058e52 Author: Benjamin Kaduk Date: Sat May 5 14:42:31 2018 -0500 Export ubik_PR_ symbols from libafsauthent Also export from liboafs_prot the ones missing from this set. This brings the unix exports in sync with the Windows exports (of ubik_PR_ symbols), and is tested as being sufficient to compile python-afs. Change-Id: I77941aa7fbbcb154c67769fe875474920d86d756 Reviewed-on: https://gerrit.openafs.org/13056 Tested-by: BuildBot Tested-by: Anders Kaseorg Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 70c4922980d1596155b4021cd72d6895c2371e23 Author: Benjamin Kaduk Date: Sat May 5 14:00:27 2018 -0500 Export comerr initialization functions from libafsauthent Add to the libafsauthent export symbol list these comerr initialization functions so that they are usable by consumers. Change-Id: I72c6f9402a46aff6fa2719c0b9e0974c7ff7b57e Reviewed-on: https://gerrit.openafs.org/13055 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 792dd44ac57032a3f2a4743c83c8a0208a08ecec Author: Benjamin Kaduk Date: Sat May 5 13:11:00 2018 -0500 opr: install afs/opr.h and opr/lock.h These headers are (transitively) referenced from rx_pthread.h, which is pulled in from rx.h when AFS_PTHREAD_ENV is defined. As such, we are presenting an incomplete public API without this header. Change-Id: I8afd1d635534910739ec37d56201a86998962cfa Reviewed-on: https://gerrit.openafs.org/13054 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 845c8927ef20e245bb88bc783dc2e581b61fbaba Author: Mark Vitale Date: Fri May 19 16:34:21 2017 -0400 ubik: remove redundant memset from udisk_write When udisk_write is extending the database, DRead will return a null buffer. udisk_write then calls DNew to get a brand new buffer for the extension write, and clears it with memset. However, this is redundant, since DNew has already cleared the new buffer. Remove the redundant memset. No functional change should be incurred by this commit. Change-Id: Ia6768098fb3c67475c8948c874b92b91bf17cdb7 Reviewed-on: https://gerrit.openafs.org/12621 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Tested-by: BuildBot commit e4c7321560acf0bd34eeee7d46269818d82fdb44 Author: Mark Vitale Date: Wed May 17 16:32:20 2017 -0400 ubik: death to orphaned signals ubik has a few very old "orphaned" LWP events that are signalled via LWP_NoYieldSignal, but have no matching waits (LWP_WaitProcess). Each "signal" runs the LWP waiting element list for each LWP on the blocked queue; this may add up to substantial wasted overhead on a heavily loaded ubik server. Remove the orphaned signals. No functional difference should be incurred by this commit. Change-Id: I66eba45975a829216e7af1927e51ec6aab63f570 Reviewed-on: https://gerrit.openafs.org/12620 Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Mark Vitale Tested-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 55013a111394052a0253c87a744d03dfabd1be75 Author: Pat Riehecky Date: Wed May 23 15:42:09 2018 -0500 lwp: Fix possible memory leak from scan-build It is possible for LWP_CreateProcess to return early. When it does, it should free up any memory it allocated before leaving scope. Change-Id: Ib5644d36dc01bbac33804f4a039661ce2c78969d Reviewed-on: https://gerrit.openafs.org/13080 Reviewed-by: Andrew Deason Reviewed-by: Marcio Brito Barbosa Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 850c7c50dccbdebb8e0a44da4fc7840760d9e02d Author: Michael Meffie Date: Fri Apr 27 23:08:34 2018 -0400 util: check for trailing characters in partition names The function which maps partition names to partition ids currently ignores trailing characters in the partition names. For example, the partition name "/vicepbogus" is currently considered a valid partition name ("/vicepbogus" maps to "bo" which is id 66). Although this is not a regression, it is problematic for several reasons. Firstly, this can lead to duplicate partition ids on the server, for example "/vicepbad" and "/vicepbar" both map to the same partition id ("ba" is id 52). Second, partitions are internally tracked by numeric id. The partition names are generated from numeric ids when reporting partition names. This means the trailing characters are lost when reporting the partition names. For example, vos reports the attached partition "/vicepbad" as "/vicepba". Third, it could be possible (but perhaps unlikely) in the future to extend the range of partition ids, so the trailing characters could become significant at that time. Finally, it could be confusing to admins that such partition names are attached by the fileserver. For example, "/vicepaa-backup" is attached and is used by the fileserver as partition id 26. This change adds a check for trailing characters in partition names in the volutil_GetPartitionID function, so it is more strict in what it accepts as a valid partition name. That function will now return -1 (illegal partition name) when trailing characters are found in partition names. Change-Id: Iad9aee05fcf439cac9afcd89cf367be693261fbd Reviewed-on: https://gerrit.openafs.org/13039 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Andrew Deason commit c0f2c26e9298d12209fbb5e523ea3173202316e5 Author: Michael Meffie Date: Fri Apr 27 22:59:57 2018 -0400 vol: check for bad partition names Currently, servers attempt to attach any partition name starting with "/vicep", even partition names which map to out of range partition ids. Examples of such misnamed partitions are "/vicepzz", "/vicep0", and others. The presence of these misnamed partitions cause the server processes to crash on startup, since the out of range partition ids are used as an index. Add a check for the bad partition names in VCheckPartitions to avoid attaching them. Log a warning for such partitions to let the admins know why the partitions are not attached. Change-Id: I553ce6cc8bc751b9ed789312f7efb4e0f737a52e Reviewed-on: https://gerrit.openafs.org/13038 Reviewed-by: Benjamin Kaduk Reviewed-by: Marcio Brito Barbosa Reviewed-by: Andrew Deason Reviewed-by: Mark Vitale Tested-by: Benjamin Kaduk commit f1d389e80367c7ea532441f9aa27a6cc3e2853a7 Author: Andrew Deason Date: Thu May 10 16:23:48 2018 -0500 ubik: Make udisk_Log* functions static Nothing uses the udisk_Log* functions outside of disk.c. Declare these static to make sure they stay that way, to make it easier to change their semantics. Change-Id: I068684782b22af788ce892c995a6d80f2d9fb2e0 Reviewed-on: https://gerrit.openafs.org/13069 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b8617f08d1bf57a6b3fbba44e5b4de24dc84a9bb Author: Andrew Deason Date: Thu May 10 16:05:10 2018 -0500 ubik: Remove 'mtime' from ubik_stat Nothing uses the 'mtime' field from ubik_stat. Remove it. Change-Id: I7611a7ca5aa5743be43aefafeda5ecf9a5d47598 Reviewed-on: https://gerrit.openafs.org/13068 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f045de21a45fcc8f71e2b30e826c22c8a7b4d0f2 Author: Jeffrey Altman Date: Fri May 11 15:44:24 2018 -0400 viced: SRXAFS_InlineBulkStatus set InterfaceVersion on error AFSFetchStatus.InterfaceVersion is required to be "1" for any of the fields in the structure to be considered valid. Therefore, InterfaceVersion must be set to one when returning an 'errorCode' value. When RXAFS_InlineBulkStatus was introduced by OpenAFS in 362d26c733b086d26f013bd229af979a112098f5 not only wasn't InterfaceVersion set but neither was the memory allocated to OutStats initialized. As a result the InterfaceVersion field value could be not only zero but random. The OutStats memory was initialized to zeros beginning with 726e1e13ff93e2cc1ac21964dc8d906869e64406. Change-Id: I5ca1b08cb32d01843a1c6dee87d8ba1d560396c8 Reviewed-on: https://gerrit.openafs.org/13067 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3cc22a442e1dad628f0b11a32c4037fc7174dde4 Author: Marcio Barbosa Date: Tue May 15 17:10:45 2018 -0400 ubik: clones should not request votes Clones should not be able to become the sync-site. To make it possible, regular sites do not vote for a site tagged as clone. In other words, the clones ask for votes but they cannot be the sync-site. Knowing that their requests for votes should be refused by the regular sites, they should never have enough votes to win the election. In addition to the unnecessary network traffic created by these unnecessary requests, this current approach can be problematic in some specific situations. As an example, consider the following scenario: The user wants to turn a regular site, called host1, into a clone. To do so, he runs the following commands on every single server: $ bos removehost -server -host host1 $ bos addhost -server -host host1 -clone After that, he restarts the servers, one by one. Depending on the delay between the restarts, a clone can become the sync-site. This is possible because the clones request votes from the other sites. If enough regular sites are not aware (yet) that the request for vote came from a clone, the clone in question can get enough votes to win the election. To fix the problems mentioned above, do not request votes if you cannot be the sync-site. Change-Id: Ic3569af8264dfff32f2a86b8dd99b922193f010a Reviewed-on: https://gerrit.openafs.org/12654 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8e740aed774d4507e656e6ae743f6c6fe6c0e356 Author: Marcio Barbosa Date: Thu May 10 00:46:01 2018 -0300 afs: alloc openafs_lck_grp before osi_Init() on darwin Commit a27bed59cae1a4244429c752edfde0a8363c8a3b moved init_hckernel_init to osi_Init. On Darwin (AFS_DARWIN80_ENV), MUTEX_INIT (called by init_hckernel_init) uses openafs_lck_grp as the argument of one of the functions called during the initialization of the mutex in question. Since openafs_lck_grp was not allocated yet, we crash. To fix this problem, call MUTEX_SETUP() before osi_Init() on Darwin. Change-Id: Ib53118208d3ca7982e712768f334299e3d948805 Reviewed-on: https://gerrit.openafs.org/13065 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c16423ec4e678e5cb01dc99f4115065f8ef6caf7 Author: Marcio Barbosa Date: Mon May 14 16:46:26 2018 -0300 rx: fix atomics on darwin As described by commit b2a21422129ca1eeeb5ea1a1f7b08b537fd2a9f7, the API used for atomic operations in kernel space is not the same as the one used in user space. To fix this problem, the commit mentioned above introduced macros to correct the name of these functions in kernel space. Unfortunately, the return value of the functions used in kernel space is not the same as the ones used in user space. Generally speaking, the kernel space atomic functions return the original value of the variable received as an argument before the operation in question. On the other hand, the user space atomic functions return the new value, after the operation has been performed. To fix this problem, this commit provides a new set of inline functions (only used in kernel space) with the expected return values. Also, in order to get the inline implementations of the OSAtomic interfaces in terms of the primitives, commit 74f837fd943ddfa20d349a83d6286a0183cb4663 defines OSATOMIC_USE_INLINED on OS X 10.12. However, the definition of this macro only affects the user space legacy interfaces for atomic operations. The kernel space interfaces for atomics are not deprecated and OSATOMIC_USE_INLINED does not affect these functions. To fix this problem, only define OSATOMIC_USE_INLINED in user space (OS X 10.12+). Change-Id: Ia6cbc76daa7068625dc9f6dff385d0568d6503bd Reviewed-on: https://gerrit.openafs.org/13063 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 96a4bee20d42484148d163b85ca049dcc980a7a5 Author: Andrew Deason Date: Tue May 8 19:09:42 2018 -0500 LINUX: Remove unused osi_fetchstore.c Ever since commit ae5f411c (Linux 4.4: Do not use splice()), most of osi_fetchstore.c has been '#if 0'd out. The only portion that isn't is a function definition that is unreferenced (afs_linux_read_actor). Remove the unused code, and other '#if 0' references to it; the code can always be added back later when we can actually use it. Change-Id: Ifc062d5665393aa6693eb0db63aa23e4feb44df4 Reviewed-on: https://gerrit.openafs.org/13061 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 46d5695a383b2b993fdd598b770f4e3c0e1a41f3 Author: Andrew Deason Date: Mon Apr 30 17:58:43 2018 -0500 afs: WriteThroughDSlots: Avoid write error panic Currently, afs_WriteThroughDSlots panics if our call to afs_WriteDCache fails. Since afs_WriteThroughDSlots is called every minute by a background daemon, this means that if our cache fs becomes inaccessible (by being forced read-only, or for any other reason), we are virtually guaranteed to panic relatively quickly. To try to avoid this at least for some cases, change afs_WriteThroughDSlots to return an error to our caller when we encounter such an error. For our background task, we can just ignore the error and retry the writes on a future iteration. During shutdown, we still panic if we encounter an error, to try to avoid silently allowing a corrupt cache to be used on subsequent boots. Change-Id: Ia5f180a5c709881c3e884629c02e9ff93729fa88 Reviewed-on: https://gerrit.openafs.org/13047 Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: BuildBot commit 22e64df8e043fa7bd78bff263866ee2bd6a6e13d Author: Andrew Deason Date: Mon Apr 30 17:33:14 2018 -0500 afs: Avoid afs_GetDCache panic on cache open error When we need to populate a dcache entry, afs_GetDCache calls afs_CFileOpen to get a handle for our file backing that dcache. Currently, if we cannot open the file, we panic. To handle this a little more gracefully, just return an error from afs_GetDCache instead. The relevant userspace request will probably fail with EIO, but this is better than possibly crashing the whole system. Change-Id: If570ecc7f0fd0aab8340b568fc6cb2e2d316f35a Reviewed-on: https://gerrit.openafs.org/13046 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 3ec0414f769c37a19410fbd9aefb086cb5b69e55 Author: Benjamin Kaduk Date: Tue May 8 18:04:21 2018 -0500 Use afs_DestroyReq in afs_PrefetchNoCache() Since commit 76ad941902c650a4a716168d3cbe68f62aef109f we use afs_DestroyReq() instead of osi_Free() directly. Also update the UKERNEL version of the function to afs_CreateReq() properly. FIXES 134533 Change-Id: I4a13f6232dbed12ee00ce219cb5f515529fff58c Reviewed-on: https://gerrit.openafs.org/13060 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit f6af4a155d3636e8f812e40c7169dd8902ae64be Author: Andrew Deason Date: Mon Apr 30 17:30:56 2018 -0500 LINUX: Return NULL for afs_linux_raw_open error Currently, afs_linux_raw_open (and by extension, LINUX's implementation of osi_UFSOpen) panic when they are unable to open the given cache file. To allow callers to handle the error more gracefully, change afs_linux_raw_open and osi_UFSOpen to return NULL on error, instead of panic'ing. Expand the language a little on the message logged while we're here, since the system might keep running after this situation now. This commit also changes all callers that did not already handle afs_linux_raw_open/osi_UFSOpen errors to assert on errors, so we still panic for all situations where we encounter an error. More graceful behavior will be added in future commits; this commit does not change the behavior on its own. An error on opening cache files can legitimately happen when there is corruption in the filesystem backing the disk cache, but possibly the easiest way to generate an error is if the filesystem has been forcibly mounted readonly (which can happen at runtime due to filesystem corruption or various hardware faults). The latter will generate -EROFS (-30) errors, but of course other errors are probably possible. Change-Id: I1462ec43c76c0b07e9368b37a9dbaedf6b6f4409 Reviewed-on: https://gerrit.openafs.org/13045 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 54e84a98f9747bb5bb2ad4b8031115ad7684c914 Author: Benjamin Kaduk Date: Fri Apr 13 08:07:59 2018 -0500 BSD: Work around panic in FlushVCache Commit 64cc7f0ca7a44bb214396c829268a541ab286c69 created the very useful afs_StaleVCache() helper function, but unfortunately it also introduced a subtle change into how we check for whether a vcache may be a directory. Previously, we just used the low bit of the Fid's Vnode number, since files have an even number and non-files an odd number. The new version uses that check but also explicitly checks `vType(avc)` against VDIR, and this new check involves consulting information stored in the associated vnode entry, not the vcache directly. The afs_FlushVCache() implementation for XBSD and DARWIN NULLs removes the cross-linkage between vcache and vnode, so that AFSTOV(avc) becomes NULL. Just a few lines later, it calls afs_StaleVCacheFlags(), at which point vType() dereferences a bad pointer (offset from a NULL pointer) and panics. This would happen during shutdown, or other periodic reclaim/flush events that can be scheduled. Change-Id: I0800e5c743cedcbec628bfa8c8ea8978c2488c1c Reviewed-on: https://gerrit.openafs.org/13014 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit cfa74883e4996dfee2bd6ffaa3b967e5a7941e0b Author: Stephan Wiesand Date: Thu Apr 26 19:50:06 2018 +0200 redhat: PACKAGE_VERSION macro no longer exists Commit 0d0e7699c9f789214205fe6837cded1a4c95f9c0 replaced all uses of the %PACKAGE_VERSION macro in the spec with the %version one, but missed an instance in the kmodtool script. Fix this, to avoid a warning during rpmbuild. Change-Id: I363241f45c5261aaf2fa0619fb159022f6dbd56a Reviewed-on: https://gerrit.openafs.org/13031 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 076b73e06df8240f209470ea6ee19b66eb4166c3 Author: Stephan Wiesand Date: Thu Apr 26 19:33:31 2018 +0200 redhat: Make separate debuginfo for kmods work with recent rpm Commit 443dd5367e0cd9050ad39a6594c5be521271b4e9 introduced the creation of separate debuginfo packages for kmod packages, and commmit 387ae9536888419d7b101513e04e1c644e3218d6 moved the code from the spec into the kmodtool script. Recent versions of rpm (the issue was found on Fedora 27) extract the debuginfo data from a copy of the original files having the package version-release as a suffix. This broke the original change since the regular expression passed to find-debuginfo.sh no longer matched the name of the openafs.ko file. The file list for the -debuginfo package remained empty, which caused rpmbuild to fail. Relax the regex to match the previous and current file names we are after. It is possible but unlikely that .*openafs\.ko.* will ever match any file not being a kernel module. Change-Id: I57178ed2c593551ede6f4ab2679dd0360dc362cf Reviewed-on: https://gerrit.openafs.org/13030 Tested-by: BuildBot Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 09f31d4c21328bcdc1dccdedf7df53d77c22e3e3 Author: Jeffrey Altman Date: Fri Feb 23 18:47:46 2018 -0500 rx: connection aborts send serial zero when no conn available When no connection object is available, send serial number zero (0) instead of one (1). There is no harm in sending one (1) but it might be confused as the first packet sent on the connection. Multiple connection aborts sent would all be sent with serial one (1). Serial number zero (0) can be an indication to humans reading packet traces that the sender has no knowledge of the connection. Change-Id: I1951284f810170bd130e4f1d8ed93b903cd66659 Reviewed-on: https://gerrit.openafs.org/12932 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cacf2b646759132dbf21e9c04fb3cfc6c2f8f1f3 Author: Jeffrey Altman Date: Fri Feb 23 18:26:24 2018 -0500 rx: pass serial number to rxi_SendRawAbort The practice of stamping abort packets with the connection's next serial number was altered by a0ae8f514519b73ba7f7653bb78b9fc5b6e228f8. This change restores the prior behavior by passing a serial number as a parameter to rxi_SendRawAbort() so that the serial number can be obtained from the connection instead of hard coded as 1. Change-Id: I0fb516b2c596e675fa4bc44598a697de81d36d83 Reviewed-on: https://gerrit.openafs.org/12931 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3d3e7bc51aaf39b5ca04bfd36ff9017ab0622057 Author: Michael Meffie Date: Mon Apr 9 19:54:54 2018 -0400 autoconf: add kernel module to the summary Add the kernel module to the list of optional build items in the configure summary to indicate whether the kernel module build is enabled. Change-Id: I11d247ac66d8119910a90a0240b0ce5854449db4 Reviewed-on: https://gerrit.openafs.org/13005 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 85e9db22b265f9bb3745246fea3a07158b8a8c0e Author: Michael Meffie Date: Mon Apr 9 19:50:28 2018 -0400 autoconf: remove uss from configure summary Commit 00a33b26d74aa067086ddc340efb82184715857f (uss: always build uss) made the uss build unconditional. Remove it from the list of optional items in the configure summary. Change-Id: Ia249451c574974b4f0892c4d6d626c57404ea8ce Reviewed-on: https://gerrit.openafs.org/13004 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 833a81eeda6e48ea1ced92169434e843d054c44d Author: Michael Meffie Date: Mon Apr 9 16:42:41 2018 -0400 autoconf: remove more linux 2.4 references Remove old linux 2.2 and 2.4 references in the autoconf macros left over from the linux 2.2 and 2.4 days. Change-Id: Ie859d938fa1fee1d98a035b55e5e41120b66bc69 Reviewed-on: https://gerrit.openafs.org/13003 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 28ea20d03f8abd8109547d6825edad159748397a Author: Michael Meffie Date: Thu Apr 5 23:43:34 2018 -0400 redhat: remove the openafs-kernel-version.sh script Commit ec706b21530240d7fb66bad2f08513eff8f7c335 (Remove Linux 2.4 compat from RedHat packaging) removed the use of the script openafs-kernel-version.sh, which was used in the linux 2.4 days to look up the current kernel version. Nowadays, we use the openafs-kmodtool script to determine the kernel version. Remove the unused openafs-kernel-version.sh script from the package sources. Change-Id: I6494812004f7b59c786ff670ff37c2fdc354f371 Reviewed-on: https://gerrit.openafs.org/12996 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 9f0164f4254da39c3c31e0268da58ce7a6ccda1d Author: Michael Meffie Date: Thu Apr 5 22:56:50 2018 -0400 redhat: remove extra kernel version check Commit a1c072ac562ccf74e5afb8449db1bcef86aef362 (redhat: fix rpmbuild command line option defaults) added logic to set the default value of the kernvers variable when not specified as an rpmbuild command line option. This default value is not necessary, since 'kmodtool verrel' already returns the current running kernel version by default. The result of 'kmodtool verrel' sets the kverrel variable, which holds the value of the kernel version we are building. The kernvers variable is only used as an argument to 'kmodtool verrel' and may be empty by default to indicate the current version should be returned. Remove the unnecessary setting of the default value of kernvers. Also update the information banner to show the value of kverrel, which is the actual version we are building, instead of kernvers, which is empty be default. Change-Id: I45ded3b4f61ec60a64288b89c1d553df9fa7b867 Reviewed-on: https://gerrit.openafs.org/12995 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 909d8358109445fdb316b68a8e55e17626cf17c9 Author: Ian Wienand Date: Tue Mar 20 14:01:43 2018 +1100 Remove warning "find_preferred_connection: no connection and !create" find_preferred_connection() is called with !create via afs_ConnByHost->afs_ConnBySA to determine if there is a cached connection available. Don't warn, as it will next be called with the create flag to create the connection anyway. Change-Id: I02c2150a04ef20c54da793926fb402b946311f9a Reviewed-on: https://gerrit.openafs.org/12964 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 154512831966d12c1e32e6271d4ab1440a25b96e Author: Stephan Wiesand Date: Wed Apr 4 17:09:39 2018 +0200 FBSD: param.h consistency Commit 88dc4d93f5ef080da8f56fac453f095e6c79d4a0 ("Add param.h files for recent FreeBSD") introduced an inconsistency between the i386 and amd64 param.h files for 11.1 and 12.0 regarding the *_FBSD101_ENV #defines. Citing Benjamin Kaduk: "Traditionally we have the param.h for a FreeBSD N.0 release include the (N-1).Y values that existed at the time of the N.0 release, and freeze that set of (N-1).Y values for the lifetime of FreeBSD N.x, if that makes sense." Given that FreeBSD 11.0 was released shortly after 10.3, and 12.0 is not yet released, consistently #define *_FBSD10{1..3}_ENV for 11.1 and *_FBSD10{1..4}_ENV for 12.0 Change-Id: Ibb7e6c4caaab7aa97b32eeec7aa0bbe998bb57f7 Reviewed-on: https://gerrit.openafs.org/12990 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1a0d68676526a5031d7f06f44d58c6dbb2b65da7 Author: Marcio Barbosa Date: Thu Mar 29 15:52:12 2018 -0300 autoconf: remove check for lorder Currently, lorder is not being used. Remove the conditional that checks if this binary exists. Change-Id: I5ccee8b34f33ba0bda38a1d0478ff7a46f73f79c Reviewed-on: https://gerrit.openafs.org/12981 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 387ae9536888419d7b101513e04e1c644e3218d6 Author: Stephan Wiesand Date: Mon Mar 26 20:21:19 2018 +0200 redhat: Create unique debuginfo packages for kmods Commit 443dd5367e0cd9050ad39a6594c5be521271b4e9 ("redhat: separate debuginfo package for kmod rpm") introduced the creation of separate debuginfo packages for the kmod packages. As such, this is useful, but all debuginfo packages for a given OpenAFS release ended up with the same name/version/release for the kmod debuginfo package, no matter which kernel release or variant the kmod was built for. Move the additional black magic from the spec into the kmodtool script where we have the means to do better: Use the same naming and versioning conventions as for the kmod-openafs packages themselves. Change-Id: Ibcb34e4c8efde13d0600005772751d8aeb8154aa Reviewed-on: https://gerrit.openafs.org/12977 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 60a006bdc43df42e40eb43f1e1af7fffe3e85763 Author: Ben Kaduk Date: Fri Dec 13 16:25:47 2013 -0500 Export {Get,Set}ServiceSpecific from liboafs_rx.la rxgk will use service-specific data. Change-Id: Id9e2d4b9920e771e1583b9362e61de6216c246b4 Reviewed-on: https://gerrit.openafs.org/10589 Reviewed-by: Daria Phoebe Brashear Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit f70ab59f88aa41074c9f075368137bd663cc8bce Author: Ben Kaduk Date: Mon Dec 9 14:42:13 2013 -0500 Add some time-related helpers RXGK_NOW(), a quick routine to get the current timestamp as an rxgkTime, and secondsToRxgkTime for the more general scaling factor. Change-Id: I0051b5c8e5ad61e35431d97454bf2741daba90cb Reviewed-on: https://gerrit.openafs.org/10566 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit f47cb2d4a957910c3e7d4b755f41ddef5dd103c5 Author: Michael Meffie Date: Sun Jan 21 18:38:11 2018 -0500 Suppress statement not reached warnings under Solaris Studio Solaris Studio issues warnings for statements which can not be reached, such as statements following an infinite loop. For example, the return statement will generate a 'statement not reached' warning in the following code: while (1) { /* no breaks or gotos in this body */ } return 0; Suppress these warnings by conditionally removing such statements when building under Solaris Studio. Change-Id: Ib4f465bf9c00eff0d603e5bd643db7d3a5aa0ba0 Reviewed-on: https://gerrit.openafs.org/12958 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 306f0f3100e453e165032ae3bc9022b4a9a9a4c5 Author: Michael Meffie Date: Sat Jan 13 20:14:59 2018 -0500 afs: squash empty declaration warning Remove spurious semi-colon which generates a warning when building under Solaris Studio. "./src/afs/UKERNEL/sysincludes.h", line ...: warning: syntax error: empty declaration Change-Id: I022728ddfd4b8229db0a247de2470846c802a462 Reviewed-on: https://gerrit.openafs.org/12957 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e0066095e7f74653c2c08d1b00010ba59f4c2cf3 Author: Michael Meffie Date: Sat Jan 20 18:34:18 2018 -0500 libafs: git ignore build artifacts on Solaris Ignore build artifacts generated when building the kernel module for Solaris: src/libafs/inet src/libafs/nfs src/libafs/ufs Change-Id: Ie791c45c48ffc15547864bee568f52f74ab6020f Reviewed-on: https://gerrit.openafs.org/12955 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 348dc87bb2eeb66d1e683dc91ee36724ee18f1af Author: Ben Kaduk Date: Fri Dec 13 16:17:54 2013 -0500 Export a few krb5 routines for rxgk We need oafs_h_krb5_generate_random_block when generating random keys and oafs_h_krb5_crypto_fx_cf2 for CombineTokens. Having oafs_h_krb5_crypto_prf_length proves very convenient for key derivation of transport keys, so move it to the public header and export it. oafs_h_krb5_enctype_keysize is needed so that we can tell whether or not we need to pass through random_to_key() when making rxgk_keys. oafs_h_krb5_random_to_key is needed for that random_to_key() operation. Change-Id: Ia34c8028b07df203b3885157e2d46c6bb512f608 Reviewed-on: https://gerrit.openafs.org/10936 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fe8a1f3a2b669057451cac358faa7320722dc053 Author: Ben Kaduk Date: Wed Dec 4 13:03:15 2013 -0500 auth: Let superuser identities be superusers We have a special rx_identity_kind for superusers, let it actually be useful for something. Change-Id: I1d551ed8e5fcfd6bdc29c6c27eee4c2ae67e1a89 Reviewed-on: https://gerrit.openafs.org/10575 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 00e12efa29659c28f0fd7b6acbfb57d91a6ca477 Author: Andrew Deason Date: Tue Mar 6 22:04:28 2018 -0600 SOLARIS: Check for map_addr() without 'vacalign' Add a configure check to see if the map_addr() function contains the 'vacalign' argument or not. The argument was removed sometime around Solaris 11.4. Change-Id: Id11c10cf849511635bd9490c97d978b4bdaa5e06 Reviewed-on: https://gerrit.openafs.org/12947 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6082243e42525c738239fe429bcb64e0e4f22207 Author: Andrew Deason Date: Wed Mar 7 15:57:56 2018 -0600 hcrypto: Avoid arc4random in kernel Our HAVE_ARC4RANDOM symbol represents the availability of arc4random() in userspace, not in the kernel. On Solaris, we'll define HAVE_ARC4RANDOM, but the built kernel module will be unusable, since we cannot resolve the arc4random symbol. To to avoid this, undef HAVE_ARC4RANDOM when building hcrypto for the kernel, just like we do with HAVE_GETUID. Change-Id: I17472420b35e7be6b4f698082714c2e51bdb064b Reviewed-on: https://gerrit.openafs.org/12946 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3e9ea6107973ccc4fa3d405f5b5d76666bfd624f Author: Andrew Deason Date: Wed Mar 7 13:28:34 2018 -0600 Avoid libtool 'nm' errors Starting around Solaris 11.3, '/usr/bin/nm -p' starts reporting some symbols with the 'C' code. libtool cannot handle this (libtool bug #22373), which causes global_symbol_pipe in the generated libtool script to be empty. This causes a rather confusing error when we go to actually use libtool to link something ("syntax error near unexpected token '|'"; see libtool bug #20947), and prevents the build from continuing. Address this in two ways: For all Solaris 11 builds, default to /usr/sfw/bin/gnm over /usr/bin/nm. This avoids any interop issues with libtool and nm, since libtool of course works very well with GNU tooling. In addition, try to catch any nm-related errors with libtool at configure time, to provide a more helpful error message. To implement these changes, create a wrapper around LT_INIT, called AFS_LT_INIT. Change-Id: I7d47c17f9d9401dc5dcc9676279bf1e4f53554c4 Reviewed-on: https://gerrit.openafs.org/12945 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5a8b68153124c3a9224f0b6993df9de9c6c54541 Author: Michael Meffie Date: Thu Feb 22 13:23:18 2018 -0500 venus: convert fs.c to safer string functions Convert string handling to safer functions to avoid buffer overflows. Change-Id: Ibb4f18d78724d87a002e2b0458cba2cceee8670c Reviewed-on: https://gerrit.openafs.org/12923 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c84f36a9b8c6b6adb9c77bab1c814ccd3aaf6a5b Author: Michael Meffie Date: Mon Feb 19 14:01:56 2018 -0500 venus: fix format overflow warning Recent versions of gcc generate a format overflow warning on the dfstring buffer in fs.c. Increase the size of the buffer to avoid a possible buffer overflow. fs.c: In function ‘AclToString’: fs.c:770:30: error: ‘%s’ directive writing up to 1024 bytes into a region of size between 13 and 23 [-Werror=format-overflow=] sprintf(dfsstring, " dfs:%d %s", acl->dfs, acl->cell); ^~ fs.c:770:2: note: ‘sprintf’ output between 8 and 1042 bytes into a destination of size 30 sprintf(dfsstring, " dfs:%d %s", acl->dfs, acl->cell); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Change-Id: Iead8b153a62f2928fabaeee1ed126535f67d7d49 Reviewed-on: https://gerrit.openafs.org/12917 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 70b7f743550a8ce02292a12c4188deaf85b1a533 Author: Michael Meffie Date: Thu Feb 22 16:07:55 2018 -0500 butc: convert butc/dump.c to safer string handling Convert butc/dump.c to safer string handling functions to avoid buffer overflows. Change-Id: I36338804ee5d0ac2eb818c42cf2671497cd5967f Reviewed-on: https://gerrit.openafs.org/12922 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit cec45d59440f55316097cfd6652d2ea26cd55233 Author: Michael Meffie Date: Mon Feb 19 13:57:16 2018 -0500 butc: fix format overflow warning Recent versions of gcc generate an overflow warning in the butc DUMPNAME macro when copying values into the finishedMsg1 buffer. Increase the size of the destination buffer to avoid a possible buffer overflow. dump.c:88:24: error: ‘%s’ directive writing up to 63 bytes into a region of size 50 [-Werror=format-overflow=] sprintf(dumpname, "%s (DumpId %u)", name, dbDumpId); ^ dump.c:1294:5: note: in expansion of macro ‘DUMPNAME’ DUMPNAME(finishedMsg1, nodePtr->dumpSetName, dparams.databaseDumpId); ^~~~~~~~ dump.c:88:6: note: ‘sprintf’ output between 12 and 84 bytes into a destination of size 50 sprintf(dumpname, "%s (DumpId %u)", name, dbDumpId); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dump.c:1294:5: note: in expansion of macro ‘DUMPNAME’ DUMPNAME(finishedMsg1, nodePtr->dumpSetName, dparams.databaseDumpId); ^~~~~~~~ Change-Id: Iadf87a308ab6c500a8407a269bc0fd443ff0c735 Reviewed-on: https://gerrit.openafs.org/12916 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c44f6f7a8052bdd1fb021e07bb6ae142b61e6b5b Author: Andrew Deason Date: Wed Mar 7 11:32:43 2018 -0600 ubik: Log sync site for SDISK_SendFile USYNC error In SDISK_SendFile, we return a USYNC error if the caller is not the sync site. Say who the sync site is when we do this, to possibly help post-mortem debugging. Change-Id: I62a3565fca20171be20481638c261c4659c68ab2 Reviewed-on: https://gerrit.openafs.org/12943 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d0805d72b7a48dcaa7abe1aea136a8cd963d76c2 Author: Andrew Deason Date: Wed Mar 7 13:11:03 2018 -0600 Avoid empty libtool -export-symbols-regex pattern Currently, in LT_LDLIB_shlib_missing, we construct our -export-symbols-regex pattern like so (with some escaping): "($(sed -e 's/^/^/' -e 's/$/$/' xxx.sym | tr '\n' '|' | sed -e 's/|$//'))" The idea is that for a .sym file consisting of, for example: foo bar We then generate a regex like (^foo$|^bar$). However, since the 'tr' removes all newlines, the line given to the last 'sed' in the pipeline has no trailing newline. On some systems, such as Solaris, this causes sed to not output anything at all, resulting in a regex pattern of just "()". For example: # on Debian $ echo -n foo | sed -e 's/foo/bar/' bar$ # on Solaris $ echo -n foo | sed -e 's/foo/bar/' $ To avoid this, we can change the sed pipeline to not remove the newlines until the very end. Change the way we construct our regex to this instead: "($(sed -e 's/^/^/' -e 's/$/$|/' -e '$ s/|$//' xxx.sym | tr -d '\n'))" So the sed removes the extra '|' in the last element by looking at the last line, instead of looking at the end of the line after the 'tr' conversion. Change-Id: Id382132f6b400bf961dbaa52138a9abd0168118d Reviewed-on: https://gerrit.openafs.org/12944 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c818f86b79a636532d396887d4f22cc196c86288 Author: Mark Vitale Date: Thu Mar 1 23:16:56 2018 -0500 LINUX: fix RedHat 7.5 ENOTDIR issues Red Hat Linux 7.5 beta introduces a new file->f_mode flag FMODE_KABI_ITERATE as a means for certain in-tree filesystems to indicate that they have implemented file operation iterate() instead of readdir(). The kernel routine iterate_dir() tests this flag to decide whether to invoke the file operation iterate() or readdir(). The OpenAFS configure script detects that the file operation iterate() is available under RH7.5 and so implements iterate() as afs_linux_readdir(). However, since OpenAFS does not set FMODE_KABI_ITERATE on any of its files, the kernel's iterate_dir() will not invoke iterate() for any OpenAFS files. OpenAFS has also not implemented readdir(), so iterate_dir() must return -ENOTDIR. Instead, modify OpenAFS to fall back to readdir() in this case. Change-Id: I242276150ab2a506e1e9c5c752e3f17d36c98935 Reviewed-on: https://gerrit.openafs.org/12935 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 79f33b859aeb3c91f2cce7597fdc138978c4e1d9 Author: Benjamin Kaduk Date: Thu Mar 1 20:28:23 2018 -0600 afs_pioctl: avoid -Wpointer-sign Change the declaration of 'addr' to be a signed int, to match RXAFS_CallBackRxConnAddr() and the afsd_pd_GetInt() used with it. This was detected by clang 4.0 in FreeBSD 11.1, via -Wpointer-sign. Change-Id: Ibd2679e6a4519db46f57693ff58221f18f6a2fe1 Reviewed-on: https://gerrit.openafs.org/12934 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bd6a2484011dad6298c4ce97dd0cd68e0834baa5 Author: Marcio Barbosa Date: Thu Feb 22 17:53:23 2018 -0500 ubik: don't set database epoch to 0 if not needed If our attempt to receive a fresh database from a peer fails, we will overwrite the version.epoch field of our current local copy of the database with an invalid value, "0". The idea behind this approach is to make sure that this database will not be seen as a legit copy if the transfer is not completed properly. Although it is questionable if this approach is still necessary (since the current version writes the data into a temporary file), it is undisputed that the database version does not have to be invalidated if the transfer fails in a early stage where no data has been written and we could safely continue to reuse the local copy for read-only queries. Early failures may happen if: 1. The peer sending the database to us is not the peer we believe to be the sync site; 2. The sender is not authorized to call DISK_SendFile; In both cases, the database epoch is invalidated. As a result of that, we may have the following consequences: 1. Reads may not be allowed Once the on disk epoch is invalidated, if the server in question is rebooted, the invalid on disk epoch will be used to initialize the in memory epoch. At this point, reads may not be allowed since urecovery_AllBetter checks if the in memory epoch is greater than 1. Reads should not be blocked forever since the sync-site will send a new database to this remote and, as a result of that, the invalid version will be corrected. 2. Data can be lost If the site with the invalid epoch is the one with the most recent database, the database can be rolled back to an earlier version during a new quorum establishment. Consider the following scenario where we have three sites: Site A (up - database up to date) (sync-site) Site B (up - database up to date) Site C (down - old database) The epoch of B is invalidated due to the problem fixed by this patch. Then, A is turned off and C is turned on. In this scenario, the new sync-site will distribute the old database held by C since its epoch is greater than 0. To fix the problem in question, do not set the database epoch to 0 if the local database was not modified. Acknowledgements: Hartmut Reuter - found the problem; - suggested a possible solution; Benjamin Kaduk - submitted the first version; Andrew Deason - suggested changes; Change-Id: I4f6a6e92aa0bd4282fab4743ea622815a009fecf Reviewed-on: https://gerrit.openafs.org/12924 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Michael Meffie commit 6d74e3d6a1becf86cec30efc2d01a5692167afe1 Author: Michael Meffie Date: Tue Feb 20 11:51:01 2018 -0500 afs: improve -volume-ttl error messages Change the afs call which sets the volume ttl value to return EFAULT instead of EINVAL when given an out of range value for the volume ttl parameter. This is more consistent with the other op codes, which return EFAULT when given an out of range parameter and allows the caller to distinguish between an invalid opcode and a bad parameter. Move the volume ttl range constants to afs_args.h, which is where constants related to the op codes are supposed to be defined. This makes the constants available to the caller in afsd.c as well as the implementation in afs_call.c. Update afsd to print a more sensible error message when the volume ttl set calls fails due to an out of range parameter. Change-Id: I6b3ab7d38a60464017daf06f70080a90d2a7a429 Reviewed-on: https://gerrit.openafs.org/12918 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 278581c24a802834719e0d57f27978321556c9bb Author: Michael Meffie Date: Tue Feb 20 20:31:11 2018 -0500 redhat: package libuafs perl bindings Require the swig package as a build dependency. Build and package the libuafs perl bindings. Place these libraries in the openafs-devel package, along with the man page (moved from the openfs-client package). This fixes an rpm build error when the swig package is present on the build system, RPM build errors: Installed (but unpackaged) file(s) found: /usr/lib64/perl/AFS/ukernel.pm /usr/lib64/perl/ukernel.so FIXES 134470 Change-Id: Ifa8a0938f0c16e6099cd2923a71dd6466052a4d8 Reviewed-on: https://gerrit.openafs.org/12919 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f82d1c7d5aeae148305e867c1f79c6ea2f9e0a2a Author: Jeffrey Altman Date: Sat Feb 10 10:47:24 2018 -0500 rx: Do not count RXGEN_OPCODE towards abort threshold An RXGEN_OPCODE is returned for opcodes that are not implemented by the rx service. These opcodes might be deprecated opcodes that are no longer supported or more recently registered opcodes that have yet to be implemented. Clients should not be punished for issuing unsupported calls. The clients might be old and are issuing no longer supported calls or they might be newer and are issuing yet to be implemented calls as part of a feature test and fallback strategy. This change ignores RXGEN_OPCODE errors when deciding how to adjust the rx_call.abortCount. When an RXGEN_OPCODE abort is sent the rx_call.abortCount and rx_call.abortError are left unchanged which preserves the state for the next failing call. Note that this change intentionlly prevents the incrementing of the abortCount for client connections as they never send delay aborts. Change-Id: I87787e7ad0a85d52a01711bb75e2be1af9a868b8 Reviewed-on: https://gerrit.openafs.org/12906 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3ddae7d168ac08c46b4e31517fdb1f6ac1ae63ac Author: Andrew Deason Date: Thu Feb 15 18:40:07 2018 -0600 RHEL: Add aarch64/arm64 to spec file Change-Id: I2247f40a839e976605e80cf468d7a023598d5dc5 Reviewed-on: https://gerrit.openafs.org/12911 Tested-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e6c2624249a6ab96053c1d1134aec8e3f6bcee9e Author: Andrew Deason Date: Thu Feb 15 16:53:57 2018 -0600 doc: Edits to the 'afsd -volume-ttl' manpage Make a few misc changes to the text for the new -volume-ttl option: - Minor grammatical/typo fixes - Emphasize a little more that the default behavior allows for vldb info to be cached _forever_ - Provide some info on the effects of changing this value - Provide a suggested "typical" value, to give some clue as to what should be set here, so a curious user doesn't just set this to the first value they see (10 minutes) Change-Id: Ib6b2871b111c392260ea80e26273201b09d4c402 Reviewed-on: https://gerrit.openafs.org/12909 Reviewed-by: Benjamin Kaduk Tested-by: Andrew Deason commit a66629eac4dda4eea37b4f06e0850641cb2a7387 Author: Andrew Deason Date: Thu Feb 15 16:41:33 2018 -0600 rxdebug: NUL-terminate version before printing Currently, 'rxdebug -version' never initializes the buffer we read the version string into. Usually this is not noticeable, since all OpenAFS binaries tend to pad the Rx version response packet with NULs, so we get back several NULs to terminate the string. However, this is not guaranteed, and if we do not get back a NUL-terminated string, we can easily read beyond the end of the buffer. To avoid this, initialize the 'version' buffer with NULs before we do anything, and set the last byte to NUL, in case we exactly filled the buffer. Change-Id: I1b1ae546c01f018a9b4e198f918c2d9eb86015d6 Reviewed-on: https://gerrit.openafs.org/12908 Reviewed-by: Benjamin Kaduk Tested-by: Andrew Deason commit 4f7550dcaf9375046514cdd97cea0f667e955e9f Author: Andrew Deason Date: Sat Mar 7 17:27:47 2015 -0600 Add support for arm64_linux26 Add support for the arm64/aarch64 architecture on Linux 2.6+. The param header file is mostly combined from arm and amd64. Note that the code for syscall interception has not been updated for arm64, so this will not build on arm64 without support for kernel keyrings. This also does not define any AFS syscall number, since no number in the Linux arm64 syscall table is "free" for us to use, as far as I am aware. Adapted from initial patches from Micheal Waltz . Change-Id: I1ee239ded17d8fea3b91b70405215aa1b3f7a6e9 Reviewed-on: https://gerrit.openafs.org/11940 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b792dea0f1f83673b0b045adf608412901b3024c Author: Andrew Deason Date: Sun Mar 8 11:47:28 2015 -0500 hcrypto: Avoid 'double' param in arm64 kernel code Currently, the RAND_add function in hcrypto uses a floating point argument (specifically, a 'double'), as well as any implementations of RAND_add. On Linux arm64, we cannot use floating point code in the kernel, since the kernel module is compiled with -mgeneral-regs-only, which prevents the use of floating point registers. No code in the tree actually makes use of this argument, but its mere presence is enough to cause an error with at least some versions of gcc with certain arguments. To get around this, simply change all instances of 'double' in hcrypto to be a void pointer instead. This allows the code to compile as long as nobody actually uses that argument in the kernel. If the code is changed such that we do actually use that argument, the argument will be a void* and so will probably (hopefully) cause a compiler error, and the code will need to be examined to make sure this workaround doesn't break anything. We already do this on Solaris, which has similar issues for different compiler versions and compiler flags. Add arm64 Linux to the cases where we do this, but restrict this to kernel code only, to try to avoid doing this more often than necessary. Change-Id: Ifd10786cd9ac6c9d5152b927e180b7362131f359 Reviewed-on: https://gerrit.openafs.org/11939 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0a896b93c86e86f5b438880ef1634b4e39ee5779 Author: Andrew Deason Date: Fri Mar 13 10:33:05 2015 -0500 Do not set default AFS_SYSCALL Currently, afs_args.h will define an AFS_SYSCALL value by default (31) if the current platform does not define an AFS_SYSCALL value on its own (via its param.h info). This is dangerous, since if a platform does not define an AFS_SYSCALL, or if it happens to not be defined for any reason, some code may try to call syscall 31, which could be anything. So get rid of this. If this breaks the build on any platform, then that platform should define AFS_SYSCALL in its own platform-specific header, or get rid of the problematic AFS_SYSCALL usage. Change-Id: I9583c8e5adc4106848a437d81306000490787ef3 Reviewed-on: https://gerrit.openafs.org/11938 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ed513bb516acdb28fc6bbf01714ef2e1df422a8a Author: Andrew Deason Date: Wed Mar 11 12:55:42 2015 -0500 Do not require AFS_SYSCALL Various parts of the code make use of AFS_SYSCALL in order to communicate with the libafs kernel module. Even though most modern platforms do not use an actual syscall anymore (instead using an ioctl-based method or similar to emulate the traditional AFS syscall), some code paths rely on AFS_SYSCALL as a fallback, or just use AFS_SYSCALL because they were never updated to use the newer methods. Even platforms that do not use the traditional AFS syscall still define the AFS_SYSCALL number, in case someone still uses it for something. However, some platforms do not have an AFS syscall number; there is no "slot" allocated to us, so we cannot safely issue any syscall. For those platforms, we must not reference AFS_SYSCALL at all, or we will fail to build. So, get rid of these references to AFS_SYSCALL if it is not defined. In some places, we can just avoid the relevant code making the syscall. In a few other places, we just pretend like the libafs kernel module was not loaded and yield an ENOSYS error, to make the code simpler. Change-Id: I38e033caf7149c2b1b567f9877221ca8551db2ea Reviewed-on: https://gerrit.openafs.org/11937 Tested-by: BuildBot Reviewed-by: Ian Wienand Reviewed-by: Benjamin Kaduk commit f5794e029903db79f345f42582230a1fd0f7d823 Author: Andrew Deason Date: Mon Feb 5 00:07:10 2018 -0600 util: Add the AFS_STRINGIZE() macro Add a macro to help with easily printing the value of #define'd constants, called AFS_STRINGIZE(). For example: printf("The value of AFS_SYSCALL is: " AFS_STRINGIZE(AFS_SYSCALL) "\n"); Change-Id: I19a3e9d930f1ca2085506957b4e96dff5bf1c22e Reviewed-on: https://gerrit.openafs.org/12893 Tested-by: BuildBot Reviewed-by: Ian Wienand Reviewed-by: Benjamin Kaduk commit 32d0493a7e4f74f5e5efdfde5eca29ed7d1bf3ec Author: Caitlyn Marko Date: Thu Feb 9 09:16:17 2017 -0500 SOLARIS: save kernel module function arguments for debugging Add the -Wu,-save_args compiler option when building kernel modules under Solaris 10 and 11 for the amd64 architecture. Binaries generated with this option save function arguments on the stack during function entry for debugging purposes. Up to six integer arguments are saved on function entry, and are not modified during the execution of the function. [mmeffie: commit message update] Change-Id: I7ee50e5108a46685efa17d0380883c6d1702a5e4 Reviewed-on: https://gerrit.openafs.org/12798 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 88cb536f99dc58fdbeb9fa6c47c26774241a0cb6 Author: Marcio Barbosa Date: Mon Feb 5 21:16:17 2018 +0000 autoconf: detect ctf-tools and add ctf to libafs CTF is a reduced form of debug information similar to DWARF and stab. It describes types and function prototypes. The principal objective of the format is to shrink the data size as much as possible so that it could be included in a production environment. MDB, DTrace, and other tools use CTF debug information to read and display structures correctly. This commit introduces a new configure option called --with-ctf-tools. This option can be used to specify an alternative path where the tools can be found. If the path is not provided, the tools will be searched in a set of default directories (including $PATH). The CTF debugging information will only be included if the corresponding --enable-debug / --enable-debug-kernel is specified. Note: at the moment, the Solaris kernel module is the only module benefited by this commit. Change-Id: If0a584377652a573dd1846eae30d42697af398d0 Reviewed-on: https://gerrit.openafs.org/12680 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c7c71d2429cf685f3ffad6b2e6d102d900edc197 Author: Ian Wienand Date: Fri Feb 2 10:52:26 2018 +1100 Add .gitreview git-review [1] makes it much easier to submit changes. Add a default configuration file. [1] https://docs.openstack.org/infra/git-review/usage.html Change-Id: I9615a81c9b199c86e8de2fedc710e3246deeac84 Reviewed-on: https://gerrit.openafs.org/12884 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 5e09a694ec2c0cd20f5dee500eff6bc3dd04c097 Author: Mark Vitale Date: Tue Jun 30 01:54:21 2015 -0400 SOLARIS: Avoid vcache locks when flushing pages for RO vnodes We have multiple code paths that hold the following locks at the same time: - avc->lock for a vcache - The page lock for a page in 'avc' In order to avoid deadlocks, we need a consistent ordering for obtaining these two locks. The code in afs_putpage() currently obtains avc->lock before the page lock (Obtain*Lock is called before pvn_vplist_dirty). The code in afs_getpages() also obtains avc->lock before the page lock, but it does so in a loop for all requested pages (via pvn_getpages()). On the second iteration of that loop, it obtains avc->lock, and the page from the first iteration of the loop is still locked. Thus, it obtains a page lock before locking avc->lock in some cases. Since we have two code paths that obtain those two locks in a different order, a deadlock can occur. Fixing this properly requires changing at least one of those code paths, so the locks are taken in a consistent order. However, doing so is complex and will be done in a separate future commit. For this commit, we can avoid the deadlock for RO volumes by simply avoiding taking avc->lock in afs_putpages() at all while the pages are locked. Normally, we lock avc->lock because pvn_vplist_dirty() will call afs_putapage() for each dirty page (and afs_putapage() requires avc->lock held). But for RO volumes, we will have no dirty pages (because RO volumes cannot be written to from a client), and so afs_putapage() will never be called. So to avoid this deadlock issue for RO volumes, avoid taking avc->lock across the pvn_vplist_dirty() call in afs_putpage(). We now pass a dummy pageout callback function to pvn_vplist_dirty() instead, which should never be called, and which panics if it ever is. We still need to hold avc->lock a few other times during afs_putpage() for other minor reasons, but none of these hold page locks at the same time, so the deadlock issue is still avoided. [mmeffie: comments, and fix missing write lock, fix lock releases] [adeason: revised commit message] Change-Id: Iec11101147220828f319dae4027e7ab1f08483a6 Reviewed-on: https://gerrit.openafs.org/12247 Tested-by: BuildBot Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit 073522b3d49467af107d1143cfa015c53347e1e3 Author: Michael Meffie Date: Wed Jan 31 16:52:40 2018 -0500 add rfc3961.h to kernel sources Export this header to the kernel sources in the libafs_tree, since it is needed for the kernel module build. FIXES 134476 Change-Id: Id359c6d065c259601d14ee5c02b93647f86a0288 Reviewed-on: https://gerrit.openafs.org/12882 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3ca1352170f87994d42578c5bc75e52c4103bc69 Author: Michael Meffie Date: Mon Feb 8 12:12:22 2016 -0500 CellServDB update 14 Mar 2017 Update all remaining copies of CellServDB in the tree, and make the Red Hat packaging use it by default too. Change-Id: I5a70a7c658ad0056cd10945bb730e84f0edfb730 Reviewed-on: https://gerrit.openafs.org/12880 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 88dc4d93f5ef080da8f56fac453f095e6c79d4a0 Author: Benjamin Kaduk Date: Mon Jan 8 22:28:24 2018 -0600 Add param.h files for recent FreeBSD Add files for FreeBSD 10.4, 11.1, and 12.0 (12-CURRENT), for i386 and amd64. Change-Id: I904f576914bb965a659750e6302f011acf66ba81 Reviewed-on: https://gerrit.openafs.org/12863 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit c390f368a5012f866c1b4ce46d6ac6af6cef2fd5 Author: Benjamin Kaduk Date: Mon Jan 8 21:27:04 2018 -0600 FBSD: catch up to missing sysnames Add sysnames for i386 and amd64 10.4, 11.1, and 12.0 (12-CURRENT, at present). Change-Id: If38ecca7b2b3e40c186b7e9321ce017b4711139c Reviewed-on: https://gerrit.openafs.org/12862 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit f5c289d00aaf7c5525b477da5b89f6675456c211 Author: Marcio Barbosa Date: Wed Jun 21 16:24:05 2017 -0400 ubik: check if epoch is sane before db relabel The sync-site relabels its database at the end of the first write transaction. The new label will be equal to the time at which the sync-site in question first received its coordinator mandate. This time is stored by a global called ubik_epochTime. In order to make sure that the new database label is sane, only relabel the database if ubik_epochTime is within a specific range. Change-Id: I2408569e5de46d387f63cbc2fab05ea1264a505c Reviewed-on: https://gerrit.openafs.org/12640 Reviewed-by: Mark Vitale Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 50c1d1088d2adcbb37b6a9d23fdd63617b1267be Author: Marcio Barbosa Date: Mon Aug 21 15:50:14 2017 -0400 ubik: update ubik_dbVersion during SDISK_SendFile The ubik_dbVersion global represents the sync site's database version and it is mostly used by the remote sites for sanity checks. Currently, this global is updated when database changes are made on the sync site (SDISK_Commit or SDISK_SetVersion), as well as every time we vote "yes" for the sync-site in a beacon reply. Unfortunately, ubik_dbVersion is not updated when a copy of the sync site's database is received via DISK_SendFile, and it won't get updated until our next "yes" vote. During this window, the current database version will not match ubik_dbVersion. As a result, any write transaction during this time frame will fail on the remote site in question. To fix this problem, do not wait for the next beacon packet to update ubik_dbVersion when the sync site's database is received; just update it when we get the new database. Since no write transactions are allowed while the db is transferring, ubik_dbVersion can be safely updated. Change-Id: Ide7a695a69cb3229ad585d9e56c5ddc2efb76dd7 Reviewed-on: https://gerrit.openafs.org/12716 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk commit ef1d4c8d328e9b9affc9864fd084257e9fa08445 Author: Andrew Deason Date: Thu Jan 11 21:27:28 2018 -0600 LINUX: Avoid locking inode in check_dentry_race Currently, check_dentry_race locks the parent inode in order to ensure it is not running in parallel with d_splice_alias for the same inode. (For old Linux kernel versions; see commit b0461f2d: "LINUX: Workaround d_splice_alias/d_lookup race".) However, it is possible to hit this area of code when the parent inode is already locked. When someone tries to create a file, directory, or symlink, Linux tries to lookup the dentry for the target path, to see if it already exists. While looking up the last component of the path, Linux locks the directory, and if it finds a dentry for the target name, it calls d_invalidate on it while the parent directory is locked. For a dentry with a NULL inode, we'll then try to lock the parent inode in check_dentry_race. But since the inode is already locked, we will deadlock. From a user's point of view, the hang can be reproduced by doing something similar to: $ mkdir dir # succeeds $ rmdir dir $ ls -l dir ls: cannot access dir: No such file or directory $ mkdir dir # hangs To avoid this, we can just change which lock we're using to avoid check_dentry_race/d_splice_alias from running in parallel. Instead of locking the parent inode, introduce a new global lock (called dentry_race_sem), and lock that in check_dentry_race and around our d_splice_alias call. We know that those are the only two users of this new lock, so this should avoid any such deadlocks. This does potentially reduce performance, since all tasks that hit check_dentry_race or d_splice_alias will take the same global lock. However, this at least still allows us to make use of negative dentries, and this entire code path only applies to older Linux kernels. It could be possible to add a new lock into struct vcache instead, but using a global lock like this commit does is much simpler. Change-Id: Ide0f21145c83d6fbb34c637d8a36c8cd21549940 Reviewed-on: https://gerrit.openafs.org/12868 Tested-by: Benjamin Kaduk Reviewed-by: Benjamin Kaduk commit f599e1ce6354c42a9c0c8f7205ba8a03c35ea72b Author: Michael Meffie Date: Wed Jan 17 17:33:50 2018 -0500 redhat: fix conditional for kernel-debuginfo files directive Commit 443dd5367e0cd9050ad39a6594c5be521271b4e9 added support for a separate debuginfo package for the kernel module. Unfortunately, the %files directive for the kernel module debuginfo package was incorrectly placed in the %if stanza of the build_userspace condition, so the rpmbuild fails when attempting to build just the kernel module. That is, when running rpmbuild with the options: rpmbuild --define "build_userspace 0" --define "build_modules 1" ... rpmbuild fails with: RPM build errors: Installed (but unpackaged) file(s) found: /usr/lib/debug/lib/modules/.../extra/openafs/openafs.ko.debug Fix this by moving the new %files directive out of the build_userspace conditional. Change-Id: I46e74b660048022a4cc4327835c6055402a34ccf Reviewed-on: https://gerrit.openafs.org/12874 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 6a2b85cd4c00a08e165cb96d2cb56bf87c6324bc Author: Michael Meffie Date: Sat Dec 30 17:59:38 2017 -0500 autoconf: refactor linux-checks.m4 Further refactoring of the autoconf macros. Divy up the linux kernel checks into smaller files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: I5ea4c9e3a0aeff1767ef561bdb8361781694ee28 Reviewed-on: https://gerrit.openafs.org/12844 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3c2e39bab7d927aa5f20d02a5e327927a4b2b553 Author: Michael Meffie Date: Sat Dec 30 12:12:59 2017 -0500 autoconf: refactor ostype.m4 Further refactoring of the autoconf macros. Move more linux and solaris specific checks into their own files. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: Ib3e7b1270826970c541a695230f4e3cd13cf9e3d Reviewed-on: https://gerrit.openafs.org/12843 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c72622a244e561173e86ffe88ee3c9a8c823a76a Author: Michael Meffie Date: Fri Dec 29 14:24:28 2017 -0500 autoconf: refactor acinclude.m4 The acinclude.m4 is very large and often requires to be changed for unrelated commits. Divy up the large acinclude.m4 into a number of smaller files to avoid so many contentions and to make the autoconf system easier to maintain. This is a non-functional change. Care has been taken preserve the ordering of the autoconf tests. Except for whitespace, the generated configure file has not been changed by this refactoring. This has been verified with a 'diff -u -w -B' comparison of the generated configure file before and after applying this commit. Change-Id: I70e7f846dea0055d00a60a47422aa73bff25c4c6 Reviewed-on: https://gerrit.openafs.org/12842 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0760feb7992e1e39f716c5f583fe7f6e85584262 Author: Benjamin Kaduk Date: Thu Jan 4 22:00:15 2018 -0600 rx: remove trailing semicolons from FBSD mutex operations Since the first introduction of FreeBSD support, the macros (MUTEX_ENTER, etc.) for kernel mutex operations have included trailing semicolons, unique among all the platforms. This did not cause problems until the recent work on rx event handlers, which put a MUTEX_ENTER() in the body of an 'if' clause with no brackets, and attempted to follow it with an 'else' clause. This results in the following (rather obtuse) compiler error: /root/openafs/src/rx/rx.c:3666:5: error: expected expression else ^ Which is more visible in the preprocessed source, as if (condition) expression;; else other_expression; is clearly invalid C. To fix the FreeBSD kernel module build, remove the unneeded semicolons. Change-Id: I191009ad412852dcc03cd71a0982fe41a953301d Reviewed-on: https://gerrit.openafs.org/12853 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit decb4308d4e18ad9f6f181e3df5f737698dba7ad Author: Benjamin Kaduk Date: Sat Dec 9 11:44:51 2017 -0600 libuafs: remove stale afs_nfsdisp.lo rule afs_nfsdisp.lo is not used, so we do not need a build rule for it. Change-Id: I4ca53a4823b0ccd5bfd769867f6766bd05ea4ceb Reviewed-on: https://gerrit.openafs.org/12802 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit e443a9fb67dbc29e6cc36661a4ac6e91af113f23 Author: Benjamin Kaduk Date: Sat Dec 9 11:37:59 2017 -0600 Replace with Our in-tree xdr.h appears to have started life as a concatenation of rpc/types.h and rpc/xdr.h, and should include all the needed functionality. Indeed, commit 7293ddf325b149cae60d3abe7199d08f196bd2b9 even indicates that we expect to be using our in-tree XDR everywhere anyway, so the system XDR is superfluous. Note that afs/sysincludes.h (not afsincludes.h!) already includes rx/xdr.h ifndef AFS_LINUX22_ENV. This change should help systems running glibc 2.26 or newer, which has stopped providing the Sun RPC headers by default. While here remove some duplicate includes of rpc/types.h in the AIX-specific sources. The Solaris NFS translator bits cannot really be changed, since the system headers are used and have tight interdependencies. Update rxgen to not emit rpc/types.h inclusion. [mmeffie: squash 12801 to not emit rpc/types.h from rxgen] Change-Id: I0b195216affa06ab9e259cb0bab0c8286a1636d9 Reviewed-on: https://gerrit.openafs.org/12800 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit afbc199f152cc06edc877333f229604c28638d07 Author: Mark Vitale Date: Thu Nov 30 20:26:46 2017 -0500 LINUX: Avoid d_invalidate() during afs_ShakeLooseVCaches() With recent changes to d_invalidate's semantics (it returns void in Linux 3.11, and always returns success in RHEL 7.4), it has become increasingly clear that d_invalidate() is not the best function for use in our best-effort (nondisruptive) attempt to free up vcaches that is afs_ShakeLooseVCaches(). The new d_invalidate() semantics always force the invalidation of a directory dentry, which contradicts our desire to be nondisruptive, especially when that directory is being used as the current working directory for a process. Our call to d_invalidate(), intended to merely probe for whether a dentry can be discarded without affecting other consumers, instead would cause processes using that dentry as a CWD to receive ENOENT errors from getcwd(). A previous commit (c3bbf0b4444db88192eea4580ac9e9ca3de0d286) tried to address this issue by calling d_prune_aliases() instead of d_invalidate(), but d_prune_aliases() does not recursively descend into children of the given dentry while pruning, leaving it an incomplete solution for our use-case. To address these issues, modify the shakeloose routine TryEvictDentries() to call shrink_dcache_parent() and maybe __d_drop() for directories, and d_prune_aliases() for non-directories, instead of d_invalidate(). (Calls to d_prune_aliases() for directories have already been removed by reverting commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286.) Just like d_invalidate(), shrink_dcache_parent() has been around "forever" (since pre-git v2.6.12). Also like d_invalidate(), it "walks" the parent dentry's subdirectories and "shrinks" (unhashes) unused dentries. But unlike d_invalidate(), shrink_dcache_parent() will not unhash an in-use dentry, and has never changed its signature or semantics. d_prune_aliases() has also been available "forever", and has also never changed its signature or semantics. The lack of recursive descent is not an issue for non-directories, which cannot have such children. [kaduk@mit.edu: apply review feedback to fix locking and avoid extraneous changes, and reword commit message] Change-Id: Icb6138ee5785e0ef82a9b85b1d2651dfd0830043 Reviewed-on: https://gerrit.openafs.org/12830 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 5076dfc14b980aed310f3862875d5e9919fa199d Author: Mark Vitale Date: Thu Nov 30 17:56:13 2017 -0500 LINUX: consolidate duplicate code in osi_TryEvictDentries The two stanzas for HAVE_DCACHE_LOCK are now functionally identical; remove the preprocessor conditionals and duplicate code. Minor functional change is incurrred for very old (before 2.6.38) Linux versions that have dcache_lock; we are now obtaining the d_lock as well. This is safe because d_lock is also quite old (pre-git, 2.6.12), and it is a spinlock that's only held for checking d_unhashed. Therefore, it should have negligible performance impact. It cannot cause deadlocks or violate locking order, because spinlocks can't be held across sleeps. Change-Id: I08faf204e6bd82c4401cdf6048d12cd551dd18fc Reviewed-on: https://gerrit.openafs.org/12792 Reviewed-by: Benjamin Kaduk Reviewed-by: Andrew Deason Tested-by: BuildBot commit 0678ad26b6069040a6ea86866fb59ef5968ea343 Author: Mark Vitale Date: Thu Nov 30 16:51:32 2017 -0500 LINUX: consolidate duplicate code in canonical_dentry The two stanzas for HAVE_DCACHE_LOCK are now identical; remove the preprocessor conditionals and duplicate code. No functional change should be incurred by this commit. Change-Id: I15cd4631d1932dcfb920313acb82fcbe570087e8 Reviewed-on: https://gerrit.openafs.org/12791 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 652cd597d9b3cf1a9daccbbf6bf35f1b0cd55a94 Author: Mark Vitale Date: Thu Nov 30 16:46:16 2017 -0500 LINUX: add afs_d_alias_lock & _unlock compat wrappers Simplify some #ifdefs for HAVE_DCACHE_LOCK by pushing them down into new helpers in osi_compat.h. No functional change should be incurred by this commit. Change-Id: Ia0dc560bc84c8db4b84ddcc77a17bab5fbf93af9 Reviewed-on: https://gerrit.openafs.org/12790 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 74f4bfc627c836c12bb7c188b86d570d2afdcae8 Author: Mark Vitale Date: Thu Nov 30 16:08:38 2017 -0500 LINUX: create afs_linux_dget() compat wrapper For dentry operations that cover multiple dentry aliases of a single inode, create a compatibility wrapper to hide differences between the older dget_locked() and the current dget(). No functional change should be incurred by this commit. Change-Id: I2bb0d453417f37707018f6ba5859903c3d34c8ff Reviewed-on: https://gerrit.openafs.org/12789 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 367693bd7da2de593e3329f6acc4a4d07621fb97 Author: Mark Vitale Date: Thu Nov 30 13:45:27 2017 -0500 Revert "LINUX: do not use d_invalidate to evict dentries" Linux recently changed the semantics of d_invalidate() to: - return void - invalidate even a current working directory OpenAFS commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286 switched libafs to use d_prune_aliases() instead. However, since that commit, several things have happened: - RHEL 7.4 changed the semantics of d_invalidate() such that it invalidates the cwd, but did NOT change the return type to void. This broke our autoconf test for detecting the new semantics. - Further research reveals that d_prune_aliases() was not the best choice for replacing d_invalidate(). This is because for directories, d_prune_aliases() doesn't invalidate dentries when they are referenced by its children, and it doesn't walk the tree trying to invalidate child dentries. So it can leave dentries dangling, if the only references to thos dentries are via children. In preparation for future commits, revert c3bbf0b4444db88192eea4580ac9e9ca3de0d286 . Change-Id: Iafbef23a6070180c0e21eb01a2d59385ef52f55c Reviewed-on: https://gerrit.openafs.org/12788 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f8247078bd33a825d8734b2c8f05120d15ab3ffd Author: Mark Vitale Date: Thu Nov 30 14:04:48 2017 -0500 Revert "LINUX: eliminate unused variable warning" This reverts commit 19599b5ef5f7dff2741e13974692fe4a84721b59 to allow also reverting commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286 . Change-Id: I2780fe68d352f0f1def198f21127ec944d1d2c1d Reviewed-on: https://gerrit.openafs.org/12787 Reviewed-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit fb1f14d8ee963678a9caad0538256c99c159c2c4 Author: Stephan Wiesand Date: Fri Dec 22 14:40:32 2017 +0100 Linux 4.15: check for 2nd argument to pagevec_init Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and pagevec_init() no longer takes a "cold" flag as the second argument. Add a configure test and use it in osi_vnodeops.c . Change-Id: Ia5287b409b2a811d2250c274579e6f15fd18fdbb Reviewed-on: https://gerrit.openafs.org/12824 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Tested-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit be5f5b2aff2d59986dd8e7dd7dd531be24c27cb2 Author: Stephan Wiesand Date: Fri Dec 22 14:17:09 2017 +0100 Linux: use plain page_cache_alloc Linux 4.15 removes the distinction between "hot" and "cold" cache pages, and no longer provides page_cache_alloc_cold(). Simply use page_cache_alloc() instead, rather than adding yet another test. Change-Id: I34e734223927030f7ff252acb61120366a808ad6 Reviewed-on: https://gerrit.openafs.org/12823 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Tested-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 443dd5367e0cd9050ad39a6594c5be521271b4e9 Author: Pat Riehecky Date: Thu Mar 12 14:33:10 2015 -0500 redhat: separate debuginfo package for kmod rpm Place the debuginfo for the kmod into its own rpm so that it doesn't have to track against the userspace packages. FIXES 132034 Change-Id: I60a753275d896a89c1f6896c653d78a4e1fe7e2c Reviewed-on: https://gerrit.openafs.org/11867 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit fd4eaebb60dbefc27be98015fee23a3cf5d9752d Author: Christof Hanke Date: Mon Dec 18 16:58:39 2017 +0100 Avoid gcc warning When using the configure option --enable-checking with gcc 7.2.1, the compilation fails with vutil.c:860:20: error: ‘%s’ directive writing up to 255 bytes into \ a region of size 63 [-Werror=format-overflow=] This can be seen in the logs of the openSUSE Tumbleweed builder for e.g. build 2368. Avoid this warning by using snprintf which is provided by libroken for all platforms. Change-Id: I6acd3a1c06760abc8144c0892812c3bb50477227 Reviewed-on: https://gerrit.openafs.org/12813 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6e57b22642bafb177e0931b8fb24042707d6d62f Author: Marcio Barbosa Date: Thu Oct 12 12:42:40 2017 -0300 macos: make the OpenAFS client aware of APFS Apple has introduced a new file system called APFS. Starting from High Sierra, APFS replaces Mac OS Extended (HFS+) as the default file system for solid-state drives and other flash storage devices. The current OpenAFS client is not aware of APFS. As a result, the installation of the current client into an APFS volume will panic the machine. To fix this problem, make the OpenAFS client aware of APFS. Change-Id: Ib5ac88b87f348744864f4e33f1f222efbc852d41 Reviewed-on: https://gerrit.openafs.org/12743 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit e533d0737058940d59d93467c9b4d6d3ec2834e6 Author: Marcio Barbosa Date: Fri Oct 6 10:01:12 2017 -0300 macos: packaging support for MacOS X 10.13 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.13 "High Sierra". Change-Id: Id9da3cf959627a13d8cfd1d1d7412820e46ad63e Reviewed-on: https://gerrit.openafs.org/12742 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 804c9cbf501d4ca91b69ad8fd6d64e49efa25a47 Author: Marcio Barbosa Date: Tue Oct 3 17:01:56 2017 -0300 macos: add support for MacOS 10.13 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.13 "High Sierra". Change-Id: I51928279d97c9d86c67db7de5eb7fc9d317fd381 Reviewed-on: https://gerrit.openafs.org/12741 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit edc5463f3db4b6af2307741d9f4ee8f2c81cd98e Author: Benjamin Kaduk Date: Thu Dec 14 19:54:57 2017 -0600 Fix macro used to check kernel_read() argument order The m4 macro implementing the configure check is called LINUX_KERNEL_READ_OFFSET_IS_LAST, but it defines a preprocessor symbol that is just KERNEL_READ_OFFSET_IS_LAST. Our code needs to check for the latter being defined, not the former. Reported by Aaron Ucko. Change-Id: Id7cd3245b6a8eb05f83c03faee9c15bab8d0f6e8 Reviewed-on: https://gerrit.openafs.org/12808 Reviewed-by: Anders Kaseorg Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 894555f93a2571146cb9ca07140eb98c7a424b01 Author: Benjamin Kaduk Date: Mon Dec 4 17:20:57 2017 -0600 OPENAFS-SA-2017-001: rx: Sanity-check received MTU and twind values Rather than blindly trusting the values received in the (unauthenticated) ack packet trailer, apply some minmial sanity checks to received values. natMTU and regular MTU values are subject to Rx minmium/maximum packet sizes, and the transmit window cannot drop below one without risk of deadlock. The maxDgramPackets value that can also be present in the trailer already has sufficient sanity checking. Extremely low MTU values (less than 28 == RX_HEADER_SIZE) can cause us to set a negative "maximum usable data" size that gets used as an (unsigned) packet length for subsequent allocation and computation, triggering an assertion when the connection is used to transmit data. FIXES 134450 Change-Id: I37698ff166da47a57aa0d1962ae8effc74e30851 commit 4fa0ee620cfb9991ca9748b5ee116cc8e1e6c505 Author: Benjamin Kaduk Date: Mon Nov 27 22:17:28 2017 -0600 afs: Fix bounds check in PNewCell Reported by the opensuse buildbot: CC [M] /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/libafs/MODLOAD-4.13.12-1-default-MP/rx_packet.o /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c: In function ‘PNewCell’: /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c:3075:55: error: ‘*’ in boolean context, suggest ‘&&’ instead [-Werror=int-in-bool-context] if ((afs_pd_remaining(ain) < AFS_MAXCELLHOSTS +3) * sizeof(afs_int32)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ The bug was introduced in commit 718f85a8b6. Change-Id: Iae55a99e35266aa763fb431f2acc4eba09fa5357 Reviewed-on: https://gerrit.openafs.org/12782 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 66b74e78ba5fea6a8236dcd3b8b46e1dfa6a0ac7 Author: Benjamin Kaduk Date: Mon Nov 27 22:07:53 2017 -0600 rx: fix call refcount leak in error case The recent event handling normalization in commit 304d758983b499dc568d6ca57b6e92df24b69de8 had event handlers switch to dropping their reference on the associated connection/call just before return. An early return case was missed in the conversion, leading to a refcount leak in an error case. Change-Id: Ie3d0bc9474fdbc09be9c753f4d0192c8cca68351 Reviewed-on: https://gerrit.openafs.org/12781 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3ce55426ee6912b78460465bcaa1428333ad1fbc Author: Marcio Barbosa Date: Thu Nov 16 17:24:03 2017 -0500 afs: fix kernel_write / kernel_read arguments The order / content of the arguments passed to kernel_write and kernel_read are not right. As a result, the kernel will panic if one of the functions in question is called. [kaduk@mit.edu: include configure check for multiple kernel_read() variants, per linux commits bdd1d2d3d251c65b74ac4493e08db18971c09240 and e13ec939e96b13e664bb6cee361cc976a0ee621a] FIXES 134440 Change-Id: I4753dee61f1b986bbe6a12b5568d1a8db30c65f8 Reviewed-on: https://gerrit.openafs.org/12769 Tested-by: BuildBot Tested-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 50a3eb7b7ee94bffaadc98429bd404164e89ec7f Author: Michael Meffie Date: Mon Nov 6 17:37:46 2017 -0500 tests: fix out of bounds access in the rx-event test Use the NUMEVENTS symbol which defines the array size instead of an incorrect hard coded number when checking if a second event can be added to be fired at the same time. This fixes a potential out of bounds access of the event test array. Also update the comment which incorrectly mentions the incorrect number of events in the test. Change-Id: I4f993b42e53e7e6a42fa31302fd1baa70e9f5041 Reviewed-on: https://gerrit.openafs.org/12762 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2ae84bf053fe66b73a2c77b5d71305bae2c17587 Author: Benjamin Kaduk Date: Thu Nov 16 04:49:49 2017 -0600 Sprinkle rx_GetConnection() for concision Instead of inlining the body (taking the lock, incrementing the refcount, and dropping the lock), use the convenience function designed for this purpose. Change-Id: I674d389e61e42710ef340e202992748e66c5e763 Reviewed-on: https://gerrit.openafs.org/12772 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 01bcfd3e14f6ee1faa4b8ce5a7932de37d585fd3 Author: Benjamin Kaduk Date: Thu Nov 16 04:48:02 2017 -0600 rx: fix mutex leak in error case Reported by Mark Vitale Change-Id: I3269fbb0f87285bcb9af64f4ad81791177582e6d Reviewed-on: https://gerrit.openafs.org/12771 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a7a3108e602c83176c5578c9f28b6312f71aba78 Author: Benjamin Kaduk Date: Tue Oct 31 19:49:09 2017 -0500 Add event-related mutex assertions In utility functions that access fields of type struct rxevent *, assert that the appropriate lock is held for the access in question. These assertions are only compiled in when built with -DOPR_DEBUG_LOCKS, which can be enbled by --debug-locks at configure time. Change-Id: I16885a4d37a0f094f0d365c54e8157ed92070c69 Reviewed-on: https://gerrit.openafs.org/12757 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 304d758983b499dc568d6ca57b6e92df24b69de8 Author: Benjamin Kaduk Date: Sat Oct 7 22:42:38 2017 -0500 Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all(*) usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: * Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection *after* doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. Also enable refcount checking unconditionally on unix, as this is a rather invasive change late in the 1.8.0 release process and we want to get as much sanity checking coverage as possible. Change-Id: I27bcb932ec200ff20364fb1b83ea811221f9871c Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit bdb509fb1d8e0fdca05dffecdbcbf60a95ea502e Author: Benjamin Kaduk Date: Wed Oct 4 23:03:44 2017 -0500 Adjust rx-event test to exercise cancel/fire race We currently do not properly handle the case where a thread runs rxevent_Cancel() in parallel with the event-handler thread attempting to fire that event, but the test suite only picked up on this issue in a handful of the Debian automated builds (somewhat less-resourced ones, perhaps). Modify the event scheduling algorithm in the test so as to create a larger chunk of events scheduled to fire "right away" and thereby exercise the race condition more often when we proceed to cancel a quarter of events "right away". Change-Id: I50f55fd532901147cfda1a5f40ef949bf3270401 Reviewed-on: https://gerrit.openafs.org/12755 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 311f1d28a2f626350b33ad432e674055b62511bd Author: Michael Laß Date: Thu Nov 2 21:16:49 2017 +0100 gtx: link against libtinfo if termlib is seperated If ncurses is built with "./configure --with-termlib=tinfo", gtx fails to link because of an undefined reference to the LINES symbol which is then provided by libtinfo.so and not libncurses.so. If ncurses is present, additionally check whether LINES is provided by ncurses or tinfo and set $LIB_curses accordingly. This change is based on a patch provided by Bastian Beischer. FIXES 134420 Change-Id: I3e29c61405d90d0b850bafe4c51125bef433452b Reviewed-on: https://gerrit.openafs.org/12760 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e0c5ada214596d5adb6798682d5e280cc99f447c Author: Benjamin Kaduk Date: Mon Oct 16 16:53:22 2017 -0500 Correct m4 conditionals in curses.m4 AS_IF does not invoke the test(1) shell builtin for us, so we must take care to consistently use it ourself. While here, sprinkle some missing double-quotes around variable expansions in AS_IF statements in this file. Submitted by Bastian Beischer. FIXES 134414 Change-Id: Iccfe311011f17de6317cf64abdc58b0812b81b8c Reviewed-on: https://gerrit.openafs.org/12738 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 5ee516b3789d3545f3d78fb3aba2480308359945 Author: Damien Diederen Date: Mon Sep 18 12:18:39 2017 +0200 Linux: Use kernel_read/kernel_write when __vfs variants are unavailable We hide the uses of set_fs/get_fs behind a macro, as those functions are likely to soon become unavailable: > Christoph Hellwig suggested removing all calls outside of the core > filesystem and architecture code; Andy Lutomirski went one step > further and said they should all go. https://lwn.net/Articles/722267/ Change-Id: Ib668f8fdb62ca01fe14321c07bd14d218744d909 Reviewed-on: https://gerrit.openafs.org/12729 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit a71288a387095ccb4be83c1abae34ada80f53185 Author: Michael Meffie Date: Fri Jul 21 22:30:43 2017 -0400 redhat: avoid rpmbuild exclude directives Older versions of rpmbuild do not support the files exclude directive, so fall back to the old way in which we remove the files to be excluded and list the files to be included. Change-Id: If64df382ef372aa1078f1703a34942a1930bdc88 Reviewed-on: https://gerrit.openafs.org/12733 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4d247e1ae446c512031511273d556ef1fd32dca1 Author: Michael Meffie Date: Fri Jul 21 22:16:44 2017 -0400 redhat: move .krb variants to the kauth-client subpackage Move the deprecated klog.krb, pagsh.krb, and tokens.krb programs and man pages to the optional openafs-kauth-client subpackage. Change-Id: I09a2e36b60f9d47726a6a314a26db88e44575567 Reviewed-on: https://gerrit.openafs.org/12732 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 671db4ca5a76625d9b7133510cc1cbdda8a5d9b9 Author: Michael Meffie Date: Thu Jul 20 04:13:04 2017 -0400 redhat: specify man pages without wildcards Currently, some of the man pages are specified with the full name and some are specified with a wildcard for the filename extension. Instead, specify all the man pages without a wildcards to be more consistent and to avoid putting incorrect man pages in packages. This change removes a stray copy the klog.krb5.1 man page from openafs-kauth-client subpackage and moves the AuthLog/AuthLog.dir man pages to the optional openafs-kauth-server subpackage. Change-Id: Id30a6174c532a9a00f850d6ca2722158293d5118 Reviewed-on: https://gerrit.openafs.org/12731 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a9810b829bdccfed4d1718b11cf4dd51f9565e00 Author: Michael Meffie Date: Fri Jul 21 18:05:48 2017 -0400 redhat: remove afsd.fuse man page The afsd.fuse binary is not currently packaged; do not package the man page. Change-Id: Ia0dd4fa72dc8a87e2c835798b6fbe1213d71da5f Reviewed-on: https://gerrit.openafs.org/12730 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 68ec78950a6e39dc1bf15012d4b889728086d0b7 Author: Marcio Barbosa Date: Mon Aug 21 14:21:54 2017 -0400 ubik: avoid DISK_Begin on sites that didn't vote for sync As already described on 7c708506, SDISK_Begin fails on remotes if lastYesState is not set. To fix this problem, 7c708506 does not allow write transactions until we know that lastYesState is set on at least quorum (ubik_syncSiteAdvertised == 1). In other words, if enough sites received a beacon packet informing that a sync-site was elected, write transactions will be allowed. This means that ubik_syncSiteAdvertised can be true while lastYesState is not set in a few sites. Consider the following scenario in a cell with frequent write transactions: Site A => Sync-site (up) Site B => Remote 1 (up) Site C => Remote 2 (down - unreachable) Since A and B are up, we have quorum. After the second wave of beacons, ubik_syncSiteAdvertised will be true and write transactions will be allowed. At some point, C is not unreachable anymore. Site A sends a copy of its database to C, but C did not vote for A yet (lastYesState == 0). A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. Since C is reachable, A will mark this remote site as up. The sync-site will send its database to C, but C did not vote for A yet. A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. In a cell with frequent write transactions, this cycle will repeat forever. As a result, the sync-site will be constantly sending its database to C and quorum will be operating with less sites, increasing the chances of re-elections. To fix this problem, do not call DISK_Begin on remotes that did not vote for the sync-site yet. Change-Id: I27f5122a089064e7b83beba3533261d8a4e31c64 Reviewed-on: https://gerrit.openafs.org/12715 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 929e77a886fc9853ee292ba1aa52a920c454e94b Author: Damien Diederen Date: Mon Sep 18 11:59:40 2017 +0200 Linux: Test for __vfs_write rather than __vfs_read The following commit: commit eb031849d52e61d24ba54e9d27553189ff328174 Author: Christoph Hellwig Date: Fri Sep 1 17:39:23 2017 +0200 fs: unexport __vfs_read/__vfs_write unexports both __vfs_read and __vfs_write, but keeps the former in fs.h--as it is is still being used by another part of the tree. This situation results in a false positive in our Autoconf check, which does not see the export statements, and ends up marking the corresponding API as available. That, in turn, causes some code which assumes symmetry with __vfs_write to fail to compile. Switch to testing for __vfs_write, which correctly marks the API as unavailable. Change-Id: I392f2b17b4de7bd81d549c84e6f7b5ef05e1b999 Reviewed-on: https://gerrit.openafs.org/12728 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0a9a6b57ce6e1c97fcc651c8cb74e66fc8422a1e Author: Anders Kaseorg Date: Fri Sep 1 23:37:07 2017 -0400 vol: Fix two buffers being one char too short Fixes these warnings: namei_ops.c: In function 'namei_copy_on_write': namei_ops.c:1328:31: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=] snprintf(path, sizeof(path), "%s-tmp", name.n_path); ^~~~~~~~ namei_ops.c:1328:2: note: 'snprintf' output between 5 and 260 bytes into a destination of size 259 snprintf(path, sizeof(path), "%s-tmp", name.n_path); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vol_split.c: In function 'split_volume': vol_split.c:576:22: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(symlink, "#%s", V_name(newvol)); ^~~~~ vol_split.c:576:5: note: 'sprintf' output between 2 and 33 bytes into a destination of size 32 sprintf(symlink, "#%s", V_name(newvol)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Change-Id: If212ebc29fa3fe10fe1e2f70dfb5f7509c269ae9 Reviewed-on: https://gerrit.openafs.org/12722 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 962f4838dc461567d896304f617a0923745d13d5 Author: Seth Forshee Date: Tue Aug 22 07:59:11 2017 -0500 Linux: Include linux/uaccess.h rather than asm/uaccess.h if present Starting with Linux 4.12 there is a module build error on s390 due to asm/uaccess.h using a macro defined in the common header. The common header has been around since 2.6.18 and has always included asm/uaccess.h, so switch to using the common header whenever it is present. Change-Id: Iaab0d7652483a2a2b1f144f3e90b6d3b902c146d Signed-off-by: Seth Forshee Reviewed-on: https://gerrit.openafs.org/12714 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit e739eaa650ee30dcce54d05908b062839eafbf73 Author: Michael Meffie Date: Fri Apr 14 20:38:27 2017 -0400 redhat: move bosserver and fssync-debug man pages Move the bosserver and fssync-debug/dafssync-debug man pages to the openafs-server package, which distributes those programs. Change-Id: I9c84ad485834177fd43b28acd444d3d54c648cc8 Reviewed-on: https://gerrit.openafs.org/12601 Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 5b79f95f7457f203213a9170389b17ffcc0208f7 Author: Michael Meffie Date: Thu Apr 13 21:48:06 2017 -0400 redhat: kauth client and server sub-packages Move the kaserver and kauth client programs to conditionally built packages called openafs-kauth-server and openafs-kauth-client. Packagers can build these by specifying '--with kauth'. They are not built by default to discourage use. This commit subsumes the openafs-kpasswd package into the openafs-kauth-client package. Change-Id: I1322f05d7fe11d466c9ed71a5059c21b759d95ab Reviewed-on: https://gerrit.openafs.org/12600 Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 54e478328fa24aa2629398c5ddfad7b50d353dd7 Author: Michael Meffie Date: Mon Apr 10 15:06:02 2017 -0400 redhat: do not package kauth by default Do not package kaserver and related programs by default to discourage use. Add the '--with kauth' rpmbuild option to allow packagers to continue include the kauth programs for compatibility. Change-Id: I8bf9f6dc221afc22ed6c9a33cf101d705e6c4920 Reviewed-on: https://gerrit.openafs.org/12597 Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 6d59b7c4b4b712160a6d60491c95c111bb831fbb Author: Benjamin Kaduk Date: Sun Jul 30 20:57:05 2017 -0500 Default to crypt mode for unix clients Though the protection offered by rxkad, even with rxkad-k5 and rxkad-kdf, is insufficient to protect traffic from a determined attacker, it remains the case that the internet is not a safe place for user data to travel in the clear, and has not been for a long time. The Windows client encrypts by default, and all or nearly all the Unix client packaging scripts set crypt mode by default. Catch up to reality and default to crypt mode in the Unix cache manager. Change-Id: If0061ddca3bedf0df1ade8cb61ccb710ec1181d4 Reviewed-on: https://gerrit.openafs.org/12668 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f7ccf0aa00459cda4579a3838b5bd59ba69c03ea Author: Marcio Barbosa Date: Mon Jul 31 15:27:10 2017 -0400 ubik: remove useless signal call The current version does not have a corresponding LWP_WaitProcess call for the beacon_globals.ubik_amSyncSite global. As a result, the LWP_NoYieldSignal(&beacon_globals.ubik_amSyncSite) signal call can be safely removed. Change-Id: I72c4ccfe8e68551673dc728dd699ba8c561d76d1 Reviewed-on: https://gerrit.openafs.org/12673 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b4c3baa2e24890face6433fcb160e85b7409df4c Author: Michael Meffie Date: Wed Aug 2 15:25:45 2017 -0400 doc: add a document to describe rx debug packets This document gives a basic description of Rx debug packets, the protocol to exchange debug packets, and the version history. Change-Id: Ic040d336c1e463f7da145f1a292c20c5d5f215df Reviewed-on: https://gerrit.openafs.org/12677 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b8e8145fa97e3edb6e4157f0d60d3d5e8db597fe Author: Michael Meffie Date: Tue Aug 1 20:36:18 2017 -0400 doc: add kolya's rx-spec to doc/txt Add rx protocol spec and rx debug spec written by Nickolia Zeldovich. Rx protocol specification draft (2002) Nickolai Zeldovich, kolya@MIT.EDU Change-Id: I65a9a83a8889503f3a82c8fde7a87f84d2736c8d Reviewed-on: https://gerrit.openafs.org/12676 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c6f5ebc4cf95b0f1d3acc7a0a8678ba0d4378243 Author: Michael Meffie Date: Tue Aug 1 20:10:32 2017 -0400 doc: relocate notes from arch to txt The doc/txt directory has become the de facto home for text-based technical notes. Relocate the contents of the doc/arch directory to doc/txt. Relocate doc/examples to doc/txt/examples. Update the doc/README file to be more current and remove old work in progress comments. Change-Id: Iaa53e77eb1f7019d22af8380fa147305ac79d055 Reviewed-on: https://gerrit.openafs.org/12675 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 57d32c29146167ff54d3221ed761a5973776ae93 Author: Benjamin Kaduk Date: Tue Aug 1 20:50:37 2017 -0500 Add NEWS entry for recent ubik changes Of the ubik-fix-write-after-recovery topic, this seems like the most noteworthy portion, with the other bits wrapped up in the preface. Change-Id: Icc1afb9f851ef2d7ade49c2382cc023997f1bf26 Reviewed-on: https://gerrit.openafs.org/12679 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit da704137f4bf766250ca87dbdc5a85c2024cb0a6 Author: Marcio Barbosa Date: Thu Jul 20 23:02:15 2017 -0400 ubik: update epoch as soon as sync-site is elected The ubik_epochTime represents the time at which the coordinator first received its coordinator mandate. However, this global is currently not updated at the moment when a new sync-site is elected. Instead, ubik_epochTime is only updated at the very end of the first write transaction, when a new database label is written (in udisk_commit). This causes at least 2 different issues: For one, this means that we change ubik_epochTime while a remote transaction is in progress. If VOTE_Beacon is called after ubik_epochTime is updated, but before the remote transaction ends, the remote sites will detect that the transaction id in ubik_currentTrans is wrong (via urecovery_CheckTid(), since the epoch doesn't match), and they will abort the transaction. This means the transaction will fail, and it may cause a loss of quorum until another election is completed. Another issue is that ubik_epochTime can be 0 at the beginning of a write transaction, if this is the first election that this site has won. Since ubik_epochTime is used to construct transaction ids, this means that we can have different transactions that originate from different sites at different times, but they have the same epoch in their tid. For example, say a write transaction starts with epoch 0, but the originating site is killed/interrupted before finishing. That write transaction will linger on remote sites in ubik_currentTrans with an epoch of 0 (since the originating site will never call DISK_ReleaseLocks, or DISK_Abort, etc). Normally the sync site will kill such a lingering transaction via urecovery_CheckTid, but since the epoch is 0, and the election winner's epoch is also 0, the transaction looks valid and may never be killed. If that transaction is holding a lock on the database, this means that the database will forever remain locked, effectively preventing any access to the db on that site. To fix both of these issues, update ubik_epochTime with the current time as soon as we win the election. This ensures that the epoch is not updated in the middle of a transaction, and it ensures that all transactions are created with a unique epoch: the epoch of the election that we won. Note that with this commit, we do not ever set ubik_epochTime to the magic value of '2' during database init. The special '2' epoch only needs to be set in the database itself, and it is never an actual epoch that represents a real quorum that went through the election process. The database will be labelled with a 'real' epoch after the first write, like normal. [kaduk@mit.edu: comment the locking strategy in ubeacon_Interact()] Change-Id: I6cdcf5a73c1ea564622bfc8ab7024d9901af4bc8 Reviewed-on: https://gerrit.openafs.org/12609 Tested-by: BuildBot Reviewed-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk commit 32ddf88547f921b33dd93473883928051faab950 Author: Joe Gorse Date: Thu Jul 6 15:47:24 2017 -0400 LINUX: afs_create infinite fetchStatus loop For a file in a directory with the CStatd bit cleared, we can get an infinite fetchStatus loop. In afs_create(), afs_getDCache() may return NULL due to an error. If unchecked it will loop which may produce multiple fetchStatus() calls to the fileserver. Credit: Yadav Yadavendra for identifying and analysing this issue. Change-Id: Iecd77d49a5f3e8bb629396c57246736b39aa935f Reviewed-on: https://gerrit.openafs.org/12651 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 18fabf9aecf358e0f45e25f6249685f7f2e32485 Author: Benjamin Kaduk Date: Wed Aug 2 19:31:17 2017 -0500 Update NEWS for volume stats default change Change-Id: I1a184bf638609866f6f7f1d11c224dfee1113eef Reviewed-on: https://gerrit.openafs.org/12678 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 8e1ca72b1cbed930d3661dee5cb742cab52737e9 Author: Michael Meffie Date: Tue Aug 1 17:21:13 2017 -0400 volser: preserve volume stats by default Commit dfceff1d3a66e76246537738720f411330808d64 added the -preserve-vol-stats flag to the volume server. This enabled a change in the volume server to preserve volume usage statistics during reclone and restore operations. Otherwise, volume usage counters of read-only volumes are cleared when volumes are released, making it difficult to track usage with the volume stats. Make this feature the default behavior of the volume server and provide the option -clear-vol-stats to use the old behavior if so desired. This change makes the -preserve-vol-stats the default, and keeps it as a hidden flag for sites which may already have that flag set in the BosConfig. Since this changes a default behavior of the volume server, this change is only appropriate on a major or minor release boundary, not in the middle of a stable series. Change-Id: I3706ede64b7b18a80b39ebd55f2e1824bb7dbc57 Reviewed-on: https://gerrit.openafs.org/12674 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7c7085061580ccce7b2d9c17df5604e5e97fcd81 Author: Marcio Barbosa Date: Mon May 22 12:55:32 2017 -0400 ubik: avoid early DISK_Begin calls we know will fail Currently, we can start a write transaction on a site immediately after it is elected as the sync site. However, after commit d47beca1, SDISK_Begin on remote sites will fail right after an election occurs (since lastYesState is not set, and so urecovery_AllBetter will fail). And after commit fac0b742, this error is always noticed and propagated back to the application. As a result, when we try to write immediately after a sync site is elected, the transaction will fail with UNOQUORUM, the remote sites will be marked as down, and we may lose quorum and require another election to be performed. This can easily happen repeatedly for a site that frequently tries to make changes to a ubik database. To avoid marking other sites down and going through another election process, do not allow write transactions until we know that lastYesState is set on the remote sites. We do this by waiting until the next wave of beacons are sent, which tell the remote sites that we are the sync site. In other words, only allow write transactions after the sync site knows that the remote sites also know that the sync site has been elected. With this commit, a write transaction immediately after an election will still fail with UNOQUORUM, but we avoid triggering an error on the remote sites, and avoid losing quorum in this situation. Change-Id: I9e1a76b4022e6d734af1165d94c12e90af04974d Reviewed-on: https://gerrit.openafs.org/12592 Reviewed-by: Andrew Deason Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 8f46ca082653116c9c42a69e2535be1bb2f0a2a9 Author: Marcio Barbosa Date: Wed Jun 21 17:42:37 2017 -0300 ubik: allow remote dbase relabel if up to date When a site is elected the sync-site, its database is not immediately relabeled. The database in question will be relabeled at the end of the first write transaction (in udisk_commit). To do so, the dbase->version is updated on the sync-site first (1) and then the versions of the remote sites are updated through SDISK_SetVersion() (2). In order to make sure that the remote site holds the same database as the sync-site, the SDISK_SetVersion() function checks if the current version held by the remote site (ubik_dbVersion) is equal to the original version stored by the sync-site (oldversionp). If ubik_dbVersion is not equal to oldversionp, SDISK_SetVersion() will fail with USYNC. However, ubik_dbVersion can be updated by the vote thread at any time. That is, if the sync site calls VOTE_Beacon() on the remote site between events (1) and (2), the remote site will set ubik_dbVersion to the new version, while ubik_dbase->version is still set to the old version. As a result, ubik_dbVersion will not be equal to oldversionp and SDISK_SetVersion() will fail with USYNC. This failure may cause a loss of quorum until another election is completed. To fix this problem, let SDISK_SetVersion() relabel the database when ubik_dbase->version is equal to oldversionp. In order to try to only affect the scenario described above, also check if ubik_dbVersion is equal to newversionp. Change-Id: I97e6f8cacd1c9bca0b4c72374c058c5fe5b638b3 Reviewed-on: https://gerrit.openafs.org/12613 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3c12ff9fbb2724b6e430f3eeeb2c2a1d2ae3f884 Author: Joe Gorse Date: Wed May 10 11:38:25 2017 -0400 afs: fix repeated BulkStatus calls for directories. There is a filetype comparison check in afs_DoBulkStat just after BulkFetch RPC. This check will fail for directories even though bulkStatus was done for directories. This code is apparently necessary for Darwin, but it causes this problem otherwise. Thus it is removed from the rest of the builds using the AFS_DARWIN_ENV preprocessor variable. Credit: Yadav Yadavendra for identifying and analysing this issue. Change-Id: I9645f0e7a3327cb5f20cdf3ba2bf1cc5b1509bb5 Reviewed-on: https://gerrit.openafs.org/12610 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 90acda692a589eb177dc5dee99490947106f8141 Author: Michael Meffie Date: Thu Jul 20 00:12:05 2017 -0400 relocate old afs docs to doc/txt Move the afs/DOC files to the top-leve doc/txt directory, since this has become the home for developer oriented documentation. Change-Id: I128d338c69534b4ee6043105a7cfd390b280afe3 Reviewed-on: https://gerrit.openafs.org/12662 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5a88209a0ff0cefb7ec1a810e25011ee9795d2fe Author: Michael Meffie Date: Wed Jul 19 23:48:42 2017 -0400 Incorporate old release notes into NEWS Cleanup the doc/txt directory by incorporating the old release notes into the NEWS file. Change-Id: I63911fc5cb0b476e201148c6d3fa3441f4746ab7 Reviewed-on: https://gerrit.openafs.org/12661 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3629ae4a682d648d6830bf551ed78faaa4cfc477 Author: Michael Meffie Date: Wed Jul 19 22:39:51 2017 -0400 Update NEWS for 1.8.0pre2 Change-Id: I5f83e81f25177bde1ea691e756359563e80ee3f2 Reviewed-on: https://gerrit.openafs.org/12660 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1d5b255ff68af03da891a0babefdadd85f48def0 Author: Michael Meffie Date: Wed Jul 19 23:09:01 2017 -0400 Import NEWS from openafs-stable-1_6_x Import change descriptions for 1.6.20.1, 1.6.20.2, 1.6.21. Change-Id: Ib4f06c7046eb6e1bb0a1ccfb9f6c45191154fe0e Reviewed-on: https://gerrit.openafs.org/12659 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 77c5e4f3fba57c85fd664f64dba2c44a44a4fb5c Author: Stephan Wiesand Date: Wed Jul 26 15:18:08 2017 +0200 Linux: fix whitespace in osi_sysctl.c Remove dozens of trailing spaces and make consistent use of tabs for indentation throughout the file. Change-Id: Ibbd17d2b9828590ffd84b76aac70646e9fe9cb2c Reviewed-on: https://gerrit.openafs.org/12665 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b0461f2def17fe3d3f49e51e3c4a1df81a921eee Author: Andrew Deason Date: Thu Jun 15 15:32:41 2017 -0500 LINUX: Workaround d_splice_alias/d_lookup race Before Linux kernel commit 4919c5e45a91b5db5a41695fe0357fbdff0d5767, d_splice_alias in some cases can d_rehash the given dentry without attaching it to the given inode, right before the dentry is unhashed again. This means that for a few moments, that negative dentry is visible to __d_lookup, and thus is visible to path lookup and can be given to afs_linux_dentry_revalidate. Currently, afs_linux_dentry_revalidate will say that the dentry is valid, because d_time and other fields are set; it's just not attached to an inode. This causes an ENOENT error on lookup, even though the file is there (and no OpenAFS code said otherwise). Normally this race is rare, but it can be frequently exercised if we access the same directory via different names at the same time. This can happen with multiple mountpoints to the same volume, or by accessing an @sys directory via its abbreviated and expanded forms. To get around this, make afs_linux_dentry_revalidate check negative 'dentry's to see if they are unhashed. We also lock the parent inode, in order to guarantee that a problematic d_splice_alias call isn't running at the same time (and thus, we know the dentry will not be unhashed immediately afterwards). This slows down afs_linux_dentry_revalidate for valid negative 'dentry's a little, but it allows us to use negative dentry's at all. Linux kernel commit 4919c5e45a91b5db5a41695fe0357fbdff0d5767 fixes this issue, which was included in 2.6.34, so don't do this workaround for 2.6.34 and on. Change-Id: I8e58ebed4441151832054b1ef3f1aa5af1c4a9b5 Reviewed-on: https://gerrit.openafs.org/12638 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d55b41072ce873210481baa4cae5c7143011869b Author: Stephan Wiesand Date: Mon Jul 24 11:37:54 2017 +0200 Linux 4.13: use designated initializers where required struct path is declared with the "designated_init" attribute, and module builds now use -Werror=designated-init. Cope. And as pointed out by Michael Meffie, struct ctl_table has the same requirement now, so use a designated initializer for the final element of the sysctl table too. Change-Id: I0ec45aac961dcefa0856a15ee218085626a357c7 Reviewed-on: https://gerrit.openafs.org/12663 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 030a9849e22f443492342794f436e2c86c98a903 Author: Michael Meffie Date: Fri Jul 7 11:11:12 2017 -0400 afs: fix afs_xserver deadlock in afsdb refresh When setting up a new volume, the cache manager calls afs_GetServer() to setup the server object for each fileserver associated with the volume. The afs_GetServer() function locks afs_xserver and then, among other things, calls afs_GetCell() to lookup the cell info by cell number. When the cache manager is running in afsdb mode, afs_GetCell() will attempt to refresh the cell info if the time-to-live has been exceeded since the last call to afs_GetCell(). During this refresh the AFSDB calls afs_GetServer() to update the vlserver information. The afsdb handler thread and the thread processing the volume setup become deadlocked since the afs_xserver lock is already held at this point. This bug will manifest when the DNS SRV record TTL is smaller than the time the fileservers respond to the GetCapabilities RPC within afs_GetServer() and there are multiple read-only servers for a volume. Avoid the deadlock by using the afs_GetCellStale() variant within afs_GetServer(). This variant returns the memory resident cell info without the afsdb upcall and the subsequent afs_GetServer() call. Change-Id: Iad57870f84c5e542a5ee20f00ea03b3fc87683a1 Reviewed-on: https://gerrit.openafs.org/12652 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a6ad67485bf23084c06e1de1a424b2e375ee70f3 Author: Michael Meffie Date: Tue Jul 11 08:51:08 2017 -0400 afs: restore force_if_down check when getting connections Commit cb9e029255420608308127b0609179a46d9983ad removed the force_if_down check in afs_ConnBySA, which effictively turned on force_if_down flag for every call to afs_ConnBySA. This caused afs_ConnBySA to always return connections, even for server addresses marked down and force_if_down set to 0. One serious consequence of this bug is the cache manager will retry the preferred vlserver indefinitely when it is unreachable. This is because the loop in afs_ConnMHosts always tries hosts in preferred order and expects afs_ConnBySA to return a NULL if the server address has no connections because it is marked down. Restore the check for server addresses marked down to honor the force_if_down flag again so we do not get connections for down servers unless requested. Change-Id: Ia117354929a62b0cedc218040649e9e0b8d8ed23 Reviewed-on: https://gerrit.openafs.org/12653 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a1c072ac562ccf74e5afb8449db1bcef86aef362 Author: Michael Meffie Date: Mon Apr 10 14:23:12 2017 -0400 redhat: fix rpmbuild command line option defaults Fix the handling of default values for the various rpmbuild options which can be given. These have been broken as code was shuffled around over the years. Remove obsolete comments about detecting what to build based on the architecture. Provide the '--without authlibs' option to disable the openafs-authlibs package. Change-Id: I6c8db1f3163ee241f9a4d1282345a0ddeabd284c Reviewed-on: https://gerrit.openafs.org/12596 Reviewed-by: Stephan Wiesand Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit a5bedda935c8147517bcbb56858dd88288fdf9da Author: Christof Hanke Date: Tue Jul 18 12:04:11 2017 +0200 mkvers: fix potential buffer overflow The space allocated for outputFileBuf is only 2 bytes larger than sizeof(VERS_FILE). But we add potentially 4 extra bytes like ".txt" or ".xml". Just allocate enough space for all file suffices. Change-Id: Ic0f97590be208deaf9c4a5c25e21056ea9d2cd6f Reviewed-on: https://gerrit.openafs.org/12657 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d7211350eec18b30e9ccf30f5e95fb58162c637d Author: Andrew Deason Date: Thu Jun 15 15:29:17 2017 -0500 afs_linux_lookup: Avoid d_add on afs_lookup error Currently, afs_linux_lookup looks roughly like this pseudocode: { code = afs_lookup(&vcp); if (!code) { ip = AFSTOV(vcp); error = process_ip(ip); if (error) { goto done; } } process_dp(dp); newdp = d_splice_alias(ip, dp); done: cleanup(); } Note that if there is an error while processing the looked-up inode (ip), we jump over d_splice_alias. But if we encounter an error from afs_lookup itself, we do not jump over d_splice_alias. This means that if afs_lookup encounters any error, we initialize the given dentry (dp) as a negative entry, effectively telling the Linux kernel that the requested name does not exist. This is correct for ENOENT errors, of course, but is incorrect for any other error. For non-ENOENT errors we later return an error from the function, but this does not invalidate the generated dentry. The result is that when afs_lookup encounters an error, that error will be propagated to userspace, but subsequent lookups for the same name will yield an ENOENT error (until the dentry is invalidated). This can easily cause a file to seem to mysteriously disappear, if a transient error like network problems caused the afs_lookup call to fail. To fix this, treat ENOENT as a non-error, like the comments already suggest. In our case, ENOENT is not really an error; it just means we populate the given dentry differently. So if we get ENOENT from afs_lookup, set our vcache to NULL and clear the error, and continue. This also has the side effect of not treating ENOENT errors from afs_CreateAttr identically to ENOENT errors from afs_lookup. That shouldn't happen, but there have been abuses of the ENOENT error code in the past, so it is probably better to be cautious. Many thanks to Gaja Sophie Peters for assistance in tracking down and testing fixes for this issue, including providing access to test systems experiencing the buggy behavior. FIXES 133654 Change-Id: Ia9aab289d5c041557ab6b00f1d41de2edfc97a89 Reviewed-on: https://gerrit.openafs.org/12637 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Joe Gorse Reviewed-by: Michael Meffie Tested-by: Michael Meffie commit 5dd2ce2043f53e80e1ded25abcfd565b4071a3ad Author: Andrew Deason Date: Thu Jun 15 15:29:48 2017 -0500 LINUX: Rearrange afs_linux_lookup cleanup Currently, the cleanup and error handling in afs_linux_lookup is structured similar to this pseudocode: if (!code) { if (!IS_ERR(newdp)) { return no_error; } else { return newdp_error; } } else { return code_error; } The multiple different nested error cases make this a little complex. To make this easier to follow for subsequent changes, alter this structure to be more like this: if (IS_ERR(newdp)) { return newdp_error; } if (code) { return code_error; } return no_error; There should be no functional change in this commit; it is just code reorganization. Technically the ordering of these checks is changed, but there is no combination of conditions that actually results in different code being hit. That is, if 'code' is nonzero and IS_ERR(newdp) is true, then we would go through a different path. But that cannot happen, since if 'code' is nonzero, we have no inode and so IS_ERR(newdp) cannot be true (d_splice_alias cannot return an error for a NULL inode). So there is no functional change. Change-Id: I94a3aef5239358c3d13fe5314044dcc85914d0a4 Reviewed-on: https://gerrit.openafs.org/12636 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Joe Gorse Reviewed-by: Michael Meffie Tested-by: Michael Meffie commit d0b64a4a1b61b5e22f0e3fe509f8facd30bc2b74 Author: Stephan Wiesand Date: Thu Jun 29 16:57:42 2017 +0200 doc: Add introduction and credits to ubik.txt Credit where it's due. And the remainder of the introduction may provide some useful context too. Change-Id: I99c7e599363126c581ae1ac00da67c33acc3687f Reviewed-on: https://gerrit.openafs.org/12644 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d1c4dbf28ae28bbfac3d8bc96d0fa5ae3d422bfd Author: Benjamin Kaduk Date: Sun Jun 25 13:56:04 2017 -0500 Put jhutz's ubik analysis in doc/txt A file in the source tree is much easier to locate than an old mailing list post; it's quite handy to have this at hand as a reference. Change-Id: I5267a2f86b36e92b05249364085bdd33aeb28d1b Reviewed-on: https://gerrit.openafs.org/12642 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 0327ead297e3cf395cced1e6690b901e445f074c Author: Andrew Deason Date: Fri Jun 23 17:20:11 2017 -0500 afs: Improve "Corrupt directory" warning This warning is a bit confusing to see, since it doesn't say anything about AFS (making it unclear where it's coming from), and it lacks a trailing newline (making it ugly). Fix both of these. Change-Id: I92a3d07fd193bf99b545aef9b21f52d23c356a2d Reviewed-on: https://gerrit.openafs.org/12641 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cdb92f94598e5b25fbcdfc6fb1650218ec05d63f Author: Jeffrey Altman Date: Thu Jun 1 22:25:49 2017 -0400 vol: modify volume updateDate upon salvage change If the salvager changed the volume, set the VolumeDiskData.updateDate field so that 1. the change is visible via "vos examine" 2. backup services will backup the corrected volume Teradactyl pointed out the problem which forces cell administrators to manually trigger a backup for each volume that has been salvaged. Change-Id: I9a35b92e8abbe3b54b08e64ac13de44442736c72 Reviewed-on: https://gerrit.openafs.org/12629 Tested-by: BuildBot Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit f5491119ff7d422b1c0c311a50e30bec1c15296c Author: Michael Meffie Date: Fri Jun 2 15:19:26 2017 -0400 bozo: do not fail silently on unknown bosserver options Instead of failing silently when the bosserver is started with an unknown option, print an error message and exit with a non-zero value. Continue to exit with 0 when the -help option is given to request the usage message. This change should help make bosserver startup failures more obvious when an unsupported option is specified. Example systemd status message: systemd[1]: Starting OpenAFS Server Service... bosserver[32308]: Unrecognized option: -bogus bosserver[32308]: Usage: bosserver [-noauth] .... systemd[1]: openafs-server.service: main process exited, code=exited, status=1/FAILURE Change-Id: I8717fb4a788fbcc3d1e2d271dd03511c5b504f10 Reviewed-on: https://gerrit.openafs.org/12630 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit aaa47dc1077f0dd5b0040006c831f64cc8a303b5 Author: Jeffrey Altman Date: Sat May 27 14:59:04 2017 -0400 rx: wake up send after 'twind' has been updated Beginning in AFS 3.4 and 3.5 the ack trailer includes the size of the peer's receive window. This value is used to update the sender's transmit window (twind). When the twind is increased the application thread is signaled to indicate that more packets can be sent. This change wakes the application thread after twind is updated by the peer's receive window instead of beforehand. Failure to do so can result in 100ms transmit delays when the receive window transitions from closed to open. Change-Id: Id129ea93e94612a4b8cce9f8cbddde9c779ff26b Reviewed-on: https://gerrit.openafs.org/12625 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 63e530e7df0b8013bcc4421b0bba558d4f1d2d57 Author: Joe Gorse Date: Tue May 16 07:29:30 2017 +0000 LINUX: Switch to new bdi api for 4.12. super_setup_bdi() dynamically allocates backing_dev_info structures for filesystems and cleans them up on superblock destruction. Appears with Linux commit fca39346a55bb7196888ffc77d9e3557340d1d0b Author: Jan Kara Date: Wed Apr 12 12:24:28 2017 +0200 Change-Id: I67eed0fcb8c96733390579847db57fb8a4f0df3e Reviewed-on: https://gerrit.openafs.org/12614 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b47dc5482da614742b01dcc62d5e11d766a9432f Author: Joe Gorse Date: Wed May 10 19:46:38 2017 +0000 LINUX: CURRENT_TIME macro goes away. Check if the macro exists, define it if it does not. Change-Id: I9990579f94bfba0804e60fa6ddcc077984cc46c3 Reviewed-on: https://gerrit.openafs.org/12611 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 7af9554bed2d906615e0f5a94537d3d553ca2d1e Author: Michael Meffie Date: Thu Apr 6 22:50:41 2017 -0400 redhat: update rpm spec file Update the spec file to keep up with accumulated changes. * Correct installation location of db check programs. * Install afsd to the legacy location to avoid breaking init scrips and systemd configs. * Exclude yet another duplicated copy of kpwvalid. * libubik_pthread.a is gone. * Install the kpwvalid man page. * Continue to remove the obsolete kdb program. * Update the names of the pam_afs symlinks. * Add libkopenafs to authlibs. * Package dafssync-debug man pages. * Package opr/queue.h in devel. * Package akeyconvert and man page. * Do not package fuse version of afsd. A separate sub-package for afsd.fuse is warrented, since it adds new libfuse dependencies. * Package new server man pages, including dafsssync-* pages. * Package libafsrfc3961.a as a devel lib. * Continue to package kauth programs. Change-Id: I875c3b8dee53abbc67b0f05f8b291bb58abf41a5 Reviewed-on: https://gerrit.openafs.org/12595 Reviewed-by: Michael Meffie Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit dd80f081663c50f93618da7a309b390f2fbdbc59 Author: Tim Creech Date: Sun Mar 5 18:13:45 2017 -0500 FBSD: build fix for FreeBSD 11 r285819 eliminated b_saveaddr from struct buf, while r292373 changed the arguments to VOP_GETPAGES. The approach used by this patch to address these changes was inspired by FreeBSD's nfs and samba clients. Change-Id: Ibcf6b6fde6c86f96aa814af2bca08f1a8b286740 Reviewed-on: https://gerrit.openafs.org/12575 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit dcfebc7ca2923c1f93df9105e493bd4228ea8a0e Author: Michael Meffie Date: Wed Apr 5 16:48:36 2017 -0400 redhat: convert rpm spec file to make install Convert the build and install from the deprecated 'make dest' to the modern 'make install' method. * Clarify the install section by unrolling the shell scripts, reorganizing, and improving the comments. * Remove the gzip glob of the man pages; rpmbuild automatically compresses the man pages and will handle symlinks correctly. * Remove the generated temporary list file and specify files directly. * Remove the extra tar commands to install the man pages out of the doc directory; 'make install..' installs the man pagess. * Remove code in the install section which determines the sysname. This is no longer needed during the install. * Update the kernel module install commands to accommodate the conversion from 'make dest'. Change-Id: I97ec80185a2b11704b27ea74941b50ff4a5aca8c Reviewed-on: https://gerrit.openafs.org/12594 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit bd8bec5b474315cd28df5a4741c1e07d48c7250a Author: Michael Meffie Date: Tue Apr 25 18:34:47 2017 -0400 redhat: fix whitespace errors in the rpm spec file Remove trailing whitespace characters that have crept into the rpm spec file over the years. Change-Id: I08c7ad926ddb524d6938b26513963c28c70b4195 Reviewed-on: https://gerrit.openafs.org/12606 Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 6b7b4239ab22fbb301e3b50e2ca4072445ba4e9e Author: Stephan Wiesand Date: Tue Apr 11 11:58:55 2017 +0200 Linux: only include cred.h if it exists Commit c89fd17df1032ec2eacc0d0c9b73e19c5e8db7d2 introduced an explicit include of linux/cred.h since the latest kernel no longer includes it implicitly in sched.h. Alas, older kernels (like 2.6.18) don't have this file. Add a configure test for the existence of cred.h and only include it if actually present. Change-Id: Ia7e38160492b1e03cdb257e4b2bef4d18c4a28fb Reviewed-on: https://gerrit.openafs.org/12593 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c89fd17df1032ec2eacc0d0c9b73e19c5e8db7d2 Author: Mark Vitale Date: Thu Mar 23 18:36:44 2017 -0700 Linux v4.11: cred.h is no longer included in sched.h With Linux commit e26512fea5bcd6602dbf02a551ed073cd4529449, cred.h is no longer included in sched.h. Several components of libafs which require cred.h were picking it by including sched.h. Instead, explicitly add an include for cred.h. cred.h begins with a customary one-shot to prevent multiple loads: #ifndef _LINUX_CRED_H #define _LINUX_CRED_H Therefore we don't need a new autoconf test or preprocessor conditional to prevent redundant includes on older Linux releases. Change-Id: Ifc496c83141d2cfbd417133feb6d87c1146e5014 Reviewed-on: https://gerrit.openafs.org/12574 Tested-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Joe Gorse Tested-by: Joe Gorse Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie commit ad001550949b612ff6b4899fa8da50ee58f87533 Author: Mark Vitale Date: Thu Mar 23 15:10:03 2017 -0700 Linux v4.11: signal stuff moved to sched/signal.h In Linux commit c3edc4010e9d102eb7b8f17d15c2ebc425fed63c, signal_struct and other signal handling declarations were moved from sched.h to sched/signal.h. This breaks existing OpenAFS autoconf tests for recalc_sigpending() and task_struct.signal->rlim, so that the OpenAFS kernel module can no longer build. Modify OpenAFS autoconfig tests to cope. Change-Id: Ic9f174b92704eabcbd374feffe5fbeb92c8987ce Reviewed-on: https://gerrit.openafs.org/12573 Tested-by: BuildBot Reviewed-by: Joe Gorse Tested-by: Joe Gorse Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie commit de5ee1a67d1c3284d65dc69bbbf89664af70b357 Author: Joe Gorse Date: Mon Mar 20 14:30:46 2017 +0000 Linux v4.11: getattr takes struct path With Linux commit a528d35e8bfcc521d7cb70aaf03e1bd296c8493f statx: Add a system call to make enhanced file info available The Linux getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr*() function. - int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); + int (*getattr) (const struct path *, struct kstat *, + u32 request_mask, unsigned int sync_mode); The first argument, request_mask, indicates which fields of the statx structure are of interest to the userland call. The second argument, flags, currently may take the values defined in include/uapi/linux/fcntl.h and are optionally used for cache coherence: (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. This patch provides a new autoconf test and conditional compilation to cope with the changes in our getattr implementation. Change-Id: Ie4206140ae249c00a8906331c57da359c4a372c4 Reviewed-on: https://gerrit.openafs.org/12572 Reviewed-by: Joe Gorse Tested-by: Joe Gorse Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit c666bfee8848183ccbc566c9e0fa019088e56505 Author: Jonathon Weiss Date: Thu Nov 10 17:06:18 2016 -0500 Prevent double-starting client on RHEL7 On RHEL7 if the AFS client is stopped with 'service openafs-client stop', but that fails for some reason (most commonly because some process has a file or directory in AFS open) systemd will decide that the openafs-client is in a failed state when it is actually running. If one then runs 'service openafs-client start' systemd will start a new AFS client. At this point AFS access will continue to work until the functional AFS client is (successfully) stopped, at which point a reboot is required to recover. Have systemd check the status of 'fs sysname' before starting the AFS client, so we avoid getting into a state that requires a reboot. Change-Id: I28a5cca746823d69183ea5ce65c10e1725009c5c Reviewed-on: https://gerrit.openafs.org/12443 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit d2721be299c124d76b611ab2980c51be148fa1a7 Author: Benjamin Kaduk Date: Mon Feb 20 22:18:09 2017 -0600 XBSD: do not claim AFS_VM_RDWR_ENV The AFS_VM_RDWR_ENV configuration parameter (defined or not defined in each platform's param.h) is undocumented, but appears to be an indication of a property of the platform OS's VFS layer, or perhaps just of the complexity of the read/write vnops that we implement for it. That is, AFS_VM_RDWR_ENV is defined when the read/write vnops implement partial write logic (and presumably when they interact with the OS VM layer in ways not expressed by the afs_write() abstraction); systems that do not define AFS_VM_RDWR_ENV can use the afs_write() function fairly directly as the vnode operation. Use of AFS_VM_RDWR_ENV evolved over time, with the original (AFS 3.2/3.3-era) code using a simple scheme that handled partial writes directly in afs_write() and avoided complexity in callers. In AFS 3.4, sunos and solaris gained a more complicated read/write vnop that incorporated the afs_DoPartialWrite() call itself, and eventually in 3.6 we see the behavior established at the original IBM import, with all the (Unix) OSes supported at that time defining AFS_VM_RDWR_ENV. When DARWIN support was brought in in commit a41175cfbbf4d06ccfe14ae54bef8b7464ecd80b, its param.h properly did not define AFS_VM_RDWR_ENV, as OS X uses a VFS interface that shares some level of abstraction with the traditional BSD VFS and its read/write/getpages/putpages operations, so the afs_write() behavior was natural and no extra complications needed for integration with the VM layer or other optimizations. However, when the initial FreeBSD support came in a few months later, it seems to have taken inspiration from the OSes that were supported in the initial IBM import, and kept the AFS_VM_RDWR_ENV definition. This was then propagated to all the later BSDs as they were added. Perhaps the most noticeable consequence of this definition is that the calls to afs_DoPartialWrite() from afs_write() are bypassed, with a comment that "[i]f write is implemented via VM, afs_DoPartialWrite() is called from the high-level write op" (and calls to afs_FakeOpen() and afs_FakeClose() are similarly skipped). This means that attempting to write a file larger than the local cache will hang waiting for more space to be freed, which will never happen as the vcache remains locked and will not be written out in the background. In the absence of a documented meaning for AFS_VM_RDWR_ENV, this also gives us a proxy that can be used to indicate whether a given OS's support intended to claim the AFS_VM_RDWR_ENV -- such platforms will actually contain the call to afs_DoPartialWrite() in the appropriate vnode operation. This can be used to sanity-check the places where AFS_VM_RDWR_ENV is removed by this commit. Interestingly, HP-UX does not call afs_DoPartialWrite() but also is clearly in a VFS that uses a rdwr()-based approach, as the corresponding vnode operation is implemented by mp_afs_rdwr(), so leave it unchanged for now. Tim Creech is responsible for noting the lack of calls to afs_DoPartialWrite() on FreeBSD, and Chaskiel Grundman for the historical research into pre-OpenAFS AFS behavior. Designing and implementing more complicated BSD read/write vnops that incorporate afs_DoPartialWrite() and other improvements is left for future work. Change-Id: I8e89855ac31303934f97d0753b64899fb7e3867c Reviewed-on: https://gerrit.openafs.org/12520 Tested-by: BuildBot Reviewed-by: Antoine Verheijen Reviewed-by: Tim Creech Reviewed-by: Benjamin Kaduk commit 2421da2bf327525216ec7e79b9aa81fa2c4f77d5 Author: Marcio Barbosa Date: Tue Jan 31 11:43:18 2017 -0300 vol: detach offline volumes on dafs Taking a volume offline always clears the inService bit. Taking a volume out of service also takes it offline. Therefore, if the inService flag is false, the volume in question should be offline. On dafs, an offline volume should be unattached. The attach2() function does not change the state of the volume received as an argument to unattached when the inService flag is false. Instead, this function changes the state of the volume in question to pre-attached and returns VNOVOL to the client. As result, subsequent accesses to this volume will make the server try and fail to attach this offline volume over and over again, writing to the FileLog each time. To fix this problem, detach the volume received as an argument if the inService flag is false. Since the new state of this volume will be unattached, subsequent accesses will not hit attach2(). This situation where a volume is not offline but is also not in service can occur if a volume is taken offline with vos offline and some time later the DAFS fileserver is shutdown and restarted; the volume is placed into the preattach state by default when the server restarts. Each access to the volume by clients then causes the fileserver to attempt to attach the volume, which fails, since the in-service flag in the volume header is false from the previous vos offline. The fileserver will log a warning to the FileLog on each attempt to attach the volume, and this will fill the FileLog with duplicate messages corresponding to the number of attempted accesses. Change-Id: Ifce07c83c1e8dbf250b88b847d331234bdaa9df5 Reviewed-on: https://gerrit.openafs.org/12515 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 22d841a45fff7026318b529a41dd957ce8bb0ddf Author: Mark Vitale Date: Tue Feb 28 18:02:39 2017 -0500 SOLARIS: prevent BAD TRAP panic with Studio 12.5 Starting with Solaris Studio 12.3, it is documented that Solaris kernel modules (such as libafs) must not use any floating point, vector, or SIMD/SSE instructions on x86 hardware. However, each new Studio compiler release (12.4 and especially 12.5) is more likely to use these types of instructions by default. If the libafs kernel module includes any forbidden kernel instructions, Solaris will panic the system with: BAD TRAP: type=7 (#nm Device not available) Provide a new autoconfig test to specify the required compiler options (-xvector=%none -xregs=no%float) when building the OpenAFS kernel module for Solaris, so that no invalid x86 instructions are used. In addition, reinstate default kernel module optimization for Solaris. It had been disabled in commit 80592c53cbb0bce782eb39a5e64860786654be9f to address this same issue in Studio 12.3 and 12.4. However, Studio 12.5 started using some SSE instructions even with no optimization. This commit has been tested with OpenAFS master and Studio 12.5 at all optimization levels (none, -xO1 through -xO5) and verified to contain no XMM register instructions via the following command: $ gobjdump -dlr libafs64.o | grep xmm | wc -l Change-Id: Ic3c7860f7d524162fd9178a1dab5dd223722ee43 Reviewed-on: https://gerrit.openafs.org/12558 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 38a3f51fb8b3910ecdd7cacb06f35ec681990aea Author: Mark Vitale Date: Mon Feb 20 20:16:47 2017 -0500 DAFS: do not save or restore host state if CPS in progress If a fileserver is shutdown while one or more PR_GetHostCPS calls are in progress, this state is saved in the fsstate.dat file as hostFlags HCPS_WAITING, HCPS_INPROGRESS. Other hosts that are merely waiting will have HCPS_WAITING recorded. However, it makes no sense to restore host structs in this state, because the GetCPS calls will no longer be in progress. Once these hosts become active, they will block server threads and quickly cause all server threads to be exhausted as other CPS requests are blocked behind them. Instead, exclude these states from both save and restore. Change-Id: I3fad67b70c96dc967d6f8e3a7b393cfda076c91d Reviewed-on: https://gerrit.openafs.org/12561 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit bd15a5f56fde98983464acf5fd4cdd731d206d9f Author: Stephan Wiesand Date: Thu Mar 2 12:52:10 2017 +0100 doc: clarify the fs wscell manpage What's displayed by fs wscell is not necessarily the current content of ThisCell, but that at the time of starting the client. Say so. FIXES 133339 Change-Id: Id3351f1236e5061340eb07041d4ce3e4de69a1a1 Reviewed-on: https://gerrit.openafs.org/12537 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d39e7c7af77b4e1b043611e1a6e78267f5f956ef Author: Marcio Barbosa Date: Thu Mar 2 18:01:48 2017 -0300 osx: build afscell only for active architecture The InstallerPlugins framework provided by the MacOSX10.12.sdk does not define symbols for architecture i386. As a result, the OpenAFS code cannot be built on OS X 10.12. To fix this problem, build the afscell xcode project only for active architecture. Change-Id: I2a2bd5694826b668fceb7402567fba1d0f128479 Reviewed-on: https://gerrit.openafs.org/12531 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 2a13973985bc7e190364d208c590ec42dbccf81b Author: Michael Meffie Date: Thu Jun 11 13:14:27 2015 -0400 libafs: vldb cache timeout option (-volume-ttl) The unix cache manager caches VLDB information for read-only volumes as long as a volume callback is held for a read-only volume. The volume callback may be held as long as files in the read-only volume are being accessed. The cache manager caches VLDB information for read/write volumes as long as volume level errors (such as VMOVED) are not returned by a fileserver while accessing files within the volume. Add a new option to set the maximum amount of time VLDB information will be cached, even if a callback is still held for a read-only volume, or no volume errors have been encounted while accessing files in read/write volumes. This avoids situations where the vldb information is cached indefinitely for read-only and read/write volumes. Instead, the VL servers will be periodically probed for volume information. Change-Id: I5f2a57cdaf5cbe7b1bc0440ed6408226cc988fed Reviewed-on: https://gerrit.openafs.org/11898 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 3893ed397283b0c3605def102004a645a325d476 Author: Michael Meffie Date: Mon Feb 27 01:40:51 2017 -0500 SOLARIS: update convert from ancient _depends_on Commit 37db7985fde9e6a5e71ae628d0b7124a27bf31c3 modernized how we declare module dependencies on Solaris 10 and newer. Instead of explicitly specifying recent Solaris version numbers, specify old versions for the old method. This should be more future proof. Thanks to Ben Kaduk for the suggestion. Change-Id: I7b3c90803825e2c0736548b56deb354183e81b15 Reviewed-on: https://gerrit.openafs.org/12529 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 69aadea298825f1f224406064b83d1a947abf96b Author: Michael Meffie Date: Sat Feb 25 20:33:00 2017 -0500 build: update search paths for solaris cc Move the macros to search for the solaris cc to a separate macro and update the search paths to keep up with released versions. Change-Id: Iaba816f1acf5f45d4e147ae517e73949eb8fe949 Reviewed-on: https://gerrit.openafs.org/12528 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6ea6c182c7fb6c22dafbbf203abcc23726e06cba Author: Sergio Gelato Date: Wed Feb 22 13:55:33 2017 -0800 LINUX: Debian/Ubuntu build regression on kernel 3.16.39 Now that kernel 4.9 has hit jessie-backports, it becomes desirable to also backport the associated openafs patches. Unfortunately, Linux-4.9-inode_change_ok-becomes-setattr_prepare.patch causes a build failure against jessie's current default kernel, 3.16.39-1, due to the fact that setattr_prepare() is available (it was cherrypicked to address CVE-2015-1350) but file_dentry() is not (it was introduced in kernel 4.6). This makes it difficult to have a version of openafs for jessie that supports both kernels. To deal with this, follow the implementation of file_dentry() in 4.6, and simplify it to account for the lack of d_real() support in older kernels. Note that inode_change_ok() has been added back to 3.16.39-1 to avoid ABI changes. That means the current openafs packages in jessie continue to work with kernel 3.16.39-1 since they do not include Linux-4.9-inode_change_ok-becomes-setattr_prepare.patch. Originally reported at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855366 FIXES RT134158 Change-Id: I157aa2ff25945c1c6e3b8e4a600557125711a681 Reviewed-on: https://gerrit.openafs.org/12523 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 789319bf0f2b26ad67995f8cbe88cee87a1bbdc0 Author: Mark Vitale Date: Wed Dec 7 11:11:45 2016 -0500 Linux 4.10: have_submounts is gone Linux commit f74e7b33c37e vfs: remove unused have_submounts() function (v4.10-rc2) removes have_submounts from the tree after providing a replacement (path_has_submounts) for its last in-tree caller, autofs. However, it turns out that OpenAFS is better off not using the new path_has_submounts. Instead, OpenAFS could/should have stopped using have_submounts() much earlier, back in Linux v3.18 when d_invalidate became void. At that time, most in-tree callers of have_submounts had already been converted to use check_submounts_and_drop back in v3.12. At v3.18, a series of commits modified check_submounts_and_drop to automatically remove child submounts (instead of returning -EBUSY if a submount was detected), then subsumed it into d_invalidate. The end result was that VFS now implicitly handles much of the housekeeping previously called explicitly by the various filesystem d_revalidate routines: - shrink_dcache_parent - check_submounts_and_drop - d_drop - d_invalidate All in-tree filesystem d_revalidate routines were updated to take advantage of this new VFS support. Modify afs_linux_dentry_revalidate to no longer perform any special handling for invalid dentries when D_INVALIDATE_IS_VOID. Instead, allow our VFS caller to properly clean up any invalid dentry when we return 0. Change-Id: I0c4d777e6d445857c395a7b5f9a43c9024b098e9 Reviewed-on: https://gerrit.openafs.org/12506 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 961cee00b8f5c302de5f66beb81caa33242c7971 Author: Joe Gorse Date: Thu Feb 16 18:01:50 2017 -0500 LINUX: Bring debug symbols back to the Linux kernel module. Starting with 4.8 Linux kernels our existing build script generator, make_kbuild_makefile.pl, does not pass the debugging symbols CFLAGS that were present when building for previous kernels. This fix appends the $(KERN_DBG) variable which will only be defined when the configuration includes the --enable-debug-kernel option. Change-Id: I9a85dc0311a3a706239bc9e471b2d7197ebe1946 Reviewed-on: https://gerrit.openafs.org/12519 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 9bc6fd9312a2be591cc831d9b0afd91e53eec6fc Author: Michael Meffie Date: Fri Feb 10 10:39:09 2017 -0500 build: add --without-swig to override swig check Add the --without-swig option to disable the automatic swig detection and disable the optional features which depend on swig. This allows builders to avoid swig even if present on the build system. Also, add the --with-swig option to force the check and fail if not detected. This allows builders to declare the swig features are mandatory. The default continues to be to check for swig, and if present, build the optional features which require swig. To disable the automatic check for swig and disable the features which depend on swig: ./configure --without-swig # or --with-swig=no To force the check and fail if swig is not present on the system: ./configure --with-swig # or --with-swig=yes If --with-swig is given and swig is not detected, then configure will fail with the message: configure: error: swig requested but not found The Perl 5 bindings for libuafs is the only feature which requires swig at this time. Change-Id: I0726658a9cc7b1b2a9d5e5d306adb6e36ad3c0f6 Reviewed-on: https://gerrit.openafs.org/12518 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit dd97cb7a7447313dbc1da65104786fe03ede7c8d Author: Andrew Deason Date: Fri Feb 10 01:29:28 2017 -0600 PERLUAFS: Modernize lang-specific swig typemaps Currently, our swig bindings for PERLUAFS define a couple of typemaps like so: %typemap(in, numinputs=1, perl5) (char *READBUF, int LENGTH) { [...] } Embedding the target language name in the typemap arguments is a very old way of specifying what language the typemap is for; they were removed after swig 1.1. With swig 3.0.x releases (and possibly others), the specific combination of this deprecated syntax and some other features we're using causes a segfault. That's clearly a bug in swig, but we shouldn't be using the deprecated syntax anyway. Update this to instead use preprocessor symbols to specify language-specific typemaps (#ifdef SWIGPERL). We only actually define these for perl right now, so make sure to throw an error if we're not running for perl. FIXES 134103 Change-Id: I14264a2dfada53d99413808ed5d60b79b1ee44f3 Reviewed-on: https://gerrit.openafs.org/12517 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 5dc53812df9e5a42fa822c9b890c1b8a442bed64 Author: Anders Kaseorg Date: Tue Dec 6 10:48:31 2016 -0500 AFS_component_version_number.c: Respect SOURCE_DATE_EPOCH if set To improve build reproducibility, if the SOURCE_DATE_EPOCH environment variable is set, use it to deterministically replace the embedded build date, and do not include the username or hostname in this case. https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal Change-Id: I9ba951f1836385ffd14aad95f071bf8c672a01bb Reviewed-on: https://gerrit.openafs.org/12471 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 872a63bbfb04addbdc17dc5c09ec018bb9ddf515 Author: Michael Meffie Date: Mon Jan 9 23:55:32 2017 -0500 redhat: move the klog.krb5 man page to openafs-krb5 Move the klog.krb5 man page to the openafs-krb5 package, which distributes the klog.krb5 binary, instead of the general openafs package. Change-Id: I6dc3896f330bb0c639cc75155f611ddaf11b9b75 Reviewed-on: https://gerrit.openafs.org/12509 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b146c2d54ff3bd99f2c4674eb88d5af417a194d7 Author: Michael Meffie Date: Thu Jan 12 12:27:36 2017 -0500 SOLARIS: fix for AFS_PAG_ONEGROUP_ENV for Solaris 11 Fix a bug introduced in commit aab1e71628e6a4ce68c5e59e2f815867438280d1 in which a pointer was incorrectly checked for a NULL value. Fixes a crash when a PAG is set on Solaris. # mdb unix.1 vmcore.1 > ::status ... panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffffc802ba8f0 addr=0 occurred in module "afs" due to a NULL pointer dereference > ::stack pag_to_gidset+0x145() setpag+0xcc() AddPag+0x3a() afs_setpag+0x58() Afs_syscall+0x115() The crash occurs since gidslot is NULL during the assignment: *gidslot = pagvalue; Change-Id: Ic4d50c6b046d10faa49cd4363692e0302707583d Reviewed-on: https://gerrit.openafs.org/12508 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a92a3a0675d941536103b60d708a6b3305b9b8fa Author: Marcio Barbosa Date: Wed Jan 11 06:05:04 2017 -0800 osx: let prefpane knows where binaries can be found Starting from OS X 10.11, the OpenAFS binaries were moved to the following directories: /opt/openafs/bin and /opt/openafs/sbin. However, the OpenAFS prefpane is not aware of the change mentioned above. As a result, some functionalities provided by the OpenAFS prefpane are not working properly. To fix this problem, add the new paths to the proper environment variable. Change-Id: Idaa2f0329af2092cf9ad1d63f1a01300b150227a Reviewed-on: https://gerrit.openafs.org/12507 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 19599b5ef5f7dff2741e13974692fe4a84721b59 Author: Mark Vitale Date: Sat Jan 7 06:22:47 2017 -0500 LINUX: eliminate unused variable warning Commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286 added routine osi_TryEvictDentries and included new logic for D_INVALIDATE_IS_VOID. Unfortunately, this new code path no longer uses dentry; it also should have been made conditional at that time. Wrap the declaration of dentry in #ifndef D_INVALIDATE_IS_VOID to eliminate the unused variable warning. Change-Id: I89c1430ba984539ca775da2540ea966030de0701 Reviewed-on: https://gerrit.openafs.org/12505 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2207dcdaad40beed29b0326153dbb76bdf91564d Author: Michael Meffie Date: Tue Jan 3 14:41:36 2017 -0500 cleanup afs_args.h Collect the syscall op code (AFSOP_) defines in one section and cleanup the use of whitespace and tabs. This should be a non-functional change. Change-Id: I1ba763a445b938fcb3677a388a703e1405ee166e Reviewed-on: https://gerrit.openafs.org/12501 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit aab1e71628e6a4ce68c5e59e2f815867438280d1 Author: Andrew Deason Date: Sat Aug 8 16:49:50 2015 -0500 SOLARIS: Use AFS_PAG_ONEGROUP_ENV for Solaris 11 On Solaris 11 (specifically, Solaris 11.1+), the supplemental group list for a process is supposed to be sorted. Starting with Solaris 11.2, more authorization checks are done that assume the list is sorted (e.g., to do a binary search), so having them out of order can cause incorrect behavior. For example: $ echo foo > /tmp/testfile $ chmod 660 /tmp/testfile $ sudo chown root:daemon /tmp/testfile $ cat /tmp/testfile foo $ id -a uid=100(adeason) gid=10(staff) groups=10(staff),12(daemon),20(games),21(ftp),50(gdm),60(xvm),90(postgres) $ pagsh $ cat /tmp/testfile cat: cannot open /tmp/testfile: Permission denied $ id -a uid=100(adeason) gid=10(staff) groups=33536,32514,10(staff),12(daemon),20(games),21(ftp),50(gdm),60(xvm),90(postgres) Solaris sorts the groups given to crsetgroups() on versions which required the group ids to be sorted, but we currently manually put our PAG groups in our own order in afs_setgroups(). This is currently required, since various places in the code assume that PAG groups are the first two groups in a process's group list. To get around this, do not require the PAG gids to be the first two gids anymore. To more easily identify PAG gids in group processes, use a single gid instead of two gids to identify a PAG, like modern Linux currently uses (under the AFS_PAG_ONEGROUP_ENV). High-numbered groups have been possible for quite a long time on Solaris, allegedly further back than Solaris 8. Only do this for Solaris 11, though, to reduce the platforms we affect. [mmeffie@sinenomine.net: Define AFS_PAG_ONEGROUP_ENV in param.h.] Change-Id: I44023ee8aa42f3f69bb0c8a8e9178abd513951a1 Reviewed-on: https://gerrit.openafs.org/11979 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 97fec642e591762391e6d453874ff9b5c9ba0c1e Author: Benjamin Kaduk Date: Mon Dec 26 12:15:35 2016 -0600 afsd_kernel: remove gratuitous OS dependence Commit 94c15f62 in 2010 gave NetBSD and only NetBSD the debug printing of errno and the strerror() output, with no justification in the commit message. In the interest of unifying behavior and avoiding unnecessary OS dependence, give all platforms the errno and strerror() behavior. [mmeffie@sinenomine.net: print errno iff syscall returns -1.] Change-Id: If3c4e0ded54bbd4d5c2573f7d7ee1c82ee3e7223 Reviewed-on: https://gerrit.openafs.org/12500 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 481047d2a2660609091dc04253d136f527469ceb Author: Michael Meffie Date: Mon Sep 12 22:21:59 2016 -0400 afsd: print syscalls on separate lines with afsd -debug afsd prints information to standard out for testing and debugging when the -debug option is given. However, syscall tracing is emitted without trailing newlines on all platforms except netbsd, creating an unreadable wall of text. # afsd -debug ... afsd: Forking 4 background daemons. SScall(183, 28, 0)=0 183, 28, 6583200)=0 SScall(183, 28, 6583 200)=0 SScall(183, 28, 6583200)=0 SScall(183, 28, 6583200)=0 S Scall(183, 28, 6583200)=0 SScall(183, 28, 6583200)=0 SScall(18 ... Make the syscall call tracing usable by printing each one on a separate line. # afsd -deubg ... afsd: Forking 4 background daemons. SScall(183, 28, 0)=0 183, 28, 6583200)=0 SScall(183, 28, 6583200)=0 SScall(183, 28, 6583200)=0 ... Change-Id: Ic9208243c1e05352744fb6f575384e00d0e3e59c Reviewed-on: https://gerrit.openafs.org/12385 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9c0db059b6585959e151f7acce845de280952c55 Author: Michael Meffie Date: Mon Sep 26 11:19:13 2016 -0400 vol: convert vnode macros to inline functions Convert the vnode macros to inline functions to fix integer overflows for very large vnode numbers (and generally improve the code robustness and readability). The macro version of vnodeIndexOffset() will evaluate to an incorrect offset for very large vnode numbers due to 32-bit integer overflow. The vnode index file will then be corrupted when writing to the incorrect offset. In code paths where the vnode number incorrectly defined as a signed 32-bit integer, this change prevents vnodeIndexOffset() from evaluating to a negative result when a vnode number is larger than 2^31. Thanks to Mark Vitale for reporting and providing analysis. Change-Id: Ia6e0f2d2f97fa1091e0b5a4029d40098692ee681 Reviewed-on: https://gerrit.openafs.org/12397 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0ae62bfa99df8ef5d85b4848783f59a041f82828 Author: Michael Meffie Date: Fri Jun 3 15:33:19 2016 -0400 doc: add the PtLog man page Clone the VLLog man page to create a man page for ptserver log as well. Fix the spelling of the PtLog file and add a link to the new PtLog man page in the ptserver man page. Add the missing PtLog log file name to the bos getlog man page. Change-Id: I95ad4a2cf380077780160ec78fd1f9bdec132ba7 Reviewed-on: https://gerrit.openafs.org/12294 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9ec765d8b4a327ae36c26e38a84dae215d3a2664 Author: Anders Kaseorg Date: Fri Dec 16 02:43:48 2016 -0500 opr: Make opr_uuid_hash endian-independent And also make sure it doesn’t use unaligned accesses. Fixes a ‘make check’ failure on big-endian architectures. Change-Id: I490174f8d1eecb5f20969b4ef12ff16d0dd3806a Reviewed-on: https://gerrit.openafs.org/12495 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie Tested-by: Michael Meffie commit 5151c03351e8a4d2bd1e212720d7ec9144bf23f0 Author: Anders Kaseorg Date: Fri Dec 16 03:04:18 2016 -0500 opr: Make opr_jhash_opaque consistent with opr_jhash Change-Id: I42e1030f8c841dcb974476012a774b91c87d3fb0 Reviewed-on: https://gerrit.openafs.org/12494 Tested-by: BuildBot Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 958120b89d62c8567ab00bc697c4fabdfecd46b4 Author: Anders Kaseorg Date: Fri Dec 16 02:16:20 2016 -0500 opr: Make opr_jhash_opaque endian-independent gcc -O2 produces exactly the same code for this on little-endian systems, but now big-endian systems have a chance of passing ‘make check’. Change-Id: Ifc6350648355a0a9f79184439e3f9522cd6f2ffa Reviewed-on: https://gerrit.openafs.org/12493 Reviewed-by: Michael Meffie Tested-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit eb7d3ac4bbd30fc31741cea74fe2b23577deb61e Author: Anders Kaseorg Date: Wed Dec 14 17:52:35 2016 -0500 opr: ExitHandler: re-raise the signal instead of exiting with that code This fixes a ‘make check’ failure introduced by commit 803d15b6aa1e65b259ba11ca30aa1afd2e12accb “vlserver: convert the vlserver to opr softsig”: $ make check … volser/vos..............FAILED 6 … $ cd tests $ ./libwrap ../lib ./runtests -o volser/vos 1..6 ok 1 - Successfully got security class ok 2 - Successfully built ubik client structure ok 3 - First address registration succeeds ok 4 - Second address registration succeeds ok 5 - vos output matches Server exited with code 15 # wanted: 0 # seen: -1 not ok 6 - Server exited cleanly # Looks like you failed 1 test of 6 afstest_StopServer has a check for the process terminating with signal 15 (SIGTERM), but not for the process exiting with code 15. Change-Id: I022965ea2b5440486ea1cf562551d3bbd0516104 Reviewed-on: https://gerrit.openafs.org/12489 Tested-by: Anders Kaseorg Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit eee532ac13a680bfb4cc857485cbaf5e454ab492 Author: Anders Kaseorg Date: Fri Dec 16 00:29:21 2016 -0500 doc/man-pages/Makefile.in: mkdir man[158] in case we did regen.sh -q Fixes this error: $ git clean -xdf $ ./regen.sh -q $ ./configure $ make […] make[3]: Entering directory '/…/openafs/doc/man-pages' rm -f man*/*.noinstall if [ "no" = "no" ] ; then \ for M in man1/klog.1 man1/knfs.1 […] man8/kpwvalid.8 man1/klog.krb.1; do \ touch $M.noinstall; \ done; \ fi touch: cannot touch 'man1/klog.1.noinstall': No such file or directory touch: cannot touch 'man1/knfs.1.noinstall': No such file or directory […] touch: cannot touch 'man8/kpwvalid.8.noinstall': No such file or directory touch: cannot touch 'man1/klog.krb.1.noinstall': No such file or directory Makefile:34: recipe for target 'prep-noinstall' failed make[3]: *** [prep-noinstall] Error 1 make[3]: Leaving directory '/…/openafs/doc/man-pages' Change-Id: I95098fb2b27f1d87fc9769497b225e9f91f72266 Reviewed-on: https://gerrit.openafs.org/12492 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 93a7e754a44c333140e75e93cac09f61320f7cc9 Author: Anders Kaseorg Date: Wed Dec 14 15:47:21 2016 -0500 tests/opr/softsig-t: Avoid hanging due to intermediate sh -c If the build directory happened to contain shell metacharacters, like the ~ in /build/openafs-vb8tid/openafs-1.8.0~pre1 used by the Debian builders, Perl was running softsig-helper via an intermediate sh -c, which would then intercept the signals we tried to send to softsig-helper. Use the list syntax to avoid this sh -c. Change-Id: I054b9c8f606e197accb414bfe3f89719255c62c4 Reviewed-on: https://gerrit.openafs.org/12488 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9fd396adabaa1868517fdb3d7cfcbe9412c35b0b Author: Benjamin Kaduk Date: Thu Dec 15 22:12:01 2016 -0600 tests: use exec to call libwrap'd executables No need to leave the shell process hanging around. In particular, if we are manually running softsig-helper under libwrap to debug test failures, the child process of the shell is another shell, which interprets some signals that we wanted to be passed through, like SIGTERM. On the other hand, once the softsig-helper is exec()'d, you basically need another shell to terminate it, which is a different problem.... Change-Id: Iff7c519886a018cb68e692746d40c427b6299457 Reviewed-on: https://gerrit.openafs.org/12490 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Tested-by: Anders Kaseorg Reviewed-by: Benjamin Kaduk commit 8b2c4665aabece187759157bda0e26c4b566dd2f Author: Michael Meffie Date: Tue Aug 16 12:56:47 2016 -0400 tests: fix signo to signame lookup in opr/softsig tests Fix the loop condition when scanning the signal number to name table to convert a signal number to a name. Instead of looping sizeof(size_t) times, loop for the number of elements in the table. This bug was masked on 64 bit-platforms, since the signal number to name table table currently has 8 elements, which is coincidently the same as sizeof(size_t) on 64-bit platforms. The bug becomes apparent on 32-bit systems; only the first 4 elements of the table are checked. Example error output before this fix: $ cd tests $ ./libwrap ../lib ./runtests -o opr/softsig 1..11 ok 1 ok 2 ok 3 ok 4 ok 5 not ok 6 # Failed test in ./opr/softsig-t at line 57. # got: 'Received UNK # ' # expected: 'Received TERM # ' not ok 7 # Failed test in ./opr/softsig-t at line 60. # got: 'Received UNK # ' # expected: 'Received USR1 # ' not ok 8 # Failed test in ./opr/softsig-t at line 63. # got: 'Received UNK # ' # expected: 'Received USR2 # ' ok 9 - Helper exited on KILL signal. ok 10 - Helper exited on SEGV signal. ok 11 # skip Skipping buserror test; SIGBUS constant is not defined. # Looks like you failed 3 tests of 11. Change-Id: I863cc9f3650c4a5e9ac9159d90e063b986a8460a Reviewed-on: https://gerrit.openafs.org/12367 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1d8cb56999a4ab25ae4cbc8e8a688b8100aedd3b Author: Neale Ferguson Date: Thu Dec 8 11:47:09 2016 -0500 s390: desupport 32-bit Linux kernels on s390/s390x Remove the obsolete and custom lwp assembler for the s390 and s390x architectures. That assembler is no longer needed since 32-bit mainframe Linux distributions are no longer supported and are very unlikely to be in use. The generic process.default.s is sufficient for modern 64-bit Linux distributions on s390/s390x. [mmeffie@sinenomine.net: commit message wording] Change-Id: I654b10dfc257e7de90c9a50048982427276f4d61 Reviewed-on: https://gerrit.openafs.org/12475 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit b5e4e8c14130f601bbf43dee5927222ebf7613fa Author: Mark Vitale Date: Tue Jan 12 18:06:51 2016 -0500 afs: fs getcacheparms miscounts dcaches for large files fs getcacheparms issued with the -excessive option tabulates in-memory dcaches ("DCentries") by size. However, any dcache with validPos > 2^31 is miscounted in the 4k-16k bucket. This is caused by a type mismatch between 'validPos' (afs_size_t) and 'size' (int) which leads to a negative value for size by sign-extension. The size comparison "sieve" fails for negative numbers; it skips the first bucket (0-4K) and dumps them in the second one (4k-16k). Move the declaration of 'size' closer to its use, and declare it with the same type as 'validPos' (afs_size_t) so the comparison sieve correctly places these dcaches in the last (>=1M) bucket. Change-Id: Ib0d973da92865043a4f1c068de5e9b81bcde2b9a Reviewed-on: https://gerrit.openafs.org/12347 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c966c0b8414ef0a041b1a8d5261c9eccd4d39d99 Author: Mark Vitale Date: Tue Jan 12 17:50:36 2016 -0500 afs: fs getcacheparms miscounts zero-length dcaches When fs getcacheparms is issued with the -excessive option, it tabulates all in-memory dcaches ("DCentries") by size. dcaches with validPos == 0 were being tabulated in the 4k-16k bucket. Fix the first comparison in the 'sieve' so these dcaches will be counted in the correct 0-4k bucket instead. Introduced by commit 176c2fddb95ced6c13e04e7492fc09b5551f273c Change-Id: I60acb0f115dad9f7951f0b17e5b3e37dc94321b9 Reviewed-on: https://gerrit.openafs.org/12346 Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 7442752ba6ad618bcdf2185f699d90c56838e89e Author: Benjamin Kaduk Date: Mon Dec 5 18:11:22 2016 -0600 Make OpenAFS 1.8.0pre1 Update version strings for the first 1.8.0 prerelease. Change-Id: I4f534c9934f6eb1baac9a784fb7c357b19924fb0 Reviewed-on: https://gerrit.openafs.org/12470 Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit edcafa93b6c4744e0747842a2e115df27e20fd93 Author: Michael Meffie Date: Fri Sep 23 00:22:22 2016 -0500 Update NEWS for 1.8 [kaduk@mit.edu: adjust sorting, rewrap, reword a few entries and remove some entries that will not be applicable] Change-Id: Ifbadc31e3f201e05617a26c12e5e725a5f3c9195 Reviewed-on: https://gerrit.openafs.org/12393 Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 37c47e5da1cfcceb3b14e5a0c4064a6ca5806bd0 Author: Benjamin Kaduk Date: Fri Sep 23 00:14:09 2016 -0500 Import NEWS from openafs-stable-1_6_x The 1.6.x changelog entries have been going directly on the openafs-stable-1_6_x branch for ease of maintenance. However, we don't want to skip those changes when mentioning changes in OpenAFS 1.8, so pull back a copy onto master before adding things for 1.8. Change-Id: I545c19db9854300a84295d3ca8b1f301756c38b0 Reviewed-on: https://gerrit.openafs.org/12392 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 35f2b8cd49477b10cf358d853f5864b8ad24ba03 Author: Benjamin Kaduk Date: Tue Dec 6 17:07:40 2016 -0500 Update libafsdep files for in-kernel fortuna Commit 0d67b00ff9db added heimdal's rand-fortuna PRNG to the kernel module on all architectures, even though it is only needed on the small subset that do not provide a cryptographically strong random number generator to kernel module consumers. This was done to ensure that the build infrastructure for it gets regularly exercised by developers. However, not all build infrastructure was exercised at the time of that submission; in particular, the make_libafs_tree.pl script was not tested. This led to a situation where the libafs tree generated by that script omitted several files that were now referenced by the kernel build due to the fortuna import. To remedy the situation, list the additional files that are needed, so that they will be copied into the build area for this class of kernel module builds. Since the libafs-tree functionality is used to build the Debian kernel-module source packages, this fix is needed in order to have a tree that can be built into debian packages without patching. Change-Id: I81502fb61d7fc718d337c5f73a51b88f6a433d6a Reviewed-on: https://gerrit.openafs.org/12473 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 85c7d31cf2dacdbcd8a053fdc3f66952e7126528 Author: Anders Kaseorg Date: Tue Dec 6 10:53:40 2016 -0500 src/cf/roken.m4: Escape buildtool_roken correctly Fixes these errors from configure: ./configure: line 32154: LDFLAGS_roken: command not found ./configure: line 32154: LIB_roken: command not found Change-Id: I63b9ade5c8f55948ea2a3f7ae023de4ed9f62341 Reviewed-on: https://gerrit.openafs.org/12472 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 4c03e42f91b36a0bf59398b0f649aa0b31b02975 Author: Andrew Deason Date: Wed Oct 26 16:04:51 2016 -0500 rx: Add rxi_FlushWriteLocked Currently, a couple of places in Rx do this: MUTEX_EXIT(&call->lock); rxi_FlushWrite(call); MUTEX_ENTER(&call->lock); This is a little silly, because if rxi_FlushWrite has anything to do, it just acquires/drops call->lock again. This seems like a very minor performance penalty, but in the right situation it can become more noticeable. Specifically, when an Rx call on the server ends successfully, rx_EndCall will rxi_FlushWrite (to send out the last Rx packet to the client) before marking the call as finished. If the client receives the last Rx packet and starts a new Rx call on the same channel before the server locks the call again, the client can receive a BUSY packet (because it looks like the previous call on the server hasn't finished yet). Nothing breaks, but this means the client waits 3 seconds to retry. This situation can probably happen with various rates of success in almost any situation, but I can see it consistently happen with 'vos move' when running 'vos' on the same machine as the source fileserver. It is most noticeable when moving a large number of small volumes (since you must wait an extra 3+ seconds per volume, where nothing is happening). To avoid this, create a new variant of rxi_FlushWrite, called rxi_FlushWriteLocked. This just assumes the call lock is already held by the caller, and avoids one extra lock/unlock pair. This is not the only place where we unlock/lock the call during the rx_EndCall situation described above, but it seems to be easiest to solve, and it's enough (for me) to avoid the 3-second delay in the 'vos move' scenario. Ideally, Rx should be able to atomically 'end' a call while sending out this last packet, but for now, this commit is easy to do. Note that rxi_FlushWrite previously didn't do much of note before locking the call. It did call rxi_FreePackets without the call lock, but calling that with the call lock is also fine; other callers do that. Change-Id: I8f71e86f6c1f6019abea21c205d2b26b7da0d808 Reviewed-on: https://gerrit.openafs.org/12421 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f413fd927af14a9a87034e47125a78eec63e599e Author: Benjamin Kaduk Date: Tue Jan 13 21:39:57 2015 -0500 pts: add some sanity checks in ptuser.c Double-check that when we're expecting two entries back, we actually got two entries, in addition to the RPC return value. Change-Id: I34631ac542667c337ed3268153eb61c70e3fa87e Reviewed-on: https://gerrit.openafs.org/11668 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 32901c58b29ba4ac666f1dba9915cae2c1f03b52 Author: Andrew Deason Date: Mon Mar 9 18:01:29 2015 -0500 LINUX: Don't compile syscall code with keyrings osi_syscall_init() is not currently called if we have kernel keyrings support, since we don't need to set up or alter any syscalls if we have kernel keyrings (we track PAGs by keyrings, and we use ioctls instead of the AFS syscall now). Since we don't call it, this commit makes us also not compile the relevant syscall-related code. This allows new platforms to be added without needing to deal with any platform-specific code for handling 32-bit compat processes and such, since usually we don't need to deal with intercepting syscalls. To do this, we just define osi_syscall_init and osi_syscall_cleanup as noops if we have keyrings support. This allows us to reduce the #ifdef clutter in the actual callers. Note that the 'afspag' module does currently call osi_syscall_init unconditionally, but this seems like an oversight. With this change, the afspag module will no longer alter syscalls when we have linux keyrings support. Change-Id: I219b92d89303975765743712587ff897b55a2631 Reviewed-on: https://gerrit.openafs.org/11936 Reviewed-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Perry Ruiter Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a6e96a8f10df738eb9b69227d344a72eb830e02e Author: Michael Meffie Date: Wed Nov 30 08:48:06 2016 -0500 vos: fix vos release -verbose output Fix incorrect vos release -verbose messages introduced by commit 9f4684cd5fac5eacf571b882e965150943383170. The commit 9f4684cd5fac5eacf571b882e965150943383170 did not take into account the change from commit 3fc800be9c702c1a40869908831a9895602909cb in which a partial commit is performed when just new sites are added and the read-write volume was not changed since the previous release. Change-Id: If4b3ab81cd810df2e866d6eca0152f475c5448d6 Reviewed-on: https://gerrit.openafs.org/12455 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 5b28061fb593f5f48df549b07f0ccd848348b93c Author: Marcio Barbosa Date: Mon Nov 28 09:42:44 2016 -0500 afs: release the packets used by rx on shutdown When the OpenAFS client is unmounted on DARWIN, the blocks of packets allocated by RX are released. Historically, the memory used by those packets was never properly released. Before 230dcebcd61064cc9aab6d20d34ff866a5c575ea, only the last block of packets used to be released: ... struct rx_packet *rx_mallocedP = 0; ... void rxi_MorePackets(int apackets) { ... getme = apackets * sizeof(struct rx_packet); p = rx_mallocedP = (struct rx_packet *)osi_Alloc(getme); ... } ... void rxi_FreeAllPackets(void) { ... osi_Free(rx_mallocedP, ...); ... } ... As we can see, ‘rx_mallocedP’ is a global pointer that stores the first address of the last allocated block of packets. As a result, when ‘rxi_FreeAllPackets’ is called, only the last block is released. However, 230dcebcd61064cc9aab6d20d34ff866a5c575ea moved the global pointer in question to the end of the last block. As a result, when the OpenAFS client is unmounted on DARWIN, the ‘rxi_FreeAllPackets’ function releases the wrong block of memory. This problem was exposed on OS X 10.12 Sierra where the system crashes when the OpenAFS client is unmounted. To fix this problem, store the address of every single block of packets in a queue and release one by one when the OpenAFS client is unmounted. Change-Id: Ibd6bd1a8bc45bb4802f9381a8e600c20ee85a59e Reviewed-on: https://gerrit.openafs.org/12427 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f591f6fae3d8b8d44140ca64e53bad840aeeeba0 Author: Mark Vitale Date: Mon Nov 7 14:16:50 2016 -0500 dir: do not leak contents of deleted directory entries Deleting an AFS directory entry (afs_dir_Delete) merely removes the entry logically by updating the allocation map and hash table. However, the entry itself remains on disk - that is, both the cache manager's cache partition and the fileserver's vice partitions. This constitutes a leak of directory entry information, including the object's name and MKfid (vnode and uniqueid). This leaked information is also visible on the wire during FetchData requests and volume operations. Modify afs_dir_Delete to clear the contents of deleted directory entries. Patchset notes: This commit only prevents leaks for newly deleted entries. Another commit in this patchset prevents leaks of partial object names upon reuse of pre-existing deleted entries. A third commit in this patchset prevents yet another kind of directory entry leak, when internal buffers are reused to create or enlarge existing directories. All three patches are required to prevent new leaks. Two additional salvager patches are also included to assist administrators in the cleanup of pre-existing leaks. [kaduk@mit.edu: style nit for sizeof() argument] Change-Id: Iabaafeed09a2eb648107b7068eb3dbf767aa2fe9 Reviewed-on: https://gerrit.openafs.org/12460 Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit a26c5054ee501ec65db3104f6a6a0fef634d9ea7 Author: Benjamin Kaduk Date: Sun Nov 6 23:29:22 2016 -0600 afs: do not leak stale data in buffers Similar to the previous commit, zero out the buffer when fetching a new slot, to avoid the possibility of leaving stale data in a reused buffer. We are not supposed to write such stale data back to a fileserver, but this is an extra precaution in case of bugs elsewhere -- memset is not as expensive as it was in the 1980s. Change-Id: I344e772e9ec3d909e8b578933dd9c6c66f0a8cf6 Reviewed-on: https://gerrit.openafs.org/12459 Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 70065cb1831dbcfd698c8fee216e33511a314904 Author: Mark Vitale Date: Fri May 13 00:01:31 2016 -0400 dir: fileserver leaks names of file and directories Summary: Due to incomplete initialization or clearing of reused memory, fileserver directory objects are likely to contain "dead" directory entry information. These extraneous entries are not active - that is, they are logically invisible to the fileserver and client. However, they are physically visible on the fileserver vice partition, on the wire in FetchData replies, and on the client cache partition. This constitutes a leak of directory information. Characterization: There are three different kinds of "dead" residual directory entry leaks, each with a different cause: 1. There may be partial name data after the null terminator in a live directory entry. This happens when a previously used directory entry becomes free, then is reused for a directory entry with a shorter name. This may be addressed in a future commit. 2. "Dead" directory entries are left uncleared after an object is deleted or renamed. This may be addressed in a future commit. 3. Residual directory entries may be inadvertently picked up when a new directory is created or an existing directory is extended by a 2kiBi page. This is the most severe problem and is addressed by this commit. This third kind of leak is the most severe because the leaked directory information may be from _any_ other directory residing on the fileserver, even if the current user is not authorized to see that directory. Root cause: The fileserver's directory/buffer package shares a pool of directory page buffers among all fileserver threads for both directory reads and directory writes. When the fileserver creates a new directory or extends an existing one, it uses any available unlocked buffer in the pool. This buffer is likely to contain another directory page recently read or written by the fileserver. Unfortunately the fileserver only initializes the page header fields (and the first two "dot" and "dotdot" entries in the case of a new directory). Any residual entries in the rest of the directory page are now logically "dead", but still physically present in the directory. They can easily be seen on the vice partition, on the wire in a FetchData reply, and on the cache partition. Note: The directory/buffer package used by the fileserver is also used by the salvager and the volserver. Therefore, salvager activity may also leak directory information to a certain extent. The volserver vos split command may also contribute to leaks. Any volserver operation that creates volumes (create, move, copy, restore, release) may also have insignificant leaks. These less significant leaks are addressed by this commit as well. Exploits: Any AFS user authorized to read directories may passively exploit this leak by capturing wire traffic or examining his local cache as he/she performs authorized reads on existing directories. Any leaked data will be for other directories the fileserver had in the buffer pool at the time the authorized directories were created or extended. Any AFS user authorized to write a new directory may actively exploit this leak by creating a new directory, flushing cache, then re-reading the newly created directory. Any leaked data will be for other directories the fileserver had in the buffer pool within the last few seconds. In this way an authorized user may sample current fileserver directory buffer contents for as long as he/she desires, without being detected. Directories already containing leaked data may themselves be leaked, leading to multiple layers of leaked data propagating with every new or extended directory. The names of files and directories are the most obvious source of information in this leak, but the FID vnode and uniqueid are leaked as well. Careful examination of the sequences of leaked vnode numbers and uniqueids may allow an attacker to: - Discern each layer of old directories by observing breaks in consecutive runs of vnode and/or uniqueid numbers. - Infer which objects may reside on the same volume. - Discover the order in which objects were created (vnode) or modified (uniqueid). - Know whether an object is a file (even vnode) or a directory (odd vnode). Prevent new leaks by always clearing a pool buffer before using it to create or extend a directory. Existing leaks on the fileserver vice partitions may be addressed in a future commit. Change-Id: Ia980ada6a2b1b2fd473ffc71e9fd38255393b352 Reviewed-on: https://gerrit.openafs.org/12458 Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 1637c4d7c1ce407390f65509a3a1c764a0c06aa6 Author: Benjamin Kaduk Date: Sun Nov 6 15:06:02 2016 -0600 bos: re-add -salvagedirs for use with -all The MR-AFS support code had a -salvagedirs option that was passed through to the salvager (when running, and when -all was used), that was removed in commit a9301cd2dc1a875337f04751e38bba6f1da7ed32 along with the rest of the MR-AFS commands and options. However, it is useful in its own right, so add it back and allow the use of -salvagedirs -all to rebuild every directory on the server. Change-Id: Ifc9c0e4046bf049fe04106aec5cad57d335475e3 Reviewed-on: https://gerrit.openafs.org/12457 Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 9e66234951cca3ca77e94ab431f739e85017a23a Author: Michael Meffie Date: Sun Nov 6 14:31:22 2016 -0600 dafs: honor salvageserver -salvagedirs Do not ignore the -salvagedirs option when given to the salvageserver. When the salvageserver is running with this option, all directories will be rebuilt by salvages spawned by the dafs salvageserver, including all demand attach salvages and salvages of individual volumes initiated by bos salvage. This does not affect the whole partition salvages initiated by bos salvage -all. Change-Id: I4dd515ffa8f962c61e922217bee20bbd88bcd534 Reviewed-on: https://gerrit.openafs.org/12456 Reviewed-by: Mark Vitale Tested-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3704fc6f2e6716d95446cd10aa2ec798be13472c Author: Anders Kaseorg Date: Fri Nov 4 20:17:32 2016 -0400 Remove NULL checks for AFS_NONNULL parameters Recent GCC warns about opr_Assert(p != NULL), where p is an __attribute__((__nonnull__)) parameter, just like clang did before those clang warnings were silenced by 11852, 11853. Now, we could go and add more autoconf tests and pragmas to silence the GCC versions of these warnings. However, I maintain that silencing the warnings is the wrong approach. The asserts in question have no purpose. They do not add any safety, because GCC and clang are optimizing them away at compile time (without proof!—they take the declaration at its word that NULL will never be passed). Just remove them. Fixes these warnings (errors with --enable-checking) from GCC 6.2: In file included from casestrcpy.c:17:0: casestrcpy.c: In function ‘opr_lcstring’: casestrcpy.c:26:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare] opr_Assert(s != NULL && d != NULL); ^ /…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’ do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^~ casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(s != NULL && d != NULL); ^~~~~~~~~~ casestrcpy.c:26:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare] opr_Assert(s != NULL && d != NULL); ^ /…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’ do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^~ casestrcpy.c:26:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(s != NULL && d != NULL); ^~~~~~~~~~ casestrcpy.c: In function ‘opr_ucstring’: casestrcpy.c:46:31: error: nonnull argument ‘d’ compared to NULL [-Werror=nonnull-compare] opr_Assert(s != NULL && d != NULL); ^ /…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’ do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^~ casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(s != NULL && d != NULL); ^~~~~~~~~~ casestrcpy.c:46:18: error: nonnull argument ‘s’ compared to NULL [-Werror=nonnull-compare] opr_Assert(s != NULL && d != NULL); ^ /…/openafs/include/afs/opr.h:28:15: note: in definition of macro ‘__opr_Assert’ do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^~ casestrcpy.c:46:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(s != NULL && d != NULL); ^~~~~~~~~~ casestrcpy.c: In function ‘opr_strcompose’: /…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘buf’ compared to NULL [-Werror=nonnull-compare] do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^ /…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’ # define opr_Assert(ex) __opr_Assert(ex) ^~~~~~~~~~~~ casestrcpy.c:98:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(buf != NULL); ^~~~~~~~~~ kalocalcell.c: In function ‘ka_CellToRealm’: /…/openafs/include/afs/opr.h:28:12: error: nonnull argument ‘realm’ compared to NULL [-Werror=nonnull-compare] do {if (!(ex)) opr_AssertionFailed(__FILE__, __LINE__);} while(0) ^ /…/openafs/include/afs/opr.h:37:25: note: in expansion of macro ‘__opr_Assert’ # define opr_Assert(ex) __opr_Assert(ex) ^~~~~~~~~~~~ kalocalcell.c:117:5: note: in expansion of macro ‘opr_Assert’ opr_Assert(realm != NULL); ^~~~~~~~~~ Change-Id: I6fd618ed49255d7b3de2f8f3424d9659890829c0 Reviewed-on: https://gerrit.openafs.org/12442 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 822ca15a0e760ad9f2c04cd177ca5634f85ee8d6 Author: Dave Botsch Date: Thu Nov 17 13:22:17 2016 -0500 Mac OS Sierra deprecates syscall() The syscall() function has been deprecated in MacOS 10.12 - Sierra. After discussions with developers, it would appear that syscall() isn't really needed, anymore, so we can just do away with it. Change-Id: I60e4220168b097bbae7a5ebaceb2d32276aad3e5 Reviewed-on: https://gerrit.openafs.org/12452 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 74f837fd943ddfa20d349a83d6286a0183cb4663 Author: Dave Botsch Date: Thu Nov 3 12:22:21 2016 -0400 Define OSATOMIC_USE_INLINED to get usable atomics on DARWIN In Mac OS 10.12, legacy interfaces for atomic operations have been deprecated. Defining OSATOMIC_USE_INLINED gets us inline implementations of the OSAtomic interfaces in terms of the primitives. This is a transition convenience. Also indent preprocessor directives within the main DARWIN block to improve readability. Change-Id: Id10ae007d5427486f1b0a307a04a90f263201150 Reviewed-on: https://gerrit.openafs.org/12433 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f5f057ce8198480fb9c67f2a8c8eee906f8a7c4a Author: Michael Meffie Date: Thu Jul 7 15:51:18 2016 -0400 doc: update information about vlserver logging Mention the vlserver -d option can be used to set the initial logging level. Thanks to Mark Vitale for the suggestion. Change-Id: Ia17a2063432343c2cf78e1b01c5897751625aae8 Reviewed-on: https://gerrit.openafs.org/12324 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 37db7985fde9e6a5e71ae628d0b7124a27bf31c3 Author: Michael Meffie Date: Sat Nov 5 12:42:19 2016 -0400 SOLARIS: convert from ancient _depends_on to ELF dependencies The ancient way of declaring module dependencies with _depends_on has been deprecated since SunOS 2.6 (circa 1996). The presence of the old _depends_on symbol triggers a warning message on the console starting with Solaris 12, and the kernel runtime loader (krtld) feature of using the _depends_on symbol to load dependencies may be removed in a future version of Solaris. Convert the kernel module from the ancient _depends_on method to modern ELF dependencies. Remove the old _depends_on symbol and specify the -dy and -N linker options to set the ELF dependencies at link time, as recommended in the Solaris device driver developer guidelines [1]. This commit does not change the declared dependencies, which may be vestiges of ancient afs versions. [1]: http://docs.oracle.com/cd/E19455-01/805-7378/6j6un037u/index.html#loading-16 Change-Id: Ic5abd82108cd59c0796a8d7659ddaffa791dbeee Reviewed-on: https://gerrit.openafs.org/12453 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3af0460a4a6d7bf22e1789fd9e375659e20c3a55 Author: Mark Vitale Date: Mon Nov 21 13:25:40 2016 -0500 doc: correct help for 'bos getlog' -restricted mode Commit f085951d39c0d6c1e6a626177c30235704317600 introduced an error in the bos getlog helpfile. Modify the helpfile to describe the actual restrictions imposed by -restricted mode. Change-Id: I8d8fedb558a1bdbd55d80046b2011f3aacc71b3f Reviewed-on: https://gerrit.openafs.org/12454 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit c3bbf0b4444db88192eea4580ac9e9ca3de0d286 Author: Mark Vitale Date: Thu Aug 4 18:42:27 2016 -0400 LINUX: do not use d_invalidate to evict dentries When working within the AFS filespace, commands which access large numbers of OpenAFS files (e.g., git operations and builds) may result in active files (e.g., the current working directory) being evicted from the dentry cache. One symptom of this is the following message upon return to the shell prompt: "fatal: unable to get current working directory: No such file or directory" Starting with Linux 3.18, d_invalidate returns void because it always succeeds. Commit a42f01d5ebb13da575b3123800ee6990743155ab adapted OpenAFS to cope with the new return type, but not with the changed semantics of d_invalidate. Because d_invalidate can no longer fail with -EBUSY when invoked on an in-use dentry. OpenAFS must no longer trust it to preserve in-use dentries. Modify the dentry eviction code to use a method (d_prune_aliases) that does not evict in-use dentries. Change-Id: I1826ae2a89ef4cf6b631da532521bb17bb8da513 Reviewed-on: https://gerrit.openafs.org/12363 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 9d4be0bd01696768602a313f627a802b358b5885 Author: Marcio Barbosa Date: Fri Nov 11 13:21:58 2016 -0800 macos: do not quit prefpane unexpectedly If the user opens the OpenAFS preference pane and choose the Mounts tab, the preference pane crashes. To fix the problem, do not assume that we can cast a NSdictionary object to NSMutableDictionary. Change-Id: I3b5f6cb324a6b53c6b53606f71185f61450ee793 Reviewed-on: https://gerrit.openafs.org/12446 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 3e8529b6efec4625a4c67e6779fc8367291461a0 Author: Mark Vitale Date: Wed May 18 00:36:12 2016 -0400 salvager: fix error message for invalid volumeid If the specified volumeid is invalid (e.g. volume name was specified instead of volume number), the error is reported via Log(). However, commit 24fed351fd13b38bfaf9f278c914a47782dbf670 moved the log opening logic from before this check to after it, effectively making this Log() call a no-op. Instead, use fprintf to issue the error message. Change-Id: I488bc93b178c7973e48d7c9ef4e7ecde9ba62696 Reviewed-on: https://gerrit.openafs.org/12288 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e8f066dede63648d7d54c632e0e257c80db6effa Author: Anders Kaseorg Date: Fri Nov 4 20:48:02 2016 -0400 src/tools/rxperf/rxperf.c: Fix misleading indentation Fixes these warnings (errors with --enable-checking) from GCC 6.2: rxperf.c: In function ‘rxperf_server’: rxperf.c:930:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (ptr && *ptr != '\0') ^~ rxperf.c:932:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ break; ^~~~~ rxperf.c: In function ‘rxperf_client’: rxperf.c:1102:4: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (ptr && *ptr != '\0') ^~ rxperf.c:1104:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ break; ^~~~~ Change-Id: I4e8e1f75ec14fa9f95441275cfc136adbb448e9e Reviewed-on: https://gerrit.openafs.org/12440 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 85cf397ec18ecfde36433fb65e5d91ecd325b76e Author: Anders Kaseorg Date: Fri Nov 4 20:46:22 2016 -0400 src/gtx/curseswindows.c: Fix misleading indentation Fixes these warnings (errors with --enable-checking) from GCC 6.2: curseswindows.c: In function ‘gator_cursesgwin_drawchar’: curseswindows.c:574:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (params->highlight) ^~ curseswindows.c:576:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ if (code) ^~ curseswindows.c:579:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (params->highlight) ^~ curseswindows.c:581:9: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ if (code) ^~ curseswindows.c: In function ‘gator_cursesgwin_drawstring’: curseswindows.c:628:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (params->highlight) ^~ curseswindows.c:630:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ if (code) ^~ curseswindows.c:633:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (params->highlight) ^~ curseswindows.c:635:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ if (code) ^~ Change-Id: Ib53eb5755eebb5e22a5414ced8a2540825b41e15 Reviewed-on: https://gerrit.openafs.org/12439 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 86153c65cad10b0459d0f87bbe227a1ebe40f4ea Author: Anders Kaseorg Date: Fri Nov 4 20:44:00 2016 -0400 src/afsd/afsd.c: Fix misleading indentation Fixes these warnings (errors with --enable-checking) from GCC 6.2: afsd.c: In function ‘afsd_run’: afsd.c:2176:6: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (enable_rxbind) ^~ afsd.c:2178:3: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ afsd_syscall(AFSOP_ADVISEADDR, code, addrbuf, maskbuf, mtubuf); ^~~~~~~~~~~~ afsd.c:2487:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (afsd_debug) ^~ afsd.c:2490:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ afsd_syscall(AFSOP_GO, 0); ^~~~~~~~~~~~ Change-Id: Ic4769046dc06bb58d61428ac08ea12a2f70743e9 Reviewed-on: https://gerrit.openafs.org/12438 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 38040db3bb7b5ae4d5b2c710da17ba60abe39935 Author: Anders Kaseorg Date: Fri Nov 4 20:39:34 2016 -0400 src/ubik/uinit.c: Fix misleading indentation Fixes this warning (error with --enable-checking) from GCC 6.2: uinit.c: In function ‘internal_client_init’: uinit.c:96:2: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (code) ^~ uinit.c:98:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ return code; ^~~~~~ Change-Id: Ib03c4128e206194fa5c34fa3c49bb06beb70e6d0 Reviewed-on: https://gerrit.openafs.org/12437 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0aeb8c17a2701169ddb7397d951c73cf361087c8 Author: Anders Kaseorg Date: Fri Nov 4 20:38:08 2016 -0400 src/rx/rx_packet.c: Fix misleading indentation Fixes these warnings (errors with --enable-checking) from GCC 6.2: rx_packet.c: In function ‘rxi_ReceiveDebugPacket’: rx_packet.c:2009:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (rx_stats_active) ^~ rx_packet.c:2011:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ s = (afs_int32 *) & rx_stats; ^ rx_packet.c:2017:9: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (rx_stats_active) ^~ rx_packet.c:2019:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ rxi_SendDebugPacket(ap, asocket, ahost, aport, istack); ^~~~~~~~~~~~~~~~~~~ Change-Id: Iaecedf63e9ed393607b8700b892aea7678c774b3 Reviewed-on: https://gerrit.openafs.org/12436 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit bd70a176c19c09c49c6c3c01ea088ca947c45966 Author: Anders Kaseorg Date: Fri Nov 4 20:36:51 2016 -0400 src/rxgen/rpc_parse.c: Fix misleading indentation Fixes this warning (error with --enable-checking) from GCC 6.2: rpc_parse.c: In function ‘analyze_ProcParams’: rpc_parse.c:861:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (tokp->kind != TOK_RPAREN) ^~ rpc_parse.c:863:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘if’ *tailp = decls; ^ Change-Id: Ia63311c20eb8cd96123ba97b0bf7621b82956e79 Reviewed-on: https://gerrit.openafs.org/12435 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a7cc505d3be81e6aaf755bcc83d0dbcab85dbdad Author: Anders Kaseorg Date: Fri Nov 4 20:18:52 2016 -0400 regen.sh: Use libtoolize -i, and .gitignore generated build-tools Recent libtoolize actually deletes build-tools/missing, which Git was treating as a change to the working copy. Besides, we should let libtoolize copy in its more recent version of config.guess, config.sub, and install-sh. Change-Id: If21f22649e1e1015ad3bcfbf6d34f297b56993a1 Reviewed-on: https://gerrit.openafs.org/12434 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 22933e02e2510f25b79230964f135571c7bfe710 Author: Benjamin Kaduk Date: Thu Oct 27 17:27:26 2016 -0500 Reformat src/afs/LINUX/osi_vcache.c Apply the GNU indent options from CODING, with manual adjustments to leave jump labels in column zero. Also rename and mark static a function-local helper function. Change-Id: I50b8300b675b2a3f76ae743136b204473ac0c8b0 Reviewed-on: https://gerrit.openafs.org/12422 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 742643e306929ac979ab69515a33ee2a3f2fa3fa Author: Mark Vitale Date: Thu Aug 4 18:18:15 2016 -0400 LINUX: split dentry eviction from osi_TryEvictVCache To make osi_TryEvictVCache clearer, and to prepare for a future change in dentry eviction, split the dentry eviction logic into its own routine osi_TryEvictDentries. No functional difference should be incurred by this commit. Change-Id: I5b255fd541d09159d70f8d7521ca8f2ae7fe5c2b Reviewed-on: https://gerrit.openafs.org/12362 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot Reviewed-by: Joe Gorse commit 0bed87a15db11bccb693b3a54f704ee5751ae553 Author: Marcio Barbosa Date: Sun Oct 23 12:52:49 2016 -0700 macos: packaging support for MacOS X 10.12 This commit introduces the new set of changes / files required to successfully create the dmg installer on OS X 10.12 "Sierra". Change-Id: I8e715240c4b230c39c26c418324c0184268e1f73 Reviewed-on: https://gerrit.openafs.org/12420 Reviewed-by: Joe Gorse Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0bdf750a962a81b9b2e61387d7a3340dabb13395 Author: Marcio Barbosa Date: Tue Oct 25 19:33:38 2016 -0700 macos: add support for MacOS 10.12 This commit introduces the new set of changes / files required to successfully build the OpenAFS source code on OS X 10.12 "Sierra". Change-Id: I42326cd271d84735188f9e3003e292afe5ee34be Reviewed-on: https://gerrit.openafs.org/12419 Reviewed-by: Joe Gorse Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8aeb711eeaa5ddac5a74c354091e2d4f7ac0cd63 Author: Mark Vitale Date: Thu Oct 20 00:49:37 2016 -0400 Linux 4.9: inode_change_ok() becomes setattr_prepare() Linux commit 31051c85b5e2 "fs: Give dentry to inode_change_ok() instead of inode" renames and modifies inode_change_ok(inode, attrs) to setattr_prepare(dentry, attrs). Modify OpenAFS to cope. Change-Id: I72f8dfbdbd25d7c775e9c35116e323ea4359e95c Reviewed-on: https://gerrit.openafs.org/12418 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f21e3ef8ce5093b4e0578d29666f76bd99aef1a2 Author: Mark Vitale Date: Fri Sep 16 19:01:19 2016 -0400 Linux 4.9: inode_operation rename now takes flags In Linux 3.15 commit 520c8b16505236fc82daa352e6c5e73cd9870cff, inode_operation rename2() was added. It takes the same arguments as rename(), with an added flags argument supporting the following values: RENAME_NOREPLACE: if "new" name exists, fail with -EEXIST. Without this flag, the default behavior is to replace the "new" existing file. RENAME_EXCHANGE: exchange source and target; both must exist. OpenAFS never implemented a .rename2() routine because it was optional when introduced at Linux v3.15. In Linux 4.9-rc1 the following commits remove the last in-tree uses of .rename() and converts .rename2() to .rename(). aadfa8019e81 vfs: add note about i_op->rename changes to porting 2773bf00aeb9 fs: rename "rename2" i_op to "rename" 18fc84dafaac vfs: remove unused i_op->rename 1cd66c93ba8c fs: make remaining filesystems use .rename2 e0e0be8a8355 libfs: support RENAME_NOREPLACE in simple_rename() f03b8ad8d386 fs: support RENAME_NOREPLACE for local filesystems With these changes, it is now mandatory for OpenAFS afs_linux_rename() to accept a 5th flag argument. Add an autoconfig test to determine the signature of .rename(). Use this information to implement afs_linux_rename() with the appropriate number of arguments. Implement "toleration support" for the flags option by treating a zero flag as a normal rename; if any flags are specified, return -EINVAL to indicate the OpenAFS filesystem does not yet support any flags. Change-Id: I165d2b7956942446d97beda8504ac1ed5185a036 Reviewed-on: https://gerrit.openafs.org/12391 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8e81b182e36cde28ec5708e5fcbe56e4900b1ea3 Author: Mark Vitale Date: Wed Sep 14 18:01:22 2016 -0400 Linux 4.9: deal with demise of GROUP_AT Linux commit 81243eacfa40 "cred: simpler, 1D supplementary groups" refactors the group_info struct, removing some members (which OpenAFS references only through the GROUP_AT macro) and adding a gid member. The GROUP_AT macro is also removed from the tree. Add an autoconfigure test for the new group_info member gid and define a replacement GROUP_AT macro to do the right thing under the new regime. Change-Id: I85a52c0ae0d91fc141a523f443a4ffc05eb72a2b Reviewed-on: https://gerrit.openafs.org/12390 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit e17cd5df703b8a924591f92c76636dd9e0d9eaf9 Author: Anders Kaseorg Date: Sun Oct 9 06:39:12 2016 -0400 tests/util/ktime-t.c: Specify EST offset in TZ This fixes test failures observed on new Debian build servers that no longer install tzdata by default. As the tests expect, EST is defined as UTC−05:00 with no daylight saving time. Change-Id: Ida8cb33687b5d87761cb0422e446afd99246d47a Reviewed-on: https://gerrit.openafs.org/12414 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1cd86de2912af9ad709d2d7cf8aa35d5d28fb6b3 Author: Yadav Yadavendra Date: Mon Oct 3 15:25:08 2016 -0400 afs: afs_linux_write_end only commit copied In afs_linux_write_end() only commit the number of bytes actually copied to the page. Change-Id: I3576a28302d35917019d369adc9d1013ad5870c5 Reviewed-on: https://gerrit.openafs.org/12409 Reviewed-by: Jeffrey Altman Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 0fdbc0754be58a50f60e3187fc4b34f057faf198 Author: Daria Phoebe Brashear Date: Sun Sep 25 19:45:48 2016 -0400 git: add a mailmap file I'd like the source tree to stop deadnaming me, so, sharing this change to do it Change-Id: Iee65d1c8e7e695ea939485db5b148615e052f953 Reviewed-on: https://gerrit.openafs.org/12394 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 2fe3a28c6ec0ff9d19ddec5500b3a5e69b483210 Author: Michael Meffie Date: Mon Aug 22 19:53:34 2016 -0400 tests: avoid passing NULL strings to vprintf Some libc implementations will crash when NULL string arguments are given to *printf. Avoid passing NULL string arguments in the make check tests that did so, and pass the string "(null)" instead. Change-Id: I65f11a3eef88d1c7b210c867ae0c40018160f55a Reviewed-on: https://gerrit.openafs.org/12377 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 4e0bc086d6d09db66b3dd26d221ff712ff351386 Author: Michael Meffie Date: Sat Aug 6 10:41:24 2016 -0400 afsd: fix afsd -help crash afsd crashes after the usage is displayed with the -help option. $ afsd -help Usage: ./afsd [-blocks <1024 byte blocks in cache>] [-files ] ... Segmentation fault (core dumped) The backtrace shows the crash occurs when calling afsconf_Open() with an invalid pointer argument, even though afsconf_Open() is not even needed when -help is given. (gdb) bt #0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:32 #1 0x00007ffff726fc36 in *__GI___strdup (s=0x0) at strdup.c:42 #2 0x0000000000408383 in afsconf_Open (adir=0x0) at cellconfig.c:444 #3 0x00000000004054d5 in afsd_run () at afsd.c:1926 #4 0x0000000000407dc5 in main (argc=2, argv=0x7fffffffe348) at afsd_kernel.c:577 afsconf_Open() is called with an uninitialized pointer because commit d72df5a18e0bb8bbcbf23df3e8591072f0cdb770 changed the libcmd cmd_Dispatch() to return 0 after displaying the command usage when the -help option is specified. (That fix was needed for scripts which use the -help option to inspect command options with the -help option.) The afsd_kernel main function then incorrectly calls the afsd_run() function, even though mainproc() was not called, which sets up the afsd option variables. The afsconf_Open() is the first function we call in afsd_run(). Commit f77c078a291025d593f3170c57b6be5f257fc3e5 split afsd into afsd.c and afsd_kernel.c to support libuafs (and fuse). This split the parsing of the command line arguments and the running of the afsd command into two functions. The mainproc(), which originally did both, was split into two functions; one (still called mainproc) to check the option values given and setup/auto-tune values, and another (called afsd_run) to do the actual running of the afsd command. The afsd_parse() function was introduced as a wrapper around cmd_Dispatch() which "dispatches" mainproc. With this fix, take the opportunity to rename mainproc() to the now more accurately named CheckOptions() and change afsd_parse() to parse the command line options with cmd_Parse(), instead of abusing cmd_Dispatch(). Change the main fuction to avoid running afsd_run() when afsd_parse() returns the CMD_HELP code which indicates the -help option was given. afsd.fuse splits the command line arguments into afsd recognized options and fuse options (everything else), so only afsd recognized arguments are passed to afsd_parse(), via uafs_ParseArgs(). The -help argument is processed as part of that splitting of arguments, so afsd.fuse never passes -help as an argument to afsd_parse(). This means we to not need to check for CMD_HELP as a return value from uafs_ParseArgs(). But since this is all a bit confusing, at least check the return value in uafs_ParseArgs(). Change-Id: If510f8dc337e441c19b5e28685e2e818ff57ef5a Reviewed-on: https://gerrit.openafs.org/12360 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 644d3b6ec4afb5e9c0f35f48058d20f791806a9d Author: Michael Meffie Date: Tue Aug 2 16:52:42 2016 -0400 revert: "LINUX: Fix oops during negative dentry caching" Commit fd23587a5dbc9a15e2b2e83160b947f045c92af1 was done to fix an oops when parent_vcache_dv() was called without the GLOCK held. Since the lockless code paths have been removed, and parent_vcache_dv() is always called with the GLOCK held, revert the extra locked flag argument and the calls obtain and release the GLOCK within parent_vcache_dv(). Change-Id: I21c3272ec4ed5d4fa1a746a0f783cccfc14e0c22 Reviewed-on: https://gerrit.openafs.org/12354 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 74d4fea1683ccd5b4db53709fc2b5053062ea052 Author: Andrew Deason Date: Wed Mar 4 14:10:23 2015 -0600 Revert "Lockless path through afs_linux_dentry_revalidate" This reverts commit 3ecd65d3375f0a4fa4c28f9b59cdf6a1f6fd51b8. This commit made it possible to execute afs_linux_dentry_revalidate without taking the GLOCK under some circumstances. However, it achieved this by examining structure members outside of the GLOCK that were previously only examined under the GLOCK (such as vcp->f.states and vcp->f.m.DataVersion). While that does of course improve performance, it is not known to be completely safe. Revert this commit so we may implement a fastpath through afs_linux_dentry_revalidate using more trusted lockless techniques (atomics, RCU, etc). Change-Id: Ia3ca2cf53f97244e4e548db7c1caf218c16aca5c Reviewed-on: https://gerrit.openafs.org/11793 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a13ea7038ebe262ba1e5387f4a3b12897bd8822b Author: Andrew Deason Date: Fri Feb 13 13:11:09 2015 -0600 opr: Add opr_StaticAssert Add a static assert macro, for asserting that certain build-time expressions are true. Change-Id: I33b0e7168f041e8e8406710d05689e044af45fad Reviewed-on: https://gerrit.openafs.org/11792 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 7b99f2e4a8b7071930a5851c5f6c6ab6ddc0dd57 Author: Andrew Deason Date: Thu Jun 26 15:47:46 2014 -0700 afs: Create afs_SetDataVersion Several different places in the codebase change avc->f.m.DataVersion for a particular vcache, when we've noticed that the DV for the vcache has changed. Consolidate all of these occurrences into a single afs_SetDataVersion function, to make it easier to change what happens when we notice a change in DV number. This should incur no behavior change; it is just simple code reorganization. Change-Id: I5dbf2678d3c4b5a2fbef6ef045a0b5bfa8a49242 Reviewed-on: https://gerrit.openafs.org/11791 Reviewed-by: Marc Dionne Reviewed-by: Daria Phoebe Brashear Reviewed-by: Benjamin Kaduk Reviewed-by: Thomas Keiser Tested-by: BuildBot commit fac0b742960899123dca6016f6ffc6ccc944f217 Author: Andrew Deason Date: Sun May 22 21:54:30 2016 -0500 ubik: Return an error from ContactQuorum when inquorate Currently, when we need to contact all other servers in the ubik quorum (to create a write transaction, and send db changes, etc), we call the ContactQuorum_* family of functions. To contact each server, those functions follow an algorithm like the following pseudocode: { int rcode = 0; int code; int okcalls = 0; for (ts = ubik_servers; ts; ts = ts->next) { if (ts->up) { code = contact_server(ts); if (code) { rcode = code; } else { okcalls++; } } } if (okcalls + 1 >= ubik_quorum) { return 0; } else { return rcode; } } This means that if we successfully contact a majority of ubik sites, we return success, even if some sites returned an error. If most sites fail, then we return an error (we arbitrarily pick the last error we got). This means that in most situations, a successful write transaction is guaranteed to have been transmitted to a majority of ubik sites, so the written data cannot be lost (at least one of the sites that got the new data will be in a future elected quorum). However, if a site is already known to be down (ts->up is 0), then we skip trying to contact that site, but we also don't set any errors. This means that if a majority of sites are already known to be down (ts->up is 0), then we can indicate success for a write transaction, even though the relevant data has not been written to a majority of sites. In that situation, it is possible to lose data. Most of the time this is not possible, since a majority of sites must be 'up' for the sync site to be elected and to allow write transactions at all. There are a few ways, though, in which we can get into a situation where most other sites are 'down', but we still let a write transaction go through. An example scenario: Say we have sites A, B, and C. All 3 sites come up at the same time, and A is the lowest IP so it starts an election (after around BIGTIME seconds). Right after A is elected the sync site, sites B and C will have 'lastYesState' set to 0, since site A hasn't yet sent out a beacon as the sync site. A client can then start a write to the ubik database on site A, which site A will allow since it's the sync site (and presumably all the relevant recovery flags are set). Site A will try to contact sites B and C for a DISK_Begin call, but lastYesState is set to 0 on those sites. This will cause DISK_Begin to return UNOQUORUM (urecovery_AllBetter will return 0, because uvote_HaveSyncAndVersion will return 0, because lastYesState is not set). So site A will get a UNOQUORUM error from sites B and C, and so site A will set 'ts->up' to 0 for sites B and C, and will return UNOQUORUM to the client. The client may then try to retry the call (because UNOQUORUM is not treated as a 'global' error in ubikclient.c's ubik_Call_New), or another client write request could come in. Now that 'ts->up' is unset for both sites B and C, we skip trying to contact any remote sites, and the ContactQuorum functions will return success. So the ubik write will go through successfully, but the new data will only be on site A. At this point, if site A crashes, then sites B and C will elect a quorum, and will not have the modifications that were written to site A (so the data written to site A is lost). If site A stays up, then it will go through database recovery, sending the entire database file to sites B and C. In addition, it's very possible in this scenario for a client to write to the database, and then try to read back data and confusingly get a different result. For example, if someone issues the following two commands while triggering the above scenario: $ pts createuser testuser $ pts examine testuser If the second command contacts site B or C, then it will always fail, saying that the user doesn't exist (even though the first command succeeded). This is because sites B and C don't have the new data written to site A, at least temporarily. While this confusing behavior is not completely avoidable in ubik (this can always happen 'sometimes' due to network errors and such), with the scenario described here, it happens 100% of the time. The general scenario described above can also happen if sites B and C are suddenly legitimately unreachable from site A, instead of throwing the UNOQUORUM error. All of the steps are pretty much the same, but there is a bit of a delay while we wait for the DISK_Begin call to fail. To fix this, do not let 0 be returned if a quorum has not been reached. In some sense, UNOQUORUM could *always* be returned in that case, but it is more in keeping with historical behavior to return a "real" error if there is one available. It is somewhat questionable whether we should even be propagating errors received from calls like DISK_Begin/DISK_Commit to the ubik client (e.g. if we get a -1 from trying to contact a remote site, we return -1 to the client, so the client may think it couldn't reach the site at all). But this commit does not change any of that logic, and should only change behavior when a majority of sites have 'ts->up' unset. A later commit might effect the change to always return UNOQUORUM and ignore the actual error values from the DISK_ calls, but that is not needed to fix the immediate issue. An important note: Before this commit, there was a window of about 15 seconds after a sync site is elected where a write to the ubik db would appear to be successful, but would only modify the ubik db on the sync site. (Details described above.) With this commit, writes during that 15-second window will instead fail, because we cannot guarantee that we won't lose that data. If someone relies on 'udebug' data from the sync site to let them know when writes will go through successfully, this commit could appear to cause new errors. [kaduk@mit.edu: transfer long commit message describing the issue from an alternative fix, and tidy up accordingly] Change-Id: If6842d7122ed4d137f298f0f8b7f20350b1e9de6 Reviewed-on: https://gerrit.openafs.org/12289 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 64cc7f0ca7a44bb214396c829268a541ab286c69 Author: Andrew Deason Date: Wed May 14 19:56:58 2014 -0500 afs: Create afs_StaleVCache In numerous different places in the code, we do something like this to mark a vcache as stale: ObtainWriteLock(&afs_xcbhash, somenumber); avc->f.states &= ~CStatd; afs_DequeueCallback(avc); ReleaseWriteLock(&afs_xcbhash); if (avc->f.fid.Fid.Vnode & 1 || (vType(avc) == VDIR)) osi_dnlc_purgedp(avc); There are some variations here and there, but all locations usually involve at least some code like that. But they all do the same general thing: invalidate a vcache so we hit the net the next time we need that vcache. In order to make it easier to modify what happens when we invalidate a vcache, and just to improve the code, take all of these instances and put the functionality in a single function, called afs_StaleVCache, which marks the vcache as 'stale'. To handle a few different situations that must be handled, we have some flags that can also be passed to the new function. These are primarily necessary to handle variations in the circumstances under which we hit this code path; for instance, we may already have afs_xcbhash locked, or we may be invalidating the entire osidnlc (if we're invalidating vcaches in bulk, for example). This should result in the same general behavior in all cases. The only slight differences in a few cases is that we hold locks for a few more operations than we used to; for example, we may clear an osidnlc entry while holding the vcache lock. But these are minor and shouldn't result in any actual differences in behavior. So, this commit should just be code reorganization and should incur no behavior change. However, this reorganization is complex, and should not be considered a simple risk-free refactoring. [kaduk@mit.edu: implement Tom Keiser's suggestion of a third argument to afs_StaleVCacheFlags, add AFS_STALEVC_CLEARCB and AFS_STALEVC_SKIP_DNLC_FOR_INIT_FLUSHED] Change-Id: I2b2f606c56d5b22826eeb98471187165260c7b91 Reviewed-on: https://gerrit.openafs.org/11790 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 733dcec01784617e3354c2b8b29f50b09464a4bb Author: Matt K. Light Date: Tue Sep 13 14:18:38 2016 -0500 Fix compile error for PPC64 gcc 6.1.1 Cast function pointer stubs to remove compile errors on Fedora 24 PPC64 with ggcc 6.1.1 FIXES 133407 Change-Id: I59a191f7f8123ce17bfa6175b989ae14b5eab5a4 Reviewed-on: https://gerrit.openafs.org/12386 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit f2f5a7bca5e77971ef71bf2ddabf93868fe79f1d Author: Michael Meffie Date: Wed Aug 17 10:57:48 2016 -0400 CODING: one-line if statements should not have braces Update the style guide with a declaration of the prevailing and preferred brace style for one-line if statements and loops. Provide an example and counter-example. Change-Id: Iafeea977203b76c0e67385779fb4ed57f3c6699a Reviewed-on: https://gerrit.openafs.org/12370 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f0fa5a5327c7440070d34127a124d6b7eb4bd32d Author: Michael Meffie Date: Thu Jun 11 11:25:51 2015 -0400 libafs: update the volume setup time when the vldb is rechecked The vldb is rechecked when the fileserver returns certain error codes, such as VMOVED. When the vldb is rechecked, update the volume setupTime to reflect the most recent time the volume vldb information is known to be correct. Be sure the VRecheck flag is cleared after checking the vldb, since the volume write lock was dropped after finding the volume. Change-Id: I0ba389ee408de602e0059fbe8013012501c337d3 Reviewed-on: https://gerrit.openafs.org/11897 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit ee08dbe37d9db4fe314bd88b9280bf73c92c37bd Author: Andrew Deason Date: Sat Aug 8 16:13:54 2015 -0500 afs: Make ONEGROUP_ENV not Linux-specific The functionality in AFS_LINUX26_ONEGROUP_ENV does not really need to be Linux-specific (it's just only implemented for Linux right now). Rename it to AFS_PAG_ONEGROUP_ENV, and remove some Linux-specific checks when checking for "onegroup" PAG GIDs. [mmeffie@sinenomine.net: Move AFS_PAG_ONEGROUP_ENV to param.h] Change-Id: I01d29fff309337ae95b9b6c65db3d2212cf4bf89 Reviewed-on: https://gerrit.openafs.org/11978 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b39095c3a7e1c631bb17816b7e707bc21a6b8c71 Author: Michael Meffie Date: Fri Sep 9 16:23:46 2016 -0400 afs: define NUMPAGGROUPS once Define the number of groups per PAG in one place. Prefix the define with AFS_ to avoid name conflicts in the future (unlikely as it may be). Fix the misnamed AFSPAGGGROUPS symbol in linux implementation of two groups per PAG. Change-Id: I78bb42913f2a5d84c9f323f17dc36d800d8acb84 Reviewed-on: https://gerrit.openafs.org/12382 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 0028ea92ad3e7aac6a4c51f63703a4d9d7b9dcd6 Author: Michael Meffie Date: Wed Apr 29 12:00:24 2015 -0400 afs: add afsd -inumcalc option This commit adds the afsd -inumcalc command line switch to specify the inode number calculation method in a platform neutral way. Inode numbers reported for files within the AFS filesystem are generated by the cache manager using a calculation which derives a number from a FID. Long ago, a new type of calculation was added which generates inode numbers using a MD5 message digest of the FID. The MD5 inode number calculation variant is computationally more expensive but greatly reduces the chances for inode number collisions. The MD5 calculation can be enabled on the Linux cache manager using the Linux sysctl interface. Other than the sysctl method of selecting the inode calculation type, the MD5 inode number calculation method is not specific to Linux. This change introduces a command-line option which accepts a value to indicate the calculation method, instead of a simple flag to enable MD5 inode numbers. This should allow for new inode calculation methods in the future without the need for additional afsd command-line flags. Two values are currently accepted for -inumcalc. The value of 'compat' specifies the legacy inode number calculation. The value 'md5' indicates that the new MD5 calculation is to be used. Change-Id: I0257c68ca1a32a7a4c55ca8174a4926ff78ddea4 Reviewed-on: https://gerrit.openafs.org/11855 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit c17d14223044936a5de5007052eff3488350e9d4 Author: Michael Meffie Date: Sat Aug 6 12:57:59 2016 -0400 CODING: update style guide for multiline comments Document the preferred style for multiple line comment blocks and give an example. Change-Id: I73d6183da9014a943316e5aea1d43be2acc81ad7 Reviewed-on: https://gerrit.openafs.org/12361 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit aca8ac83bd456862815a7f247e9a7b89583517a8 Author: Benjamin Kaduk Date: Wed Jul 13 18:23:50 2016 -0500 Document minimum supported compiler versions Pick some fairly old versions of clang and gcc and document them as the minimum supported version. This will let us make assumptions about compiler features that are available when using those compilers. Change-Id: Ibb8df72c9b12cc7adff39ece9708a428975ba703 Reviewed-on: https://gerrit.openafs.org/12331 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 83a0f2a9ef88e63fbd300fbb436c17ca80c245b4 Author: Anders Kaseorg Date: Mon Jul 25 21:04:59 2016 -0400 Linux 4.7: Follow key_alloc API change Linux v4.7-rc1~124^2~2^2^2~9 adds an eighth optional argument restrict_link. The same commit adds a KEY_ALLOC_BYPASS_RESTRICTION macro, which we test so we can avoid adding another configure test. Change-Id: I83e27b54ba5711124dccaa41de7155be77054f47 Reviewed-on: https://gerrit.openafs.org/12345 Tested-by: BuildBot Reviewed-by: Anders Kaseorg Reviewed-by: Benjamin Kaduk commit fa5af899319b69fa9542add78beca388521e3450 Author: Mark Vitale Date: Fri May 27 16:44:17 2016 -0400 SOLARIS: corrupted content of mmap'd files over 4GiB Many Solaris programs and utilities (notably mdb and cp) use mmap() in their implementation. When AFS files exceeding 4GiB are mmap'd, the contents of the file will be incorrectly mapped into memory. Starting at 4GiB + 1, the first 4GiB will be repeated for the remainder of the file. If the mmap'd file is written back to storage (AFS or otherwise), the newly created file will also be corrupted. This is due to a bug in the afs_map() routine that supports mmap() of AFS files on Solaris. The segvn_crarg.offset passed to the Solaris virtual memory APIs is incorrectly cast to u_int, causing it to wrap at 4GiB. Although Solaris passes the offset from fop_map() to afs_map() as type offset_t, the destination segvn_crargs.offset is actually type u_offset_t. Existing examples of other Solaris filesystems (e.g. zfs_map() ) cast the offset from offset_t to u_offset_t when assigning to segvn_crargs.offset. If it's good enough for ZFS, it's good enough for AFS. Correctly cast the offset to u_offset_t. Thanks to Robert Milkowski for the report and diagnosis. Change-Id: Id25363255ec011f2ad7e003ca3e4a1385bebff7e Reviewed-on: https://gerrit.openafs.org/12292 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 75325fc9ab1cec4a338e1aaf1b32de1922492b12 Author: Mark Vitale Date: Thu May 26 16:53:47 2016 -0400 SOLARIS: support mmap() over 4GiB When mmap() is issued for exactly 4GiB of a large AFS-resident file, mmap() fails with ENOMEM. This is because the AFS code is handling the requested length as u_int instead of size_t, resulting in a 0 being passed back to the caller. When mmap() is issued for non-multiples of 4GiB, the subsequent mapping will not contain all the requested pages, and for the same reason - the mapped size has been truncated to 32 bits. This results in SIGSEGV when accessing the non-mapped page(s). Fix the signature of afs_map() to specify the correct type for the length. Thanks to Robert Milkowski for the report and diagnosis. Change-Id: I8a9f0cb04ff9b80de5516e14d0679b06ef0b3f9a Reviewed-on: https://gerrit.openafs.org/12291 Tested-by: BuildBot Tested-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 19ffa2b7f09bffea816dda4713ad53f4d8cb93cb Author: Marcio Barbosa Date: Wed Jul 20 15:09:43 2016 -0400 macos: pkgbuild.sh should not be tracked by git The automatically generated pkgbuild.sh file should not be tracked by git. To fix this problem, add the name of this file to the proper .gitignore file. Change-Id: I9bdbad8e7cc02926de61e337ccb94d8a2c27ae43 Reviewed-on: https://gerrit.openafs.org/12343 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 7f8af1b384cfdc2964a122953e4102b4d82e6cb1 Author: Mark Vitale Date: Thu Jun 18 15:32:36 2015 -0400 afs: incorrect comments for afs_ClearStatus The brief description was identical to the one for afs_Analyze. Update it to accurately describe afs_ClearStatus. Change-Id: I70ceca41342c1b47950c35f567f8ae5a2566f925 Reviewed-on: https://gerrit.openafs.org/12005 Reviewed-by: Perry Ruiter Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit d3dbdade7e8eaf6da37dd6f1f53d9f1384626071 Author: Andrew Deason Date: Sun May 1 11:24:30 2016 -0500 ubik: Don't RECFOUNDDB if can't contact most sites Currently, the ubik recovery code will always set UBIK_RECFOUNDDB during recovery, after asking all other sites for their dbversions. This happens regardless of how many sites we were actually able to successfully contact, even if we couldn't contact any of them. This can cause problems when we are unable to contact a majority of sites with DISK_GetVersion. Since, if we haven't contacted a majority of sites, we cannot say with confidence that we know what the best db version available is (which is what UBIK_RECFOUNDDB represents; that we've found which database is the one we should be using). This can also result in UBIK_RECHAVEDB in a similar situation, indicating that we have the best db version locally, even though we never actually asked anyone else what their db version was. For example, say site A is the sync site going through recovery, and DISK_GetVersion fails for the only other sites B and C. Site A will then set UBIK_RECFOUNDDB, and will claim that site A has the best db version available (UBIK_RECHAVEDB). This allows site A to process ubik write transactions (causing the db to be labelled with a new epoch), or possibly to send the db to the other sites via DISK_SendFile, if they quickly become available during recovery. Ubik write transactions can succeed in this situation, because our ContactQuorum_* calls will succeed if we never try to contact a remote site ('rcode' defaults to 0). This situation should be rather rare, because normally a majority of sites must be reachable by site A for site A to be voted the sync site in the first place. However, it is possible for site A to lose connectivity to all other sites immediately after sync site election. It is also possible for site A to proceed far enough in the recovery process to set UBIK_RECHAVEDB before it loses its sync site status. As a result of all of this, if a site with an old database comes online and there are network connectivity problems between the other sites and a ubik write request comes in, it's possible for the "old" database to overwrite the "new" database. This makes it look as if the database has "rolled back" to an earlier version. This should be possible with any ubik database, though how to actually trigger this bug can change due to different ubik servers setting different network timeouts. It is probably the most likely with the VLDB, because the VLDB is typically the most frequently written database. If a VLDB reverts to an earlier version, it can result in existing volumes to appear to not exist in the VLDB, and can result in new volumes re-using volume IDs from existing volumes. This can result in rather confusing errors. To fix this, ensure that we have contacted a majority of sites with DISK_GetVersion before indicating that we have located the best db version. If we've contacted a majority of sites, then we are guaranteed (under ubik assumptions) that we've found the best version, since previous writes to the database should be guaranteed to hit a majority of sites (otherwise they wouldn't be successful). If we cannot reach a majority of sites, we just don't set UBIK_RECFOUNDDB, and the recovery process restarts. Presumably on the next iteration we'll be able to contact them, or we'll lose sync site status if we can't reach the other sites for long enough. Change-Id: I84f745b5e017bb62d93b538dbc9c7de845bee1bd Reviewed-on: https://gerrit.openafs.org/12281 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 3e531db9ce50dd41f0c64a11ab3bfcf0239ba0cd Author: Andrew Deason Date: Thu May 12 21:34:31 2016 -0500 vlserver: rx_SetRxDeadTime before ubik init Currently, vlserver calls rx_SetRxDeadTime to set the default rx deadtime to 50 seconds, but it does so after calling ubik_ServerInitByInfo. ubik_ServerInitByInfo creates several rx connections before it returns, and so these connections get the default rx deadtime (12 seconds), instead of the 50 seconds vlserver tries to set. When ubik detects that a remote site is down, ubik recreates the rx connections for that site, and this new connection gets the new deadtime of 50 seconds. This means that ubik behavior can have different timings in the vlserver, depending on if any remote sites have ever been detected as being 'down' or not. This can result in seemingly-inconsistent or confusing behavior, since some sequences of operations that appear identical can produce different results, depending on if the 12-second timeout or the 50-second timeout is being used. This behavior is not directly to blame for any problems, but it can be very confusing, especially when trying to diagnose or reproduce bugs. So to make things more consistent, just call rx_SetRxDeadTime earlier, so all conns always get the 50-second timeout. In order to do this, though, we must also ensure that rx_Init is called before rx_SetRxDeadTime (otherwise, rx_Init will overwrite our configured deadtime). So also call rx_Init earlier; rx_Init is idempotent, so it's okay that it may be called again after or before this. Note that vlserver is currently the only ubik server that sets a deadtime of 50 seconds, and it's not clear why. Another way to solve this is to just remove the call to rx_SetRxDeadTime, to make vlserver behave more similar to ptserver. But this commit takes a conservative approach to result in a deadtime that is probably the most common in current use. Since, most long-running vlservers will probably eventually lose contact with remote sites at one time or another, and so will eventually use a deadtime of 50 seconds. Change-Id: I49430144d9a62eb8cad1509c1aeafc9fcc927f8e Reviewed-on: https://gerrit.openafs.org/12285 Tested-by: Andrew Deason Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 48ce41a447c354b8a20b769e4aa5b502ba5bcc09 Author: Marcio Barbosa Date: Fri Jul 15 12:22:11 2016 -0300 macos: use pkgbuild to build the package on 10.10/10.11 PackageMaker is no longer part of OS X. As a result, it is not possible to build the package on OS X 10.10 and OS X 10.11 using the existing code. To solve this problem, a new script, along with a couple of new files, are provided. - pkgbuild.sh This script uses the command line tools pkgbuild and productbuild to build the package on OS X 10.10 and OS X 10.11. By default, the package built by this script will not be signed. Optionally, the package might be signed. - Distribution.xml This file is nothing more than an XML file used by productbuild. It is mainly used to configure how the installer will look and behave. - conclusion.txt Contains the text that is displayed by Installer at the end of the installation process. Only used by El Capitan and further. - Uninstall.14.15 This script can be used by OS X 10.10/10.11 users to uninstall OpenAFS. Notes: - This work is based on a patch made by Brandon Allbery with fixes and updates from Andrew Deason . - El Capitan and further prevent us from touching /usr/bin directly. As a result, /opt is used. - If the package is not signed, the user will have to disable the OS X security protections. Otherwise, the client will not work. - Now we have two different scripts to build the package on OS X. For OS X 10.10 and newer versions, pkgbuild.sh will be used. For older versions, the existing buildpkg.sh will be used. Change-Id: If8320666c553b82af450c0263f5e80a00c33e3b8 Reviewed-on: https://gerrit.openafs.org/12239 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 1bfc24dda0f391b88d7617c6947d03216abb0d80 Author: Marcio Barbosa Date: Wed Jul 6 09:56:26 2016 -0300 pam: avoid warning messages In order to avoid some warning messages, do not ignore the code returned by some functions. Change-Id: Ie01fa98b54010d566fb5b980b001d58989ef9a67 Reviewed-on: https://gerrit.openafs.org/12298 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit a0417565a3ab7e6a49d7c48efd72d62bdeb4436c Author: Garrett Wollman Date: Sat Jul 28 18:35:13 2012 -0400 ptuser: guarantee that all names are valid C strings The prname type is represented in XDR as a vector[PR_MAXNAMELEN] of char, not as a string, which means that the XDR (de)serializer will not guarantee null-termination. Guarantee that all buffers used in the public protection server API are in fact valid strings by disallowing any names that are exactly PR_MAXNAMELEN (64) characters long. DO NOT silently truncate names that are even longer than this. Consistently use the prname typedef in declarations to reinforce the length limitation to those reading the header file. Introduces a new protection error code, PRNAMETOOLONG, which will be returned if either IN or OUT parameters would exceed the limit. [kaduk@mit.edu convert macro to static_inline function and expand at call sites; add string_ wrapper to add checking to viced and libadmin; export the string_ wrapper from libafsauthent for the windows build] Change-Id: I65f850afcfea2fd2bc0110ca7b7f6ecca247dd58 Reviewed-on: https://gerrit.openafs.org/7896 Reviewed-by: Chas Williams <3chas3@gmail.com> Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit f14d263a73f0be75e4de92f62e836fb2e55680dd Author: Joe Gorse Date: Thu Jun 9 14:11:23 2016 -0400 Linux 4.6: rm PAGE_CACHE_* and page_cache_{get,release} macros This is an automatic patch generated by Coccinelle (spatch) from the commit message of the linked commit: https://github.com/torvalds/linux/commit/09cbfeaf1a5a67bfb3201e0c83c810cecb2efa5a We will not add an autoconfig test because the PAGE_{...} macros should exist where the PAGE_CACHE_{...} were previously. The spatch used: @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Change-Id: Iabe29b1349ab44282c66c86eced9e5b2056c9efb Reviewed-on: https://gerrit.openafs.org/12297 Reviewed-by: Michael Laß Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Stephan Wiesand Tested-by: Stephan Wiesand Reviewed-by: Benjamin Kaduk commit 16463b602a210768f80bec9ef7c6896ea8a9909d Author: Stephan Wiesand Date: Wed Jul 13 16:55:11 2016 +0200 redhat: Use a secure URL to retrieve CellServDB By default, makesrpm.pl will use wget to retrieve the CellServDB as specified in the spec file. Even though the script need not and thus should not be run by a privileged UID, make this a bit more secure by specifying an https URL. Change-Id: I0f14bbac35e7dc30a6e194f8706f7f3674d15a3f Reviewed-on: https://gerrit.openafs.org/12329 Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk commit 8b57f9fc423c6a69a0fb8147d0621cb703e1374e Author: Marcio Barbosa Date: Thu Jun 9 15:04:18 2016 -0300 build-sys: do not capitalize value of HAVE_PAM The value assigned to HAVE_PAM should not be capitalized. If so, the PAM source files will not be compiled. To fix this problem, convert to lowercase one of the values assigned to HAVE_PAM. Change-Id: I4973394f8d398bbea0f578fadb04aedee6fd1fc0 Reviewed-on: https://gerrit.openafs.org/12296 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit a443accfdf8771b90e2b06da04e7e3d1e88028fd Author: Michael Meffie Date: Thu Jun 11 11:02:20 2015 -0400 libafs: rename volume accessTime to setupTime Since OpenAFS 1.0, the struct volume accessTime member has been the time time the volume structure is setup, not the last time the volume was used (as indicated by the comments). This time stamp is only used to find the oldest available volume slot in the disked backed volume cache. (Perhaps in pre-OpenAFS this was updated each time the volume was referenced.) Rename this structure member and update the comments for it. Change-Id: I33a6371e8800b2d0f7b2700db0785fc365a8649e Reviewed-on: https://gerrit.openafs.org/11896 Reviewed-by: Perry Ruiter Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit c5b52c815972b4f623defaec9e0d8c235228b7b8 Author: Michael Meffie Date: Mon Apr 4 12:35:11 2016 -0400 vlserver: --enable-ubik-read-while-write configure option Commit a0f416e3504929b304fefb5ca65e2d6a254ade2e unconditionally turned on the new ubik_BeginTransReadAnyWrite functionality for the vlserver, which allows us to read data from ubik during a conflicting ubik write lock. This feature is not ready for production use. Make it a build time option, marked as experimental, until more testing can be done. Change-Id: If64702e7a7ed2340066df5faf82ce8b0875fc610 Reviewed-on: https://gerrit.openafs.org/12240 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit cd52915b3e8c8249c5af1cfebd57276cd34a00b9 Author: Benjamin Kaduk Date: Tue Oct 7 17:17:08 2014 -0400 LWP fileserver is no more Don't mention it in the man pages. Change-Id: I8a6d706f055545642116af5a98fa8c04f533b990 Reviewed-on: https://gerrit.openafs.org/11529 Reviewed-by: Marcio Brito Barbosa Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 43a66de66c40171fedcf0450e9fa93b47c0d9f2e Author: Michael Meffie Date: Fri Jun 5 10:09:54 2015 -0400 libafs: avoid resetting the dynroot volume every 10 minutes The dynroot volumes are synthetic, so do not need to be reset every time the background daemon checks the volumes. The results of osi_Time() is a signed 32-bit integer, and the volume expireTime is an signed 32-bit integer, so use signed 32-bit integers for the expiry check. Change-Id: Ib92157686c1d8b84a63d409cb148155705953b6d Reviewed-on: https://gerrit.openafs.org/11895 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit b3e85976936239e30d44da00bf28fbe8487f6998 Author: Mark Vitale Date: Thu Jun 18 15:54:28 2015 -0400 afs: document missing afs_Analyze parm rxconn was missing from the comments; add it. Change-Id: I8c0cf212ca2952d3a23c3bb5db1857dfd9a8f41e Reviewed-on: https://gerrit.openafs.org/12004 Reviewed-by: Perry Ruiter Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit dda47aab6179b6940aa994a0cd7b88a4b0942fe6 Author: Benjamin Kaduk Date: Mon Jul 4 20:13:31 2016 -0500 Add sysname IDs for FreeBSD 10.2 and 10.3 While here, de-conflict the numbers for 10.0/10.1 and 7.2/7.3 Change-Id: I87697587359a26258298f4710c7232bea417f807 Reviewed-on: https://gerrit.openafs.org/12321 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit 683acaed17da90455aab0cbb3d1539c51415b137 Author: Benjamin Kaduk Date: Sun May 15 13:51:56 2016 -0500 viced: make -vhashsize usable for non-DAFS The ability to set the size of the volume hash table was added at the same time that DAFS was introduced, and got caught up in the same preprocessor conditional. However, -vhashsize can be useful for the traditional fileserver as well (even though we recommend DAFS over the traditional fileserver), so let it be used in that case. Update the man pages accordingly and fix some grammar while here. Noted by Mark Vitale. Change-Id: Ic3282c9d661d60cf36f9ffb197e723a3f71da167 Reviewed-on: https://gerrit.openafs.org/12287 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit d3b8a05d229a80100f40fca4dfdcd820313fcea8 Author: Marcio Barbosa Date: Tue Jun 28 12:48:06 2016 -0300 venus: fix memory leak The fs getserverprefs command displays preference ranks for file / volume location server machine interfaces. In order to get the complete set of preference ranks, the VIOC_GETSPREFS system call might have to be called several times. If so, the memory previously allocated should be released. Change-Id: I8491117ead626e70aac40343923d52284f274efd Reviewed-on: https://gerrit.openafs.org/12315 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 360f4ef53c454494cd5212a5ea46c658bdb2879c Author: Benjamin Kaduk Date: Sun May 1 19:48:40 2016 -0400 Linux 4.5: don't access i_mutex directly Linux commit 5955102c, in preparation for future work, introduced wrapper functions to lock/unlock inode mutexes. This is to prepare for converting it to a read-write semaphore, so that lookup can be done with only the shared lock held. Adopt the afs_linux_*lock_inode() functions accordingly, and convert afs_linux_fsync() to using those wrappers, since the FOP_FSYNC_TAKES_RANGE case appears to be the current case. Amusingly, afs_linux_*lock_inode() already have a branch to handle the case when inode serialization is protected by a semaphore; it seems that this is going to come full-circle. Change-Id: Ia5a194acc559de21808655ef066151a0a3826364 Reviewed-on: https://gerrit.openafs.org/12268 Tested-by: BuildBot Reviewed-by: Joe Gorse Tested-by: Joe Gorse Reviewed-by: Benjamin Kaduk commit 2ef27ea1bb032cee8d26980e60e02b52a0805763 Author: Chaskiel Grundman Date: Thu May 5 12:35:08 2016 -0400 Linux 4.5: get_link instead of follow_link+put_link In linux commit 6b255391, the follow_link inode operation was replaced by the get_link operation, which is basically the same but takes the inode and dentry separately, allowing for the possibility of staying in RCU mode. For now, only support this if page_get_link is available and we are using the USABLE_KERNEL_PAGE_SYMLINK_CACHE The previous test for USABLE_KERNEL_PAGE_SYMLINK_CACHE used a bogus, undefined configure variable (ac_cv_linux_kernel_page_follow_link). Remove it, as it was not needed Change-Id: I2d7851d31dd4b1b944b16fad611addb804930eca Reviewed-on: https://gerrit.openafs.org/12265 Tested-by: BuildBot Reviewed-by: Joe Gorse Tested-by: Joe Gorse Reviewed-by: Benjamin Kaduk commit d9cfc1f3f5a75f1dbb14a56cd3da9db6b7a48065 Author: Benjamin Kaduk Date: Sun May 1 19:04:45 2016 -0400 Linux 4.5: no highmem in symlink ops Symlink bodies in the pagecache should not be in highmem, as upstream converted in commit 21fc61c73. Change-Id: I1e4c3c51308df096cdfa4d5e7b16279e275e7f41 Reviewed-on: https://gerrit.openafs.org/12264 Tested-by: BuildBot Reviewed-by: Joe Gorse Tested-by: Joe Gorse Reviewed-by: Benjamin Kaduk commit 49106a54993a0c9c64b407f05deaabe8f64e742d Author: Nathaniel Wesley Filardo Date: Fri Aug 1 02:48:21 2014 -0400 Use rxkad_crypt for inter-volser traffic, if asked Add a -s2scrypt option to the volume server, with possible options: * never -- the existing behavior * always -- switch to using afsconf_ClientAuthSecure, which uses rxkad_crypt, for ForwardVolume calls. * inherit -- encrypt inter-server traffic if the causal client connection is encrypted. This has the effect of "inheriting" the "-encrypt" flag given to "vos release", for example. Thanks to Jeffrey Altman for pointers and to Andrew Deason