Wednesday, August 12, 2009

Safe Removal of Active Disk Devices

Earlier this year, the Foundation sponsored Edward Tomasz Napierala to fix FreeBSD's #1 reported bug: a USB disk causing a panic when detached before unmounting. Edward describes the project as follows:

One of the long-standing problems encountered by FreeBSD users was the fact that the system could often crash after a mounted disk device - for example, a USB flash drive - was removed. This behavior was not only annoying, but also made a bad impression about the overall stability and robustness of the operating system.

The project was not about fixing one buggy driver, as it could seem at first glance. Fixing the problem involved changes in CAM (Common Access Method, FreeBSD SCSI subsystem), GEOM framework, Virtual Filesystem layer, and finally the UFS filesystem. (Ironically, there were no problems with the USB itself.) There were no big design changes of any sort; just an iterative process of finding a way to crash the system, tracking down the bug that was causing it, fixing it, and proceeding to the next one. Most of the fixes were backported to FreeBSD 7-STABLE and will appear in FreeBSD 7.2.

It is now possible to remove mounted devices - and to unmount them afterwards - without any user-unfriendly behavior, such as crashes. Also, the system became more robust in the presence of non-USB disk removal, such as SCSI or SATA drive detachment or failure.


  1. Great news!

    Is that going to appear in 7.2 or 7.3? Cause 7.2-RELEASE is out already.

  2. @flauran:

    "Most of the fixes were backported to FreeBSD 7-STABLE and will appear in FreeBSD 7.2."

    So it will be , of course, also in 7.3.

  3. Can you point me into the actual commits?

  4. "... fixing the problem involved making changes to CAM, SCSI, GEOM, VFS, UFS ... "

    Does this mean there are architectural issues and deficencies in FreeBSD, wherein fixing a 'umount' issue requires touching so many subsystems ?


  5. Hi Saifi:

    Call it less an "architectural deficiency" so much as a change in requirements. As with other UNIX systems, FreeBSD has its origins in micro-computer and server installations where hard disks were not run-time removable media. These historic computing environments (and even current server environments) are highly controlled, and hard disks are rarely accidentally removed without first unmounting file systems. Administrator-controlled removal of storage devices has been well-supported for years (certainly as long as I've been using FreeBSD, which is since about 1994), but at issue is the unexpected disappearance of in-use storage devices.

    Over the years, computers have moved down-market and become more portable, FreeBSD has grown support for the run-time adding and removing a variety of devices, from mice and keyboards to network devices and hard disks. Hardware platforms have also matured in that time: you may recall that in the 1990's (and even today), PS/2 keyboard ports don't really support runtime configuration changes, but the advent of USB has changed the playing field significantly.

    This Foundation-funded project continues tracks that trend by improving the gracefulness with which the unexpected disappearance of storage devices is handled. There are limits to how gracefully this can ever be handled: fundamentally, a storage device that was in active use has been removed without notice, which means that write-behind caches can't be flushed, programs actively using the device will be exposed to potential data loss, and so on.

    Nevertheless, making sure the kernel doesn't panic and instead reports I/O errors to applications was an important change that required a sweep of in-kernel locations where previously successful completion of I/O was a code invariant. Much of this code was written with graceful removal in mind, so in some ways this was a sweeping up of loose ends, albeit critical ones.

    Edward's have significantly improved the robustness of FreeBSD in the presence of unexpected failures -- be it a user removing a USB key without unmounting first, or a disk array mysteriously disappearing in a server environment. This is something I know all of our users, low-end and high-end, will appreciate greatly.