ZFS Troubleshooting and Data Recovery
This chapter describes how to identify and recover from ZFS failure modes. Information for preventing failures is provided as well.
The following sections are provided in this chapter:
ZFS Failure Modes
As a combined file system and volume manager, ZFS can exhibit many different failure modes. This chapter begins by outlining the various failure modes, then discusses how to identify them on a running system. This chapter concludes by discussing how to repair the problems. ZFS can encounter three basic types of errors:
Note that a single pool can experience all three errors, so a complete repair procedure involves finding and correcting one error, proceeding to the next error, and so on.
Missing Devices in a ZFS Storage Pool
If a device is completely removed from the system, ZFS detects that
the device cannot be opened and places it in the
Depending on the data replication level of the pool, this might or might not
result in the entire pool becoming unavailable. If one disk in a mirrored
or RAID-Z device is removed, the pool continues to be accessible. If all components
of a mirror are removed, if more than one device in a RAID-Z device is removed,
or if a single-disk, top-level device is removed, the pool becomes
FAULTED. No data is accessible until the device is reattached.
Damaged Devices in a ZFS Storage Pool
The term “damaged” covers a wide variety of possible errors. Examples include the following errors:
Transient I/O errors due to a bad disk or controller
On-disk data corruption due to cosmic rays
Driver bugs resulting in data being transferred to or from the wrong location
Simply another user overwriting portions of the physical device by accident
In some cases, these errors are transient, such as a random I/O error while the controller is having problems. In other cases, the damage is permanent, such as on-disk corruption. Even still, whether the damage is permanent does not necessarily indicate that the error is likely to occur again. For example, if an administrator accidentally overwrites part of a disk, no type of hardware failure has occurred, and the device need not be replaced. Identifying exactly what went wrong with a device is not an easy task and is covered in more detail in a later section.
Corrupted ZFS Data
Data corruption occurs when one or more device errors (indicating missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data will be the result.
Data corruption is always permanent and requires special consideration during repair. Even if the underlying devices are repaired or replaced, the original data is lost forever. Most often this scenario requires restoring data from backups. Data errors are recorded as they are encountered, and can be controlled through routine disk scrubbing as explained in the following section. When a corrupted block is removed, the next scrubbing pass recognizes that the corruption is no longer present and removes any trace of the error from the system.
Checking ZFS Data Integrity
No fsck utility equivalent exists for ZFS. This utility has traditionally served two purposes, data repair and data validation.
With traditional file systems, the way in which data is written is inherently vulnerable to unexpected failure causing data inconsistencies. Because a traditional file system is not transactional, unreferenced blocks, bad link counts, or other inconsistent data structures are possible. The addition of journaling does solve some of these problems, but can introduce additional problems when the log cannot be rolled back. With ZFS, none of these problems exist. The only way for inconsistent data to exist on disk is through hardware failure (in which case the pool should have been redundant) or a bug in the ZFS software exists.
Given that the fsck utility is designed to repair known pathologies specific to individual file systems, writing such a utility for a file system with no known pathologies is impossible. Future experience might prove that certain data corruption problems are common enough and simple enough such that a repair utility can be developed, but these problems can always be avoided by using redundant pools.
If your pool is not redundant, the chance that data corruption can render some or all of your data inaccessible is always present.
In addition to data repair, the fsck utility validates that the data on disk has no problems. Traditionally, this task is done by unmounting the file system and running the fsck utility, possibly taking the system to single-user mode in the process. This scenario results in downtime that is proportional to the size of the file system being checked. Instead of requiring an explicit utility to perform the necessary checking, ZFS provides a mechanism to perform routine checking of all data. This functionality, known as scrubbing, is commonly used in memory and other systems as a method of detecting and preventing errors before they result in hardware or software failure.
Controlling ZFS Data Scrubbing
Whenever ZFS encounters an error, either through scrubbing or when accessing a file on demand, the error is logged internally so that you can get a quick overview of all known errors within the pool.
Explicit ZFS Data Scrubbing
The simplest way to check your data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the file system should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. For example:
# zpool scrub tank
The status of the current scrub can be displayed in the zpool status output. For example:
# zpool status -v tank pool: tank state: ONLINE scrub: scrub completed with 0 errors on Wed Aug 30 14:02:24 2006 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
Note that only one active scrubbing operation per pool can occur at one time.
You can stop a scrub that is in progress by using the
# zpool scrub -s tank
In most cases, a scrub operation to ensure data integrity should continue to completion. Stop a scrub at your own discretion if system performance is impacted by a scrub operation.
Performing routine scrubbing also guarantees continuous I/O to all disks on the system. Routine scrubbing has the side effect of preventing power management from placing idle disks in low-power mode. If the system is generally performing I/O all the time, or if power consumption is not a concern, then this issue can safely be ignored.
For more information about interpreting zpool status output, see Querying ZFS Storage Pool Status.
ZFS Data Scrubbing and Resilvering
When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can happen at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing, and restarts it after the resilvering is complete.
For more information about resilvering, see Viewing Resilvering Status.
Identifying Problems in ZFS
The following sections describe how to identify problems in your ZFS file systems or storage pools.
You can use the following features to identify problems with your ZFS configuration:
Detailed ZFS storage pool information with the zpool status command
Pool and device failures are reported with ZFS/FMA diagnostic messages
Previous ZFS commands that modified pool state information can be displayed with the zpool history command
Most ZFS troubleshooting is centered around the zpool status command. This command analyzes the various failures in the system and identifies the most severe problem, presenting you with a suggested action and a link to a knowledge article for more information. Note that the command only identifies a single problem with the pool, though multiple problems can exist. For example, data corruption errors always imply that one of the devices has failed. Replacing the failed device does not fix the data corruption problems.
In addition, a ZFS diagnostic engine is provided to diagnose and report pool failures and device failures. Checksum, I/O, device, and pool errors associated with pool or device failures are also reported. ZFS failures as reported by fmd are displayed on the console as well as the system messages file. In most cases, the fmd message directs you to the zpool status command for further recovery instructions.
The basic recovery process is as follows:
If appropriate, use the zpool history command to identify the previous ZFS commands that led up to the error scenario. For example:
# zpool history History for 'tank': 2007-04-25.10:19:42 zpool create tank mirror c0t8d0 c0t9d0 c0t10d0 2007-04-25.10:19:45 zfs create tank/erick 2007-04-25.10:19:55 zfs set checksum=off tank/erick
Notice in the above output that checksums are disabled for the tank/erick file system. This configuration is not recommended.
Identify the errors through the fmd messages that are displayed on the system console or in the /var/adm/messages files.
Find further repair instructions in the zpool status -x command.
Repair the failures, such as:
Replace the faulted or missing device and bring it online.
Restore the faulted configuration or corrupted data from a backup.
Verify the recovery by using the zpool status
Back up your restored configuration, if applicable.
This chapter describes how to interpret zpool status output in order to diagnose the type of failure and directs you to one of the following sections on how to repair the problem. While most of the work is performed automatically by the command, it is important to understand exactly what problems are being identified in order to diagnose the type of failure.
Determining if Problems Exist in a ZFS Storage Pool
The easiest way to determine if any known problems exist on the system
is to use the zpool status
This command describes only pools exhibiting problems. If no bad pools exist
on the system, then the command displays a simple message, as follows:
# zpool status -x all pools are healthy
-x flag, the command displays the complete
status for all pools (or the requested pool, if specified on the command line),
even if the pools are otherwise healthy.
For more information about command-line options to the zpool status command, see Querying ZFS Storage Pool Status.
Reviewing zpool status Output
The complete zpool status output looks similar to the following:
# zpool status tank pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 OFFLINE 0 0 0 errors: No known data errors
This output is divided into several sections:
Overall Pool Status Information
This header section in the zpool status output contains the following fields, some of which are only displayed for pools exhibiting problems:
The name of the pool.
The current health of the pool. This information refers only to the ability of the pool to provide the necessary replication level. Pools that are
ONLINEmight still have failing devices or data corruption.
A description of what is wrong with the pool. This field is omitted if no problems are found.
A recommended action for repairing the errors. This field is an abbreviated form directing the user to one of the following sections. This field is omitted if no problems are found.
A reference to a knowledge article containing detailed repair information. Online articles are updated more often than this guide can be updated, and should always be referenced for the most up-to-date repair procedures. This field is omitted if no problems are found.
Identifies the current status of a scrub operation, which might include the date and time that the last scrub was completed, a scrub in progress, or if no scrubbing was requested.
Identifies known data errors or the absence of known data errors.
config field in the zpool status output
describes the configuration layout of the devices comprising the pool, as
well as their state and any errors generated from the devices. The state can
be one of the following:
If the state is anything but
ONLINE, the fault tolerance
of the pool has been compromised.
The second section of the configuration output displays error statistics. These errors are divided into three categories:
READ– I/O error occurred while issuing a read request.
WRITE– I/O error occurred while issuing a write request.
CKSUM– Checksum error. The device returned corrupted data as the result of a read request.
These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the disk devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. If this scenario is the case, then ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.
For more information about interpreting these errors to determine device failure, see Determining the Type of Device Failure.
Finally, additional auxiliary information is displayed in the last column
of the zpool status output. This information expands on
state field, aiding in diagnosis of failure modes.
If a device is
FAULTED, this field indicates whether the
device is inaccessible or whether the data on the device is corrupted. If
the device is undergoing resilvering, this field displays the current progress.
For more information about monitoring resilvering progress, see Viewing Resilvering Status.
The third section of the zpool status output describes the current status of any explicit scrubs. This information is distinct from whether any errors are detected on the system, though this information can be used to determine the accuracy of the data corruption error reporting. If the last scrub ended recently, most likely, any known data corruption has been discovered.
For more information about data scrubbing and how to interpret this information, see Checking ZFS Data Integrity.
Data Corruption Errors
The zpool status command also shows whether any known errors are associated with the pool. These errors might have been found during disk scrubbing or during normal operation. ZFS maintains a persistent log of all data errors associated with the pool. This log is rotated whenever a complete scrub of the system finishes.
Data corruption errors are always fatal. Their presence indicates that
at least one application experienced an I/O error due to corrupt data within
the pool. Device errors within a redundant pool do not result in data corruption
and are not recorded as part of this log. By default, only the number of errors
found is displayed. A complete list of errors and their specifics can be found
by using the zpool status
-v option. For
# zpool status -v pool: tank state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scrub: resilver completed with 1 errors on Fri Mar 17 15:42:18 2006 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 1 mirror DEGRADED 0 0 1 c1t0d0 ONLINE 0 0 2 c1t1d0 UNAVAIL 0 0 0 corrupted data errors: The following persistent errors have been detected: DATASET OBJECT RANGE 5 0 lvl=4294967295 blkid=0
A similar message is also displayed by fmd on the system console and the /var/adm/messages file. These messages can also be tracked by using the fmdump command.
For more information about interpreting data corruption errors, see Identifying the Type of Data Corruption.
System Reporting of ZFS Error Messages
In addition to persistently keeping track of errors within the pool, ZFS also displays syslog messages when events of interest occur. The following scenarios generate events to notify the administrator:
Device state transition – If a device becomes
FAULTED, ZFS logs a message indicating that the fault tolerance of the pool might be compromised. A similar message is sent if the device is later brought online, restoring the pool to health.
Data corruption – If any data corruption is detected, ZFS logs a message describing when and where the corruption was detected. This message is only logged the first time it is detected. Subsequent accesses do not generate a message.
Pool failures and device failures – If a pool failure or device failure occurs, the fault manager daemon reports these errors through syslog messages as well as the fmdump command.
If ZFS detects a device error and automatically recovers from it, no notification occurs. Such errors do not constitute a failure in the pool redundancy or data integrity. Moreover, such errors are typically the result of a driver problem accompanied by its own set of error messages.
Repairing a Damaged ZFS Configuration
ZFS maintains a cache of active pools and their configuration on the root file system. If this file is corrupted or somehow becomes out of sync with what is stored on disk, the pool can no longer be opened. ZFS tries to avoid this situation, though arbitrary corruption is always possible given the qualities of the underlying file system and storage. This situation typically results in a pool disappearing from the system when it should otherwise be available. This situation can also manifest itself as a partial configuration that is missing an unknown number of top-level virtual devices. In either case, the configuration can be recovered by exporting the pool (if it is visible at all), and re-importing it.
For more information about importing and exporting pools, see Migrating ZFS Storage Pools.
Repairing a Missing Device
If a device cannot be opened, it displays as
the zpool status output. This status means that ZFS was
unable to open the device when the pool was first accessed, or the device
has since become unavailable. If the device causes a top-level virtual device
to be unavailable, then nothing in the pool can be accessed. Otherwise, the
fault tolerance of the pool might be compromised. In either case, the device
simply needs to be reattached to the system to restore normal operation.
For example, you might see a message similar to the following from fmd after a device failure:
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Thu Aug 31 11:40:59 MDT 2006 PLATFORM: SUNW,Sun-Blade-1000, CSN: -, HOSTNAME: tank SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: e11d8245-d76a-e152-80c6-e63763ed7e4e DESC: A ZFS device failed. Refer to http://illumos.org/msg/ZFS-8000-D3 for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device.
The next step is to use the zpool status
to view more detailed information about the device problem and the resolution.
# zpool status -x pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Thu Aug 31 11:45:59 MDT 2006 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c0t1d0 UNAVAIL 0 0 0 cannot open c1t1d0 ONLINE 0 0 0
You can see from this output that the missing device
not functioning. If you determine that the drive is faulty, replace the device.
Then, use the zpool online command to online the replaced device. For example:
# zpool online tank c0t1d0
Confirm that the pool with the replaced device is healthy.
# zpool status -x tank pool 'tank' is healthy
Physically Reattaching the Device
Exactly how a missing device is reattached depends on the device in question. If the device is a network-attached drive, connectivity should be restored. If the device is a USB or other removable media, it should be reattached to the system. If the device is a local disk, a controller might have failed such that the device is no longer visible to the system. In this case, the controller should be replaced at which point the disks will again be available. Other pathologies can exist and depend on the type of hardware and its configuration. If a drive fails and it is no longer visible to the system (an unlikely event), the device should be treated as a damaged device. Follow the procedures outlined in Repairing a Damaged Device.
Notifying ZFS of Device Availability
Once a device is reattached to the system, ZFS might or might not automatically detect its availability. If the pool was previously faulted, or the system was rebooted as part of the attach procedure, then ZFS automatically rescans all devices when it tries to open the pool. If the pool was degraded and the device was replaced while the system was up, you must notify ZFS that the device is now available and ready to be reopened by using the zpool online command. For example:
# zpool online tank c0t1d0
For more information about bringing devices online, see Bringing a Device Online.
Repairing a Damaged Device
This section describes how to determine device failure types, clear transient errors, and replace a device.
Determining the Type of Device Failure
The term damaged device is rather vague, and can describe a number of possible situations:
Bit rot – Over time, random events, such as magnetic influences and cosmic rays, can cause bits stored on disk to flip in unpredictable events. These events are relatively rare but common enough to cause potential data corruption in large or long-running systems. These errors are typically transient.
Misdirected reads or writes – Firmware bugs or hardware faults can cause reads or writes of entire blocks to reference the incorrect location on disk. These errors are typically transient, though a large number might indicate a faulty drive.
Administrator error – Administrators can unknowingly overwrite portions of the disk with bad data (such as copying /dev/zero over portions of the disk) that cause permanent corruption on disk. These errors are always transient.
Temporary outage– A disk might become unavailable for a period time, causing I/Os to fail. This situation is typically associated with network-attached devices, though local disks can experience temporary outages as well. These errors might or might not be transient.
Bad or flaky hardware – This situation is a catch-all for the various problems that bad hardware exhibits. This could be consistent I/O errors, faulty transports causing random corruption, or any number of failures. These errors are typically permanent.
Offlined device – If a device is offline, it is assumed that the administrator placed the device in this state because it is presumed faulty. The administrator who placed the device in this state can determine is this assumption is accurate.
Determining exactly what is wrong can be a difficult process. The first step is to examine the error counts in the zpool status output as follows:
# zpool status -v pool
The errors are divided into I/O errors and checksum errors, both of which might indicate the possible failure type. Typical operation predicts a very small number of errors (just a few over long periods of time). If you are seeing large numbers of errors, then this situation probably indicates impending or complete device failure. However, the pathology for administrator error can result in large error counts. The other source of information is the system log. If the log shows a large number of SCSI or fibre channel driver messages, then this situation probably indicates serious hardware problems. If no syslog messages are generated, then the damage is likely transient.
The goal is to answer the following question:
Is another error likely to occur on this device?
Errors that happen only once are considered transient, and do not indicate potential failure. Errors that are persistent or severe enough to indicate potential hardware failure are considered “fatal.” The act of determining the type of error is beyond the scope of any automated software currently available with ZFS, and so much must be done manually by you, the administrator. Once the determination is made, the appropriate action can be taken. Either clear the transient errors or replace the device due to fatal errors. These repair procedures are described in the next sections.
Even if the device errors are considered transient, it still may have caused uncorrectable data errors within the pool. These errors require special repair procedures, even if the underlying device is deemed healthy or otherwise repaired. For more information on repairing data errors, see Repairing Damaged Data.
Clearing Transient Errors
If the device errors are deemed transient, in that they are unlikely to effect the future health of the device, then the device errors can be safely cleared to indicate that no fatal error occurred. To clear error counters for RAID-Z or mirrored devices, use the zpool clear command. For example:
# zpool clear tank c1t0d0
This syntax clears any errors associated with the device and clears any data error counts associated with the device.
To clear all errors associated with the virtual devices in the pool, and clear any data error counts associated with the pool, use the following syntax:
# zpool clear tank
For more information about clearing pool errors, see Clearing Storage Pool Devices.
Replacing a Device in a ZFS Storage Pool
If device damage is permanent or future permanent damage is likely, the device must be replaced. Whether the device can be replaced depends on the configuration.
Determining if a Device Can Be Replaced
For a device to be replaced, the pool must be in the
The device must be part of a redundant configuration, or it must be healthy
ONLINE state). If the disk is part of a redundant
configuration, sufficient replicas from which to retrieve good data must exist.
If two disks in a four-way mirror are faulted, then either disk can be replaced
because healthy replicas are available. However, if two disks in a four-way
RAID-Z device are faulted, then neither disk can be replaced because not enough
replicas from which to retrieve data exist. If the device is damaged but otherwise
online, it can be replaced as long as the pool is not in the
However, any bad data on the device is copied to the new device unless there
are sufficient replicas with good data.
In the following configuration, the disk
be replaced, and any data in the pool is copied from the good replica,
mirror DEGRADED c1t0d0 ONLINE c1t1d0 FAULTED
c1t0d0 can also be replaced, though no self-healing
of data can take place because no good replica is available.
In the following configuration, neither of the faulted disks can be
ONLINE disks cannot be replaced either, because
the pool itself is faulted.
raidz FAULTED c1t0d0 ONLINE c2t0d0 FAULTED c3t0d0 FAULTED c3t0d0 ONLINE
In the following configuration, either top-level disk can be replaced, though any bad data present on the disk is copied to the new disk.
c1t0d0 ONLINE c1t1d0 ONLINE
If either disk were faulted, then no replacement could be performed because the pool itself would be faulted.
Devices That Cannot be Replaced
If the loss of a device causes the pool to become faulted, or the device contains too many data errors in an non-redundant configuration, then the device cannot safely be replaced. Without sufficient redundancy, no good data with which to heal the damaged device exists. In this case, the only option is to destroy the pool and re-create the configuration, restoring your data in the process.
For more information about restoring an entire pool, see Repairing ZFS Storage Pool-Wide Damage.
Replacing a Device in a ZFS Storage Pool
Once you have determined that a device can be replaced, use the zpool replace command to replace the device. If you are replacing the damaged device with another different device, use the following command:
# zpool replace tank c1t0d0 c2t0d0
This command begins migrating data to the new device from the damaged device, or other devices in the pool if it is in a redundant configuration. When the command is finished, it detaches the damaged device from the configuration, at which point the device can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command. For example:
# zpool replace tank c1t0d0
This command takes an unformatted disk, formats it appropriately, and then begins resilvering data from the rest of the configuration.
For more information about the zpool replace command, see Replacing Devices in a Storage Pool.
Viewing Resilvering Status
The process of replacing a drive can take an extended period of time, depending on the size of the drive and the amount of data in the pool. The process of moving data from one device to another device is known as resilvering, and can be monitored by using the zpool status command.
Traditional file systems resilver data at the block level. Because ZFS eliminates the artificial layering of the volume manager, it can perform resilvering in a much more powerful and controlled manner. The two main advantages of this feature are as follows:
ZFS only resilvers the minimum amount of necessary data. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes or seconds, rather than resilvering the entire disk, or complicating matters with “dirty region” logging that some volume managers support. When an entire disk is replaced, the resilvering process takes time proportional to the amount of data used on disk. Replacing a 500-Gbyte disk can take seconds if only a few gigabytes of used space is in the pool.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.
To view the resilvering process, use the zpool status command. For example:
# zpool status tank pool: tank state: DEGRADED reason: One or more devices is being resilvered. action: Wait for the resilvering process to complete. see: http://illumos.org/msg/ZFS-XXXX-08 scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing DEGRADED 0 0 0 52% resilvered c1t0d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0
In this example, the disk
c1t0d0 is being replaced
c2t0d0. This event is observed in the status output
by presence of the replacing virtual device in the configuration.
This device is not real, nor is it possible for you to create a pool by using
this virtual device type. The purpose of this device is solely to display
the resilvering process, and to identify exactly which device is being replaced.
Note that any pool currently undergoing resilvering is placed in the
DEGRADED state, because the pool cannot provide the desired level
of redundancy until the resilvering process is complete. Resilvering proceeds
as fast as possible, though the I/O is always scheduled with a lower priority
than user-requested I/O, to minimize impact on the system. Once the resilvering
is complete, the configuration reverts to the new, complete, configuration.
# zpool status tank pool: tank state: ONLINE scrub: scrub completed with 0 errors on Thu Aug 31 11:20:18 2006 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 errors: No known data errors
The pool is once again
ONLINE, and the original bad
c1t0d0) has been removed from the configuration.
Repairing Damaged Data
The following sections describe how to identify the type of data corruption and how to repair the data, if possible.
ZFS uses checksumming, redundancy, and self-healing data to minimize the chances of data corruption. Nonetheless, data corruption can occur if the pool isn't redundant, if corruption occurred while the pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: The data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted, and its relative value. Two basic types of data can be corrupted:
Pool metadata – ZFS requires a certain amount of data to be parsed to open a pool and access datasets. If this data is corrupted, the entire pool or complete portions of the dataset hierarchy will become unavailable.
Object data – In this case, the corruption is within a specific file or directory. This problem might result in a portion of the file or directory being inaccessible, or this problem might cause the object to be broken altogether.
Data is verified during normal operation as well as through scrubbing. For more information about how to verify the integrity of pool data, see Checking ZFS Data Integrity.
Identifying the Type of Data Corruption
By default, the zpool status command shows only that corruption has occurred, but not where this corruption occurred. For example:
# zpool status tank -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 1 0 0 mirror ONLINE 1 0 0 c2t0d0 ONLINE 2 0 0 c1t1d0 ONLINE 2 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE tank 6 0-512
# zpool status pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM monkey ONLINE 0 0 0 c1t1d0s6 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 errors: 8 data errors, use '-v' for a list
Each error indicates only that an error occurred at the given point in time. Each error is not necessarily still present on the system. Under normal circumstances, this situation is true. Certain temporary outages might result in data corruption that is automatically repaired once the outage ends. A complete scrub of the pool is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If you determine that the errors are no longer present, and you don't want to wait for a scrub to complete, reset all errors in the pool by using the zpool online command.
If the data corruption is in pool-wide metadata, the output is slightly different. For example:
# zpool status -v morpheus pool: morpheus id: 1422736890544688191 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. see: http://illumos.org/msg/ZFS-8000-72 config: morpheus FAULTED corrupted data c1t10d0 ONLINE
In the case of pool-wide corruption, the pool is placed into the
FAULTED state, because the pool cannot possibly provide the needed redundancy
Repairing a Corrupted File or Directory
If a file or directory is corrupted, the system might still be able to function depending on the type of corruption. Any damage is effectively unrecoverable if no good copies of the data exist anywhere on the system. If the data is valuable, you have no choice but to restore the affected data from backup. Even so, you might be able to recover from this corruption without restoring the entire pool.
If the damage is within a file data block, then the file can safely
be removed, thereby clearing the error from the system. Use the
-v command to display a list of filenames
with persistent errors. For example:
# zpool status -v pool: monkey state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM monkey ONLINE 0 0 0 c1t1d0s6 ONLINE 0 0 0 c1t1d0s7 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /monkey/a.txt /monkey/bananas/b.txt /monkey/sub/dir/d.txt /monkey/ghost/e.txt /monkey/ghost/boo/f.txt
The preceding output is described as follows:
If the full path to the file is found and the dataset is mounted, the full path to the file is displayed. For example:
If the full path to the file is found, but the dataset is not mounted, then the dataset name with no preceding slash (/), followed by the path within the dataset to the file, is displayed. For example:
If the object number to a file path cannot be successfully translated, either due to an error or because the object doesn't have a real file path associated with it , as is the case for a
dnode_t, then the dataset name followed by the object's number is displayed. For example:
If an object in the meta-object set (MOS) is corrupted, then a special tag of
<metadata>, followed by the object number, is displayed.
If the damage is within a file
data block, then the file can safely be removed, thereby clearing the error
from the system. The first step is to try to locate the file by using the find command and specify the object number that is identified in
the zpool status output under
as the inode number to find. For example:
# find -inum 6
Then, try removing the file with the rm command. If this command doesn't work, the corruption is within the file's metadata, and ZFS cannot determine which blocks belong to the file in order to remove the corruption.
If the corruption is within a directory or a file's metadata, the only choice is to move the file elsewhere. You can safely move any file or directory to a less convenient location, allowing the original object to be restored in place.
Repairing ZFS Storage Pool-Wide Damage
If the damage is in pool metadata that damage prevents the pool from
being opened, then you must restore the pool and all its data from backup.
The mechanism you use varies widely by the pool configuration and backup strategy.
First, save the configuration as displayed by zpool status so
that you can recreate it once the pool is destroyed. Then, use
-f to destroy the pool. Also, keep a file
describing the layout of the datasets and the various locally set properties
somewhere safe, as this information will become inaccessible if the pool is
ever rendered inaccessible. With the pool configuration and dataset layout,
you can reconstruct your complete configuration after destroying the pool.
The data can then be populated by using whatever backup or restoration strategy
Repairing an Unbootable System
ZFS is designed to be robust and stable despite errors. Even so, software bugs or certain unexpected pathologies might cause the system to panic when a pool is accessed. As part of the boot process, each pool must be opened, which means that such failures will cause a system to enter into a panic-reboot loop. In order to recover from this situation, ZFS must be informed not to look for any pools on startup.
ZFS maintains an internal cache of available pools and their configurations
in /etc/zfs/zpool.cache. The location and contents of
this file are private and are subject to change. If the system becomes unbootable,
boot to the
none milestone by using the
-m milestone=none boot option. Once the system is up, remount your root file system
as writable and then remove /etc/zfs/zpool.cache. These
actions cause ZFS to forget that any pools exist on the system, preventing
it from trying to access the bad pool causing the problem. You can then proceed
to a normal system state by issuing the svcadm milestone all command.
You can use a similar process when booting from an alternate root to perform
Once the system is up, you can attempt to import the pool by using the zpool import command. However, doing so will likely cause the same error that occurred during boot, because the command uses the same mechanism to access pools. If more than one pool is on the system and you want to import a specific pool without accessing any other pools, you must re-initialize the devices in the damaged pool, at which point you can safely import the good pool.