Why power fail management is an essential feature for industrial applications

Why power fail management is an essential feature for industrial applications

Why power fail management is an essential feature for industrial applications

The goal of power fail management is to avoid corrupted data and failing devices in situations of unexpected power loss, excessive voltage supply variances or hot unplugging. Since Sudden Power Failure (SPF) are by nature unexpected, they can occur during an array of procedures such as writing, reading, erasing, mapping updates and background firmware operations. Hyperstone NAND flash controllers utilize several algorithms to protect data in these instances and it is invaluable to understand these processes when designing a NAND flash based storage system.

First of all, the risk of a power failure needs to be understood as well as the bottleneck. A NAND flash is form of non-volatile memory which means that data can be stored even if no current is applied to the system. The controller, on the other hand is the bottleneck. Inseparable, NAND flash controllers manage the data being transferred onto the flash, and by design they contain volatile memory which can lose data in the event of a power failure. Data is processed in these volatile components before being stored on the flash. This means that if the data is being processed during a power failure, it might be lost. This can affect user data as well as management data, which is necessary to store the data properly and to find it again. Flash controller manufactures handle power fail robustness differently depending on how they value reliability, performance and system trade-offs.

In order to manage a power failure and minimize its potential damage, the flash controller needs to detect the event in advance. Hyperstone controllers contain internal voltage sensors, which monitor the current and external supply of voltage. If the power supply falls below a certain threshold, the firmware finishes the current running command and will immediately trigger the flash write protect. The flash write protect trigger sends a signal to the host application so not to send any further user data. Simultaneously, the controller writes and updates management data to better ensure that any lost or corrupted data can be restored.

In the event of a sudden power failure, Hyperstone controllers can minimize the damage on data that was supposed to be written onto the flash. This is because the log-book and the mapping data are always stored and updated on the flash to ensure that the data cannot be entirely lost. Even if data becomes corrupted through a power failure, it can be restored and corrected. It is especially important that the firmware data stays intact because if the firmware is damaged, the entire system can fail. Therefore, Hyperstone controllers have a redundant firmware, which guarantees that firmware it is stored twice on the flash. Thus, if the firmware is damaged, the backup firmware can be used to repair it.

Some applications require that user data must be stored on the flash before the system shuts down. In such a case, the host can prepare the controller for the power failure. This means that it has more time to carry out the necessary operations which guarantee a safe shut-down. In this instance the controller focusses purely on writing the user data onto the flash, which means that all other operations such as wear levelling or housekeeping operations become second priority. This allows the controller to reach the necessary write performance and carry out its data transfer as efficiently as possible.

In order to minimize the ramifications of a power failure within a NAND flash based application; there are many different aspects that need to be considered. The bottleneck is in itself the solution. Corrupted and lost data is a major issue in electronics today, especially in industrial applications, reliability is key. Hyperstone designs controllers with this in mind, targeting industrially embedded storage solutions with state of the art power fail robustness guaranteeing a safe system shutdown.