NAND Flash controllers – The key to endurance and reliability

NAND Flash controllers – The key to endurance and reliability

NAND Flash controllers – The key to endurance and reliability

The interface between a host system and Flash memory is handled by the Flash controller which has to manage usage, optimise performance, endurance, reliability and lifetime.

Flash memory has some specific characteristics that the controller needs to handle correctly to achieve these goals. Among these are the process for writing data, the limited number of program/erase cycles of Flash memory cells, and error handling.

Programming data: pages and blocks

The architecture of NAND Flash means that data can be read and programmed in pages, typically between 4 KB and 16 KB in size, but can only be erased at the level of entire blocks consisting of multiple pages and MB in size. The fact that a cells within pages and blocks need to be programmed and erased ‘in a flash’ is the source of the name for the technology.

When a block is erased all the cells are logically set to 1. Data can only be programmed in one pass to a page in a block that was erased. Any cells that have been set to 0 by programming can only be reset to 1 by erasing the entire block. This means that before new data can be programmed in to a page that already contains data, the current contents of the page plus the new data must be copied to a new, erased page. If a suitable page is available, the data can be written to it immediately. If no erased page is available, a block must be erased before copying the data to a page in that block. The old page is then marked as invalid and is available for erase and reuse.

The controller has to manage the process of choosing pages to use, keeping track of invalid pages that need to be erased and, when necessary, performing ‘garbage collection’ by consolidating pages of valid data into blocks in order to create empty blocks ready for erasing and reuse. With all this data movement, the controller needs to keep track of the mapping from the logical addresses used by the host to the physical location in the memory. The controller also needs to ensure data integrity if there is a power failure while data is being moved.

Limited program/erase cycles

The choice of which blocks and pages to use is further complicated by the limited number of program/erase cycles that Flash cells can undergo, caused by the physical characteristics and relatively high voltages used for these operations. To prevent this causing early failure of pages, the controller uses ‘wear levelling’ to ensure all the Flash blocks are used equally. This means selecting one of the least-used pages when deciding where to move data.

Error Correction

The effective use of Error Correction Coding (ECC) is critical to detecting errors when reading data, whether they are caused by wear, radiation or other disturb effects. Recent ECCs can correct over 100 bit errors within 1K Byte of user data. Yes, if you do the maths, this means that up to 1 out of 80 bits read is expected to be incorrect relating to the so called Raw Bit Error Rate (RBER)!  The required ECC strength depends on the quality of the used Flash technology and is specified by the Flash vendor.

Bad Block Management

If there are repeated failures in a block or a block fails to erase, it is marked as bad and not be used in future. Flash memories are built with excess spare blocks so they can cope with a certain number of bad blocks before the memory becomes unusable.

Health monitoring

Tracking the current status and expected lifetime of the Flash memory is important to avoid unexpected failures and data loss. As with hard disk drives, the standard self-monitoring, analysis and reporting technology (SMART) allows the controller to report the health of the Flash memory and provide early warnings of potential failures well before they occur. Therefore, a good Flash controller is key to enable Flash-based storage to achieve better levels of endurance, reliability and operating life compared to hard disk storage.