Which Error Correction Codes are Most Suitable for Industrial Storage?

Which Error Correction Codes are Most Suitable for Industrial Storage?

Why LDPC is not suitable for industrial flash storage

LDPC codes have become increasingly common in NAND flash storage devices such as SSDs. Here we consider their suitability for different classes of applications and show that there are better solutions for industrial storage.

How has the history of LDPC codes shaped their usage?

LDPC codes were originally developed in the 1960s but, because of the complexity, implementation was not practical until the late ‘90s. The initial use was in applications with noisy and error-prone communication channels, for example digital video satellite broadcasting.

What are the benefits of LDPC? 

The error correction performance of LDPC codes approaches the theoretical maximum, the Shannon limit, particularly with high input error rates. Very few other systems can achieve similar levels of performance.

What are the drawbacks of LDPC? 

However, LDPC is not as effective when error rates are lower. This makes it a very good match for applications where occasional errors are acceptable, but high performance is needed when error rates are high. For example, in digital video broadcasting a small number of uncorrected errors may result in a few pixels being the wrong color, which is probably acceptable. On the other hand, when there is a very high error rate in the input data it is important to ensure that the video stream is not lost completely, resulting in a blank screen. LDPC works well in both of these cases.

More recently, LDPC has been applied in flash memory to cope with the higher error rates seen in modern, high-density technologies such as triple-level cell (TLC) and quad-level cell (QLC) devices. The high error rates would seem to make LDPC codes an ideal solution.

However, the raw error rate of flash memory can change over the lifetime of the device. LDPC may give excellent results late in the life of the device, when raw error rates are highest, but can be less effective while the input error rates are lower.

This is particularly important for enterprise and industrial applications, where the widely-used JEDEC specification requires uncorrected bit error rates of better than 10-16 over the entire lifetime of the device.

Also, with LDPC the error correction performance for low to medium error rates can only be estimated, not precisely calculated. This makes it impossible to guarantee a specified level of performance.

What is GCC and why was it developed? 

Because of these problems with LDPC, Hyperstone worked with the University of Applied Science Konstanz to develop a different approach based on a generalized concatenated code (GCC). This does much better than LDPC for low to medium input error rates. It also allows the output error rate to be calculated so that performance guarantees can be made.

To make sure that this approach can be used over the lifetime of the device, a number of techniques are used to keep the raw error rate in the “low to medium” range where GCC is most effective. The Hyperstone flash controller uses calibration to ensure that the output levels from the memory cells are always in the optimal range, in order to minimize errors. This can be combined with other mechanisms such as read-retry and dynamic data refresh to further minimize the error rates from the flash memory. This results in error rates that are in the optimum range for GCC, throughout the lifetime of the device.

These techniques would have no benefit for LDPC-based systems as they would, instead, keep the input error rate in the range where LDPC was least effective.

What the conclusion? 

LDPC has become a popular solution for error correction in NAND flash storage. It provides a good solution for consumer-level devices based on TLC or QLC technology with potentially high error rates. However, this approach is not suitable for commercial and industrial applications where higher reliability levels must be guaranteed. Hyperstone address this by using GCC, a more appropriate error correcting code, along with calibration to maintain lower raw error rates. This allows us to guarantee conformance to performance standards.