Published on Sep 16, 2019
With the continuous decrease in the minimum feature size and increase in the chip density due to technology scaling, on-chip L2 caches are becoming increasingly susceptible to multi-bit soft errors.
The increase in multi-bit errors could lead to higher risk of data corruption and potentially result in the crashing of application programs. Traditionally, the L2 caches have been protected from soft errors using techniques such as:
1) error detection/correction codes;
2) physical interleaving of cache bit lines to convert multi-bit errors into single-bit errors; and
3) cache scrubbing.
In this paper, we investigate in detail the multi-bit soft error rates in large L2 caches and propose a framework of solutions for their correction based on the amount of redundancy present in the memory hierarchy. We investigate several new techniques for reducing multi-bit errors in large L2 caches, in which, the multi-bit errors are detected using simple error detection codes and corrected using the data redundancy in the memory hierarchy.
We also propose several techniques to control/mine the redundancy in the memory hierarchy to further improve the reliability of the L2 cache. The proposed techniques were implemented in the Simple scalar framework and validated using the SPEC 2000 integer and floating point benchmarks for L2 cache vulnerability, global cache miss-rate, average cycle count and main memory write back rate, considering the area and power overheads.
Experimental results indicate that the vulnerability of L2 caches can be decreased by 40% on the average for integer benchmarks and 32% on the average for floating point benchmarks, with an average multi-bit error coverage of about 96%, with significantly less area and power overheads and with virtually no performance penalty. The proposed techniques are applicable to both single and multi-core processor-based systems
In this proposed architecture area consumed by the architecture and power consumption becomes less. And t he multi-bit errors are detected using simple error detection codes and corrected using the data redundancy
Synthesis: Xilinx 9.1