Mitigating Silent Data Corruptions In Integer Matrix Products: Toward Reliable Multimedia Computing On Unreliable Hardware

Ijeoma Anarado (Corresponding Author), Mohammad Ashraful Anam, Fabio Verdicchio, Yiannis Andreopoulos

Research output: Contribution to journalArticle

3 Citations (Scopus)
6 Downloads (Pure)

Abstract

The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, machine learning and object recognition systems that process integer inputs. Therefore, it is of paramount importance to ensure that integer GEMM computations remain robust to silent data corruptions (SDCs), which stem from accidental voltage or frequency overscaling, or other hardware non-idealities. In this paper, we introduce a new method for SDC mitigation based on the concept of numerical packing. The key difference between our approach and all existing methods is the production of redundant results within the numerical representation of the outputs, rather than as a separate set of checksums. Importantly, unlike well-known algorithm-based fault tolerance (ABFT) approaches for GEMM, the proposed approach can reliably detect the locations of the vast majority of all possible SDCs in the results of GEMM computations. An experimental investigation of voltage-scaled integer GEMM computations for visual descriptor matching within state-of-theart image and video retrieval algorithms running on an Intel i7- 4578U 3GHz processor shows that SDC mitigation based on numerical packing leads to comparable or lower execution and energy-consumption overhead in comparison to all other alternatives.
Original languageEnglish
Pages (from-to)2476-2489
Number of pages14
JournalIEEE transactions on circuits and systems for video technology
Volume27
Issue number11
Early online date11 Jul 2016
DOIs
Publication statusPublished - Nov 2017

Keywords

  • integer matrix multiplication
  • dependable systems
  • fault tolerance
  • hardware
  • voltage scaling
  • Fault tolerant systems
  • Integrated circuit reliability
  • Error correction codes
  • Proposals

Fingerprint Dive into the research topics of 'Mitigating Silent Data Corruptions In Integer Matrix Products: Toward Reliable Multimedia Computing On Unreliable Hardware'. Together they form a unique fingerprint.

  • Cite this