ReTMiC: Reliability-Aware Thermal Management in Multicore Mixed-Criticality Embedded Systems

As the number of cores in multicore platforms increases, temperature constraints may prevent powering all cores simultaneously at maximum voltage and frequency level. Thermal hot spots and unbalanced temperatures between the processing cores may degrade the reliability. This paper introduces a reliability-aware thermal management scheduling (ReTMiC) method for mixed-criticality embedded systems. In this regard, ReTMiC meets Thermal Design Power as the chip-level power constraint at design time. In order to balance the temperature of the processing cores, our proposed method determines balancing points on each frame of the scheduling, and at run time, our proposed lightweight online re-mapping technique is activated at each determined balancing point for balancing the temperature of the processing cores. The online mechanism exploits the proposed temperature-aware factor to reduce the system’s temperature based on the current temperature of processing cores and the behavior of their corresponding running tasks. Our experimental results show that the ReTMiC method achieves up to 12.8°C reduction in the chip temperature and 3.5°C reduction in spatial thermal variation in comparison to the state-of-the-art techniques while keeping the system reliability at a required level.

View this article on IEEE Xplore