2014-06 Elsevier Environmental Modelling & Software: Monitoring Power Data

Overview

Research paper: Hayk Shoukourian, Torsten Wilde, Axel Auweter, Arndt Bode: Monitoring Power Data: A first step towards a unified energy efficiency evaluation toolset for HPC data centers (e-print version attached)

published by Elsevier in: Environmental Modelling & Software (Thematic issue on Modelling and evaluating the sustainability of smart solutions), Volume 56, June 2014, Pages 13–26; DOI: http://dx.doi.org/10.1016/j.envsoft.2013.11.011


The energy consumption of High Performance Computing (HPC) systems, which are the key technology for many modern computation-intensive applications, is rapidly increasing in parallel with their performance improvements. This increase leads data centers to focus on three major challenges: the reduction of overall environmental impacts, which is driven by policy makers; the reduction of operating costs, which are increasing due to rising system density and electrical energy costs; and the 20 MW power consumption boundary for Exascale computing systems, which represent the next thousandfold increase in computing capability beyond the currently existing petascale systems. The improvement of the energy efficiency of data centers will play a major part in addressing these challenges.
In order to improve the energy efficiency in a data center the following items are necessary: a) collected data from all aspects of the data center (e.g. environmental information, site infrastructure, information technology systems, resource management systems, and applications); b) the ability to correlate data in order to better understand the interactions between different components of the data center, to assess the status of current key performance indicators (KPI), to identify improvement areas, and to verify the success of the optimization; and c) the provision of KPI information to data center operators and policy makers. Currently, there exists no tool that addresses item a), and subsequently also neither items b) nor c).
This paper will present a toolset, called Power Data Aggregation Monitor (PowerDAM), which will address item a) and is, therefore, the first step towards a truly unified energy efficiency toolset for data centers encompassing items a), b), and c). PowerDAM was specifically designed to collect data from all aspects of a data center and to be independent of speciffic vendor hardware or existing data center tools. It currently monitors two HPC systems, as well as their repective resource management systems (also called batch scheduling
systems), at the Leibniz Supercomputing Centre of the Bavarian Academy of Sciences (BAdW-LRZ). PowerDAM already covers some of the functionalities related to item b). For example, it can calculate and display the Energy-to-Solution (EtS) KPI for an application which has been run on those systems.
This paper starts with highlighting the energy efficiency improvement domains for HPC data centers, then turns to introducing the important design aspects of the developed PowerDAM toolset, and finally presents the current functionality of the toolset using an example application on two different HPC systems.

Keywords: energy consumption, PowerDAM, Energy-to-Solution (EtS), energy measurement, energy efficiency toolset, data center