BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D174
DTSTART;TZID=America/Chicago:20181116T111000
DTEND;TZID=America/Chicago:20181116T113000
UID:submissions.supercomputing.org_SC18_sess146_ws_ftxs115@linklings.com
SUMMARY:CPU Overheating Characterization in HPC Systems: a Case Study
DESCRIPTION:Workshop\nResiliency, Scientific Computing, Workshop Reg Pass\
 n\nCPU Overheating Characterization in HPC Systems: a Case Study\n\nPlatin
 i, Ropars, Pelletier, De Palma\n\nWith the increase in size of supercomput
 ers, the number of abnormal events also increases. Some of these events mi
 ght lead to an application failure. Others might simply impact the system 
 efficiency. CPU overheating is one such event that decreases the system ef
 ficiency: when a CPU overheats, it reduces its frequency. This paper studi
 es the problem of CPU overheating in supercomputers. In a first part, we a
 nalyze data collected over one year on a supercomputer of the Top500 list 
 to understand under which conditions CPU overheating occurs. Our analysis 
 show that overheating events are due to some specific applications. In a s
 econd part, we evaluate the impact of such overheating events on the perfo
 rmance of MPI applications. Using 6 representative HPC benchmarks, we show
  that for a majority of the applications, a frequency drop on one CPU impa
 cts the execution time of distributed runs proportionally to the duration 
 and to the extent of the frequency drop.
URL:https://sc18.supercomputing.org/presentation/?id=ws_ftxs115&sess=sess1
 46
END:VEVENT
END:VCALENDAR

