BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D174
DTSTART;TZID=America/Chicago:20181116T094000
DTEND;TZID=America/Chicago:20181116T100000
UID:submissions.supercomputing.org_SC18_sess146_ws_ftxs112@linklings.com
SUMMARY:A Comprehensive Informative Metric for Analyzing HPC System Status
  Using the LogSCAN Platform
DESCRIPTION:Workshop\nResiliency, Scientific Computing, Workshop Reg Pass\
 n\nA Comprehensive Informative Metric for Analyzing HPC System Status Usin
 g the LogSCAN Platform\n\nHui, Park, Engelmann\n\nLog processing by Spark 
 and Cassandra-based ANalytics (LogSCAN) is a newly developed analytical pl
 atform that provides flexible and scalable data gathering, transformation 
 and computation. One major challenge is to effectively summarize the statu
 s of a complex computer system, such as the Titan supercomputer at the Oak
  Ridge Leadership Computing Facility (OLCF). Although there is plenty of o
 perational and maintenance information collected and stored in real time, 
 which may yield insights about short- and long-term system status, it is d
 ifficult to present this information in a comprehensive form. In this work
 , we present system information entropy (SIE), a newly developed metric th
 at leverages the powers of traditional machine learning techniques and inf
 ormation theory. By compressing the multi-variant multi-dimensional event 
 information recorded during the operation of the targeted system into a si
 ngle time series of SIE, we demonstrate that the historical system status 
 can be sensitively represented concisely and comprehensively. Given a shar
 p indicator as SIE, we argue that follow-up analytics based on SIE will re
 veal in-depth knowledge about system status using other sophisticated appr
 oaches, such as pattern recognition in the temporal domain or causality an
 alysis incorporating extra independent metrics of the system.
URL:https://sc18.supercomputing.org/presentation/?id=ws_ftxs112&sess=sess1
 46
END:VEVENT
END:VCALENDAR

