BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D174
DTSTART;TZID=America/Chicago:20181116T092000
DTEND;TZID=America/Chicago:20181116T094000
UID:submissions.supercomputing.org_SC18_sess146_ws_ftxs104@linklings.com
SUMMARY:Improving Application Resilience by Extending Error Correction wit
 h Contextual Information
DESCRIPTION:Workshop\nResiliency, Scientific Computing, Workshop Reg Pass\
 n\nImproving Application Resilience by Extending Error Correction with Con
 textual Information\n\nPoulos, Wallace, Robey, Monroe, Job...\n\nExtreme-s
 cale systems are growing in scope and complexity as we approach exascale. 
 Uncorrectable faults in such systems are also increasing, so resilience ef
 forts addressing these are of great importance. In this paper, we extend a
  method that augments hardware error detection and correction (EDAC) conte
 xtually, and show an application-based approach that takes detectable unco
 rrectable (DUE) data errors and corrects them.\n\nWe applied this applicat
 ion-based method successfully to data errors found using common EDAC, and 
 discuss operating system changes that will make this possible on existing 
 systems. We show that even when there are many acceptable correction choic
 es (which may be seen in floating point), a large percentage of DUEs are c
 orrected, and even the miscorrected data are very close to correct. We dev
 eloped two different contextual criteria for this application: local avera
 ging and global conservation of mass. Both did well in terms of closeness,
  but conservation of mass outperformed averaging in terms of actual correc
 tness.\n\nThe contributions of this paper are: 1) the idea of application-
 specific EDAC-based contextual correction, 2) its demonstration with great
  success on a real application, 3) the development of two different contex
 tual criteria, and 4) a discussion of attainable changes to the OS kernel 
 that make this possible on a real system.
URL:https://sc18.supercomputing.org/presentation/?id=ws_ftxs104&sess=sess1
 46
END:VEVENT
END:VCALENDAR

