BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160903Z
LOCATION:C148
DTSTART;TZID=America/Chicago:20181111T083000
DTEND;TZID=America/Chicago:20181111T170000
UID:submissions.supercomputing.org_SC18_sess236_tut115@linklings.com
SUMMARY:Fault-Tolerance for High Performance and Distributed Computing: Th
 eory and Practice
DESCRIPTION:Tutorial\nResiliency, Tutorial Reg Pass\n\nFault-Tolerance for
  High Performance and Distributed Computing: Theory and Practice\n\nBosilc
 a, Bouteiller, Herault, Robert\n\nReliability is one of the major concerns
  when envisioning future exascale platforms. The International Exascale So
 ftware Project forecasts an increase in node performance and concurrency b
 y one or two orders of magnitude, which translates, even under the most op
 timistic perspectives, in a mechanical decrease of the mean time to interr
 uption of at least one order of magnitude. Because of this trend, platform
  providers, software implementors, and high-performance application users 
 who target capability runs on such machines cannot regard the occurrence o
 f interruption due to a failure as a rare dramatic event, but must conside
 r faults inevitable, and therefore design and develop software components 
 that have some form of fault-tolerance integrated at their core.\n\nIn thi
 s tutorial, we present a comprehensive survey on the techniques proposed t
 o deal with failures in high performance and distributed systems. At the e
 nd of the tutorial, each attendee will have a better understanding of the 
 fault tolerance premises and constraints, will know some of the available 
 techniques, and will be able to determine, integrate, and adapt the techni
 que which best suits their applications. In addition, the participants wil
 l learn how to employ existing fault tolerant infrastructure software to s
 upport more productive application development and deployment.
URL:https://sc18.supercomputing.org/presentation/?id=tut115&sess=sess236
END:VEVENT
END:VCALENDAR

