BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160906Z
LOCATION:C156
DTSTART;TZID=America/Chicago:20181111T083000
DTEND;TZID=America/Chicago:20181111T170000
UID:submissions.supercomputing.org_SC18_sess257_tut189@linklings.com
SUMMARY:Node-Level Performance Engineering
DESCRIPTION:Tutorial\nHeterogeneous Systems, Performance, Tutorial Reg Pas
 s\n\nNode-Level Performance Engineering\n\nHager, Wellein\n\nThe advent of
  multi- and manycore chips has led to a further opening of the gap between
  peak and application performance for many scientific codes. This trend is
  accelerating as we move from petascale to exascale. Paradoxically, bad no
 de-level performance helps to "efficiently" scale to massive parallelism, 
 but at the price of increased overall time to solution. If the user cares 
 about time to solution on any scale, optimal performance on the node level
  is often the key factor. We convey the architectural features of current 
 processor chips, multiprocessor nodes, and accelerators, as far as they ar
 e relevant for the practitioner. Peculiarities like SIMD vectorization, sh
 ared vs. separate caches, bandwidth bottlenecks, and ccNUMA characteristic
 s are introduced, and the influence of system topology and affinity on the
  performance of typical parallel programming constructs is demonstrated. P
 erformance engineering and performance patterns are suggested as powerful 
 tools that help the user understand the bottlenecks at hand and to assess 
 the impact of possible code optimizations. A cornerstone of these concepts
  is the roofline model, which is described in detail, including useful cas
 e studies, limits of its applicability, and possible refinements.
URL:https://sc18.supercomputing.org/presentation/?id=tut189&sess=sess257
END:VEVENT
END:VCALENDAR