BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D171/173
DTSTART;TZID=America/Chicago:20181116T110000
DTEND;TZID=America/Chicago:20181116T111500
UID:submissions.supercomputing.org_SC18_sess145_ws_p3hpc111@linklings.com
SUMMARY:Delivering Performance-Portable Stencil Computations on CPUs and G
 PUs Using Bricks
DESCRIPTION:Workshop\nHeterogeneous Systems, Performance, Workshop Reg Pas
 s\n\nDelivering Performance-Portable Stencil Computations on CPUs and GPUs
  Using Bricks\n\nZhao, Williams, Hall, Johansen\n\nAchieving high performa
 nce on stencil computations poses a number of challenges on modern archite
 ctures. The optimization strategy varies significantly across architecture
 s, types of stencils, and types of applications. The standard approach to 
 adapting stencil computations to different architectures, used by both com
 pilers and application programmers, is through the use of iteration space 
 tiling, whereby the data footprint of the computation and its computation 
 partitioning are adjusted to match the memory hierarchy and available para
 llelism of different platforms.  In this paper, we explore an alternative 
 performance portability strategy for stencils, a data layout library for s
 tencils called bricks, that adapts data footprint and parallelism through 
 fine-grained data blocking. Bricks are designed to exploit the inherent mu
 lti-dimensional spatial locality of stencils, facilitating improved code g
 eneration that can adapt to CPUs or GPUs, and reducing pressure on the mem
 ory system.  We demonstrate that bricks are performance-portable across CP
 U and GPU architectures and afford performance advantages over various til
 ing strategies, particularly for modern multi-stencil and high-order stenc
 il computations. For a range of stencil computations, we achieve high perf
 ormance on both the Intel Knights Landing (Xeon Phi) and Skylake (Xeon) CP
 Us as well as the Nvidia P100 (Pascal) GPU delivering up to a 5x speedup a
 gainst tiled code.
URL:https://sc18.supercomputing.org/presentation/?id=ws_p3hpc111&sess=sess
 145
END:VEVENT
END:VCALENDAR

