BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:D171/173
DTSTART;TZID=America/Chicago:20181116T103000
DTEND;TZID=America/Chicago:20181116T104500
UID:submissions.supercomputing.org_SC18_sess145_ws_p3hpc104@linklings.com
SUMMARY:Performance Portability of an Unstructured Hydrodynamics Mini-Appl
 ication
DESCRIPTION:Workshop\nHeterogeneous Systems, Performance, Workshop Reg Pas
 s\n\nPerformance Portability of an Unstructured Hydrodynamics Mini-Applica
 tion\n\nLaw, Kevis, Powell, Dickson, Maheswaran...\n\nIn this work we stud
 y the parallel performance portability of BookLeaf: a recent 2D unstructur
 ed hydrodynamics mini-application. The aim of BookLeaf is to provide a sel
 f-contained and representative testbed for exploration of the modern hydro
 dynamics application design-space.\n\nWe present a previously unpublished 
 reference C++11 implementation of BookLeaf parallelised with MPI, alongsid
 e hybrid MPI+OpenMP and MPI+CUDA versions, and two implementations using C
 ++11 performance portability frameworks: Kokkos and RAJA, which both targe
 t a variety of parallel back-ends. We assess the scalability of our implem
 entations on the ARCHER Cray XC30 up to 4096 nodes (98,304 cores) and on t
 he Ray EA system at Lawrence Livermore National Laboratory up to 16 nodes 
 (64 Tesla P100 GPUs), with a particular focus on the overheads introduced 
 by Kokkos and RAJA relative to our handwritten OpenMP and CUDA implementat
 ions. We quantify the performance portability achieved by our Kokkos and R
 AJA implementations across five modern architectures using a metric previo
 usly introduced by Pennycook et al.\n\nWe find that our BookLeaf implement
 ations all scale well, in particular the hybrid configurations (the MPI+Op
 enMP variant achieves a parallel efficiency above 0.8 running on 49,152 co
 res). The Kokkos and RAJA variants exhibit competitive performance in all 
 experiments, however their CPU performance is best in memory-bound situati
 ons where the overhead introduced by the frameworks is partially shadowed 
 by the need to wait for data. The overheads seen in the GPU experiments ar
 e extremely low. We observe overall performance portability scores of 0.92
 8 for Kokkos and 0.876 for RAJA.
URL:https://sc18.supercomputing.org/presentation/?id=ws_p3hpc104&sess=sess
 145
END:VEVENT
END:VCALENDAR

