BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160725Z
LOCATION:D175
DTSTART;TZID=America/Chicago:20181111T093600
DTEND;TZID=America/Chicago:20181111T100000
UID:submissions.supercomputing.org_SC18_sess155_ws_waccpd102@linklings.com
SUMMARY:Heterogeneous Programming and Optimization of Gyrokinetic Toroidal
  Code Using Directives
DESCRIPTION:Workshop\nAccelerators, Heterogeneous Systems, Parallel Progra
 mming Languages, Libraries, and Models, Workshop Reg Pass\n\nHeterogeneous
  Programming and Optimization of Gyrokinetic Toroidal Code Using Directive
 s\n\nZhang\n\nThe latest production version of the fusion particle simulat
 ion code, Gyrokinetic Toroidal Code (GTC), has been ported to and optimize
 d for the next generation exascale GPU supercomputing platform. Heterogene
 ous programming using directives has been utilized to fuse and thus balanc
 e the continuously implemented physical capabilities and rapidly evolving 
 software/hardware systems. The original code has been refactored to a set 
 of unified functions/calls to enable the acceleration for all the species 
 of particles. Binning and GPU texture caching technique have also been use
 d to boost the performance of the particle push and shift operations. In o
 rder to identify the hotspots, the GPU version of the GTC code was the fir
 st benchmarked on up to 8000 nodes of the Titan supercomputer, which shows
  about 2–3 times overall speedup comparing NVidia M2050 GPUs to Intel Xeon
  X5670 CPUs. This Phase I optimization was followed by further optimizatio
 ns in Phase II, where single-node tests show an overall speedup of about 3
 4 times on SummitDev and 7.9 times on Titan. The real physics tests on Sum
 mit machine showed impressive scaling properties that reaches roughly 50% 
 efficiency on 928 nodes of Summit. The GPU+CPU speed up from purely CPU is
  over 20 times, leading to an unparalleled speed.
URL:https://sc18.supercomputing.org/presentation/?id=ws_waccpd102&sess=ses
 s155
END:VEVENT
END:VCALENDAR

