BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160903Z
LOCATION:C2/3/4 Ballroom
DTSTART;TZID=America/Chicago:20181115T083000
DTEND;TZID=America/Chicago:20181115T170000
UID:submissions.supercomputing.org_SC18_sess324_post135@linklings.com
SUMMARY:Multi-GPU Accelerated Non-Hydrostatic Numerical Ocean Model with G
 PUDirect RDMA Transfers
DESCRIPTION:Poster\nTech Program Reg Pass, Exhibits Reg Pass\n\nMulti-GPU 
 Accelerated Non-Hydrostatic Numerical Ocean Model with GPUDirect RDMA Tran
 sfers\n\nYamagishi, Matsumura, Hasumi\n\nWe have implemented our “kinaco” 
 numerical ocean model on Tokyo University’s Reedbush supercomputer, which 
 utilizes the latest Nvidia Pascal P100 GPUs with GPUDirect technology. We 
 have also optimized the model’s Poisson/Helmholtz solver by adjusting the 
 global memory alignment and thread block configuration, introducing shuffl
 e functions to accelerate the creation of coarse grids and merging small k
 ernels in the multigrid preconditioner. We also utilize GPUDirect RDMA tra
 nsfers to improve MPI communication efficiency. By exploiting the GPUs’ ca
 pabilities, the GPU implementation is now twice as fast as the CPU version
 , and it shows good weak scalability to multiple GPUs. Most of the GPU ker
 nels are accelerated, and the velocity diagnosis functions in particular a
 re now approximately seven times faster. The performance of inter-node dat
 a transfers using a CUDA-aware MPI library with GPUDirect RDMA transfers i
 s comparable to that on CPUs.
URL:https://sc18.supercomputing.org/presentation/?id=post135&sess=sess324
END:VEVENT
END:VCALENDAR

