BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:C141/143/149
DTSTART;TZID=America/Chicago:20181114T160000
DTEND;TZID=America/Chicago:20181114T163000
UID:submissions.supercomputing.org_SC18_sess215_pap464@linklings.com
SUMMARY:Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixe
 d-Precision Iterative Refinement Solvers
DESCRIPTION:Paper\nAlgorithms, Applications, Architectures, Compiler Analy
 sis and Optimization, Floating Point, Performance, Precision, Programming 
 Systems, Tools, Tech Program Reg Pass\n\nHarnessing GPU's Tensor Cores Fas
 t FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers\
 n\nHaidar, Tomov, Dongarra, Higham\n\nThe use of low-precision arithmetic 
 in computing methods has been a powerful tool to accelerate numerous scien
 tific computing applications including Artificial Intelligence. We present
  an investigation showing that other HPC applications can harness this pow
 er too, and in particular, the general HPC problem of solving Ax = b, wher
 e A is a large dense matrix, and the solution is needed in FP64 accuracy. 
 Our approach is based on the mixed-precision (FP16->FP64) iterative refine
 ment technique – we generalize and extend prior advances into a framework,
  for which we develop architecture-specific algorithms and highly-tuned im
 plementations where we show how the use of FP16-TC (tensor cores) arithmet
 ic can provide up to 4X speedup and improve the energy consumption by a fa
 ctor of 5 achieving 74 Gflop/Watt. This is due to the performance boost th
 at the FP16 (Tensor Cores) provide and to its better accuracy that outperf
 orms the classical FP16.
URL:https://sc18.supercomputing.org/presentation/?id=pap464&sess=sess215
END:VEVENT
END:VCALENDAR