BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160731Z
LOCATION:C140/142
DTSTART;TZID=America/Chicago:20181114T143000
DTEND;TZID=America/Chicago:20181114T150000
UID:submissions.supercomputing.org_SC18_sess204_pap133@linklings.com
SUMMARY:High-Performance Dense Tucker Decomposition on GPU Clusters
DESCRIPTION:Paper\nAlgorithms, Applications, Computational Physics, Scient
 ific Computing, Tech Program Reg Pass\n\nHigh-Performance Dense Tucker Dec
 omposition on GPU Clusters\n\nChoi, Liu, Chakaravarthy\n\nThe Tucker decom
 position method is one of the most popular algorithms for analyzing and co
 mpressing data with multi-way relationship. Its execution time is typicall
 y dominated by dense matrix multiplication, which makes it well-suited for
  GPU acceleration. State-of-the-art distributed dense Tucker implementatio
 ns for CPU clusters adopt multi-dimensional partitioning that optimizes fo
 r storage and communication. This, however, leads to smaller matrix dimens
 ions that result in under-utilizing the GPU. \n\nIn this paper, we present
  our optimized implementation and performance analysis of dense Tucker dec
 omposition on a multi-GPU cluster. We propose three optimizations: a new p
 artitioning strategy that improves GPU performance, a new tensor matriciza
 tion layout that halves the number of communication/matricization steps, a
 nd a variation of the randomized SVD algorithm to overcome the eigenvalue 
 bottleneck that arises from the high speedups gained from GPU acceleration
 .  Our GPU implementation employing all three optimizations achieves up to
  11.8x speedup on 64 nodes over state-of-the-art TuckerMPI.
URL:https://sc18.supercomputing.org/presentation/?id=pap133&sess=sess204
END:VEVENT
END:VCALENDAR