BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160726Z
LOCATION:D166
DTSTART;TZID=America/Chicago:20181111T113000
DTEND;TZID=America/Chicago:20181111T120000
UID:submissions.supercomputing.org_SC18_sess174_ws_exampi101@linklings.com
SUMMARY:Optimal Algorithms for Half-Duplex Inter-Group All-to-All Broadcas
 t on Fully Connected and Ring Topologies
DESCRIPTION:Workshop\nExascale, MPI, Networks, System Software, Workshop R
 eg Pass\n\nOptimal Algorithms for Half-Duplex Inter-Group All-to-All Broad
 cast on Fully Connected and Ring Topologies\n\nKang, Choudhary, Agrawal, L
 iao\n\nHalf-duplex inter-group collective communications are bipartite mes
 sage transfer patterns such that the processes in a sender group pass mess
 ages to the processes in a receiver group. These communication patterns se
 rve as basic operations for scientific application workflows. In this pape
 r, we present optimal parallel algorithms for half-duplex inter-group all-
 to-all broadcast under bidirectional communication constraint on fully con
 nected and ring topologies. We implement the algorithms using MPI communic
 ation functions and perform experiments on Cori. For the fully connected t
 opology case, we compare our algorithms with production MPI libraries. For
  the ring topology case, we implement our proposed algorithms using MPI\_S
 endrecv function to emulate a ring topology environment. The proposed algo
 rithms are compared with the intra-group Allgather algorithm emulated unde
 r the same environment. Message sizes ranging from 32KB to 4MB are used fo
 r evaluations. The proposed algorithms for fully connected topology are up
  to 5 times faster than the root gathering algorithm adopted by MPICH. The
  proposed algorithms for the ring topology are up to 1.4 times faster than
  the intra-group Allgather algorithm.
URL:https://sc18.supercomputing.org/presentation/?id=ws_exampi101&sess=ses
 s174
END:VEVENT
END:VCALENDAR

