Presentation
Improving MPI Reduction Performance for Manycore Architectures with OpenMP and Data Compression
Author/Presenters
Event Type
Workshop
W
Benchmarks
Parallel Programming Languages, Libraries, and Models
Performance
Simulation
TimeMonday, November 12th9am - 9:30am
LocationD165
DescriptionMPI reductions are widely used in many scientific applications and often become the scaling performance bottleneck. When performing reductions on vectors, different algorithms have been developed to balance messaging overhead and bandwidth. However, most implementations have ignored the effect of single-thread performance not scaling as fast as aggregate network bandwidth. In this work, we propose, implement, and evaluate two approaches (threading and exploitation of sparsity) to accelerate MPI reductions on large vectors when running on manycore-based supercomputers. Our benchmark results show that our new techniques improve the MPI_Reduce performance up to 4x and improve BIGSTICK application performance by up to 2.6x.
Archive


