BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160729Z
LOCATION:C141/143/149
DTSTART;TZID=America/Chicago:20181113T143000
DTEND;TZID=America/Chicago:20181113T150000
UID:submissions.supercomputing.org_SC18_sess209_pap421@linklings.com
SUMMARY:HPL and DGEMM Performance Variability on the Xeon Platinum 8160 Pr
 ocessor
DESCRIPTION:Paper\nOpenMP, Performance, Power, Tools, Tech Program Reg Pas
 s\n\nHPL and DGEMM Performance Variability on the Xeon Platinum 8160 Proce
 ssor\n\nMcCalpin\n\nDuring initial testing of a large cluster equipped wit
 h Xeon Platinum 8160 processors, we observed infrequent, but significant, 
 performance drops in HPL benchmark results. The variability was seen in bo
 th single node and multi-node runs, with approximately 0.4% of results mor
 e than 10% slower than the median. We were able to reproduce this behavior
  with a single-socket (24-core) DGEMM benchmark. Performance counter analy
 sis of several thousand DGEMM runs showed that increased DRAM read traffic
  is the primary driver of increased execution time. Increased DRAM traffic
  in this benchmark is primarily generated by dramatically elevated snoop f
 ilter evictions, which arise due to the interaction of high-order (physica
 l) address bits with the hash used to map addresses across the 24 coherenc
 e agents on the processor. These conflicts (and the associated performance
  variability) were effectively eliminated (for both DGEMM and HPL) by usin
 g 1 GiB large pages.
URL:https://sc18.supercomputing.org/presentation/?id=pap421&sess=sess209
END:VEVENT
END:VCALENDAR

