BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:Booth 619
DTSTART;TZID=America/Chicago:20181112T190000
DTEND;TZID=America/Chicago:20181112T210000
UID:submissions.supercomputing.org_SC18_sess508_emt104@linklings.com
SUMMARY:A Cost-Effective Flexible System Optimized for DNN and ML
DESCRIPTION:Emerging Technologies\nTech Program Reg Pass, Exhibits Reg Pas
 s, Exhibits - Exhibit Hall Only Reg Pass\n\nA Cost-Effective Flexible Syst
 em Optimized for DNN and ML\n\nChung, Yeh, Yang, Pan\n\nHardware accelerat
 ors (e.g., GPU) are increasingly used for compute-intensive tasks (e.g., A
 I and HPC). When multiple accelerator and storage devices are present, dir
 ect data paths between the devices bypassing the host memory may be used (
 P2P). Current P2P provided by NVIDIA CUDA driver is limited to the NVIDIA 
 GPUs under the same PCIe root complex and only up to 9 GPUs allowed in the
  P2P communication. \n\nIn our design, we used a simplified architecture a
 s the basic building block. The new PCIe switch allows PCIe ID translation
  between different PCIe domains and customized routing. Together with the 
 PCIe Gen 4, the blocks can stack together to scale out. This design is esp
 ecially desired for the collective communications in DNN/ML and many HPC a
 pplications. Compared to other PCIe expansion enclosures, our design allow
 s a CPU card installed to make the system self-sufficient/operational.\n\
 nOn the system software side, our solution breaks the 9-GPU under the same
  PCIe root complex limit and is not limited to NVIDIA GPUs. For example, t
 he data can be transferred between NVMe storage and GPU memory directly.\n
 \nOverall the new design provides a more cost-effective, robust and flexib
 le solution that is optimized for DNN/ML and HPC applications.
URL:https://sc18.supercomputing.org/presentation/?id=emt104&sess=sess508
END:VEVENT
END:VCALENDAR