« Retour

Rumley Sébastien

Professeur HES associé

Compétences principales

Energy Systems and Informatics

Professeur HES associé

Haute école d'ingénierie et d'architecture de Fribourg
Boulevard de Pérolles 80, 1700 Fribourg, CH

BSc HES-SO en Informatique - Haute école d'ingénierie et d'architecture de Fribourg

Génie logiciel
DevOps et Code Robuste
Infrastructures Distribuées
Optimisation Logicielle

2023

Peta-scale embedded photonics architecture for distributed deep learning applications

Article scientifique ArODES

Zhenguo Wu, Liang Yuan Dai, Asher Novick, Madeleine Glick, Ziyi Zhu, Sébastien Rumley, George Michelogiannakis, John Shalf, Keren Bergman

Journal of Lightwave Technology, 2023, vol. 41, 12, 3737-3749

Lien vers la publication

Résumé:

As Deep Learning (DL) models grow larger and more complex, training jobs are increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs. Each CU processes a sub-part of the model and synchronizes results with others. Communication among these CUs has emerged as a key bottleneck in the training process. In this work, we present SiPAC, a Silicon Photonic Accelerated Compute cluster. SiPAC accelerates distributed DL training by means of two co-designed components: a photonic physical layer and a novel collective algorithm. The physical layer exploits embedded photonics to bring peta-scale I/O directly to the CUs of a DL optimized cluster and uses resonator-based optical wavelength selectivity to realize hardware multi-casting. The collective algorithm builds on the hardware multi-casting primitive. This combination expedites a variety of collective communications commonly employed in DL training and has the potential to drastically ease the communication bottlenecks. We demonstrate the feasibility of realizing the SiPAC architecture through 1) an optical testbed experiment where an array of comb laser wavelengths are shuffled by a cascaded ring switch, with each ring selecting and forwarding multiple wavelengths to increase the effective communication bandwidth and hence demonstrating the hardware multicasting primitive, and 2) a four-GPU testbed running a realistic DL workload that achieves 22% system-level performance improvement relative to a similarly-sized leaf-spine topology. Large scale simulations show that SiPAC achieves a 1.4× to 5.9× communication time reduction compared to state-of-the-art compute clusters for representative collective communications.

2022

Performance trade-offs in reconfigurable networks for HPC

Article scientifique ArODES

Min Yee Teh, Zhenguo Wu, Madeleine Glick, Sébastien Rumley, Manya Ghobadi, Keren Bergman

Journal of Optical Communications and Networking, 2022, vol. 14, no. 6, pp. 454-468

Lien vers la publication

Résumé:

Designing efficient interconnects to support high-bandwidth and low-latency communication is critical toward realizing high performance computing (HPC) and data center (DC) systems in the exascale era. At extreme computing scales, providing the requisite bandwidth through overprovisioning becomes impractical. These challenges have motivated studies exploring reconfigurable network architectures that can adapt to traffic patterns at runtime using optical circuit switching. Despite the plethora of proposed architectures, surprisingly little is known about the relative performances and trade-offs among different reconfigurable network designs. We aim to bridge this gap by tackling two key issues in reconfigurable network design. First, we study how cost, power consumption, network performance, and scalability vary based on optical circuit switch (OCS) placement in the physical topology. Specifically, we consider two classes of reconfigurable architectures: one that places OCSs between top-of-rack (ToR) switches—ToR-reconfigurable networks (TRNs)—and one that places OCSs between pods of racks—pod-reconfigurable networks (PRNs). Second, we tackle the effects of reconfiguration frequency on network performance. Our results, based on network simulations driven by real HPC and DC workloads, show that while TRNs are optimized for low fan-out communication patterns, they are less suited for carrying high fan-out workloads. PRNs exhibit better overall trade-off, capable of performing comparably to a fully nonblocking fat tree for low fan-out workloads, and significantly outperform TRNs for high fan-out communication patterns.

2020

Evolving Requirements and Trends of HPC

Chapitre de livre

Rumley Sébastien, Keren Bergman, Seyedi M Ashkan, Marco Fiorentino

, Springer Handbook of Optical Networks. 2020, Cham : Springer

Lien vers la publication

Résumé:

High-performance computing (HPC) denotes the design, build or use of computing systems substantially larger than typical desktop or laptop computers, in order to solve problems that are unsolvable on these traditional machines. Today's largest high-performance computers, a.k.a. supercomputers, are all organized around several thousands of compute nodes, which are collectively leveraged to tackle heavy computational problems. This orchestrated operation is only possible if compute nodes are able to communicate among themselves with low latency and high bandwidth.

This chapter presents the evolution of HPC architectures of the years, with particular focus on the interconnection networks. The future role of emerging optical technologies in building these interconnects is also evoked

2018

Recent advances in optical technologies for data centers: a review

Article scientifique

Qixiang Cheng, Meisam Bahadori, Madeleine Glick, Rumley Sébastien, Keren Bergman

Optica, 2018 , vol. 5, no 11, pp. 1354-1370

Lien vers la publication

2023

Performance losses with virtualization :

Conférence ArODES

comparing bare metal to VMs and containers

Jonatan Baumgartner, Christophe Lillo, Sébastien Rumley

Proceedings of the ISC High Performance 2023 International Workshops

Lien vers la conférence

Résumé:

The use of virtualization technologies has become widespread with the advent of cloud computing. The purpose of this study is to quantify the performance losses caused by all kind of virtualization/containerization configurations. A benchmark suite consisting of tools that stress specific components and then four real applications commonly used in computing centers has been designed. A system to schedule the execution of these benchmarks and to collect the results has been developed. Finally, a procedure calling all the benchmark in a consistent and reproducible way either within a container or in a (virtual or not machine) has been implemented. These developments permitted then to compare bare metal with four hypervisors and two container runtimes as well as the mix of containers in the virtual machines. The results show that the performance differences vary greatly depending on the workload and the virtualization software used. When using the right virtualization software, the estimated the performance losses are around 5% for a container and 10% for a virtual machine. The combination of the two entails the addition of these losses to 15%. In the case of non-optimized software, a performance loss of up to 72% can be observed. We also observed that containers and virtual machines can over-perform bare-metal when it comes to file access. Overall we conclude that virtualization has become very mature and performance losses seems not to be a concern anymore.

Réalisations

PEOPLE@HES-SO
Annuaire et Répertoire des compétences

Rumley Sébastien

Professeur HES associé

Compétences principales

High Performance Computing

Energy Systems and Informatics

DevOps

Computer Architecture

Green IT

Génie Logiciel

IT Infrastructures

Professeur HES associé

PEOPLE@HES-SO Annuaire et Répertoire des compétences

Rumley Sébastien

Professeur HES associé

Compétences principales

High Performance Computing

Energy Systems and Informatics

DevOps

Computer Architecture

Green IT

Génie Logiciel

IT Infrastructures

Professeur HES associé

PEOPLE@HES-SO
Annuaire et Répertoire des compétences