« Back

Zapater Sancho Marina

Professeure HES associée

Main skills

Professeure HES associée

Desktop: A21

Haute Ecole d'Ingénierie et de Gestion du Canton de Vaud
Route de Cheseaux 1, 1400 Yverdon-les-Bains, CH

Institute
ReDS - Institut Reconfigurable & Embedded Digital Systems

Marina Zapater is Associate Professor in the REDS Institute since 2020. She obtained a Master on Electronic Engineering and a Master on Telecommunication Engineering from Universitat Politècnica de Catalunya (UPC), Barcelona, Spain, in 2010. After obtaining a PhD fellowship from the International Program of Talent Recruitment from the Spanish government, she started her PhD in Universidad Politecnica de Madrid (UPM), Madrid, Spain. During her PhD, she did two research stays at the PeacLab group in Boston University, for a total duration of 9 months, where she worked on a project in collaboration with Oracle, Inc.

After obtaining her PhD in 2015, she became as Assistant Professor in Universidad Complutense de Madrid. In parallel, she started to collaborate with the Embedded Systems Laboratory (ESL) at the Swiss Federal Institute of Technology Lausanne (EPFL). In July 2016, she joined ESL at EPFL as a post-doctoral researcher, and she moved to Switzerland. In February 2020 she joined the REDS Institute at HEIG-VD. HES-SO as an Associate Professor.

Her research interests include the design and optimisation of novel complex heterogeneous architectures to increase their performance and energy efficiency. Within this framework, her works spans from embedded systems (edge and IoT) to high-performance computing processors, including servers and data centers, as well as the efficient execution of artificial intelligence and deep learning workloads. She applies her knowledge on edge-AI acceleration and edge-to-cloud inference to several application domains, including that of autonomous driving and biomedical applications.

In this domain, she has co-authored more than 80 papers in top conferences and journals, she has received over 1000 citations and has an h-index above 20. Since 2016, Marina Zapater has been Scientific Coordinator of 4 European H2020 projects and participates in 3 others. Since 2020 she has been PI in 4 projects with industrial partners (including Facebook, Intel and Huawei), as well as PI of one Innosuisse project and one project funded directly by HES-SO.

A full publication list can be found in GoogleScholar.

BSc en Informatique et systèmes de communication - Haute école d'Ingénierie et de Gestion du Canton de Vaud

Calcul numérique et accélération matérielle (CNM)
Intelligence Artificielle pour les Systèmes Autonòmes (IAA)
Architecture des Ordinateurs (ARO)

Ongoing

IGNITE ' Real-time processing and visualization of high-density EEG brain signals

AGP

Role: Co-applicant

Requérant(e)s: FR - EIA - Institut HumanTech

Financement: HES-SO Rectorat

Description du projet : Sleep studies (also called polysomnography) are well-known non-invasive painless, often overnight, tests that allow doctors to monitor patients while asleep to unveil problems related to their brain and body. Sleep studies are a powerful diagnosis technique for a wide range of sleep disorders, such as insomnia, sleep apnoea, narcolepsy, or sleepwalking. Sleep studies lasting 1-2 hours are often used to diagnose different types of epilepsy, since sleep disorders and epilepsy are often comorbid and intertwined in a complex way, greatly hindering diagnosis. Regardless of the application, sleep studies monitor brain activity by using standard Electroencephalogram (EEG) measuring the electrical activity of brain cells using small electrodes attached to the scalp. Because brain cells keep producing electrical currents even during sleep, EEG aims at capturing changes in the pattern of waveforms. Such changes are specific to different brain areas. Therefore, to capture global brain activity, the EEG electrodes must cover the whole head. The higher the density of the EEG electrodes placement, the higher the resolution of the brain electric activity. Therefore, nowadays, sleep studies employ high density EEGs (hdEEG) using up to 256 channels. The algorithms are required to pre-process, filter, and analyse hdEEG signals are complex and computationally heavy, as they require feature extraction in the time and frequency domains, thus requiring at least 256-channels Fourier Transforms. Real-time analysis and visualization of hdEEG remains as of today a challenge. Therefore, EEG sleep data are usually recorded together with video of the patients asleep and analysed by doctor or clinician offline and oftentimes manually. However, clinicians will only analyse a set of channels, which greatly diminishes the benefits and purpose of hdEEG and overlooks the global view. The goal of this project is to tackle this challenge by developing a tool for real-time processing and visualization of hdEEG. We aim at providing clinicians with the possibility to observe important for sleep studies events such as spindles, K-complexes, and slow waves in real-time as well as fluctuation of specific EEG signal features, such as spectral power, entropy, spectral edge frequency and other, associated with electrode location and thus mapped to a scalp. This will not only allow much faster evaluation of sleep related disorders, but also allow a more global and comprehensive view of brain activity, thus opening new avenues of research in much broader domains of applications. To accomplish this goal, the project puts together expertise in two different domains: 1. The proposal of algorithms for noise and artifact elimination from EEG signals, features extraction and detection of different sleep related events using signal processing techniques and machine learning algorithms will be caried out by the group of prof. Alena Simalatsar, the PI of this project, in HES-SO Valais, Sion. 2. The development hardware and software acceleration techniques by means of using heterogeneous systems composing of CPUs, GPUs and/or FPGAs and adequate workload management and partitioning from the edge (EEG machine) to the server to enable real-time processing and visualization of hdEEG without quality loss will be performed by the group of prof. Marina Zapater, in HEIG-VD, Yverdon-les-Bains.

Research team within HES-SO: Hennebert Jean , Brunet Yorick , Petraglio Enrico , Chacun Guillaume , Simalatsar Alena , Da Rocha Carvalho Bruno , Extrat Bastien , Zapater Sancho Marina , Maillard Philippe

Partenaires académiques: ReDS; FR - EIA - Institut HumanTech

Durée du projet: 01.05.2025 - 31.07.2026

Montant global du projet: 220'000 CHF

Statut: Ongoing

C4liTwin: Machine-Learning self-calibration and modelling of robotic arms using Digital Twins

AGP

Role: Main Applicant

Financement: Innosuisse; Trimos

Description du projet : The C4liTwin projet (pronounced "calitwin") brings together machine learning and the virtual twin technology to develop the framework and algorithms required to self-calibrate the C4 robotic arms of Trimos by creating a digital twin to model the arm behaviour and enable automatic re-calibration.

Research team within HES-SO: Brunet Yorick , Chacun Guillaume , Extrat Bastien , Zapater Sancho Marina , Dieperink Clément , Akeddar Mehdi

Partenaires académiques: ReDS; Zapater Sancho Marina, ReDS

Durée du projet: 15.11.2023 - 30.06.2026

Montant global du projet: 398'080 CHF

Statut: Ongoing

Midgard: Virtual Memory for Post-Moore Servers

Role: Partner

Requérant(e)s: EPFL

Description du projet :

The goal of this project is to support the Midgard technology in the Gem5 simulator. Midgard proposes to rethink the overall virtual memory technology of current servers by exposing a global but sparse intermediate address space in a coherent cache hierarchy. Midgard eliminates TLBs by offering a direct translation in hardware from existing Operating System (OS) Virtual Memory (VM) software abstractions (called VMAs) and performs page-level translations only when accessing physical memory or I/O. As such, in contrast to state-of-the-art page-based VM, Midgard's overall address translation overhead decreases with an increase in cache hierarchy capacity. By eliminating the need for deep TLB hierarchies, Midgard not only reclaims the TLB silicon provisioning, but also offers orders of magnitude faster address translation, shootdown and access control creation/revocation as compared to conventional page-based VM.

Prof. Zapater is affiliated faculty in the Midgard project, a joint project between EPFL, Yale University and the University of Edinburgh. Within this project she collaborates with EPFL in providing simulation support for Midgard in gem5.

Firstly, because Midgard requires the use of multi-level TLBs, to simulate Midgard on gem5 we need to setup all the simulation environment in the last Gem5 version, namely gem5-22. However, the gem5-X simulator developed at EPFL only provides support for older (2019) gem5 versions, which do not include the adequate support for Midgard. Therefore, within this project we plan to perform the necessary developments required to port the main features of gem5-X to gem5-22, creating a new release named gem5-X-22, where all Midgard support will be released.

Secondly, we will create the necessary models in gem5 to enable simulating Midgard at the hardware and architectural levels. We will also develop the necessary framework to enable the creation of cache-coherent Midgard-compliant accelerators. We will do so by supporting the enhancement of the ALPINE simulation framework, adequately integrating it into gem5.

Finally, to showcase and fully simulate Midgard, we will focus on having a proof-of-concept Midgard-compliant OS implementation is SO3, running in gem5. SO3 is a Linux-based simple operating system user for teaching and research at the REDS institute. It provides all base functionalities of a Linux kernel while being lightweight and easy to modify. The goal will be providing Midgard support in SO3, while maintaining compatibility with Linux-based systems, to allow advances in the proposal of cache-coherent accelerators.

Research team within HES-SO: Zapater Sancho Marina

Partenaires académiques: EPFL

Durée du projet: 01.10.2022 - 31.12.2024

Url of the project site: https://midgard.epfl.ch

Statut: Ongoing

ECO4AI - efficient Edge-to-Cloud workload allOcation for Artificial Intelligence applications

Role: Main Applicant

Financement: HES-SO - Appel à projets jeunes chercheurs

Description du projet :

The main goal of ECO4AI is to propose workload allocation techniques that efficiently distribute workload between the edge and the cloud in a transparent way for AI-based IoT applications, allowing an efficiency increase (in terms of performance per watt). This will be accomplished by exploiting the underlying heterogeneous capabilities of novel edge and cloud architectures, and by proposing elastic edge-cloud resource allocation and management techniques.

The project exploits open hardware architectures such as RISC-V and propose a hardware/software ecosystem that will be released open-source, increasing visibility and impact.

ECO4AI will focus on three different use cases that play a key role today in the AI-based IoT scenario: (1) Video surveillance and object tracking; (2) autonomous driving, which represents an important opportunity for edge-edge and edge-cloud cooperation and (3) e-Health, and more specifically bio-signal monitoring for cardiac diseases and epileptic seizures

Research team within HES-SO: Zapater Sancho Marina

Durée du projet: 01.01.2022 - 30.06.2023

Montant global du projet: 100'000 CHF

Statut: Ongoing

Completed

An Edge-to-Cloud platform for semi-supervised multi-source data labelling to enable horse health diagnosis.

AGP

Role: Main Applicant

Financement: HES-SO Rectorat; Alogo Analysis SA

Description du projet : In this project, the main goal is to tackle the objectives (1) and (2) presented above. For this purpose, we plan to develop an edge-to-cloud server platform enabling the collection, synchronization, and labeling of real-time accelerometry data from Alogo Move Pro as well as video data captured by a high-resolution smartphone camera. The accelerometry and video data will be synchronized, and the video will be pre-processed via machine learning techniques in order to adequately crop, refocus and resize the images of horses, as well as to automatically select regions of interest (running, jumping, etc.). An interactive Graphic User Interface (GUI) developed in collaboration with Alogo will be created to enable fast (yet accurate) data labeling, enabling horse experts and veterinaries to label and assess horse condition. The overarching goal will be creating a platform and interactive visualization and labelling tool enabling the creation of a dataset that will be used in future projects to develop AI algorithms able to diagnose horse condition.

Research team within HES-SO: Convers Anthony , Chacun Guillaume , Zapater Sancho Marina , Akeddar Mehdi

Partenaires académiques: ReDS; Zapater Sancho Marina, ReDS

Durée du projet: 01.05.2024 - 30.04.2025

Montant global du projet: 55'000 CHF

Statut: Completed

C4libre: Machine-Learning selfcalibration of the C4 robotic arm

AGP

Role: Main Applicant

Financement: Innosuisse

Description du projet : The main goal is to propose novel machine learning based algorithms to enable selfcalibration of robotic coordinate measuring machine (CMMs). CMMs are micrometer-accurate robotic arms capable of providing high-precision measurements of industrial component parts. Calibration is a mandatory step that remains as of today very time-consuming and costly. In the specific case of C4 robotic arm of Trimos, the calibration process must ensure achieving a maximum measurement error below the 8um threshold for the overall volume under measure, while automatizing most of the process to reduce human-related costs. The calibration algorithms proposed in this project imply a significant departure from current techniques. We will build on formal mathematical models that describe the arm behavior (using trigonometry) and propose evolutionary techniques to tune the static and dynamic corrections and attain sub-8um errors. Evolutionary techniques (genetic algorithms and genetic programming) have great potential in situations where we know the underlying governing physical/mathematical laws of the system, but the dynamics are too complex to be described formally and we benefit from a data-driven approach. Moreover, we know that the placement of artifacts required for calibration plays an important role in achieving a uniform error across the overall volume under test. Therefore we also plan to analyze the error distribution to understand the impact of artifact placement on error.

Research team within HES-SO: Extrat Bastien , Zapater Sancho Marina , Akeddar Mehdi

Partenaires académiques: ReDS

Durée du projet: 29.03.2023 - 28.09.2023

Montant global du projet: 15'000 CHF

Statut: Completed

2026

SigmaQuant :

Scientific paper ArODES

hardware–aware heterogeneous quantization method for edge DNN inference

Pengbo Yu, Marina Zapater, David Atienza

IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2026, 1-14

Link to the publication

Summary:

Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces SigmaQuant, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search. SigmaQuant allocates layer-wise bitwidths based on weight standard deviation and KL divergence, enabling adaptive quantization under real hardware constraints. This strategy efficiently balances accuracy, memory, and latency without the use of exhaustive search. We validate its practicality through ASIC integration in a shift-add-based accelerator, analyzing power, performance, and area (PPA) tradeoffs for the resulting mixed-precision models. Experimentals on CIFAR-100 and ImageNet show that SigmaQuant consistently outperforms both uniform and state-of-the-art heterogeneous quantization. At an equal size model, it achieves up to 2.0% higher accuracy; at an equal accuracy, it reduces memory by up to 40.0%. Hardware evaluation demonstrates up to 22.3% area savings and 20.6% less energy cost compared to the widely used INT8 quantization and implementation, with slight latency overhead and comparable accuracy. These results confirm the effectiveness of SigmaQuant for edge AI deployment.

2025

SideDRAM :

Scientific paper ArODES

integrating SoftSIMD datapaths near DRAM banks for energy-efficient variable precision computation

Rafael Medina Morillas, Pengbo Yu, Dwaipayan Biswas, Marina Zapater, Giovanni Ansaloni, Francky Catthoor, David Atienza

ACM Transactions on Embedded Computing Systems, 2025, 24, 5s, 1-24, 111

Link to the publication

Summary:

By interfacing computing logic directly to the DRAM banks, bank-level Compute-near-Memory (CnM) architectures promise to mitigate the bottleneck at the memory interconnect. While this computation paradigm heavily reduces the energy requirements for data movement across the system, current solutions fail to co-optimize hardware and software to further increase efficiency. Instead, in this manuscript we present SideDRAM, a co-designed bank-level CnM architecture to enable massively parallel and energy-efficient computations near DRAM. In contrast with past solutions, we support flexible data typing and heterogeneous quantization, relying on the robustness of workloads to employ small bitwidths, and enable a row-wide access to the banks to exploit parallelism and spatial locality. As a result, SideDRAM integrates (1) software-defined SIMD (SoftSIMD) datapaths, supporting low-energy computing with flexible precision, (2) an interface to the banks based on very wide registers (VWRs), enabling asymmetric data access to both utilize the full DRAM bank bandwidth and leverage data locality at the datapath, and (3) a low-overhead distributed control plane, allowing the efficient handling of variable data typing. We benchmark SideDRAM as a near-DRAM solution by analyzing the area, performance and energy consumption of an HBM2 CnM channel executing heterogeneously quantized machine learning models. The results show that, compared to the state-of-the-art FIMDRAM design, energy improvements of up to 67% are achieved when a DeiT-S inference is executed with a batch size of 16 under the same area constraints, resulting in energy-delay-area product (EDAP) savings that reach 83%. When comparing to a massively parallel mixed-signal CnM solution, SideDRAM consistently obtains similar performance and better energy efficiency results (geomean of 15 × improvement across workloads) at a lower area overhead.

Towards accurate RISC-V full system simulation via component-level calibration

Scientific paper ArODES

Karan Pathak, Joshua Klein, Giovanni Ansaloni, Said Hamdioui, Georgi Gaydadjiev, Marina Zapater, David Atienza

ACM Transactions on Embedded Computing Systems, 2025, 24, 4, 57, 1-19

Link to the publication

Summary:

Full-System (FS) simulation is essential for performance evaluation of complete systems that execute complex applications on a complete software stack consisting of an operating system and user applications. Nevertheless, they require careful fine-tuning against real hardware to obtain reliable performance statistics, which can become tedious, error-prone, and time-consuming with typical trial-and-error approaches. We propose a novel, streamlined, component-level calibration methodology to address these shortcomings to validate FS simulation models. Our methodology greatly accelerates the validation process without sacrificing accuracy. It is Instruction Set Architecture (ISA)-agnostic, and can tackle hardware specifications at different levels of detail. We demonstrate its effectiveness by validating FS models against both open-hardware and IP-protected (closed hardware) RISC-V silicon, achieving a mean error of 19-23% for the SPEC CPU2017 suite in the two cases. We introduce the first open-source RISC-V-based FS-validated simulation models with a complete and replicable methodology.

LIONHEART :

Scientific paper ArODES

a layer-based mapping framework for heterogeneous systems with analog in-memory computing tiles

Corey Lammie, Yuxuan Wang, Flavio Ponzina, Joshua Klein, Hadjer Benmeziane, Marina Zapater Sancho, Irem Boybat, Abu Sebastian, Giovanni Ansaloni, David Atienza

IEEE Transactions on Emerging Topics in Computing, 2025, vol. 13, no 4, 1383 - 1395

Link to the publication

Summary:

When arranged in a crossbar configuration, resistive memory devices can be used to execute Matrix-Vector Multiplications (MVMs), the most dominant operation of many Machine Learning (ML) algorithms, in constant time complexity. Nonetheless, when performing computations in the analog domain, novel challenges are introduced in terms of arithmetic precision and stochasticity, due to non-ideal circuit and device behaviour. Moreover, these non-idealities have a temporal dimension, resulting in a degrading application accuracy over time. Facing these challenges, we propose a novel framework, named LionHeart, to obtain hybrid analog-digital mappings to execute Deep Learning (DL) inference workloads using heterogeneous accelerators. The accuracy-constrained mappings derived by LionHeart showcase, across different Convolutional Neural Networks (CNNs) and one transformer-based network, high accuracy and potential for speedup. The results of the full system simulations highlight runtime reductions and energy efficiency gains that exceed 6×, with a user-defined accuracy threshold for a fully digital floating point implementation.

2024

Bank on compute-near-memory :

Scientific paper ArODES

design space exploration of processing-near-bank architectures

Rafael Medina, Giovanni Ansaloni, Marina Zapater, Saeideh Alinezhad Chamazcoti, Timon Evenblij, Dwaipayan Biswas, Francky Catthoor, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43, 11, 4117-4129

Link to the publication

Summary:

Near-DRAM computing strategies advocate for providing computational capabilities close to where data is stored. Although this paradigm can effectively address the memory-to-processor communication bottleneck, it also presents new challenges: The strict resource constraints in the memory periphery demand careful tailoring of architectural elements. We herein propose a novel framework and methodology to explore compute-near-memory designs that interface to DRAM memory banks, demonstrating the area, energy, and performance tradeoffs subject to the architectural configuration. We exemplify this methodology by conducting two studies on compute-near-bank designs: 1) analyzing the interaction between control and data resources, and 2) exploring the integration of processing units with different DRAM standards. According to our study, the optimal size ratios between instruction and data capacity vary from 2× to 4× across benchmarks from representative application domains. The retrieved Pareto-optimal solutions from our framework improve state-of-the-art designs, e.g., achieving a 50% performance increase on matrix operations with 15% energy overhead relative to the FIMDRAM design. In addition, the exploration of DRAM shows the interplay between available internal bandwidth, performance, and area overhead. For example, a threefold increase in bandwidth rises performance by 47% across workloads at a 34% extra area cost.

Which coupled is best coupled ? An exploration of AIMC tile interfaces and load balancing for CNNs

Scientific paper ArODES

Joshua Klein, Irem Boybat, Giovanni Ansaloni, Marina Zapater, David Atienza

IEEE Transactions on Parallel and Distributed Systems, 2024, 35, 10, 1780-1795

Link to the publication

Summary:

Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.

Intermediate address space :

Scientific paper ArODES

virtual memory optimization of heterogeneous architectures for cache-resident workloads

Qunyou LIu, Darong Huang, Luis Costero, Marina Zapater, David Atienza

ACM Transactions on Architecture and Code Optimization, 2024, 21, 3, n°50, p. 1-23

Link to the publication

CloudProphet :

Scientific paper ArODES

a machine learning-based performance prediction for public clouds

Darong Huang, Luis Costero, Ali Pahlevan, Marina Zapater, David Atienza

IEEE Transactions on Sustainable Computing, 2024, 9, 4, 661-676

Link to the publication

Summary:

Computing servers have played a key role in developing and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the computing servers while alleviating the performance inference between VMs. However, these management techniques require accurate performance prediction of the application running inside the VM, which is challenging to get in the public cloud due to the black-box nature of the VMs. From this perspective, this paper proposes a novel machine learning-based performance prediction approach for applications running in the cloud. To achieve high-accuracy predictions for black-box VMs, the proposed method first identifies the running application inside the virtual machine. It then selects highly correlated runtime metrics as the input of the machine learning approach to accurately predict the performance level of the cloud application. Experimental results with state-of-the-art cloud benchmarks demonstrate that our proposed method outperforms existing prediction methods by more than 2× in terms of the worst prediction error. In addition, we successfully tackle the challenge of performance prediction for applications with variable workloads by introducing the performance degradation index, which other comparison methods fail to consider. The workflow versatility of the proposed approach has been verified with different modern servers and VM configurations.

2023

HackRF?+?GNU Radio: A software-defined radio to teach communication theory

Scientific paper ArODES

Alberto A Del Barrio, José P. Manzano, Victor M. Maroto, Álvaro Villarín, Josué Pagán, Marina Zapater, José Ayala, Román Hermida

The International Journal of Electrical Engineering & Education, 60, 1, 23-40

Link to the publication

Summary:

In this paper, an alternative to the traditional methodology related to signal processing-like subjects is proposed. These are subjects that require a deep mathematical and theoretical basis, but the practical goal is not often emphasized, which drives students to lose interest in the subject. Thus, a software-defined radio environment is proposed to provide a more practical view of the subject. This solution consists of an open hardware–software platform able to capture and process a wide range of frequencies. HackRF is the hardware component, while GNU Radio will provide the graphical support to this device. The tests performed with a set of 36 students have revealed that they are more satisfied with this framework than just employing a traditional equation-based environment as Matlab. Furthermore, their scores in the exams also support the suitability of the proposed platform.

2022

ALPINE :

Scientific paper ArODES

analog in-memory acceleration with tight processor integration for deep learning

Joshua Klein, Irem Boybat, Yasir Qureshi, Martino Dazzi, Giovanni Ansaloni, Marina Zapater, Abu Sebastian, David Atienza

IEEE Transactions on Computers, 2023, vol. 72, no. 7, pp. 1985 - 1998

Link to the publication

Summary:

Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support a limited set of processing functions. With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems. We developed a powerful gem5-based full system-level simulation framework into the gem5-X simulator, ALPINE, which enables an in-depth characterization of the proposed architecture. ALPINE allows the simulation of the entire computer architecture stack from major hardware components to their interactions with the Linux OS. Within ALPINE, we have defined a custom ISA extension and a software library to facilitate the deployment of inference models. We showcase and analyze a variety of mappings of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional neural networks, multi-layer perceptrons, and recurrent neural networks.

Thermal and voltage-aware performance management of 3D MPSoCs with flow cell arrays and integrated SC converters

Scientific paper ArODES

Halima Najibi, ALexandre Levisse, Giovanni Ansaloni, Marina Zapater, Miroslav Vasic, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, vol. 42, no. 1, pp. 2-15

Link to the publication

Summary:

Flow cell arrays (FCAs) concurrently provide efficient on-chip liquid cooling and electrochemical power generation. This technology is especially promising for threedimensional multi-processor systems-on-chip (3D MPSoCs) realized in deeply scaled technologies, which present very challenging power and thermal requirements. Indeed, FCAs effectively improve power delivery network (PDN) performance, particularly if switched capacitor (SC) converters are employed to decouple the flow cells and the systems-on-chip voltages, allowing each to operate at their optimal point. Nonetheless, the design of FCAbased solutions entails non-obvious considerations and trade-offs, stemming from their dual role in governing both the thermal and power delivery characteristics of 3D MPSoCs. Showcasing them in this paper, we explore multiple FCA design configurations and demonstrate that this technology can decrease the temperature of a heterogeneous 3D MPSoC by 78∘C, and its total power consumption by 46%, compared to a high-performance cold-plate based liquid cooling solution. At the same time, FCAs enable up to 90% voltage drop recovery across dies, using SC converters occupying a small fraction of the chip area. Such outcomes provide an opportunity to boost 3D MPSoC computing performance by increasing the operating frequency of dies. Leveraging these results, we introduce a novel temperature and voltage-aware model predictive control (MPC) strategy that optimizes power efficiency during run-time. We achieve application-wide speedups of up to 16% on various machine learning (ML), data mining, and other high-performance benchmarks while keeping the 3D MPSoC temperature below 83∘C and voltage drops below 5%.

Reinforcement learning-based joint reliability and performance optimization for hybrid-cache computing servers

Scientific paper ArODES

Darong Huang, Ali Pahlevan, Luis Costero, Marina Zapater, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, vol. 41, no. 12, pp. 5596-5609

Link to the publication

Summary:

Computing servers play a key role in the development and process of emerging compute-intensive applications in recent years. However, they need to operate efficiently from an energy perspective viewpoint, while maximizing the performance and lifetime of the hottest server components (i.e., cores and cache). Previous methods focused on either improving energy efficiency by adopting new hybrid-cache architectures including the resistive random-access memory (RRAM) and static random-access memory (SRAM) at the hardware level, or exploring trade-offs between lifetime limitation and performance of multi-core processors under stable workloads conditions. Therefore, no work has so far proposed a co-optimization method with hybrid-cache-based server architectures for real-life dynamic scenarios taking into account scalability, performance, lifetime reliability, and energy efficiency at the same time. In this paper, we first formulate a reliability model for the hybrid-cache architecture to enable precise lifetime reliability management and energy efficiency optimization. We also include the performance and energy overheads of cache switching, and optimize the benefits of hybrid-cache usage for better energy efficiency and performance. Then, we propose a runtime Q-Learning-based reliability management and performance optimization approach for multi-core microprocessors with the hybrid-cache architecture, jointly incorporated with a dynamic preemptive priority queue management method to improve the overall tasks’ performance by targeting to respect their end time limits. Experimental results show that our proposed method achieves up to 44% average performance (i.e., tasks execution time) improvement, while maintaining the whole system design lifetime longer than 5 years, when compared to the latest state-of-the-art energy efficiency optimization and reliability management methods for computing servers.

Energy-aware task scheduling in data centers using an application signature

Scientific paper ArODES

Juan Carlos Salinas-Hilburg, Marina Zapater, José, M. Moya, José L. Ayala

Computers Electrical Engineering, 2022, vol. 97, article no. 107630

Link to the publication

Summary:

Data centers are power hungry facilities. Energy-aware task scheduling approaches are of utmost importance to improve energy savings in data centers, although they need to know beforehand the energy consumption of the applications that will run in the servers. This is usually done through a full profiling of the applications, which is not feasible in long-running application scenarios due to the long execution times. In the present work we use an application signature that allows to estimate the energy without the need to execute the application completely. We use different scheduling approaches together with the information of the application signature to improve the makespan of the scheduling process and therefore improve the energy savings in data centers. We evaluate the accuracy of using the application signature by means of comparing against an oracle method obtaining an error below 1.5%, and Compression Ratios around 39.7 to 45.8.

2021

Gem5-X :

Scientific paper ArODES

a many-core heterogeneous simulation platform for architectural exploration and optimization

Yasir Mahmood Qureshi, Marina Zapater, Katzalin Olcoz, David Atienza

ACM Transactions on Architecture and Code Optimization, 2021, vol. 18, no 4, article no. 44, pp. 1-27

Link to the publication

Summary:

The increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings.

Interpreting deep learning models for epileptic seizure detection on EEG signals

Scientific paper ArODES

Valentin Gabeff, Tomas Teijeiro, Marina Zapater, Leila Cammoun, Sylvie Rheims, Philippe Ryvlin, David Atienza

Artificial Intelligence in Medicine, 2021, vol. 117, article no. 102084

Link to the publication

Summary:

While Deep Learning (DL) is often considered the state-of-the art for Artificial Intel-ligence-based medical decision support, it remains sparsely implemented in clinical practice and poorly trusted by clinicians due to insufficient interpretability of neural network models. We have approached this issue in the context of online detection of epileptic seizures by developing a DL model from EEG signals, and associating certain properties of the model behavior with the expert medical knowledge. This has conditioned the preparation of the input signals, the network architecture, and the post-processing of the output in line with the domain knowledge. Specifically, we focused the discussion on three main aspects: (1) how to aggregate the classification results on signal segments provided by the DL model into a larger time scale, at the seizure-level; (2) what are the relevant frequency patterns learned in the first convolutional layer of different models, and their relation with the delta, theta, alpha, beta and gamma frequency bands on which the visual interpretation of EEG is based; and (3) the identification of the signal waveforms with larger contribution towards the ictal class, according to the activation differences highlighted using the DeepLIFT method. Results show that the kernel size in the first layer determines the interpretability of the extracted features and the sensitivity of the trained models, even though the final performance is very similar after post-processing. Also, we found that amplitude is the main feature leading to an ictal prediction, suggesting that a larger patient population would be required to learn more complex frequency patterns. Still, our methodology was successfully able to generalize patient inter-variability for the majority of the studied population with a classification F1-score of 0.873 and detecting 90% of the seizures.

Multi-agent reinforcement learning for hyperparameter optimization of convolutional neural networks

Scientific paper ArODES

Arman Iranfar, Marina Zapater, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, vol. 41, no. 4, pp. 1034-1047

Link to the publication

Summary:

Nowadays, Deep Convolutional Neural Networks (DCNNs) play a significant role in many application domains, such as, computer vision, medical imaging, and image processing. Nonetheless, designing a DCNN, able to defeat the state of the art, is a manual, challenging, and time-consuming task, due to the extremely large design space, as a consequence of a large number of layers and their corresponding hyperparameters. In this work, we address the challenge of performing hyperparameter optimization of DCNNs through a novel Multi-Agent Reinforcement Learning (MARL)-based approach, eliminating the human effort. In particular, we adapt Q-learning and define learning agents per layer to split the design space into independent smaller design sub-spaces such that each agent fine-tunes the hyperparameters of the assigned layer concerning a global reward. Moreover, we provide a novel formation of Q-tables along with a new update rule that facilitates agents’ communication. Our MARL-based approach is data-driven and able to consider an arbitrary set of design objectives and constraints. We apply our MARL-based solution to different well-known DCNNs, including GoogLeNet, VGG, and U-Net, and various datasets for image classification and semantic segmentation. Our results have shown that, compared to the original CNNs, the MARL-based approach can reduce the model size, training time, and inference time by up to, respectively, 83x, 52%, and 54% without any degradation in accuracy. Moreover, our approach is very competitive to state-of-the-art neural architecture search methods in terms of the designed CNN accuracy and its number of parameters while significantly reducing the optimization cost.

ECOGreen :

Scientific paper ArODES

electricity cost optimization for green datacenters in emerging power markets

Ali Pahlevan, Marina Zapater, Ayse K. Coskun, David Atienza

IEEE Transactions on Sustainable Computing, 2021, vol. 6, no. 2, pp. 289 - 305

Link to the publication

Summary:

Modern datacenters need to tackle efficiently the increasing demand for computing resources while minimizing energy usage and monetary costs. Power market operators have recently introduced emerging demand-response programs, in which electricity consumers regulate their power usage following provider requests to reduce monetary costs. Among different programs, regulation service (RS) reserves are particularly promising for datacenters due to the high credit gain possibilities and datacenters' flexibility in regulating their power consumption. Therefore, it is essential to develop bidding strategies for datacenters to participate in emerging power markets together with power management policies that are aware of power market requirements at runtime. In this paper we propose ECOGreen, a holistic strategy to jointly optimize the datacenter RS problem and virtual machine (VM) allocation that satisfies the hour-ahead power market constraints in the presence of electrical energy storage (EES) and renewable energy. We first find the best power and reserve bidding values as well as the number of active servers in a fast analytical way that works well in practice. Then, we present an online adaptive policy that modulates datacenter power consumption by controlling VMs CPU resource limits and efficiently utilizing demand-side EES and renewable power, while guaranteeing quality-of-service (QoS) constraints. Our results demonstrate that ECOGreen can provide 76 percent of the datacenter power consumption on average as reserves to the market, due to largely operating on renewable sources and EES. This translates into ECOGreen saving up to 71 percent electricity costs when compared to other state-of-the-art datacenter electricity cost minimization techniques that participate in the power market.

3D-ICE 3.0 :

Scientific paper ArODES

efficient nonlinear MPSoC thermal simulation with pluggable heat sink models

Frederico Terraneo, Alberto Leva, William Fornaciari, Marina Zapater, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, vol. 41, no. 4, pp. 1062-1075

Link to the publication

Summary:

The increasing power density in modern highperformance multi-processor system-on-chip (MPSoC) is fueling a revolution in thermal management. On the one hand, thermal phenomena are becoming a critical concern, making accurate and efficient simulation a necessity. On the other hand, a variety of physically heterogeneous solutions are coming into play: liquid, evaporative, thermoelectric cooling, and more. A new generation of simulators, with unprecedented flexibility, is thus required. In this paper, we present 3D-ICE 3.0, the first thermal simulator to allow for accurate nonlinear descriptions of complex and physically heterogeneous heat dissipation systems, while preserving the efficiency of latest compact modeling frameworks at the silicon die level. 3D-ICE 3.0 allows designers to extend the thermal simulator with new heat sink models while simplifying the time-consuming step of model validation. Support for nonlinear dynamic models is included, for instance to accurately represent variable coolant flows. Our results present validated models of a commercial water heat sink and an air heat sink plus fan that achieve an average error below 1∘C and simulate, respectively, up to 3x and 12x faster than the real physical phenomena.

Fast energy estimation framework for long-running applications

Scientific paper ArODES

Juan Carlos Salinas-Hilburg, Marina Zapater, José M. Moya, José C. Ayala

Future Generation Computer Systems, 2021, vol. 115, pp. 20-33

Link to the publication

Summary:

The computation power in data center facilities is increasing significantly. This brings with it an increase of power consumption in data centers. Techniques such as power budgeting or resource management are used in data centers to increase energy efficiency. These techniques require to know beforehand the energy consumption throughout a full profiling of the applications. This is not feasible in scenarios with long-running applications that have long execution times. To tackle this problem we present a fast energy estimation framework for long-running applications. The framework is able to estimate the dynamic CPU and memory energy of the application without the need to perform a complete execution. For that purpose, we leverage the concept of application signature. The application signature is a reduced version, in terms of execution time, of the original application. Our fast energy estimation framework is validated with a set of long-running applications and obtains RMS values of 11.4% and 12.8% for the CPU and memory energy estimation errors, respectively. We define the concept of Compression Ratio as an indicator of the acceleration of the energy estimation process. Our framework is able to obtain Compression Ratio values in the range of 10.1 to 191.2.

COCKTAIL :

Scientific paper ArODES

multi-core co-optimization framework with proactive reliability management

Darong Huang, Ali Pahlevan, Marina Zapater, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, vol. 41, no. 2, pp. 386-399

Link to the publication

Summary:

High performance computing (HPC) servers aim to meet an increase in the number and complexity of tasks and, consequently, to address the energy efficiency challenge. In addition to energy efficiency, it is essential to manage lifetime limitations of power-hungry components of servers (e.g., cores and cache), hence avoiding server failure before its lifetime period. Traditional approaches focus on either using hybrid caches to reduce the leakage power of traditional static random-access memory (SRAM) cache, and thus increase the energy efficiency, or the trade-off between the lifetime and performance of multi-core processors. However, these approaches fall short in terms of flexibility and applicability for HPC tasks in terms of multi-parametric optimization including quality-of-service (QoS), lifetime reliability, and energy efficiency. As a result, in this paper we propose COCKTAIL, a holistic strategy framework to jointly optimize the energy efficiency of multi-core server processors and tasks performance in the HPC context, while guaranteeing the lifetime reliability. First, we analyze the best cache technology among traditional SRAM and resistive random access memory (RRAM), within the context of hybrid cache architectures, to improve the energy efficiency and manage cache endurance limits with respect to tasks requirements. Second, we introduce a novel efficient proactive queue optimization policy to reorder HPC tasks for execution considering their end time and possible reliability effects on the use of the hybrid caches. Third, we present a dynamic model predictive control (MPC)-based reliability management method to maximize task performance, by controlling the frequency, temperature, and target lifetime of the server processor. Our results demonstrate that, while consuming similar energy, COCKTAIL provides up to 60% QoS improvement when compared to latest state-of-the-art energy optimization and reliability management techniques in the HPC context. Moreover, our strategy guarantees a design lifetime longer than 5 years for the whole HPC system.

2020

Resource management for power-constrained HEVC transcoding using reinforcement learning

Scientific paper ArODES

Luis Costero, Arman Iranfar, Marina Zapater, Francisco D. Igual, Katzalin Olcoz, David Atienza

IEEE Transactions on Parallel and Distributed Systems, 2020, vol. 31, no. 12

Link to the publication

Summary:

The advent of online video streaming applications and services along with the users' demand for high-quality contents require High Efficiency Video Coding (HEVC), which provides higher video quality and more compression at the cost of increased complexity. On one hand, HEVC exposes a set of dynamically tunable parameters to provide trade-offs among Quality-of-Service (QoS), performance, and power consumption of multi-core servers on the video providers' data center. On the other hand, resource management of modern multi-core servers is in charge of adapting system-level parameters, such as operating frequency and multithreading, to deal with concurrent applications and their requirements. Therefore, efficient multi-user HEVC streaming necessitates joint adaptation of application-and system-level parameters. Nonetheless, dealing with such a large and dynamic design space is challenging and difficult to address through conventional resource management strategies. Thus, in this work, we develop a multi-agent Reinforcement Learning framework to jointly adjust application-and system-level parameters at runtime to satisfy the QoS of multi-user HEVC streaming in power-constrained servers. In particular, the design space, composed of all design parameters, is split into smaller independent sub-spaces. Each design sub-space is assigned to a particular agent so that it can explore it faster, yet accurately. The benefits of our approach are revealed in terms of adaptability and quality (with up to to 4× improvements in terms of QoS when compared to a static resource management scheme), and learning time (6× fasterthan an equivalent mono-agent implementation). Finally, we show that the power-capping techniques formulated outperform the hardware-based power capping with respect to quality.

The RECIPE approach to challenges in deeply heterogeneous high performance systems

Scientific paper ArODES

Giovanni Agosta, William Fornaciari, David Atienza, Ramon Canal, Alessandro Cilardo, Marina Zapater

Microprocessors and Microsystems, 2020, vol. 77, 103185

Link to the publication

Summary:

RECIPE (REliable power and time-ConstraInts-aware Predictive management of heterogeneous Exascale systems) is a recently started project funded within the H2020 FETHPC programme, which is expressly targeted at exploring new High-Performance Computing (HPC) technologies. RECIPE aims at introducing a hierarchical runtime resource management infrastructure to optimize energy efficiency and minimize the occurrence of thermal hotspots, while enforcing the time constraints imposed by the applications and ensuring reliability for both time-critical and throughput-oriented computation that run on deeply heterogeneous accelerator-based systems. This paper presents a detailed overview of RECIPE, identifying the fundamental challenges as well as the key innovations addressed by the project. In particular, the need for predictive reliability approaches to maximizing hardware lifetime and guarantee application performance is identified as the key concern for RECIPE. We address it through hierarchical resource management of the heterogeneous architectural components of the system, driven by estimates of the application latency and hardware reliability obtained respectively through timing analysis and modeling thermal properties and mean-time-to-failure of subsystems. We show the impact of prediction accuracy on the overheads imposed by the checkpointing policy, as well as a possible application to a weather forecasting use case.

BLADE :

Scientific paper ArODES

an in-cache computing architecture for edge devices

Yasir Mahmood Qureshi, Marco Rios, Marina Zapater

IEEE Transactions on Computers, 2020, vol. 69, no. 9, pp. 1349 - 1363

Link to the publication

Summary:

Area and power-constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power-efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spatially local data common in many emerging workloads, while reducing power consumption due to data movement. However, in-SRAM computing faces many challenges, including integration into the existing architecture, arithmetic operation support, data corruption at high operating frequencies, inability to run at low voltages, and low area density. To meet these challenges, this article introduces BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an in-SRAM computing architecture that utilizes local wordline groups to perform computations at a frequency 2.8× higher than state-of-the-art in-SRAM computing architectures. BLADE is integrated into the cache hierarchy of low-voltage edge devices, and simulated and benchmarked at the transistor, architecture, and software abstraction levels. Experimental results demonstrate performance/energy gains over an equivalent NEON accelerated processor for a variety of edge device workloads, namely, cryptography (4× performance gain/6× energy reduction), video encoding (6×/2×), and convolutional neural networks (3×/1.5×), while maintaining the highest frequency/energy ratio (up to 2.2 Ghz@1V) of any conventional in-SRAM computing architecture, and a low area overhead of less than 8 percent.

Genome sequence alignment - design space exploration for optimal performance and energy architectures

Scientific paper ArODES

Yasir Mahmood Qureshi, Jose Manuel Herruzo, Marina Zapater, Katzalin Olcoz, Sonia Gonzalez-Navarro

IEEE Transactions on Computers, 2020, vol. 14, no. 8, pp. 1-14

Link to the publication

Summary:

Next generation workloads, such as genome sequencing, have an astounding impact in the healthcare sector. Sequence alignment, the first step in genome sequencing, has experienced recent breakthroughs, which resulted in next generation sequencing (NGS). As NGS applications are memory bounded with random memory access patterns, we propose the use of high bandwidth memories like 3D stacked HBM2, instead of traditional DRAMs like DDR4, along with energy efficient compute cores to improve both performance and energy efficiency. Three state-of-the-art NGS applications, Bowtie2, BWA-MEM and HISAT2, are used as case studies to explore and optimize NGS computing architectures. Then, using the gem5-X architectural simulator, we obtain an overall 68% performance improvement and 71% energy savings using HBM2 instead of DDR4. Furthermore, we propose an architecture based on ARMv8 cores and demonstrate that 16 ARMv8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory. Moreover, we show that by using frequency scaling we can achieve up to 59% and 61% energy savings for ARM in-order and OoO cores, respectively. Lastly, we show that many ARMv8 in-order cores at 1.5GHz match the performance of fewer OoO cores at 2GHz, while attaining 4.5x energy savings.

2019

MAGNETIC :

Scientific paper ArODES

multi-agent machine learning-based approach for energy efficient dynamic consolidation in data centers

Kawsar Haghshenas, Ali Pahlevan, Marina Zapater, Siamak Mohammadi, David Atienza

IEEE Transactions on Services Computing, Early Access

Link to the publication

Summary:

Improving the energy efficiency of data centers while guaranteeing Quality of Service (QoS), together with detecting performance variability of servers caused by either hardware or software failures, are two of the major challenges for efficient resource management of large-scale cloud infrastructures. Previous works in the area of dynamic Virtual Machine (VM) consolidation are mostly focused on addressing the energy challenge, but fall short in proposing comprehensive, scalable, and low-overhead approaches that jointly tackle energy efficiency and performance variability. Moreover, they usually assume over-simplistic power models, and fail to accurately consider all the delay and power costs associated with VM migration and host power mode transition. These assumptions are no longer valid in modern servers executing heterogeneous workloads and lead to unrealistic or inefficient results. In this paper, we propose a centralized-distributed low-overhead failure-aware dynamic VM consolidation strategy to minimize energy consumption in large-scale data centers. Our approach selects the most adequate power mode and frequency of each host during runtime using a distributed multi-agent Machine Learning (ML) based strategy, and migrates the VMs accordingly using a centralized heuristic. Our Multi-AGent machine learNing-based approach for Energy efficienT dynamIc Consolidation (MAGNETIC) is implemented in a modified version of the CloudSim simulator, and considers the energy and delay overheads associated with host power mode transition and VM migration, and is evaluated using power traces collected from various workloads running in real servers and resource utilization logs from cloud data center infrastructures. Results show how our strategy reduces data center energy consumption by up to 15% compared to other works in the state-of-the-art (SoA), guaranteeing the same QoS and reducing the number of VM migrations and host power mode transitions by up to 86% and 90%, respectively. Moreover, it shows better scalability than all other approaches, taking less than 0.7% time overhead to execute for a data center with 1500 VMs. Finally, our solution is capable of detecting host performance variability due to failures, automatically migrating VMs from failing hosts and draining them from workload.

2018

Machine learning-based quality-aware power and thermal management of multistream HEVC encoding on multicore servers

Scientific paper ArODES

Arman Iranfar, Marina Zapater, David Atienza

IEEE Transactions on Parallel and Distributed Systems, 2018, vol. 29, no. 10, pp. 2268 - 2281

Link to the publication

Summary:

The emergence of video streaming applications, together with the users' demand for high-resolution contents, has led to the development of new video coding standards, such as High Efficiency Video Coding (HEVC). HEVC provides high efficiency at the cost of increased complexity. This higher computational burden results in increased power consumption in current multicore servers. To tackle this challenge, algorithmic optimizations need to be accompanied by content-aware application-level strategies, able to reduce power while meeting compression and quality requirements. In this paper, we propose a machine learning-based power and thermal management approach that dynamically learns and selects the best encoding configuration and operating frequency for each of the videos running on multicore servers, by using information from frame compression, quality, encoding time, power, and temperature. In addition, we present a resolution-aware video assignment and migration strategy that reduces the peak and average temperature of the chip while maintaining the desirable encoding time. We implemented our approach in an enterprise multicore server and evaluated it under several common scenarios for video providers. On average, compared to a state-of-the-art technique, for the most realistic scenario, our approach improves BD-PSNR and BD-rate by 0.54 dB, and 8 percent, respectively, and reduces the encoding time, power consumption, and average temperature by 15.3, 13, and 10 percent, respectively. Moreover, our proposed approach enhances BDPSNR and BD-rate compared to the HEVC Test Model (HM), by 1.19 dB and 24 percent, respectively, without any encoding time degradation, when power and temperature constraints are relaxed.

Exploring manycore architectures for next-generation HPC systems through the MANGO approach

Scientific paper ArODES

José Flich, Giovanni Agosta, Philipp Ampletzer, David Atienza Alonso, Carlo Brandolese, Marina Zapater

Microprocessors and Microsystems, 2018, vol. 61, pp. 154-170

Link to the publication

Summary:

The Horizon 2020 MANGO project aims at exploring deeply heterogeneous accelerators for use in High-Performance Computing systems running multiple applications with different Quality of Service (QoS) levels. The main goal of the project is to exploit customization to adapt computing resources to reach the desired QoS. For this purpose, it explores different but interrelated mechanisms across the architecture and system software. In particular, in this paper we focus on the runtime resource management, the thermal management, and support provided for parallel programming, as well as introducing three applications on which the project foreground will be validated.

Integrating heuristic and machine-learning methods for efficient virtual machine allocation in data centers

Scientific paper ArODES

Ali Pahlevan, Xiaoyu Qu, Marina Zapater, David Atienza

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, vol. 37, no. 8, pp. 1667 - 1680

Link to the publication

Summary:

Modern cloud data centers (DCs) need to tackle efficiently the increasing demand for computing resources and address the energy efficiency challenge. Therefore, it is essential to develop resource provisioning policies that are aware of virtual machine (VM) characteristics, such as CPU utilization and data communication, and applicable in dynamic scenarios. Traditional approaches fall short in terms of flexibility and applicability for large-scale DC scenarios. In this paper, we propose a heuristic- and a machine learning (ML)-based VM allocation method and compare them in terms of energy, quality of service (QoS), network traffic, migrations, and scalability for various DC scenarios. Then, we present a novel hyper-heuristic algorithm that exploits the benefits of both methods by dynamically finding the best algorithm, according to a user-defined metric. For optimality assessment, we formulate an integer linear programming (ILP)-based VM allocation method to minimize energy consumption and data communication, which obtains optimal results, but is impractical at runtime. Our results demonstrate that the ML approach provides up to 24% server-to-server network traffic improvement and reduces execution time by up to 480× compared to conventional approaches, for large-scale scenarios. On the contrary, the heuristic outperforms the ML method in terms of energy and network traffic for reduced scenarios. We also show that the heuristic and ML approaches have up to 6% energy consumption overhead compared to ILP-based optimal solution. Our hyper-heuristic integrates the strengths of both the heuristic and the ML methods by selecting the best one during runtime.

Power transmission and workload balancing policies in eHealth mobile cloud computing scenarios

Scientific paper ArODES

Josué Pagán, Marina Zapater, José L. Ayala

Future Generation Computer Systems, 2018, vol. 78, no. 2, pp. 587-601

Link to the publication

Summary:

The Internet of Things (IoT) holds big promises for healthcare, especially in proactive personal eHealth. Prediction of symptomatic crises in chronic diseases in the IoT scenario leads to the deployment of ambulatory monitoring systems. These systems place a major concern in the amount of data to be processed and the intelligent management of the energy consumption. The huge amount of data generated for these systems require high computing capabilities only available in Data Centers. This paper presents a real case of prediction in the eHealth scenario, devoted to neurological disorders. The presented case study focuses on the migraine headache, a disease that affects around 15% of the European population. This paper extrapolates results from real data and simulations in a study where migraine patients are monitored using an unobtrusive Wireless Body Sensor Network. Low-power techniques are applied in monitorization nodes. Techniques such us: on-node signal processing and radio policies to make node’s autonomy longer and save energy, have been applied. Workload balancing policies are carried out in the coordinator nodes and Data Centers to reduce the computational burden in these facilities and minimize its energy consumption. Our results draw average savings of € 288 million in this eHealth scenario applied only to 2% of European migraine sufferers; in addition to savings of € 1272 million due to the benefits of the migraine prediction.

PowerCool :

Scientific paper ArODES

simulation of cooling and powering of 3D MPSoCs with integrated flow cell arrays

Artem Aleksandrovich Andreev, Arvind Sridhar, Mohamed M. Sabry, Marina Zapater, Patrick Ruch

IEEE Transactions on Computers, 2018, vol. 67, no. 1, pp. 73 - 85

Link to the publication

Summary:

Integrated Flow-Cell Arrays (FCAs) represent a combination of integrated liquid cooling and on-chip power generation, converting chemical energy of the flowing electrolyte solutions to electrical energy. The FCA technology provides a promising way to address both heat removal and power delivery issues in 3D Multiprocessor Systems-on-Chips (MPSoCs). In this paper we motivate the benefits of FCA in 3D MPSoCs via a qualitative analysis and explore the capabilities of the proposed technology using our extended PowerCool simulator. PowerCool is a tool that performs combined compact thermal and electrochemical simulation of 3D MPSoCs with inter-tier FCA-based cooling and power generation. We validate our electrochemical model against experimental data obtained using a micro-scale FCA, and extend PowerCool with a compact thermal model (3D-ICE) and subthreshold leakage estimation. We show the sensitivity of the FCA cooling and power generation on the design-time (FCA geometry) and run-time (fluid inlet temperature, flow rate) parameters. Our results show that we can optimize the FCA to keep maximum chip temperature below 95 °C for an average chip power consumption of 50 W/cm 2 while generating up to 3.6 W per cm 2 of chip area.

2017

Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark

Scientific paper ArODES

José L. Risco-Martin, Saurabh Mittal, Juan Carlos Fabero Jiménez, Marina Zapater, Román Hermida Correa

SIMULATION, 2017, vol. 93, no. 6, pp. 459–476

Link to the publication

Summary:

The discrete event system specification formalism, which supports hierarchical and modular model composition, has been widely used to understand, analyze and develop a variety of systems. Discrete event system specification has been implemented in various languages and platforms over the years. The DEVStone benchmark was conceived to generate a set of models with varied structure and behavior, and to automate the evaluation of the performance of discrete event system specification-based simulators. However, DEVStone is still in a preliminary phase and more model analysis is required. In this paper, we revisit DEVStone introducing new equations to compute the number of events triggered. We also introduce a new benchmark with a similar central processing unit and memory requirements to the most complex benchmark in DEVStone, but with an easier implementation and with it being more manageable analytically. Finally, we compare both the performance and memory footprint of five different discrete event system specification simulators in two different hardware platforms.

2016

Runtime data center temperature prediction using grammatical evolution techniques

Scientific paper ArODES

Marina Zapater, José L. Risco-Martín, Patricia Arroba, José L. Ayala, José M. Moya

Applied Soft Computing, 2016, vol. 49, pp. 94-107

Link to the publication

Summary:

Data Centers are huge power consumers, both because of the energy required for computation and the cooling needed to keep servers below thermal redlining. The most common technique to minimize cooling costs is increasing data room temperature. However, to avoid reliability issues, and to enhance energy efficiency, there is a need to predict the temperature attained by servers under variable cooling setups. Due to the complex thermal dynamics of data rooms, accurate runtime data center temperature prediction has remained as an important challenge. By using Grammatical Evolution techniques, this paper presents a methodology for the generation of temperature models for data centers and the runtime prediction of CPU and inlet temperature under variable cooling setups. As opposed to time costly Computational Fluid Dynamics techniques, our models do not need specific knowledge about the problem, can be used in arbitrary data centers, re-trained if conditions change and have negligible overhead during runtime prediction. Our models have been trained and tested by using traces from real Data Center scenarios. Our results show how we can fully predict the temperature of the servers in a data rooms, with prediction errors below 2 °C and 0.5 °C in CPU and server inlet temperature respectively.

2015

Leakage-aware cooling management for improving server energy efficiency

Scientific paper ArODES

Marina Zapater, Ozan Tuncer, José L. Ayala, José M. Moya, Kalyan Vaidyanathan

IEEE Transactions on Parallel and Distributed Systems, 2015, vol. 26, no. 10, pp. 2764 - 2777

Link to the publication

Summary:

The computational and cooling power demands of enterprise servers are increasing at an unsustainable rate. Understanding the relationship between computational power, temperature, leakage, and cooling power is crucial to enable energy-efficient operation at the server and data center levels. This paper develops empirical models to estimate the contributions of static and dynamic power consumption in enterprise servers for a wide range of workloads, and analyzes the interactions between temperature, leakage, and cooling power for various workload allocation policies. We propose a cooling management policy that minimizes the server energy consumption by setting the optimum fan speed during runtime. Our experimental results on a presently shipping enterprise server demonstrate that including leakage awareness in workload and cooling management provides additional energy savings without any impact on performance.

Self-organizing maps versus growing neural gas in detecting anomalies in data centres

Scientific paper ArODES

Marina Zapater, David Fraga, Pedro Malagón, Zorana Bankovic, José M. Moya

Logic Journal of IGPL, 2015, vol. 23, no. 3, pp. 495–505

Link to the publication

Summary:

Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of these facilities lead data centre operators to increase the ambient temperature of the data room to decrease cooling costs. However, increasing ambient temperature reduces the safety margins and can result in a higher number of anomalous events. Anomalies in the data centre need to be detected as soon as possible to optimize cooling efficiency and mitigate the harmful effects over servers. This article proposes the usage of clustering-based outlier detection techniques coupled with a trust and reputation system engine to detect anomalies in data centres. We show how self-organizing maps or growing neural gas can be applied to detect cooling and workload anomalies, respectively, in a real data centre scenario with very good detection and isolation rates, in a way that is robust to the malfunction of the sensors that gather server and environmental information.

Energy-aware policies in ubiquitous computing facilities

Book chapter ArODES

Marina Zapater, Patricia Arroba, José Luis Ayala Rodrigo, Katzalin Olcoz Herrero, José Manuel Moya Fernandez

Cloud computing with e-science applications (20 p.). 2015, Boca Raton : CRC Press

Link to the publication

Summary:

This chapter provides a vision of the increasing energy problem in computing facilities with focuses on cloud computing, under the new computational paradigms, and proposes solutions from a global, multilayer perspective, describing a novel system architecture, power models, and optimization algorithms. Researchers have done a massive amount of work to address the issues and provide energy-aware computing environments. Consolidation allows reducing the number of operating servers to process the same workload, minimizing the static consumption, which leads to operating server set and turn-off policies. Cloud computing, mobile cloud computing, or even modern high-performance computing start with data centers. While we can dream of a world in which anyone is allowed to sell their excess computing capacity as virtualized resources to anyone else or where the ubiquitous sensing of information is processed by a center kilometers away from the source.

Comparative study of meta-heuristic 3D floorplanning algorithms

Scientific paper ArODES

Alfredo Cuesta-Infante, J. Manuel Colmenar, Zorana Bankovic, José L. Risco-Martín, Marina Zapater

Neurocomputing, 2015, vol. 150, part A, pp. 67-81

Link to the publication

Summary:

Constant necessity of improving performance has brought the invention of 3D chips. The improvement is achieved due to the reduction of wire length, which results in decreased interconnection delay. However, 3D stacks have less heat dissipation due to the inner layers, which leads to increased temperature and the appearance of hot spots. This problem can be mitigated through appropriate floorplanning. For this reason, in this work we present and compare five different solutions for floorplanning of 3D chips. Each solution uses a different representation, and all are based on meta-heuristic algorithms, namely three of them are based on simulated annealing, while two other are based on evolutionary algorithms. The results show great capability of all the solutions in optimizing temperature and wire length, as they all exhibit significant improvements comparing to the benchmark floorplans.

Enhancing regression models for complex systems using evolutionary techniques for feature engineering

Scientific paper ArODES

Patricia Arroba, José L. Risco-Martín, Marina Zapater, José M. Moya, José L. Ayala

Journal of Grid Computing, 2015, vol. 13, pp. 409–423

Link to the publication

Summary:

This work proposes an automatic methodology for modeling complex systems. Our methodology is based on the combination of Grammatical Evolution and classical regression to obtain an optimal set of features that take part of a linear and convex model. This technique provides both Feature Engineering and Symbolic Regression in order to infer accurate models with no effort or designer’s expertise requirements. As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. These facilities consume from 10 to 100 times more power per square foot than typical office buildings. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. For this case study, our methodology minimizes error in power prediction. This work has been tested using real Cloud applications resulting on an average error in power estimation of 3.98 %. Our work improves the possibilities of deriving Cloud energy efficient policies in Cloud data centers being applicable to other computing environments with similar characteristics.

2014

A novel energy-driven computing paradigm for e-health scenarios

Scientific paper ArODES

Marina Zapater, Patricia Arroba, José L. Ayala, José M. Moya, Katzalin Olcoz

Future Generation Computer Systems, 2014, vol. 34, pp. 138-154

Link to the publication

Summary:

A first-rate e-Health system saves lives, provides better patient care, allows complex but useful epidemiologic analysis and saves money. However, there may also be concerns about the costs and complexities associated with e-health implementation, and the need to solve issues about the energy footprint of the high-demanding computing facilities. This paper proposes a novel and evolved computing paradigm that: (i) provides the required computing and sensing resources; (ii) allows the population-wide diffusion; (iii) exploits the storage, communication and computing services provided by the Cloud; (iv) tackles the energy-optimization issue as a first-class requirement, taking it into account during the whole development cycle. The novel computing concept and the multi-layer top-down energy-optimization methodology obtain promising results in a realistic scenario for cardiovascular tracking and analysis, making the Home Assisted Living a reality.

2012

GreenDisc :

Book chapter ArODES

a HW/SW energy optimization framework in globally distributed computation

Marina Zapater, José L. Ayala, Jose M. Moya

Ubiquitous Computing and Ambient Intelligence (8 p.). 2012, Heidelberg : Springer

Link to the publication

Summary:

In recent future, wireless sensor networks (WSNs) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of WSNs facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers (DCs). The high economical and environmental impact of the energy consumption in DCs requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of WSNs: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of DCs: energy-optimal workload assignment policies in heterogeneous DCs, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.

Ubiquitous green computing techniques for high demand applications in smart environments

Scientific paper ArODES

Marina Zapater, Cesar Sanchez, Jose L. Ayala, Jose M. Moya, José L. Risco-Martín

Sensors, 2012, vol. 12, no. 8, pp. 10659-10677

Link to the publication

Summary:

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.

Compiler optimizations as a countermeasure against side-channel analysis in MSP430-based devices

Scientific paper ArODES

Pedro Malagón, Juan-Mariano de Goyeneche, Marina Zapater, José M. Moya, Zorana Bankovic

Sensors, 2012, vol. 12, no. 6, pp. 7994-8012

Link to the publication

Summary:

Ambient Intelligence (AmI) requires devices everywhere, dynamic and massively distributed networks of low-cost nodes that, among other data, manage private information or control restricted operations. MSP430, a 16-bit microcontroller, is used in WSN platforms, as the TelosB. Physical access to devices cannot be restricted, so attackers consider them a target of their malicious attacks in order to obtain access to the network. Side-channel analysis (SCA) easily exploits leakages from the execution of encryption algorithms that are dependent on critical data to guess the key value. In this paper we present an evaluation framework that facilitates the analysis of the effects of compiler and backend optimizations on the resistance against statistical SCA. We propose an optimization-based software countermeasure that can be used in current low-cost devices to radically increase resistance against statistical SCA, analyzed with the new framework.

2024

Cross-layer exploration of 2.5D energy-efficient heterogeneous chiplets integration :

Conference ArODES

from system simulation to open hardware

Anna Burdina, Gabriel Catel Torres, Davide Schiavone, Miguel Peón-Quirós, Giovanni Ansaloni, David Atienza, Marina Zapater

Proceedings of the ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design

PEOPLE@HES-SO Directory and Skills inventory

Zapater Sancho Marina

Professeure HES associée

Main skills

High Performance Computing

Embedded Systems

Energy efficiency

Power Management

Deep Learning

Thermal modelling

Novel architectures

Professeure HES associée

PEOPLE@HES-SO
Directory and Skills inventory