Heben Sie Ihre Leistungen auf People@HES-SO hervor weitere Infos
PEOPLE@HES-SO - Verzeichnis der Mitarbeitenden und Kompetenzen
PEOPLE@HES-SO - Verzeichnis der Mitarbeitenden und Kompetenzen

PEOPLE@HES-SO
Verzeichnis der Mitarbeitenden und Kompetenzen

Hilfe
language
  • fr
  • en
  • de
  • fr
  • en
  • de
  • SWITCH edu-ID
  • Verwaltung
ID
« Zurück
Percia David Dimitri

Percia David Dimitri

Professeur-e HES Associé-e

Hauptkompetenzen

Data Science

Applied Machine Learning

Large Language Models

Technology Mining

Information Economics

  • Kontakt

  • Lehre

  • Forschung

  • Publikationen

  • Konferenzen

Hauptvertrag

Professeur-e HES Associé-e

Telefon-Nummer: +41 58 606 97 59

Büro: FOY

HES-SO Valais-Wallis - Haute Ecole de Gestion
Route de la Plaine 2, Case postale 80, 3960 Sierre, CH
HEG - VS
Bereich
Economie et services
Hauptstudiengang
Economie d'entreprise
BSc HES-SO en Economie d'entreprise - HES-SO Valais-Wallis - Haute Ecole de Gestion
  • Statistics
  • Probabilities
  • Data Science
  • Combinatorics

Laufend

SwissCyber Initiative

Rolle: Hauptgesuchsteller/in

Financement: Swiss State Secretariat for Education, Research, and Innovation (SERI)

Description du projet :

The Swiss cybersecurity project, funded by the Swiss State Secretariat for Education, Research, and Innovation (SERI) with a budget of CHF 1 million, seeks to provide a comprehensive overview of Switzerland’s cybersecurity research landscape at both academic and business levels. This initiative addresses critical gaps in knowledge by building on previous studies of university research and expanding to include startups, SMEs, and private research institutions. The project is led by Prof. Dimitri Percia David (HES-SO Valais-Wallis) in collaboration with SATW (Swiss Academy of Technical Sciences), armasuisse Science and Technology (CYD Campus), and Softcom, among others.

The project is structured into five interconnected work packages:

  1. Work Package 1: Data Collection and Analysis
    Quantitative and qualitative data will be gathered from diverse sources, including patents, financial performance, publications, and expert interviews. Advanced data science methods, such as machine learning, network analysis, and natural language processing, will create a structured database mapping Switzerland’s cybersecurity ecosystem.

  2. Work Package 2: Target-State Analysis
    Expert interviews, focus groups, and scenario analyses will define Switzerland’s desired cybersecurity capabilities over a 5-10-year horizon, identifying areas where capabilities need to be developed or dependencies reduced.

  3. Work Package 3: Platform Development
    An interactive online platform will visually present the findings, allowing stakeholders to access up-to-date insights on research efforts, technological capabilities, and gaps. The platform will support long-term use and updates beyond the project's duration.

  4. Work Package 4: Delta Analysis and Recommendations
    A comparative analysis of the current ("ACTUAL") and desired future ("TARGET") states will identify blind spots and actionable recommendations for policymakers, research funding organizations, and industry stakeholders.

  5. Work Package 5: Dissemination and Networking
    The project will organize annual networking events, develop communication materials (e.g., reports, articles, presentations), and promote collaboration between academia, SMEs, startups, and government agencies.

Key collaborations include SATW, which leads dissemination efforts and oversees the advisory board, armasuisse S+T, which contributes data and expertise through the CYD Campus, and Softcom, which supports platform development. The project also leverages tools like the TMM 2.0 platform and existing databases from Trust Valley, Startupticker, Crunchbase, and academic institutions.

The project’s findings will guide strategic investments, policy development, and education initiatives to strengthen Switzerland’s position in the global cybersecurity landscape. The initiative aims to enhance the country’s resilience and innovation in a rapidly evolving field by fostering collaboration across sectors and providing a detailed understanding of Switzerland's cybersecurity ecosystem.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei , Sternfeld Alexander

Partenaires professionnels: Mermoud Alain, CYD Campus; Wettstein Nicole, SATW

Durée du projet: 01.01.2025 - 31.12.2026

Montant global du projet: 1'000'000 CHF

Statut: Laufend

Characterizing and Mitigating Attacks on Large Language Models in Code Generation and Privacy

Rolle: Hauptgesuchsteller/in

Financement: Cyber-Defence Campus

Description du projet :

WP1: Evaluation of LLM-generated code to injected vulnerabilities and potential mitigations: Whether allowing non-coders to realize software projects or making programmers more productive and more competent, code-generating LLMs are one of the focuses of the largest LLM providers, from GPT4 with Codex to GitHub with Copilot to Meta with Code-LLaMA. While it is still to be seen if the promises of code-generating LLMs will be realized, preliminary research from the cyber-security community and GenLearning Center has demonstrated that the code LLMs generate is as prone to bugs, making it vulnerable. Unfortunately, unlike code written by humans, the one generated by LLMs can be poisoned through datasets used in their training and cannot be trivially corrected at scale when the vulnerabilities affecting them are discovered and released. This working package focuses on indirect code vulnerability injection through training dataset poisoning, malicious fine-tuning, or adversarial pre-prompting, focusing specifically on less easily detectable vulnerabilities. The end goal of this project would be to investigate detection and mitigation avenues for such malicious tampering.

WP2: Defining vulnerabilities in LLMs and developing guidelines for characterizing, disclosing, and mitigating them: While there is a rising awareness in the cyber-security community that LLMs introduce novel threats in the cyber-space and amplify existing ones, there is still no understanding of how to systematically report them or link to details of LLM-derived tools implementations to allow the mitigation measures to propagate through ecosystem in the same way security patches are propagated in the software ecosystem. This project aims for the GenLearning Center to generalize the definitions of LLM vulnerabilities to cover aspects such as private data leakage, dangerous information generation, and failure in information retrieval and summarization and to establish guidelines for characterizing, disclosing, and mitigating them. This project would allow a compilation of standards to accept or not LLM-based solutions as part of cyber-physical systems used by DDPS, and how to keep them operating safely in a sustained manner, as well as share information about potential failure modes and mitigations with Swiss partners in defense.

WP3: Secure LLM privacy leakage detection: One of the significant issues with LLMs is their uncanny ability to memorize information they saw in their training only once. With the broad deployment of Conversational Agent LLMs and the use of conversations with users to fine-tune conversational agents further, they potentially memorize critical non-public information from such exchanges. The goal of this WP will be to develop a protocol to securely exchange information between LLM developers and entities who want to verify if their private information was leaked in a way that does not require either party to disclose potentially sensitive information. This project will specifically explore the private set intersection approach with additional safeguards from the differential privacy domain.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei , Vallez Cyril

Partenaires professionnels: Dr. Ljiljana Dolamic, Cyber-Defence Campus

Durée du projet: 01.01.2024 - 31.12.2024

Montant global du projet: 138'871 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/defining-vulnerabilities-in-llms-and-developing-guidelines-for-characterizing-disclosing-and-mitigating-them-208974

Statut: Laufend

Quantitative Technology Assessment, Monitoring & Forecasting Models for Cyber-Defense

Rolle: Hauptgesuchsteller/in

Financement: Cyber-Defence Campus

Description du projet :

Objectives of the project: The project provides quantitative technology assessment, monitoring & forecasting models for cyber-defence. Such an effort aims to contribute to the Technology Monitoring (TM) portfolio of the Cyber-Defence (CYD) Campus in (i) fulfilling the first measure of the NCS[1] attributed to armasuisse S+T, (ii) writing technical reports on specific cyber-defense technologies for the CYD Campus clients, and (iii) contributing in the development of The Swiss Technology Observatory[2] – all three objectives being attached to the Strategy Cyber DPPS (see figure below). While the project aims to present concrete cybersecurity-technology assessments, monitoring, and forecasting through case studies defined under the TM portfolio needs, we want these results to be backed by coherent, relevant, and solid scientific methodologies that will be published in Q1 journals and A conferences. By applying (i) advanced natural language processing (NLP) methods, (ii) forecasting techniques based on machine-learning models and quantitative analysis (data science), and (iv) algorithmics economics, our work aims to rethink traditional technology mining methodologies by developing dynamic and holistic approaches to provide concrete cyber-defense insights in terms of technology assessment, monitoring, and forecasting.

WP1: Edition and Coordination to the CYD TM “Safety of LLMs in Cybertechnology” Overview Book: Successful public demonstration of high-performance LLMs in late 2022 led to a push to their generalized introduction across a range of software services, including mission-critical systems such as intelligence report generation, retrieval, and summarization or integration with operations systems as user interfaces. Unfortunately, LLMs are still a new technology, and the new risks to the security of the cyber-physical systems they introduce have yet to be discovered. To respond to this risk, the CYD TMM center is preparing the “Safety of LLMs in Cyber” book. While it results from collaboration between dozens of cyber-security and machine learning experts worldwide, their domain expertise until 2023 usually had minimal to no overlap with LLMs. In turn, it means that they rely critically on the information provided in an accessible manner by an LLM expert. As an ex-distinguished CYD Postdoctoral Fellow specializing in generative learning in cyber-security and cyber-defense and current co-leader of the GenLearning Center at HES-SO Valais-Wallis, Dr. Kucharavy is well-suited for this task. In the book's first chapters, Dr. Kucharavy will provide a solid base for others to work off, notably by introducing the principles behind the current generation of LLMs, an overview of existing models and approaches to adapt them to novel tasks, and their fundamental limitations. This will provide other authors with a solid basis for LLM capabilities evaluation to provide their input in their domain of expertise.

WP2: Identification of persistently robust technological monitoring proxies: Developing new defense capabilities fundamentally differs from fundamental or applied research in the civil environment. Because of the length of procurement and lifecycles, the technologies that provide them must still be relevant decades later. This leads to a conundrum. On the one hand, the technologies must be novel enough to be still relevant to the delivery time of the new capabilities. On the other hand, they must be mature enough to be ready for use by then. Errors in either direction are measured in lives lost or tens of billions wasted in procurement. MRAPs and DD-21 (Zumwalt) are recent impressive examples, but cyber-defense is rife with similar failures. In the 2010s, NATO lacked social media information operations defense, and despite repeated promises since the 1980s, expert verification systems have not yet gotten rid of all bugs in software. Quantitative technology monitoring and forecasting tools have been developed to address this problem. They rely on hard-to-falsify proxies, ranging from patent citation structure to bibliometrics, journalistic coverage analysis, and social media conversation sentiment. However, with the recent advances in Generative ML, such proxies are no longer hard to falsify. Given the IP-based investment and addition of AI tools to better evaluate patent analysis by several patent offices worldwide, they are likely to be falsified. To retain the robust technological monitoring capabilities of TMM, this project aims to identify novel robust proxies, notably by examining the novelty of terms and correlations and statement factuality coherence, and to apply it to current novel technology with high long-term novel capabilities potential – quantum technologies.

WP3: Novel NLP methods of evaluating short-term technological convergence potential: Key technologies underlying defensive operations rely on a steady effort and funding to be progressively developed and brought to a maturity level when applicable. However, novel operational capabilities often rely on technological convergence, obtaining a massive synergy from well-developed but previously unconnected technologies, such as DDoS and HTTP/2 parallel page loading assets logic, resulting in an overnight tripling of traffic load available to the attackers. Such convergences present a unique opportunity for offensive use, given that systems able to use them can be developed rapidly and deployed without warning. Because of that, it is critical for the side in a defensive posture, such as Switzerland, to anticipate short-term technological convergence potential and forecast the threat posed by systems resulting from such convergence. This working package aims to investigate how well recently developed NLP methods – notably entailment on scientific texts – could assist technological convergence analysis. Specifically, this WP will support the development of a prototype tool to perform such forecasting on DDoS, in addition to the above-mentioned HTTP/2 convergence previously seen with IoT.


[1] https://www.ncsc.admin.ch/ncsc/en/home/strategie/strategie-ncss-2018-2022.html

[2] https://technology-observatory.ch/

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei

Partenaires professionnels: Dr. Alain Mermoud, Cyber-Defence Campus

Durée du projet: 01.01.2024 - 31.12.2024

Montant global du projet: 148'002 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/novel-nlp-methods-to-evaluate-short-term-technological-convergence-potential-208972

Statut: Laufend

Distributed, Scalable, and Sustainable Digital Trust

Rolle: Mitgesuchsteller/in

Financement: FNS Bridge (Proof-of-Concept)

Description du projet :

Central to our modern lives, the Internet faces challenges due to the predominance of centralized service models. Centralized solutions compromise digital sovereignty, promote unsustainable resource usage, fail to meet diverse user needs, and intensify cybersecurity threats. Traditional technological solutions like blockchains fall short, often exacerbating the very problems they aim to solve. Based on ground-breaking distributed computing research recognized at top-tier conferences, my Proof of Concept reimagines this framework. I propose a two-tier server system that, in tests, successfully handled over 43 million transactions per second using minimal resources while providing enhanced security. This innovative approach empowers any entity to launch a high-performance, secure Internet service, realigning the power dynamics of the digital world. Switzerland, renowned for its trust-based economy, offers the perfect environment for developing and leveraging this transformative solution. While the Internet itself is a distributed system, with parts being able to fail without bringing the whole system down, the vast majority of services providing the Internet with economic utility are highly centralized. This poses several problems due to the inherent flaws of centralized architectures and the misalignment of incentives between Internet-scale service providers and their users:

  • Failure of a single centralized component compromises the entire system, leading to bug resilience, cyber-security, and privacy issues.
  • Economies of scale push Internet-scale service providers to become transnational, becoming a digital sovereignty threat to smaller countries.
  • Global price competition favors unsustainable resource usage, notably cheaper fossil energy, excessive water usage, or uncontrolled e-waste generation.
  • Global market focus leads to neglecting more niche needs, starting with those of more local communities, digital transformation left-outs, and rare languages users. 

Unfortunately, until now, the technology that promised to solve these issues—blockchain—has exacerbated most of them. Retrospectively, this is not surprising—the resilience of a blockchain is guaranteed by wasting more resources than any single attacker could afford. This leads back to centralisation, but with additional complexity and resource waste.

This Proof of Concept aims to build an alternative software platform to create distributed Internet-scale services. To achieve this, I will use the results of my PhD work, which I have published in top-tier international conferences, won two Best Paper Awards, and demonstrated to work as intended on the Internet scale. Specifically, during my PhD, I found a provably optimal algorithm to distribute any Internet-scale, general-purpose computation among a limited number of servers. The critical innovation that allowed this is the separation of servers into two tiers: a small number of lightly used validators, that ensure the security of computation, and a large number of brokers, that preprocess requests to minimise validator complexity. Unlike validators, brokers have no security requirements - even if all brokers were controlled by a malicious adversary, the system's security would not be compromised. Thanks to this two-tier architecture, my algorithm can ensure security at scale without requiring heavy investment or wide adoption: as little as 4 cheap, well-protected validators are sufficient to start the system; as requests scale, more and more brokers can be spun up to meet client demand.

This security at scale is not just a theory. I conducted experiments in which an implementation of my algorithm - Chop Chop – could scale past 43 million transactions per second, with only 64 servers distributed across the globe and communicating over the open Internet. For comparison, Chop Chop is 43x faster than WhatsApp, 86’400x faster than the SWIFT payment network, and 2’800’000x faster than the Ethereum blockchain. All while operating on a budget accessible to a single PhD student. My algorithm is a new frontier in digital trust with the potential to revolutionize the Web. It is easily packageable as a product - a software platform - enabling other companies to build world-scale, sustainable, secure solutions locally and affordably, thus addressing the limitations of a centralized Web. Switzerland's reputation as a world-class provider of trusted and safe solutions, combined with the Swiss digital trust ecosystem, makes Switzerland a perfect place to build a startup to develop and bring to market such a product. 

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Monti Matteo

Durée du projet: 01.01.2024 - 31.12.2024

Montant global du projet: 121'200 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/distributed-scalable-and-sustainable-digital-trust-208977

Statut: Laufend

Abgeschlossen

Modèle d’équilibre général du prix de l’électricité : effets du transport transfrontalier et du développement des nouvelles énergies renouvelables sur le marché européen

Rolle: Hauptgesuchsteller/in

Financement: RCSO

Description du projet :

Nos sociétés industrialisées sont indubitablement dépendantes de l’approvisionnement électrique, rendant ce dernier indispensable à la bonne gestion de la continuité opérationnelle de la production de l’écrasante majorité de nos biens et services – et ce, que ce soit pour leur production ou leur consommation. Or, l’approvisionnement électrique est lui-même sujet à de fortes tensions géostratégiques, politiques, opérationnelles, logistiques et technologiques, apportant une forte volatilité au marché de l’énergie électrique – perturbant ainsi non seulement le pouvoir d’achat, mais la pérennité de l’offre et de la consommation des biens et services. Anticiper de telles fluctuations de prix devient alors essentiel non seulement pour appuyer les acteurs du marché de l’énergie dans la mise en place de contingences visant à assurer l’approvisionnement électrique à court et moyen termes, mais également pour pérenniser leurs stratégies d’approvisionnement à long terme. Si plusieurs modèles de prédiction de prix du marché électrique ont vu le jour au sein de l’industrie et des instituts de recherche, la grande majorité d’entre eux n’étudient les effets de marché que sur des secteurs économiques et géographiques limités – éludant ainsi la capacité à comprendre un marché intrinsèquement déterminé par ses interconnexions multirégionales. De plus, l’impact incrémental de capacités de transport électrique – matérialisées par la création de nouvelles interconnexions de transport transfrontaliers – ainsi que l’impact du développement des nouvelles énergies renouvelables (NER) constituent des lacunes en matière de recherche, se traduisant par un manque de transparence des déterminants du prix du marché électrique en Europe.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Genoud Stéphane , Principe Biagio

Durée du projet: 01.09.2023 - 31.08.2024

Montant global du projet: 99'960 CHF

Url des Projektstandortes: https://www.hevs.ch/fr/projets/forcasting-electricity-pricing-12524

Statut: Abgeschlossen

Fine-Tuning of Generative Language Models On-Premises: Usefulness/Safety Balance of Patient-Facing LLM-Based Conversational Agents

Rolle: Hauptgesuchsteller/in

Financement: Axe Transformation Numérique (HES-SO Valais-Wallis)

Description du projet :

Generative language models (GLMs) gained significant attention in late 2002 / early 2023, notably with the introduction of models refined to act consistently with user's expectations of interactions with AI (conversational agents). Conversational fine-tuning revealed the extent of their true capabilities in a real-world environment, and eHealth applications are no exception. This has garnered both industry excitement for their potential applications in eHealth and concerns about their capabilities to assist health professionals and properly preserve patients' data privacy. The privacy-preserving possibilities for ensuring the security of patients' data while using GLMs are a significant concern. Recent research shows that, in the case of GLM usage, even federated learning is not a solution for ensuring privacy protection. In this case, the only plausible solution is to deploy models on-premises, safeguarding data privacy in-house.

In this project, we suggest exploring how effective a methodology could be: applying pre-prompt response fine-tuning combined with the request of personalized databases to leverage GLMs to deploy a tailor-made eHealth solution in-house. Fine-tuning GLMs on-premises is adapting a large pre-trained language model to a specific task or domain using a smaller dataset without sending data to a cloud server. This can be useful for privacy-preserving applications, such as eHealth, involving sensitive personal information. Here, we want to explore how effective fine-tuning of different GLMs on-premises by:

1. Fine-tuning language models from human preferences, where human feedback guides the model towards desired behaviors or styles in natural language tasks such as text continuation and summarization.
2. Fine-tuning language models for zero-shot learning, where natural language instructions teach the model to perform various tasks without any labeled data or examples.
3. Fine-tuning language models for domain-specific generation, where the model is adapted to produce more relevant and coherent text for a particular topic or audience.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei , Vallez Cyril

Durée du projet: 03.04.2023 - 31.01.2024

Montant global du projet: 40'000 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/fine-tuning-of-generative-language-models-on-premises-usefulness-safety-balance-of-patient-facing-llm-based-conversational-agents-206556

Statut: Abgeschlossen

Evaluating the Robustness of Large Language Models to Abuse in the Swiss Cyber-Defense Landscape

Rolle: Hauptgesuchsteller/in

Financement: Cyber-Defence Campus

Description du projet :

This project aims at laying the preliminary groundwork to enable the Swiss cyber-defense ecosystem to prepare for the large-scale deployment of LLMs and new attack surface such a deployment would entail, contributing to the CYD Campus missions of developing the means to counter novel cyber threats and training partners responsible for defense in cyber-space within the Confederation. To achieve this goal, we plan to leverage (i) manual and automated LLM red-teaming, (ii) domain-specific knowledge screening, (iii) generation bias evaluation, and (iv) self-censored generation to evaluate the robustness of the current generation of LLMs to misuse in the cyber-offensive operations against targets within Switzerland. 

WP1: Identity, Define, and Catalog publicly disclosed LLM jailbreak vulnerabilities: The conversational agent fine-tuning of LLMs promises to solve a long-standing problem in the user interfaces – one of full natural language conversation capabilities. While the usability advantage such applications offer is alluring, they also come with potential vulnerabilities. LLMs are prone to constraints escape – so-called “jailbreaks.” As such, any user interface based on LLMs that has access to a database of user data or can control a program execution flow becomes a potential target of attack for a malicious user. While the scale of the problem is not yet clear, this project aims to prepare a unified reporting interface dedicated to reporting known approaches to exploit LLM interfaces, based on the example of the MITRE CVE system, and to be potentially integrated with it. To achieve that, we will define an LLM-based interface exploit and develop a classification system for such exploits. We will then design a reporting and validation interface based on publicly available LLMs and common private LLMs from CYD research partners (notably OpenAI GPT4) and test it against known jailbreaks reported in public sources, such as Reddit r/ChatGPT. The final database and reporting system can be used as CSRTs to refine the MITRE ATT&CK as the T1562 Impair Defenses technique. 

WP2: Evaluation of LLM-generated software for vulnerabilities with automated tools: Another LLM application that has generated a lot of excitement and extensive demos has been made possible with its text-to-code capabilities, made more accessible by conversational agent fine-tunes and more capable with further base LLM pretraining on code documentation - code pairs. While the accumulating evidence suggests that some SotA LLMs can generate sufficiently functional code to be included in working products, the general resilience of that code to cyber-attacks has not been systematically examined until now. Laying the groundwork for such a systematic evaluation is the goal of this working package. For this, we will evaluate the robustness of the LLM-generated code against standard open-source vulnerability testing and fuzzing tools and how this robustness reacts to the prompt language modification to emulate comments in higher or lower-quality code. The resulting report would be transferred to the CYD campus for further action and training of partners in the cyber-security and cyber-defense spaces. 

WP3: Evaluation of LLMs for information operations in Switzerland: Until recently, information operations in Switzerland have been hampered by two significant peculiarities of the Swiss operational environment. First is linguistics. Whereas the vast majority of states in existence have a single primary language – often belonging to a significant linguistic group such as English, French, German, Spanish, Portuguese, or Mandarin, that is extensively taught outside of the country, Switzerland has four national languages, one of which is unique and three others present significant variation compared to their counterparts taught at large, notably the informal Switzerdutch. Second is the political organization. Unlike most countries with a single person presiding over the executive branch (president or prime minister), Switzerland has a Federal Council, multiple Federation and State-level public vocations, and local political organizations arcane to foreigners. These specificities make information operations in the Swiss cyber-informational space extremely hard for foreign adversaries. The ability of LLMs to learn such specificities during pretraining and generate such texts based on prompts in other languages seriously threatens such a status quo. This working package aims to evaluate the severe threat posed by LLMs and to implement a persistent monitoring solution to automatically evaluate new LLMs as they are released. The resulting methodology would be put up for expert scrutiny through a scientific article publication, whereas the actual evaluation prompt bank will be kept private to avoid fine-tuning against it. 

Forschungsteam innerhalb von HES-SO: Percia David Dimitri

Durée du projet: - 29.12.2023

Montant global du projet: 117'541 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/evaluating-the-robustness-of-large-language-models-to-abuse-in-the-swiss-cyber-defense-landscape-206552

Statut: Abgeschlossen

Cybersecurity-Technology Assessment, Monitoring & Forecasting

Rolle: Hauptgesuchsteller/in

Financement: Cyber-Defence Campus

Description du projet :

Objectives of the project: The project provides quantitative technology assessment, monitoring & forecasting models for cyber-defence. Such an effort aims to contribute to the Technology Monitoring (TM) portfolio of the Cyber-Defence (CYD) Campus in (i) fulfilling the first measure of the NCS[1] attributed to armasuisse S+T, (ii) writing technical reports on specific cyber-defense technologies for the CYD Campus clients, and (iii) contributing in the development of The Swiss Technology Observatory[2] – all three objectives being attached to the Strategy Cyber DPPS (see figure below). While the project aims to present concrete cybersecurity-technology assessments, monitoring, and forecasting through case studies defined under the TM portfolio needs, we want these results to be backed by coherent, relevant, and solid scientific methodologies that will be published in Q1 journals and A conferences. By applying (i) Solomonff Bayesian algorithmic probability, (ii) symbolic theory of evolution in Gillespie-Orr formulation, (iii) quantitative analysis (data science), and (iv) algorithmics economics, our work aims to rethink traditional technology mining methodologies by developing dynamic and holistic approaches to provide concrete cyber-defense insights in terms of technology assessment, monitoring, and forecasting.

WP1: Quantification of adoption processes in cyber-security technologies as evolutionary selection: In this working package, we will formalize the equivalence between the Gillespie-Orr model of evolution and innovation process and derive from first principles well-known properties of innovative processes that have not yet been quantified, such as hype curve or iterate-and-pivot innovation model, based on attention and adoption metrics, such as Wikipedia articles views and GitHub repositories creation, modification and starring. We expect our model to allow us to detect and quantify previously ignored phenomena in technological forecastings, such as neutral drift – widespread adoption of technologies offering no benefit due to random factors (e.g., blockchains in a trusted environment). We aim to provide a quantitative evaluation of the time-to-market readiness of technologies relevant to cyber-defense and the likelihood of them making it to the deployment readiness plateau. To validate our approach, we will retrospectively use contact tracing applications as vectors of target identification attacks, as well as generative machine learning in the context of information operations.

WP2: Optimization of time of performance metric re-adjustment: In this working package, we further develop the Gillespie-Orr view on technical innovation in cyber-security to predict the best time for technology performance metric revision. While Goodhart's law is an empirical adage stating that any good measure ceases to be one once it is used to guide decisions, it has fundamental theoretical reasons to exist due to a combination of Rice theorem and properties of evolution processes. We aim to formalize it, leveraging the quantitative cyber-security technology evolution framework, and predict the optimal time to revise them based on signals from quantitative data collection points. This is critical for cyber-security, given the speed of the new technology's emergence and deprecation. A concrete case study of application would be a comparative study of effectiveness saturation of software-mediated email attack vector de-fanging (eg. denial of mails with attachments or with links to outside domains), versus human training-based (phishing exercises), based on the speed of their adoption, attention and initial effectiveness. This would allow the CYD Campus to re-evaluate the performance metrics of deployed cyber-defense technologies before bypasses are feasible. This process will be illustrated with a case study on a specific technology identified by the CYD Campus as a critical future technology.

WP3: Development of a computational model to predict the indirect impact of new technologies on cyber-defense: In this working package, we will develop a formal language to describe critical infrastructures across different scales of granularity. Our goal is to be able to perform a quantitative evaluation of the potential points of vulnerabilities introduced by seemingly unrelated technologies through a semantic knowledge graph. A motivating example is the widespread adoption of smart electricity meters and smart devices, which are becoming a vector of attack on the stability of the electric grid. Specifically, without proper security validation, smart meters can be tricked into thinking that the price of electricity is low, as the load on the electric grid is at its peak and gives a “full electricity usage now” command to a large number of heavy-load intelligent appliances such as electric water boilers or electric cars charging stations. In turn, if the adoption of such meters and appliances is sufficiently widespread, the sudden surcharge can take out the electric grid and trigger the tripping of substations, inducing blackouts that can be exploited for military or informational operations. Our approach is to computationally evaluate a similar impact of new technologies on a grid of controlled vocabulary term relationships to detect similar attack points.

 


[1] https://www.ncsc.admin.ch/ncsc/en/home/strategie/strategie-ncss-2018-2022.html

[2] https://technology-observatory.ch/

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei

Partenaires professionnels: Dr. Alain Mermoud, Cyber-Defence Campus

Durée du projet: 02.01.2023 - 29.12.2023

Montant global du projet: 144'753 CHF

Url des Projektstandortes: https://www.hevs.ch/en/projects/quantification-of-adoption-processes-in-cyber-security-technologies-as-evolutionary-selection-208968

Statut: Abgeschlossen

Améliorer le transfert de connaissances entre les professeur-e-s et les étudiant-e-s : affinement des modèles de langage génératif en hébergement local

Rolle: Hauptgesuchsteller/in

Financement: Guichet permanent sur l'expérimentation digitale du centre de compétences numériques de la HES-SO

Description du projet :

Ce projet vise à développer et tester un nouvel outil pédagogique de transfert de connaissances assisté par l’intelligence artificielle (IA). Le but du projet est d’optimiser l'apprentissage des étudiant-e-s universitaires grâce à l’affinement (fine-tuning) sur mesure de modèles open-source de langage génératif (LLMs). La nouveauté du projet réside dans l'utilisation responsable et consciente des LLMs – affinés sur mesure (selon les matières enseignées) et sur place à l'aide de l’apprentissage par renforcement à partir de retours humains (RLHF). L'outil pédagogique permettra aux professeur-e-s et aux étudiant-e-s de personnaliser les LLMs pour différents cours et sujets, tout en garantissant la protection des données (car les modèles seront hébergés sur site), et la réduction les biais dans les résultats émis par les LLMs.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei , Vallez Cyril

Durée du projet: 01.05.2023 - 31.10.2023

Montant global du projet: 15'000 CHF

Url des Projektstandortes: https://www.hes-so.ch/la-hes-so/soutien-a-lenseignement/projets-enseignement/detail-projet/ameliorer-le-transfert-de-connaissances-entre-les-professeur-e-s-et-les-etudiant-e-s-adaptation-des-modeles-de-langage-generatif-llms-en-hebergement-local

Statut: Abgeschlossen

Transformation numérique et intelligence artificielle: évaluation des solutions applicables aux processus de l'OIC

Rolle: Hauptgesuchsteller/in

Financement: Innosuisse

Description du projet :

Le but de ce mandat est d’évaluer de quelles façons et dans quels secteurs l’adaptation et l’adoption des agents conversationnels (conversational agents – Cas) basés sur des modèles génératifs de langage naturel (generative large language models – LLMs) peut aider l’OIC et ses collaborateurs à être plus efficaces et effectifs dans leur travail et leurs missions. Il s’agit notamment d’évaluer les solutions possibles d’implémentation de ces CAs dans différents domaines d’activité de l’organisation, tels que la gestion des ressources humaines, la comptabilité, le marketing ou le service client.

L’implémentation des CAs basés sur des LLMs permettrait ainsi à l’OIC et à ses collaborateurs d’atteindre les objectifs stratégiques suivants :

  • Augmenter l’allocation de leur temps de travail dédié au cœur de leur métier/leurs prestations, en diminuant leur temps d’allocation dédié aux tâches “parasites” mais nécessaires (telles que l’administration, les courriels, les formulaires, etc.).
  • Former les collaborateurs à l’usage des CAs basés sur des LLMs, en leur fournissant des formations adaptées et un support technique sur mesure.
  • Améliorer leur satisfaction au travail en libérant du temps pour être plus efficace et créatif dans leurs tâches quotidiennes.

Forschungsteam innerhalb von HES-SO: Percia David Dimitri , Kucharavy Andrei , Seppey Sherine

Partenaires professionnels: Robin Zambaz, Organisme Intercantonal de Certification (OIC)

Durée du projet: 01.03.2023 - 30.04.2023

Montant global du projet: 15'000 CHF

Statut: Abgeschlossen

2024

TechRank
Wissenschaftlicher Artikel ArODES

Anita Mezzetti, Loïc Maréchal, Dimitri Percia David, Thomas Maillart, Alain Mermoud

The Journal of Alternative Investments,  26, 3, 57-83

Link zur Publikation

Zusammenfassung:

This article introduces TechRank, a recursive algorithm based on a bipartite graph with weighted nodes that the authors developed to link companies and technologies based on the reflection method. They allow the algorithm to incorporate exogenous variables that reflect an investor’s preferences and calibrate the algorithm in the cybersecurity sector. First, their results help estimate each entity’s influence and explain companies’ and technologies’ ranking. Second, the results provide investors with an optimal quantitative ranking of technologies and thus help them design their optimal portfolio. The authors propose this static method as an alternative to traditional portfolio management and, in the case of private equity investments, as a new way to optimize portfolios of assets for which cash flows are not observable.

TechRank
Wissenschaftlicher Artikel

Anita Mezzetti, Loïc Maréchal, Percia David Dimitri, Thomas Maillart, Alain Mermoud

The Journal of Alternative Investments, 2024 , vol.  26, no  3, pp.  57-83

Link zur Publikation

Zusammenfassung:

This article introduces TechRank, a recursive algorithm based on a bipartite graph with weighted nodes that the authors developed to link companies and technologies based on the reflection method. They allow the algorithm to incorporate exogenous variables that reflect an investor’s preferences and calibrate the algorithm in the cybersecurity sector. First, their results help estimate each entity’s influence and explain companies’ and technologies’ ranking. Second, the results provide investors with an optimal quantitative ranking of technologies and thus help them design their optimal portfolio. The authors propose this static method as an alternative to traditional portfolio management and, in the case of private equity investments, as a new way to optimize portfolios of assets for which cash flows are not observable.

Key Findings

  • This article introduces a recursive algorithm based on a bipartite graph linking companies and technologies.

  • The authors’ method overcomes the typical caveats of asset pricing in the context of private equity, where cash flows are not observable.

  • The authors’ method is flexible enough to allow investors to plug their preferences directly into the model.

Measuring technological convergence in encryption technologies with proximity indices
Wissenschaftlicher Artikel
A text mining and bibliometric analysis using OpenAlex

Alessandro Tavazzi, Percia David Dimitri, Jang-Jaccard Julian, Alain Mermoud

arXiv, 2024 , vol.  arXiv:2403.01601v1, no  arXiv:2403.01601v1

Link zur Publikation

Zusammenfassung:

Identifying technological convergence among emerging technologies in cybersecurity is a crucial task for advancing science and fostering innovation. Unlike previous studies that focus on the binary relationship between a paper and the concept it attributes to technology, our approach utilizes attribution scores to enhance the relationships between research papers, combining keywords, citation rates, and collaboration status with specific technological concepts. The proposed method integrates text mining and bibliometric analyses to formulate and predict technological proximity indices for encryption technologies using the ’OpenAlex’ catalog. Our case study findings highlight a significant convergence between blockchain and public-key cryptography, evident in the increasing proximity indices. These results offer valuable strategic insights for those contemplating investments in these domains.

LLM-Resilient Bibliometrics
Wissenschaftlicher Artikel
Factual Consistency Through Entity Triplet Extraction

Alexander Sternfeld, Kucharavy Andrei, Percia David Dimitri, Alain Mermoud, Julian Jang-Jaccard

Swiss Technology Observatory, 2024

Link zur Publikation

Zusammenfassung:

The increase in power and availability of Large Language Models (LLMs) since late 2022 led to increased concerns with their usage to automate academic paper mills. In turn, this poses a threat to bibliometrics-based technology monitoring and forecasting in rapidly moving fields. We propose to address this issue by leveraging semantic entity triplets. Specifically, we extract factual statements from scientific papers and represent them as (subject, predicate, object) triplets before validating the factual consistency of statements within and between scientific papers. This approach heavily penalizes blind usage of stochastic text generators such as LLMs while not penalizing authors who used LLMs solely to improve the readability of their paper. Here, we present a pipeline to extract such triplets and compare them. While our pipeline is promising and sensitive enough to detect inconsistencies between papers from different domains, the intra-paper entity reference resolution needs to be improved to ensure that triplets are more specific. We believe that our pipeline will be useful to the general research community working on the factual consistency of scientific texts.

Measuring the performance of investments in information security startups
Wissenschaftlicher Artikel
An empirical analysis by cybersecurity sectors using Crunchbase data

Loïc Maréchal, Alain Mermoud, Percia David Dimitri, Mathias Humbert

arXiv, 2024 , vol.  arXiv:2402.04765v2, no  arXiv:2402.04765v2

Link zur Publikation

Zusammenfassung:

Early-stage firms play a significant role in driving innovation and creating new products and services, especially for cybersecurity. Therefore, evaluating their performance is crucial for investors and policymakers. This work presents a financial evaluation of early-stage firms' performance in 19 cybersecurity sectors using a private-equity dataset from 2010 to 2022 retrieved from Crunchbase. We observe firms, their primary and secondary activities, funding rounds, and pre and post-money valuations. We compare cybersecurity sectors regarding the amount raised over funding rounds and post-money valuations while inferring missing observations. We observe significant investor interest variations across categories, periods, and locations. In particular, we find the average capital raised (valuations) to range from USD 7.24 mln (USD 32.39 mln) for spam filtering to USD 45.46 mln (USD 447.22 mln) for the private cloud sector. Next, we assume a log process for returns computed from post-money valuations and estimate the expected returns, systematic and specific risks, and risk-adjusted returns of investments in early-stage firms belonging to cybersecurity sectors. Again, we observe substantial performance variations with annualized expected returns ranging from 9.72% for privacy to 177.27% for the blockchain sector. Finally, we show that overall, the cybersecurity industry performance is on par with previous results found in private equity. Our results shed light on the performance of cybersecurity investments and, thus, on investors' expectations about cybersecurity.

Monitoring Emerging Trends in LLM Research
Buchkapitel

Maxime Würsch, Percia David Dimitri, Alain Mermoud

,  Large Language Models in Cybersecurity. 2024,  Cham : Springer Cham

Link zur Publikation

Zusammenfassung:

Established methodologies for monitoring and forecasting trends in technological development fall short of capturing advancements in Large Language Models (LLMs). This chapter suggests a complementary and alternative approach to mitigate this concern. Traditional indicators, such as search volumes and citation frequencies, are demonstrated to inadequately reflect the rapid evolution of LLM-related technologies due to biases, semantic drifts, and inherent lags in data documentation. Our presented methodology analyzes the proximity of technological terms related to LLMs, leveraging the OpenAlex and arXiv databases, and focuses on extracting nouns from scientific papers to provide a nuanced portrayal of advancements in LLM technologies. The approach aims to counteract the inherent lags in data, accommodate semantic drift, and distinctly differentiate between various topics, offering both retrospective and prospective insights in their analytical purview. The insights derived underline the need for refined, robust, adaptable, and precise forecasting models as LLMs intersect with domains like cyber defense. At the same time, they are considering the limitations of singular ontologies and integrating advanced anticipatory measures for a nuanced understanding of evolving LLM technologies.

2023

Efficient collective action for tackling time-critical cybersecurity threats
Wissenschaftlicher Artikel ArODES

Sébastien Gillard, Dimitri Percia David, Alain Mermoud, Thomas Maillart

Journal of Cybersecurity,  9, 1, 1-13

Link zur Publikation

Zusammenfassung:

The latency reduction between the discovery of vulnerabilities, the build-up, and the dissemination of cyberattacks has put significant pressure on cybersecurity professionals. For that, security researchers have increasingly resorted to collective action in order to reduce the time needed to characterize and tame outstanding threats. Here, we investigate how joining and contribution dynamics on Malware Information Sharing Platform (MISP), an open-source threat intelligence sharing platform, influence the time needed to collectively complete threat descriptions. We find that performance, defined as the capacity to characterize quickly a threat event, is influenced by (i) its own complexity (negatively), by (ii) collective action (positively), and by (iii) learning, information integration, and modularity (positively). Our results inform on how collective action can be organized at scale and in a modular way to overcome a large number of time-critical tasks, such as cybersecurity threats.

Measuring security development in information technologies
Wissenschaftlicher Artikel
A scientometric framework using arXiv e-prints

Percia David Dimitri, Loïc Maréchal, William Lacube, Sébastien Gillard, Michael Tsesmelis, Thomas Maillart, Alain Mermoud

Technological Forecasting and Social Change, 2023 , vol.  188, no  122316, pp.  1-18

Link zur Publikation

Zusammenfassung:

We study security-development patterns in computer-science technologies through (i) the security attention among technologies, (ii) the relation between technological change and security development, and (iii) the effect of opinion on security development. We perform a scientometric analysis on arXiv e-prints (n=340,569) related to 20 computer-science technology categories. Our contribution is threefold. First, we characterize both processes of technological change and security development: while most technologies follow a logistic-growth process, the security development follows an AR(1) process or a random walk with positive drift. Moreover, over the lifetime of computer-science technologies, the security development surges at a late stage. Second, we document no relation between the technological change and the security development. Third, we identify an inverse relation between security attention and experts’ opinion. Along with these results, we introduce new methods for modeling security-development patterns for broader sets of technologies.

Highlights:

  • Quantitative framework for assessing security-development patterns in technologies.
  • Identification of 3 patterns among 20 computer-science technology categories: 1. Security gains more attention at a later stage of technology development; 2. Technology and security developments are not correlated; 3. Opinion on technology is associated with security development.

Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense
Bericht

Kucharavy Andrei, Zachary Schillaci, Loïc Maréchal, Maxime Würsch, Ljiljana Dolamic, Remi Sabonnadiere, Percia David Dimitri, Alain Mermoud, Vincent Lenders

2023,  Cornell : arXiv,  50  p.

Link zur Publikation

Zusammenfassung:

Generative Language Models gained significant attention in late 2022 / early 2023, notably with the introduction of models refined to act consistently with users' expectations of interactions with AI (conversational models). Arguably the focal point of public attention has been such a refinement of the GPT3 model -- the ChatGPT and its subsequent integration with auxiliary capabilities, including search as part of Microsoft Bing. Despite extensive prior research invested in their development, their performance and applicability to a range of daily tasks remained unclear and niche. However, their wider utilization without a requirement for technical expertise, made in large part possible through conversational fine-tuning, revealed the extent of their true capabilities in a real-world environment. This has garnered both public excitement for their potential applications and concerns about their capabilities and potential malicious uses. This review aims to provide a brief overview of the history, state of the art, and implications of Generative Language Models in terms of their principles, abilities, limitations, and future prospects -- especially in the context of cyber-defense, with a focus on the Swiss operational environment.

Scientometric and Wikipedia Pageview Analysis
Buchkapitel

Alexander Glavackij, Sarah Ismail, Percia David Dimitri

,  Trends in Data Protection and Encryption Technologies. 2023,  Cham : Springer Nature Switzerland

Link zur Publikation

Zusammenfassung:

This chapter explores trends in data protection and encryption technologies across different technologies. The technologies analyzed are taken from the previous chapters. Any trend assessment concerning data protection and encryption technologies constitutes a challenging task for various reasons. The swift development of the security technologies brings a myriad of novel protocols, tools, and procedures, whose technological readiness levels (TRL) also evolve rapidly. Also, while some technologies thrive, others stagnate or vanish in favour of more marketadapted technologies or enhanced operational implementation. Moreover, in such a fast-paced and growing environment, opportunities and threats evolve quickly, making it difficult to evaluate the whole spectrum of technologies available on the market. Consequently, evaluations of the security consequences of the arrival and evolution of such technologies on data protection are complex. Following the previous individual analysis of the data protection and encryption technologies, we evaluate these technologies through time by benchmarking a development indicator—the attention paid by different communities.

Efficient collective action for tackling time-critical cybersecurity threats
Wissenschaftlicher Artikel

Percia David Dimitri, Gillard Sébastien, Alain Mermoud, Thomas Maillart

Journal of Cybersecurity, 2023 , vol.  9, no  1, pp.  21-34

Link zur Publikation

Zusammenfassung:

The latency reduction between the discovery of vulnerabilities, the build-up, and the dissemination of cyberattacks has put significant pressure on cybersecurity professionals. For that, security researchers have increasingly resorted to collective action in order to reduce the time needed to characterize and tame outstanding threats. Here, we investigate how joining and contribution dynamics on Malware Information Sharing Platform (MISP), an open-source threat intelligence sharing platform, influence the time needed to collectively complete threat descriptions. We find that performance, defined as the capacity to characterize quickly a threat event, is influenced by (i) its own complexity (negatively), by (ii) collective action (positively), and by (iii) learning, information integration, and modularity (positively). Our results inform on how collective action can be organized at scale and in a modular way to overcome a large number of time-critical tasks, such as cybersecurity threats.

Forecasting labor needs for digitalization
Wissenschaftlicher Artikel
A bi-partite graph machine learning approach

Percia David Dimitri, Santiago Anton Moreno, Loïc Maréchal, Thomas Maillart, Alain Mermoud

World Patent Information, 2023 , vol.  73, no  102193, pp.  102-193

Link zur Publikation

Zusammenfassung:

We use a unique database of digital, and cybersecurity hires from Swiss organizations and develop a method based on a temporal bi-partite network, which combines local and global indices through a Support Vector Machine. We predict the appearance and disappearance of job openings from one to six months horizons. We show that global indices yield the highest predictive power, although the local network does contribute to long-term forecasts. At the one-month horizon, the “area under the curve” and the “average precision” are 0.984 and 0.905, respectively. At the six-month horizon, they reach 0.864 and 0.543, respectively. Our study highlights the link between the skilled workforce and the digital revolution and the policy implications regarding intellectual property and technology forecasting.

Identification of Future Cyberdefense Technology by Text Mining
Buchkapitel

Percia David Dimitri, William Blonay, Sébastien Gillard, Thomas Maillart, Alain Mermoud, Loïc Maréchal, Michael Tsesmelis

,  Cyberdefense. 2023,  Cham : Springer International Publishing

Link zur Publikation

Zusammenfassung:

We propose a reproducible, automated, scalable, and free method for automated bibliometric analysis that requires little computing power. We explain how firms can use this method with open source data from public repositories to generate unbiased insights about future technology developments, and to assess the maturity, security and likely future development of particular technology domains. The method is demonstrated by systematic text mining of more than 400,000 e-prints from the arXiv repository.

A Novel Algorithm for Informed Investment in Cybersecurity Companies and Technologies
Buchkapitel

Anita Mezzetti, Loïc Maréchal, Percia David Dimitri, William Blonay, Sébastien Gillard, Michael Tsesmelis, Thomas Maillart, Alain Mermoud

,  Cyberdefense. 2023,  Cham : Springer International Publishing

Link zur Publikation

Zusammenfassung:

 

We introduce a novel recursive algorithm that analyzes and ranks the relative influence that companies and technologies have in a technology landscape. The algorithm also incorporates exogenous variables that reflect investor preferences. The results provide investors with an optimal ranking of technologies and thus help them to make more informed decisions about companies and technologies, in particular vis-à-vis traditional portfolio theory and in a private equity setting where cash flows are not directly observable.

Identifying Emerging Technologies and Influential Companies Using Network Dynamics of Patent Clusters
Buchkapitel

Michael Tsesmelis, Ljiljana Dolamic, Marcus M Keupp, Percia David Dimitri, Alain Mermoud

,  Cyberdefense. 2023,  Cham : Springer International Publishing

Link zur Publikation

Zusammenfassung:

The need for dependable and real-time insights on technological paradigm shifts requires objective information. We develop a lean recommender system which predicts emerging technology by a sequential blend of machine learning and network analytics. We illustrate the capabilities of this system with patent data and discuss how it can help organizations make informed decisions.

From Scattered Sources to Comprehensive Technology Landscape
Wissenschaftlicher Artikel
A Recommendation-based Retrieval Approach

Chi Thang Duong, Percia David Dimitri, Ljiljana Dolamic, Alain Mermoud, Vincent Lenders, Karl Aberer

World Patent Information, 2023 , vol.  73, no  102198

Link zur Publikation

Zusammenfassung:

Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.

LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature
Wissenschaftlicher Artikel

Maxime Würsch, Kucharavy Andrei, Percia David Dimitri, Alain Mermoud

arXiv, 2023 , vol.  arXiv:2312.07110v1, no  arXiv:2312.07110v1

Link zur Publikation

Zusammenfassung:

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model to identify trends in the LLM domain. We observe some limitations, but it offers promising results to monitor the evolution of emergent trends.

2022

Cybersecurity Technologies
Bericht
An Overview of Trends & Activities in Switzerland and Abroad

Michael Tsesmelis, Percia David Dimitri, Thomas Maillart, Ljiljana Dolamic, Giorgio Tresoldi, William Lacube, Colin Barschel, Quentin Ladetto, Claudia Schärer, Vincent Lenders, Kilian Cuche, Alain Mermoud

2022,  Thun : Cyber-Defence Campus, armasuisse S+T,  53  p.

Link zur Publikation

Zusammenfassung:

Several severe cyberincidents are pushing the Swiss federal government to take extensive measures against cyberthreats. The National Strategy for the protection of Switzerland against Cyber risks 2018-2022 (NCS) details the various measures that the government is implementing to adapt Switzerland to the emerging cybersecurity challenges. The NCS delegates to the Cyber-Defence Campus (CYD Campus) of armasuisse S+ T the role of monitoring cybersecurity trends. For this purpose, the CYD Campus and its stakeholders from academia and industry have developed qualitative and quantitative technology-intelligence methods for early identification and anticipation of technology development. This report focuses on four technologies of importance for cybersecurity: 5G, Big Data & Machine Learning, Blockchain and Contact-Tracing methods. Interest in these emerging technologies is quickly gathering pace, as is visible thanks to data analytics methods using data on job openings, patents and publications. Relevant startups and companies in Switzerland and abroad have also been identified through data-driven methods and with the help of scouting efforts in startup centers around the world. Overall, this report describes the efforts to identify, analyse and forecast trends related to cybersecurity technologies. Therefore, the insights therein allow for more informed decision-making in technology investment, technology assessment, as well as technology roadmapping.

2021

5G System Security Analysis
Bericht

Gerrit Holtrup, William Lacube, Percia David Dimitri, Alain Mermoud, Gérôme Bovet, Vincent Lenders

2021,  Thun : Cyber-Defence Campus, armasuisse S+T,  47  p.

Link zur Publikation

Zusammenfassung:

Fifth generation mobile networks (5G) are currently being deployed by mobile operators around the globe. 5G acts as an enabler for various use cases and also improves the security and privacy over 4G and previous network generations. However, as recent security research has revealed, the standard still has security weaknesses that may be exploitable by attackers. In addition, the migration from 4G to 5G systems is taking place by first deploying 5G solutions in a non-standalone (NSA) manner where the first step of the 5G deployment is restricted to the new radio aspects of 5G, while the control of the user equipment is still based on 4G protocols, i.e. the core network is still the legacy 4G evolved packet core (EPC) network. As a result, many security vulnerabilities of 4G networks are still present in current 5G deployments. This paper presents a systematic risk analysis of standalone and non-standalone 5G networks. We first describe an overview of the 5G system specification and the new security features of 5G compared to 4G. Then, we define possible threats according to the STRIDE threat classification model and derive a risk matrix based on the likelihood and impact of 12 threat scenarios that affect the radio access and the network core. Finally, we discuss possible mitigations and security controls. Our analysis is generic and does not account for the specifics of particular 5G network vendors or operators. Further work is required to understand the security vulnerabilities and risks of specific 5G implementations and deployments.

Critical Information Infrastructures Security
Buch

Percia David Dimitri, Alain Mermoud, Thomas Maillart

2021,  Lausanne, Switzerland : Springer LNCS,  221  p.

Link zur Publikation

Zusammenfassung:

Conference post-proceedings of the16th International Conference, CRITIS 2021.

2020

Knowledge Absorption for Cyber-Security
Wissenschaftlicher Artikel
The Role of Human Beliefs

Percia David Dimitri, Alain Mermoud, Marcus M Keupp

Computers in Human Behavior, 2020 , vol.  106, no  106255, pp.  1-11

Link zur Publikation

Zusammenfassung:

We investigate how human beliefs are associated with the absorption of specialist knowledge that is required to produce cyber-security. We ground our theorizing in the knowledge-based view of the firm and transaction-cost economics. We test our hypotheses with a sample of 262 members of an information-sharing and analysis center who share sensitive information related to cyber-security. Our findings suggest that resource belief, usefulness belief, and reciprocity belief are all positively associated with knowledge absorption, whereas reward belief is not. The implications of these findings for practitioners and future research are discussed.

2019

To share or not to share
Wissenschaftlicher Artikel
A behavioral perspective on human participation in security information sharing

Alain Mermoud, Marcus M Keupp, Kévin Huguenin, Maximilian Palmié, Percia David Dimitri

Journal of Cybersecurity, 2019 , vol.  5, no  1, pp.  1-13

Link zur Publikation

Zusammenfassung:

Security information sharing (SIS) is an activity whereby individuals exchange information that is relevant to analyze or prevent cybersecurity incidents. However, despite technological advances and increased regulatory pressure, individuals still seem reluctant to share security information. Few contributions have addressed this conundrum to date. Adopting an interdisciplinary approach, our study proposes a behavioral framework that theorizes how and why human behavior and SIS may be associated. We use psychometric methods to test these associations, analyzing a unique sample of human Information Sharing and Analysis Center members who share real security information. We also provide a dual empirical operationalization of SIS by introducing the measures of SIS frequency and intensity. We find significant associations between human behavior and SIS. Thus, the study contributes to clarifying why SIS, while beneficial, is underutilized by pointing to the pivotal role of human behavior for economic outcomes. It therefore extends the growing field of the economics of information security. By the same token, it informs managers and regulators about the significance of human behavior as they propagate goal alignment and shape institutions. Finally, the study defines a broad agenda for future research on SIS.

The Persistent Deficit of Militia Officers in the Swiss Armed Forces
Wissenschaftlicher Artikel
An Opportunity-Cost Explanation

Percia David Dimitri, Marcus M Keupp, Ricardo Marino, Patrick Hofstetter

Defence and Peace Economics, 2019 , vol.  30, no  1, pp.  111-127

Link zur Publikation

Zusammenfassung:

The Swiss Armed Forces are suffering from a structural deficit of militia officers despite good pay and a general supportive attitude in the population. Whereas, prior studies have focused on motivation to explain understaffing in armed forces, we offer an alternative approach based on opportunity cost. We model decision alternatives both within and outside a military organization, taking private sector employment as the reference point. We then monetize opportunity costs of leisure, fringe benefits, and private sector income not compensated. Our results suggest that in terms of opportunity cost, service as a militia officer is the least attractive option, an effect that we believe explains the persistent staff deficit. Implications of these findings for the literature and recruitment policy are discussed.

2024

GenAI: The Dark Side
Konferenz
Overview of Security & Privacy Challenges

Percia David Dimitri

EPFL Wired-Brains, 06.06.2024 - 06.06.2024, EPFL Innovation Park

Zusammenfassung:

The speaker explores the evolving landscape of threats and mitigations associated with the rise of generative machine-learning technologies. He highlights how generative machine learning can be exploited for malicious purposes, such as generating deep fakes, automated misinformation, and cyberattacks. The speaker outlines the need for robust security measures and proactive mitigation strategies, including enhanced AI model security, regulatory frameworks, and cross-sector collaboration. The discussion emphasizes balancing innovation with ethical considerations to ensure the safe deployment of generative machine learning in various sectors.

LLM-Resilient Bibliometrics
Konferenz
Factual Consistency Through Entity Triplet Extraction

Alexander Sternfeld, Kucharavy Andrei, Percia David Dimitri, Julian Jang-Jaccard, Alain Mermoud

EEKE-AII 2024, 23.04.2024 - 24.04.2024, Changchun, China

Link zur Konferenz

Zusammenfassung:

The increase in power and availability of Large Language Models (LLMs) since late 2022 led to increased concerns with their usage to
automate academic paper mills. In turn, this poses a threat to bibliometrics-based technology monitoring and forecasting in rapidly
moving fields. We propose to address this issue by leveraging semantic entity triplets. Specifically, we extract factual statements
from scientific papers and represent them as (subject, predicate, object) triplets before validating the factual consistency of statements
within and between scientific papers. This approach heavily penalizes blind usage of stochastic text generators such as LLMs while not
penalizing authors who used LLMs solely to improve the readability of their paper. Here, we present a pipeline to extract such triplets
and compare them. While our pipeline is promising and sensitive enough to detect inconsistencies between papers from different
domains, the intra-paper entity reference resolution needs to be improved to ensure that triplets are more specific. We believe that our
pipeline will be useful to the general research community working on the factual consistency of scientific texts.

Expertise à l’ère des IA génératives
Konferenz
Le fond dans la forme

Kucharavy Andrei, Percia David Dimitri

3ème Symposium des Entreprise responsable de Valais Excellence (SERVE 2024), 17.04.2024 - 17.04.2024, Martigny, Valais, Switzerland

Link zur Konferenz

LLM-resilient bibliometrics :
Konferenz ArODES
factual consistency through entity triplet extraction

Alexander Sternfeld, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud, Julian Jang-Jaccard

Proceeding of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2024)

Link zur Konferenz

Zusammenfassung:

The increase in power and availability of Large Language Models (LLMs) since late 2022 led to increased concerns with their usage to automate academic paper mills. In turn, this poses a threat to bibliometrics-based technology monitoring and forecasting in rapidly moving fields. We propose to address this issue by leveraging semantic entity triplets. Specifically, we extract factual statements from scientific papers and represent them as (subject, predicate, object) triplets before validating the factual consistency of statements within and between scientific papers. This approach heavily penalizes blind usage of stochastic text generators such as LLMs while not penalizing authors who used LLMs solely to improve the readability of their paper. Here, we present a pipeline to extract such triplets and compare them. While our pipeline is promising and sensitive enough to detect inconsistencies between papers from different domains, the intra-paper entity reference resolution needs to be improved to ensure that triplets are more specific. We believe that our pipeline will be useful to the general research community working on the factual consistency of scientific texts.

On-Premises LLMs
Konferenz
The Safe Way

Mehdi Zayenne, Vallez Cyril, Kucharavy Andrei, Zachary Schillaci, Percia David Dimitri

Applied Machine Learning Days (AMLD 2024), 23.03.2024 - 26.03.2024, EPFL, Lausanne, Switzerland

Link zur Konferenz

Zusammenfassung:

Open-access Large Language Models (LLMs) are on the rise, thanks in large part to the public release of state-of-the-art models from organizations such as BigScience (BLOOM) and the Technology Innovation Institute (Falcon) as well as companies such as Meta (LLaMA, LLaMA 2) and Mistral (Mistral 7B). Driven by a dedicated open-source community, there has been a proliferation of fine-tuned models - often derived from foundation models such as Meta AI's LLaMA - tailored for diverse use cases and rapid progress in developing and deploying applications with these models. Meanwhile, techniques from the research community, such as quantization and parameter efficient fine-tuning (or PEFT), have lowered the barrier to entry for running and fine-tuning LLMs - now possible even on widely available consumer hardware. The performance gaps between open-access models and top-tier proprietary models, notably OpenAI's GPT-4, are quickly closing, with specialized open-access models excelling on various benchmarks while using a fraction of the parameter count.
Developing and deploying local LLMs provides users complete control over the model's training - or at least fine-tuning - and usage, a scenario with benefits and drawbacks regarding cybersecurity. Entities in sensitive sectors such as public health, private banking, and defense can judiciously employ these local models without sending classified data to the model vendors. At the same time, some can misuse this autonomy for malicious intentions. Moreover, the legality and safety of some open-access models remain unclear, primarily when they are traced back to foundation models where the training data is not fully disclosed.
The workshop aims to educate participants about the evolving domain of open-access LLMs, offering guidance on model selection, application development, and deployment while highlighting such efforts' security ramifications and potential hazards.
Prerequisites:
- Basic understanding and familiarity with LLMs;
- Familiarity with general security practices;
- Programming/developer experience to follow exercises;
- Docker, command line, Python, and web interfaces;
- Laptop to follow exercises (preferably MacOS/Linux or Windows with WSL).

Evaluation of LLM-Generated Software for Vulnerabilities with Automated Tools
Konferenz

Vallez Cyril, Kucharavy Andrei, Percia David Dimitri, Ljiljana Dolamic

AI Days 2024, 06.02.2024 - 07.02.2024, Sion, Valais, Switzerland

Link zur Konferenz

Zusammenfassung:

Another LLM application that has generated a lot of excitement and extensive demos has been made possible with its text-to-code capabilities. It is made more accessible by conversational agent fine-tunes and more capable with further base LLM pretraining on code documentation - code pairs. While the accumulating evidence suggests that some SotA LLMs can generate sufficiently functional code to be included in working products, the general resilience of that code to cyberattacks has just been systematically examined. This applied research lays the groundwork for such a systematic evaluation. For this, we evaluate the robustness of the LLM-generated code against standard open-source vulnerability testing and fuzzing tools and how this robustness reacts to the prompt language modification to emulate comments in higher or lower-quality code. The resulting report has been transferred to the Cyber-Defence (CYD) Campus of the Swiss Department of Defense (DDPS) for further action and training of partners in the cyber-security and cyber-defense spaces.

2023

Evaluating Generative-AI Usage for Knowledge Transfer
Konferenz
Fine-Tuning of LLMs On-Premises

Kucharavy Andrei, Percia David Dimitri

RESER 2023, 07.12.2023 - 09.12.2023, Sierre, Valais, Switzerland

Link zur Konferenz

Zusammenfassung:

Amidst the transformative impact of Large Language Model-based Conversational Agents (LLM-CAs) on education, this study explores an innovative approach to enhance student learning and accelerate the transition to domain expertise. By designing LLM-CAs with intentional knowledge gaps on specific topics, we aim to familiarize students with the model's limitations, encourage deeper engagement with learned material, and prepare them for knowledge transfer in professional settings. Using established LLM-CAs like LLaMA and GPT-neo-X, we introduce these gaps through various methods, including Prompt Optimization, Actor Agents, and Parameter-Efficient Fine-Tuning (PEFT). Our methodology involves deploying these modified LLM-CAs in beginner-level algorithmics and functional analysis classes, followed by a comprehensive Bayesian analysis to determine the most beneficial approach.

Evaluating generative-AI usage for knowledge transfer :
Konferenz ArODES
fine-tuning of LLMs on-premises

Andrei Kucharavy, Dimitri Percia David

Digital Transformation and the Service Economy: Exploring the Societal Impact

Link zur Konferenz

Coaching nutritionnel avec des agents conversationnels basés sur des modèles de langage génératif
Konferenz
Retours issus de la recherche appliquée

Vallez Cyril, Kucharavy Andrei, Percia David Dimitri

Digital Health Connect (DHC 2023), 17.11.2023 - 17.11.2023, Martigny, Valais, Switzerland

Link zur Konferenz

Zusammenfassung:

NutriBot is a conversational agent designed and developed based on the architecture of two open-source Large Language Models (LLMs) and multimodal solutions for assisting dieticians in providing accurate, privacy-preserving nutritional advice. The project addresses challenges such as data privacy, ensuring reliable outputs, and integrating AI within ethical guidelines. NutriBot aims to improve performance and accuracy in delivering nutritional advice, reduce data leakage risks, and enable customization and control over the model's output. This innovative approach to digital health represents a significant contribution to the field, combining advanced AI technology with practical, user-focused applications. A live demonstration will complement the presentation of Nutribot's development. 

Noun Extraction Still Resists LLM-based Extractors for Cybersecurity
Konferenz

Percia David Dimitri

The 13th Annual Global Tech Mining Conference (GTM 2023), 10.11.2023 - 11.11.2023, Atlanta, United States

Link zur Konferenz

Zusammenfassung:

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context.

Cybersecurity Technologies Emergence, Adoption, and Diffusion from Evolutionary Perspective
Konferenz

Kucharavy Andrei, Percia David Dimitri, Alain Mermoud

Technological Forecasting & Social Change Conference (TFCSC 23), 28.10.2023 - 29.10.2023, Hsinchu City, Taiwan

Link zur Konferenz

Zusammenfassung:

The rapid pace of cybersecurity-related technology emergence, diffusion, adoption, and obsolescence means that numerous tools developed for forecasting technological change are hardly applicable. Moreover, the ability of organizations to adapt is slower than the speed of technological development. Yet, with the change in the geopolitical environment towards a multipolar competition, where offensive operations in cyber-space are commonplace, forecasting cyber technologies is becoming more and more vital. To address this situation, we formalize the innovation in cybersecurity technologies as an evolutionary process in the Gillespie-Orr Fitness Landscapes evolution model with attention - the Attentioned Fitness Landscapes Model (AFLM). Applying AFLM to GitHub open source software (OSS) repositories as proxies for technological systems, we show that features of innovation processes critical to forecasting -- such as S-curves and Hype Cycles -- arise naturally. This allows for a direct forecast in a rapidly moving technological field with minimal assumptions. We demonstrate an exceptional fit of the model with observed trends in software repositories and use AFLM to evaluate domestic cybersecurity industry development scenarios.

Cybersecurity technologies emergence, adoption, and diffusion from evolutionary perspective
Konferenz ArODES

Andrei Kucharavy, Dimitri Percia David, Alain Mermoud

Proceedings of TFSC 2023

Link zur Konferenz

Zusammenfassung:

The rapid pace of cybersecurity-related technology emergence, diffusion, adoption, and obsolescence means that numerous tools developed for forecasting technological change are all but inapplicable. With the change in the geopolitical environment towards a multipolar competition, where offensive operations in the cyber-space are commonplace, forecasting cyber technologies is becoming vital to modern states. In this work, we formalize the innovation in cybersecurity technologies as an evolutionary process in the Gillespie-Orr Fitness Landscapes evolution model with attention - the Attentioned Fitness Landscapes Model (AFLM). We show that empirically derived features of innovation processes, such as S-curves and Hype Curves, naturally arise from this formalization and can be used to evaluate scenarios of geopolitical change. This opens avenues for their derivation from first principles, potentially allowing for earlier forecasting.

Capturing Trends Using OpenAlex and Wikipedia Page Views as Science Indicators
Konferenz
The Case of Data Protection and Encryption Technologies

Sarah Ismail, Alain Mermoud, Loïc Maréchal, Samuel Orso, Percia David Dimitri

27th International Conference on Science, Technology and Innovation Indicators (STI 2023), 27.09.2023 - 29.09.2023, Leiden, Netherlands

Link zur Konferenz

Zusammenfassung:

This paper presents a novel science indicator to identify, analyze, and capture technology trends based on Wikipedia page views and OpenAlex presented at STI2022. Our webometric methodology is grounded in open science practices and applied to crowd-sourced, open, and free data. We explore the relationships between 36 data protection and encryption technologies, by measuring and classifying their time-varying attention. These highly research-intensive technologies are particularly suitable to illustrate our approach. We first find that Blockchain, Hash Function, and Asymmetric Encryption are the technologies that generate significant public interest. Conversely, niche or longstanding technologies such as Disk Encryption and Email Encryption are considered low-interest technologies with no growth. Our findings suggest that monitoring public attention on Wikipedia can serve as a scientific indicator to provide valuable information on technology trends and inform decision-making related to investment, assessment, and technology road-mapping.

LLM-based entity extraction is not for cybersecurity
Konferenz

Maxime Würsch, Kucharavy Andrei, Percia David Dimitri, Alain Mermoud

Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and AI+ Informetrics (AII2023), 26.06.2023 - 26.06.2023, Santa Fe, New Mexico, USA

Link zur Konferenz

Zusammenfassung:

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs regarding entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context.

2022

Towards an Information Technology Convergence Index
Konferenz
A Keyword Extraction Approach Using arXiv Data

Jacques Roitel, Percia David Dimitri, Alexander Glavackij, Alain Mermoud, Thomas Maillart, Alessandro Tavazzi

The 12th Global Tech Mining Conference 2022, 03.11.2022 - 04.11.2022, Rio de Janeiro, Brazil

Link zur Konferenz

Zusammenfassung:

The paper, "Towards a Technology Convergence Index for Information Technologies: A Keyword Extraction Approach Applied to ArXiv," addresses the challenge of quantifying technological convergence, a phenomenon where multiple technologies merge to form new, hybrid technologies. Traditional studies using patent data suffer from timing issues, as patents are often registered well after the initial technological development. To overcome this, the authors analyze arXiv preprints, which are more current and cover a broad spectrum of technological fields. They introduce a "Technological Convergence Index" that uses keyBERT for extracting keywords from titles and abstracts of these preprints to measure semantic proximity between technology categories over time. The premise is that technologies sharing more keywords are more likely to converge. Focusing on the 'cryptography and security' subsection, the study reveals diverging, stagnating, and converging relationships with other technology areas like information theory, databases, and machine learning respectively. The index, which ranges from 0 to 1, dynamically quantifies these relationships, providing a valuable tool for decision-makers to anticipate technological integrations. This method offers a significant improvement over older techniques by utilizing timely data sources and providing a quantifiable measure of technological convergence, enhancing strategic management and forecasting in the field of TechMining.

Dimitri Percia David [HTML] from springer.com Building collaborative cybersecurity for critical infrastructure protection
Konferenz
Empirical evidence of collective intelligence information sharing dynamics on threatfox

Eric Jollès, Sébastien Gillard, Percia David Dimitri, Martin Strohmeier, Alain Mermoud

The 17th International Conference on Critical Information Infrastructures Security (CRITIS 2022), 14.09.2022 - 16.09.2022, Berlin, Germany

Link zur Konferenz

Zusammenfassung:

This article describes three collective intelligence dynamics observed on ThreatFox, a free platform operated by abuse.ch that collects and shares indicators of compromise. These three dynamics are empirically analyzed with an exclusive dataset provided by the sharing platform. First, participants’ onboarding dynamics are investigated and the importance of building collaborative cybersecurity on an established network of trust is highlighted. Thus, when a new sharing platform is created by abuse.ch, an existing trusted community with ’power users’ will migrate swiftly to it, in order to enact the first sparks of collective intelligence dynamics. Second, the platform publication dynamics are analyzed and two different superlinear growths are observed. Third, the rewarding dynamics of a credit system is described - a promising incentive mechanism that could improve cooperation and information sharing in open-source intelligence communities through the gamification of the sharing activity. Overall, our study highlights future avenues of research to study the institutional rules enacting collective intelligence dynamics in cybersecurity. Thus, we show how the platform may improve the efficiency of information sharing between critical infrastructures, for example within Information Sharing and Analysis Centers using ThreatFox. Finally, a broad agenda for future empirical research in the field of cybersecurity information sharing is presented - an important activity to reduce information asymmetry between attackers and defenders.

Building collaborative cybersecurity for critical infrastructure protection
Konferenz
Empirical evidence of collective intelligence information sharing dynamics on threatfox

Eric Jollès, Sébastien Gillard, Percia David Dimitri, Martin Strohmeier, Alain Mermoud

The 17th International Conference on Critical Information Infrastructures (CRITIS 2022), 14.09.2022 - 16.09.2022, München, Germany

Link zur Konferenz

Zusammenfassung:

This article describes three collective intelligence dynamics observed on ThreatFox, a free platform operated by abuse.ch that collects and shares indicators of compromise. These three dynamics are empirically analyzed with an exclusive dataset provided by the sharing platform. First, participants’ onboarding dynamics are investigated and the importance of building collaborative cybersecurity on an established network of trust is highlighted. Thus, when a new sharing platform is created by abuse.ch, an existing trusted community with ’power users’ will migrate swiftly to it, in order to enact the first sparks of collective intelligence dynamics. Second, the platform publication dynamics are analyzed and two different superlinear growths are observed. Third, the rewarding dynamics of a credit system is described - a promising incentive mechanism that could improve cooperation and information sharing in open-source intelligence communities through the gamification of the sharing activity. Overall, our study highlights future avenues of research to study the institutional rules enacting collective intelligence dynamics in cybersecurity. Thus, we show how the platform may improve the efficiency of information sharing between critical infrastructures, for example within Information Sharing and Analysis Centers using ThreatFox. Finally, a broad agenda for future empirical research in the field of cybersecurity information sharing is presented - an important activity to reduce information asymmetry between attackers and defenders.

Beyond S-curves
Konferenz
Recurrent Neural Networks for Technology Forecasting

Alexander Glavackij, Percia David Dimitri, Alain Mermoud, Angelika Romanou, Karl Aberer

Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2022), 11.07.2022 - 15.07.2022, Milan, Italy

Link zur Konferenz

Zusammenfassung:

Because of the considerable heterogeneity and complexity of the technological landscape, building accurate models to forecast is a challenging endeavor. Due to their high prevalence in many complex systems, S-curves are a popular forecasting approach in previous work. However, their forecasting performance has not been directly compared to other technology forecasting approaches. Additionally, recent developments in time series forecasting that claim to improve forecasting accuracy are yet to be applied to technological development data. This work addresses both research gaps by comparing the forecasting performance of S-curves to a baseline and by developing an autencoder approach that employs recent advances in machine learning and time series forecasting. S-curves forecasts largely exhibit a mean average percentage error (MAPE) comparable to a simple ARIMA baseline. However, for a minority of emerging technologies, the MAPE increases by two magnitudes. Our autoencoder approach improves the MAPE by 13.5% on average over the second-best result. It forecasts established technologies with the same accuracy as the other approaches. However, it is especially strong at forecasting emerging technologies with a mean MAPE 18% lower than the next best result. Our results imply that a simple ARIMA model is preferable over the S-curve for technology forecasting. Practitioners looking for more accurate forecasts should opt for the presented autoencoder approach.

2019

Governance Models Preferences for Security Information Sharing
Konferenz
An Institutional Economics Perspective for Critical Infrastructure Protection

Alain Mermoud, Marcus M Keupp, Percia David Dimitri

The 14th International Conference on Critical Information Infrastructures Security (CRITIS 2019), 23.09.2019 - 25.09.2019, Linköping, Sweden

Link zur Konferenz

Zusammenfassung:

Empirical studies have analyzed the incentive mechanisms for sharing security information between human agents, a key activity for critical infrastructure protection. However, recent research shows that most Information Sharing and Analysis Centers do not perform optimally, even when properly regulated. Using a meso-level of analysis, we close an important research gap by presenting a theoretical framework that links institutional economics and security information sharing. We illustrate this framework with a dataset collected through an online questionnaire addressed to all critical infrastructures (N = 262) operating at the Swiss Reporting and Analysis Centre for Information Security (MELANI). Using descriptive statistics, we investigate how institutional rules offer human agents an institutional freedom to self-design an efficient security information sharing artifact. Our results show that a properly designed artifact can positively reinforces human agents to share security information and find the right balance between three governance models: (A) public-private partnership, (B) private, and (C) government-based. Overall, our work lends support to a better institutional design of security information sharing and the formulation of policies that can avoid non-cooperative and free-riding behaviors that plague cybersecurity.

2018

Incentives for human agents to share security information
Konferenz
A model and an empirical test

Alain Mermoud, Marcus Keupp, Kévin Huguenin, Maximilian Palmié, Percia David Dimitri

The 17th Workshop on the Economics of Information Security (WEIS 2018), 07.05.2018 - 10.05.2018, Innsbruck, Austria

Link zur Konferenz

Zusammenfassung:

In this paper, we investigate the role of incentives for Security Information Sharing (SIS) between human agents working in institutions. We present an incentive-based SIS system model that is empirically tested with an exclusive dataset. The data was collected with an online questionnaire addressed to all participants of a deployed Information Sharing and Analysis Center (ISAC) that operates in the context of critical infrastructure protection (N=262). SIS is measured with a multidimensional approach (intensity, frequency) and regressed on five specific predicators (reciprocity, value of information, institutional barriers, reputation, trust) that are measured with psychometric scales. We close an important research gap by providing, to the best of our knowledge, the first empirical analysis on previous theoretical work that assumes SIS to be beneficial. Our results show that institutional barriers have a strong influence on our population, i.e., SIS decision makers in Switzerland. This lends support to a better institutional design of ISACs and the formulation of incentive-based policies that can avoid non-cooperative and free-riding behaviours. Both frequency and intensity are influenced by the extent to which decision makers expect to receive valuable information in return for SIS, which supports the econometric structure of our multidimensional model. Finally, our policy recommendations support the view that the effectiveness of mandatory security-breach reporting to authorities is limited. Therefore, we suggest that a conducive and lightly regulated SIS environment – as in Switzerland – with positive reinforcement and indirect suggestions can “nudge” SIS decision makers to adopt a productive sharing behaviour. 

2016

Using incentives to foster security information sharing and cooperation
Konferenz
A general theory and application to critical infrastructure protection

Alain Mermoud, Marcus M Keupp, Solange Ghernaouti, Percia David Dimitri

The 11th International Conference on Critical Information Infrastructures, 10.10.2016 - 12.10.2016, Paris, France

Link zur Konferenz

Cyber-Security Investment in the Context of Disruptive Technologies
Konferenz
Extension of the Gordon-Loeb Model and Application to Critical-Infrastructure Protection

Percia David Dimitri, Marcus M Keupp, Solange Ghernaouti, Alain Mermoud

The 11th International Conference on Critical Information Infrastructures, 10.10.2016 - 12.10.2016, Paris, France

Link zur Konferenz

Zusammenfassung:

We propose an extension of the Gordon-Loeb model by considering multi-periods and relaxing the assumption of a continuous security breach probability function. Such adaptations allow capturing dynamic aspects of information security investment such as the advent of a disruptive technology and its consequences. In this paper, the case of big data analytics (BDA) and its disruptive effects on information security investment is theoretically investigated. Our analysis suggests a substantive decrease in such investment due to a technological shift. While we believe this case should be generalizable across the information security milieu, we illustrate our approach in the context of critical infrastructure protection (CIP) in which security cost reduction is of prior importance since potential losses reach unaffordable dimensions. Moreover, despite BDA has been considered as a promising method for CIP, its concrete effects have been discussed little.

Errungenschaften

Medien und Kommunikation
Kontaktieren Sie uns
Folgen Sie der HES-SO
linkedin instagram facebook twitter youtube rss
univ-unita.eu www.eua.be swissuniversities.ch
Rechtliche Hinweise
© 2021 - HES-SO.

HES-SO Rectorat