PEOPLE@HES-SO Annuaire et Répertoire des compétences

Unlocking the potential of PubMed Central supplementary data files

Article scientifique ArODES

Julien Gobeill, Déborah Caucheteur, Alexandre Flament, Pierre-André Michel, Anaïs Mottaz, Emilie Pasche, Patrick Ruch

Bioinformatics Advances, 2025, 5, 1, vbaf155

2024

Gender and geographical bias in the editorial decision-making process of biomedical journals :

a case-control study

Article scientifique ArODES

Angèle Gayet-Ageron, Khaoula Ben Messaoud, Mark Richards, Cyril Jaksic, Julien Gobeill, Jeevanthi Liyanapathirana, Luc Mottin, Nona Naderi, Patrick Ruch, Zoé Mariot, Alexandra Calmy, Julia Friedman, Leonard Leibovici, Sara Schroter

BMJ evidence-based medicine, 2025, Vol. 30, 3, bmjebm-2024-113083

Résumé:

Objectives: To assess whether the gender (primary) and geographical affiliation (post-hoc) of the first and/or last authors are associated with publication decisions after peer review. Design: Case-control study. Setting: Biomedical journals. Participants: Original peer-reviewed manuscripts submitted between 1 January 2012 and 31 December 2019. Main outcome measure: Manuscripts accepted (cases) and rejected for publication (controls). Results: Of 6213 included manuscripts, 5294 (85.2%) first and 5479 (88.1%) last authors' gender were identified; 2511 (47.4%) and 1793 (32.7%) were women, respectively. The proportion of women first and last authors was 48.4% (n=1314) and 32.2% (n=885) among cases and 46.4% (n=1197) and 33.2% (n=908) among controls. After adjustment, the association between the first author's gender and acceptance for publication remained non-significant 1.04 (0.92 to 1.17). Acceptance for publication was lower for first authors affiliated to Asia 0.58 (0.46 to 0.73), Africa 0.75 (0.41 to 1.36) and South America 0.68 (0.40 to 1.16) compared with Europe, and for first author affiliated to upper-middle country-income 0.66 (0.47 to 0.95) and lower-middle/low 0.69 (0.46 to 1.03) compared with high country-income group. It was significantly higher when both first and last authors were affiliated to different countries from same geographical and income groups 1.35 (1.03 to 1.77), different countries and geographical but same income groups 1.50 (1.14 to 1.96) or different countries, geographical and income groups 1.78 (1.27 to 2.50) compared with authors from similar countries. The study funding was independently associated with the acceptance for publication (when compared with no funding, 1.40; 1.04 to 1.89 for funding by association & foundations, 2.76; 1.87 to 4.10 for international organisations, 1.30; 1.04 to 1.62 for non-profit & associations & foundations). The reviewers' recommendations of the original submitted version were significantly associated with the outcome (unadjusted 5.36; 4.98 to 5.78 for acceptance compared with rejection). Gender of the first author was not associated with reviewers' recommendations (adjusted 0.96, 0.87 to 1.06). Conclusions: We did not identify evidence of gender bias during the editorial decision-making process for papers sent out to peer review. However, the under-representation in manuscripts accepted for publication of first authors affiliated to Asia, Africa or South America and those affiliated to upper/lower-middle and low country-income group, indicates poor representation of global scientists' opinion and supports growing demands for improving equity, diversity and inclusion in biomedical research. The more diverse the countries and incomes of the first and last authors, the greater the chances of the publication being accepted.

DOME Registry :

implementing community-wide recommendations for reporting supervised machine learning in biology

Article scientifique ArODES

Omar Abdelghani Attafi, Damiano Clementel, Konstantinos Kyritsis, Emidio Capriotti, Gavin Farrell, Styliani-Christina Fragkouli, Leyla Jael Castro, András Hatos, Tom Lenaerts, Stanislav Mazurenko, Soroush Mozaffari, Franco Pradelli, Patrick Ruch, Castrense Savojardo, Paola Turina, Federico Zambelli, Damiano Piovesan, Alexander Miguel Monzon, Fotis Psomopoulos, Silvio C. E. Tosatto

GigaScience, 13, giae094

A research data management (RDM) community for ELIXIR

Article scientifique ArODES

Flora D'Anna, Niclas Jareborg, Mijke Jetten, Minna Ahokas, Pinar Alper, Robert Andrews, Korbinian Bösl, Teresa D'Altri, Daniel Faria, Nazeefa Fatima, Siiri Fuchs, Clare Garrard, Wei Gu, Katharina F. Heil, Yvonne Kallberg, Flavio Licciulli, Nils-Christian Lübke, Anna M. P. Melo, Ivan Micetic, Jorge Oliveira, Anastasis Oulas, Patricia M. Palagi, Krzysztof Poterlowicz, Xenia Perez-Sitja, Patrick Ruch, Susanna-Assunta Sansone, Helena Schnitzer, Celia van Gelder, Thanasis Vergoulis, Daniel Wibberg, Ulrike Wittig, Brane Leskošek, Jiri Vondrasek, Munazah Andrabi

F1000Research, 13, 230

Résumé:

Research data management (RDM) is central to the implementation of the FAIR (Findable Accessible, Interoperable, Reusable) and Open Science principles. Recognising the importance of RDM, ELIXIR Platforms and Nodes have invested in RDM and launched various projects and initiatives to ensure good data management practices for scientific excellence. These projects have resulted in a rich set of tools and resources highly valuable for FAIR data management. However, these resources remain scattered across projects and ELIXIR structures, making their dissemination and application challenging. Therefore, it becomes imminent to coordinate these efforts for sustainable and harmonised RDM practices with dedicated forums for RDM professionals to exchange knowledge and share resources. The proposed ELIXIR RDM Community will bring together RDM experts to develop ELIXIR’s vision and coordinate its activities, taking advantage of the available assets. It aims to coordinate RDM best practices and illustrate how to use the existing ELIXIR RDM services. The Community will be built around three integral pillars, namely, a network of RDM professionals, RDM knowledge management and RDM training expertise and resources. It will also engage with external stakeholders to leverage benefits and provide a forum to RDM professionals for regular knowledge exchange, capacity building and development of harmonised RDM practices, keeping in line with the overall scope of the RDM Community. In the short term, the Community aims to build upon the existing resources and ensure that the content of these remain up to date and fit for purpose. In the long run, the Community will aim to strengthen the skills and knowledge of its RDM professionals to support the emerging needs of the scientific community. The Community will also devise an effective strategy to engage with other ELIXIR structures and international stakeholders to influence and align with developments and solutions in the RDM field.

Exploring the dual role of LLMs in cybersecurity :

threats and defenses

Chapitre de livre ArODES

Ciarán Bryce, Alexandros Kalousis, Ilan Leroux, Hélène Madinier, Thomas Pasche, Patrick Ruch

Dans Kucharavy, Andrei, Lenders, Vincent, Mermoud, Alain, Mulder, Valentin, Plancherel, Octave, Large language models in cybersecurity (8 p.). 2024, Cham : Springer

2023

Multilingual RECIST classification of radiology reports using supervised learning

Article scientifique ArODES

Luc Mottin, Jean-Philippe Goldman, Christoph Jäggli, Rita Achermann, Julien Gobeill, Julien Knafou, Julien Ehrsam, Alexandre Wicky, Camille L. Gérard, Tanja Schwenk, Mélinda Charrier, Petros Tsantoulis, Christian Lovis, Alexander Leichtle, Michael K. Kiessling, Olivier Michielin, Sylvain Pradervand, Vasiliki Foufi, Patrick Ruch

Frontiers in digital health, 2023, vol. 5

Assessing the use of supplementary materials to improve genomic variant discovery

Article scientifique ArODES

Emilie Pasche, Anaïs Mottaz, Julien Gobeill, Pierre-André Michel, Déborah Caucheteur, Nona Naderi, Patrick Ruch

Database, Published online 31 March 2023

2022

COVoc and COVTriage :

novel resources to support literature triage

Article scientifique ArODES

Déborah Caucheteur, Zoë May Pendlington, Paola Roncaglia, Julien Gobeill, Luc Mottin, Nicolas Matentzoglu, Donat Agosti, David Osumi-Sutherland, Helen Parkinson, Patrick Ruch

Bioinformatics, 2023, Vol. 39, no. 1, btac800

Variomes :

a high recall search engine to support the curation of genomic variants

Article scientifique ArODES

Emilie Pasche, Anaïs Mottaz, Déborah Caucheteur, Julien Gobeill, Pierre-André Michel, Patrick Ruch

Bioinformatics, 2022, vol. 38, no. 9, pp. 2595–2601

Biodiversity community integrated knowledge library (BiCIKL)

Article scientifique ArODES

Lyubomir Penev, Dimitrios Koureas, Quentin Groom, Jerry Lanfear, Donat Agosti, Ana Casino, Joe Miller, Christos Arvanitidis, Guy Cochrane, Donald Hobern, Olaf Banki, Wouter Addink, Urmas Kõljalg, Kyle Copas, Patricia Mergen, Anton Güntsch, Laurence Benichou, Jose Benito Gonzalez Lopez, Patrick Ruch, Corinne S. Martin, Boris Barov, Iliyana Demirova, Kristina Hristova

Research ideas and outcomes, 2022, vol. 8, article no. e81136, pp. 1-115

2021

Ensemble of deep masked language models for effective named entity recognition in health and life science corpora

Article scientifique ArODES

Nona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro

Frontiers in research metrics and analytics, 2021, vol. 6, article no 689803, pp. 1-15

UniProt :

the universal protein knowledgebase in 2021

Article scientifique ArODES

The UniProt Consortium, Patrick Ruch, Douglas Teodoro

Nucleic Acids Research, 2021, vol. 49, no. D1, pp. D480–D489

2020

SIB literature services :

RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts

Article scientifique ArODES

Julien Gobeill, Déborah Caucheteur, Pierre-André Michel, Luc Mottin, Emilie Pasche, Patrick Ruch

Nucleic Acids Research, 2020, vol. 48, no W1, pp. W12-W16

UPCLASS :

a deep learning-based classifier for UniProtKB entry publications

Article scientifique ArODES

Douglas Teodoro, Julien Knafou, Nona Naderi, Emilie Pasche, Julien Gobeill, Cecilia N. Arighi, Patrick Ruch

Database, 2020, vol. 2020, baaa026, pp. 1-13

Machine learning for automatic encoding of French electronic medical records :

is more data better?

Chapitre de livre ArODES

Julien Gobeill, Patrick Ruch, Rodolphe Meyer

Dans Pape-Haugaard, Louise B., Digital Personalized Health and Medicine (Pp. 312-316). 2020, Amsterdam, The Netherlands : OIS Press

Text-mining services of the Swiss variant interpretation platform for oncology

Chapitre de livre ArODES

Déborah Caucheteur, Julien Gobeill, Anaïs Mottaz, Emilie Pasche, Pierre-André Michel, Luc Mottin, Daniel J. Stekhoven, Valérie Barbié, Patrick Ruch

Dans Pape-Haugaard, Louise B., Digital personalized health and medicine (Pp. 884-888). 2020, Amsterdam, The Netherlands : IOS Press

DisProt :

intrinsic protein disorder annotation in 2020

Article scientifique ArODES

András Hatos, Borbála Hajdu-Soltész, Alexander M. Monzon, Nicolas Palopoli, Julien Gobeill, Emilie Pasche, Patrick Ruch

Nucleic Acids Research, 2020, vol. 48, no D1, pp. D269–D276

2019

La filière Information Documentaire a 100 ans !

Article professionnel ArODES

Hors-Texte, 2019, no 116, pp. 4-6

2018

Apprentissage et classification automatiques pour améliorer la pertinence d’un corpus d’articles

Article professionnel ArODES

Julien Gobeill, Matthias Van den Heuvel, Laura Minu Nowzohour, Joëlle Noailly, Gaétan de Rassenfosse, Patrick Ruch

RESSI : revue électronique suisse en science de l'information, 2018, no 19

Résumé:

Dans le cadre d’un projet étudiant le développement des politiques environnementales et climatiques sur les quatre dernières décennies, l’un des moyens envisagés par des chercheurs en sciences économiques est de construire puis exploiter un corpus d’articles de presse relatifs à cette thématique. La première année du projet s’est concentrée sur les seules archives du New York Times. Ce sont néanmoins 2,6 millions d’articles qui étaient à traiter – une masse trop importante pour l’homme. Des chercheurs en sciences de l’information et en fouille de texte ont donc été associés à cette tâche de recherche d’information. Dans un premier temps, les 2,6 millions d’articles ont été moissonnés depuis le Web, puis indexés dans un moteur de recherche. La conception d’une équation de recherche complexe a permis de sélectionner un corpus intermédiaire de 170 000 articles, dont la précision (taux d’articles pertinents) a été évaluée à 14%. Dans un deuxième temps, un algorithme d’apprentissage automatique a donc été entraîné et utilisé pour prédire la pertinence ou non d’un article. Pour nourrir l’algorithme, un échantillon de 700 articles a été manuellement étiqueté par les chercheurs en sciences économiques. L’application du classifieur à l’ensemble du corpus intermédiaire a produit un corpus final de 15 000 articles, dont la précision a été évaluée à 83%. Nos résultats montrent qu’une centaine d’articles étiquetés semble ici une quantité suffisante pour maximiser les performances du classifieur, et obtenir un corpus final de qualité proche de celle obtenue par des experts humains. La fouille de texte n’est plus une discipline émergente, ni extérieure aux sciences de l’information ; c’est une discipline mature qui peut dès à présent être utilisée pour assister le spécialiste de recherche documentaire dans une tâche de construction de corpus ou de classification de documents, tout spécialement avec des masses d’informations importantes.

PowerCool :

simulation of cooling and powering of 3D MPSoCs with integrated flow cell arrays

Article scientifique ArODES

Artem Aleksandrovich Andreev, Arvind Sridhar, Mohamed M. Sabry, Marina Zapater, Patrick Ruch

IEEE Transactions on Computers, 2018, vol. 67, no. 1, pp. 73 - 85

ORBDA :

an openEHR benchmark dataset for performance assessment of electronic health record servers

Article scientifique ArODES

Douglas Teodoro, Erik Sundvall, Mario João Junior, Patrick Ruch, Sergio Miranda Freire

PLOS ONE,

2017

Improving average ranking precision in user searches for biomedical research datasets

Article scientifique ArODES

Douglas Teodoro, Luc Mottin, Julien Gobeill, Arnaud Gaudinat, Thérèse Vachon, Patrick Ruch

Database, 2017, vol. 2017, pp. 1-18

Résumé:

Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results.

Triage by ranking to support the curation of protein interactions

Article scientifique ArODES

Luc Mottin, Emilie Pasche, Julien Gobeill, Valentine Rech de Laval, Anne Gleizes, Pierre-André Michel, Amos Bairoch, Pascale Gaudet, Patrick Ruch

Database, 2017, vol. 2017

Résumé:

Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs.

Development and evaluation of a case-based retrieval service

Chapitre de livre ArODES

Emilie Pasche, Marcello Chinali, Julien Gobeill, Patrick Ruch

Informatics for health: connected citizen-led wellness and population health (pp. 186-190). 2017, Amsterdam : ISO Press

2016

Text mining to support gene ontology curation and vice versa

Chapitre de livre ArODES

Dans Dessimoz, Christophe, Škunca, Nives, The gene ontology handbook (Pp. 69-84). 2016, New York : Springer

neXtA5 :

accelerating annotation of articles via automated approaches in neXtProt

Article scientifique ArODES

Luc Mottin, Julien Gobeill, Emilie Pasche, Pierre-André Michel, Isabelle Cusin, Pascale Gaudet, Patrick Ruch

Database : the journal of biological databases and curation, 2016, baw098

Résumé:

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following:+231% for Diseases,+236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the topreturned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, userfriendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.

2015

The SIB Swiss institute of bioinformatics’ resources :

focus on curated databases

Article scientifique ArODES

Patrick Ruch, Luc Mottin, Julien Gobeill, Emilie Pasche, Arnaud Gaudinat

Nucleic Acids Research, November 2015, Vol. 44, no.4, pp. 27-37

Deep Question Answering for protein annotation

Article scientifique ArODES

Julien Gobeill, Arnaud Gaudinat, Emilie Pasche, Dina Vishnyakova, Pascale Gaudet, Amos Bairoch, Patrick Ruch

Database : the journal of biological databases and curation, 2015

2014

Closing the loop : from paper to protein annotation using supervised Gene Ontology classification

Article scientifique ArODES

Patrick Ruch, Julien Gobeill, Emilie Pasche, Dina Vishnyakova

Database : the journal of biogical database and curation. 2014. 7 p.,

Overview of the gene ontology task at BioCreative IV

Article scientifique ArODES

Patrick Ruch, Julien Gobeill, et al.

Database : the journal of biogical database and curation. 2014. 14 p.,

Development and tuning of an original search engine for patent libraries in medicinal chemistry

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Kreim Olivier, Fatma Oezdemir-Zaech, Thérèse Vachon, Christian Lovis, Patrick Ruch

BMC Bioinformatics. 2014. Vol. 15, suppl. 1, 9 p.,

2013

Improving data and knowledge management to better integrate health care and research

Article scientifique ArODES

Patrick Ruch, et al.

Journal of internal medicine. 2013. Vol. 274, no. 4. Pp. 321-328,

Electronic processing of informed consents in a global pharmaceutical company environment

Article scientifique ArODES

Patrick Ruch, Julien Gobeill, Dina Vishnyakova, et al.

Studies in health technology and informatics. 2013. vol.?205, p.?995-999,

Application of text-mining for updating protein post-translational modification annotation in UniProtKB

Article scientifique ArODES

Anne-Lise Veuthey, Bridge Alan, Julien Gobeill, Patrick Ruch, Johanna R. Mcentyre, Lydie Bougueleret, Ioannis Xenarios

BMC Bioinformatics. 2013. Vol.?14. P. 104-113,

Julien Gobeill, Patrick Ruch

The COMBREX project : design, methodology, and initial results

Article scientifique ArODES

PLOS Biology. 2013. Vol.?11, no.?8. 8 p.,

Utilization of ontology look-up services in information retrieval for biomedical literature

Article scientifique ArODES

Dina Vishnyakova, Emilie Pasche, Patrick Ruch

Studies in health technology and informatics. 2013. Vol.?186. P.?155-159,

Assisted knowledge discovery for the maintenance of clinical guidelines

Article scientifique ArODES

Emilie Pasche, Patrick Ruch, Douglas Teodoro, Angela Hutner, Stephan Harbarth, Julien Gobeill, Rolf Wipfli, Christian Lovis

PLoS One. 2013. Vol.?8, no.?4. 11 p.,

Managing the data deluge : data-driven GO category assignment improves while complexity of functional annotation increases

Article scientifique ArODES

Julien Gobeill, Emilie Pasche, Dina Vishnyakova, Patrick Ruch

Database (Oxford). 2013. 9 p.,

Use of controlled vocabularies to improve biomedical information retrieval tasks

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Dina Vishnyakova, Patrick Ruch, Christian Lovis

Studies in Health Technology and Informatics. 2013. Vol.?192. p.?1068,

2012

Answering gene ontology terms to proteomics questions by supervised macro reading in Medline

Article scientifique ArODES

Julien Gobeill, Emilie Pasche, Douglas Teodoro, Patrick Ruch, Christian Lovis

EMBnet.journal, 2012, vol. 18b, p. 29-31,

Using binary classification to prioritize and curate articles for the comparative toxicogenomics database

Article scientifique ArODES

Dina Vishnyakova, Emilie Pasche, Patrick Ruch

Database, 2012, p. 1-9,

A medical informatics perspective on findings from the yearbook 2012 section on decision support systems

Article scientifique ArODES

Yearbook of medical informatics, 2012, vol. 7, p. 113-116,

Pathogens and gene product normalization in the biomedical literature

Article scientifique ArODES

Dina Vishnyakova, Emilie Pasche, Douglas Teodoro, Christian Lovis, Patrick Ruch

Studies in health technology and informatics, 2012, vol. 174, p. 89-93,

Development of a text search engine for medicinal chemistry patents

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Fatma Oezdemir-Zaech, Therese Vachon, Christian Lovis, Patrick Ruch

EMBnet.journal, 2012, vol. 18b, p. 44-46,

Building a transnational biosurveillance network using semantic web technologies : requirements, design, and preliminary evaluation

Article scientifique ArODES

Douglas Teodoro, Emilie Pasche, Julien Gobeill, Stéphane Emonet, Patrick Ruch, Lovis Christian

Journal of medical internet research, 2012, vol. 14, no. 3, p. 1-17,

System zur Unterstützung der Kuration der Toxikogenomik in der medizinischen Literatur

Article scientifique ArODES

Dina Vishnyakova, Emilie Pasche, Christian Lovis, Patrick Ruch

Swiss medical informatics, 2012, vol. 28, p. 1,

Assister la création de guides de bonnes pratiques par des techniques de recherche d’information

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Douglas Teodoro, Dina Vishnyakova, Angela Huttnerc, Patrick Ruch, Lovis Christian

Swiss medical informatics, 2012, vol. 28, p. 1-5,

An advanced search engine for patent analytics in medicinal chemistry

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Douglas Teodoro, Arnaud Gaudinat, Dina Vishnyakova, Christian Lovis, Patrick Ruch

Studies in health technology and informatics, 2012, vol. 180, p. 204-209,

2011

The gene normalization task in Biocreative III

Article scientifique ArODES

Patrick Ruch, et al.

BMC Bioinformatics, 2011, vol. 12, p. 1-19,

Interoperability driven integration of biomedical data sources

Article scientifique ArODES

Douglas Teodoro, Rémy Choquet, Daniel Schober, Giovanni Mels, Emilie Pasche, Patrick Ruch, Christian Lovis

Studies in health technology and informatics, 2011, vol. 169, p. 185-189,

Using multimodal mining to drive clinical guidelines development

Article scientifique ArODES

Emilie Pasche, Julien Gobeill, Douglas Teodoro, Dina Vishnyakova, Arnaud Gaudinat, Patrick Ruch, Christian Lovis

Studies in health technology and informatics, 2011, vol. 169, p. 477-481,

A medical informatics perspective on findings from the yearbook 2011 section on web 3.0 decision support systems

Article scientifique ArODES

Yearbook medical informatics, 2011, vol. 6, p. 30-32,

2010

Review of the book Literature-based discovery, by Peter Bruza and Marc Weeber (eds)

Article scientifique ArODES

Journal of the American society for information science and technology, 2010, vol. 61, no. 7, pp. 1506-1508

De la recherche d’informations hautement spécialisées : le cas de la recherche d’informations dans les brevets de chimie

Article professionnel ArODES

RESSI : revue électronique suisse en science de l'information [en ligne]. Décembre 2010, no 11,

2009

Journée de la recherche HEG 2009 : recueil des communications : cahier de recherche

Rapport ArODES

Enrica Ferrini Tinguely, Sacha Varone, David Billard, Rahel Birri-Blezon, Andrea Baranzini, Anne-Kathrin Faust, Lorraine Filippozzi, Hélène Madinier, Nicolas Montandon, Catherine Equey, Sébastien Jossi, Gobeill, Julien, Emilie Pasche, Douglas Teodoro, Patrick Ruch, et al.

Genève : Haute école de gestion de Genève, 2009. 57 p. Cahier de recherche no HES-SO/HEG-GE/C--09/5/1--CH

Comparing a rule based vs. statistical system for automatic categorization of MEDLINE documents according to biomedical specialty

Article scientifique ArODES

Patrick Ruch, Julien Gobeill, Susanne M. Humphrey, Aurélie Neveol, Allen Browne, Stéfan J. Darmoni

Journal of the American Society for Information Science and Technology, 2009, vol. 60, no. 12, p. 2530-2539,

2025

TransBERT :

A Framework for synthetic translation in domain-specific language modeling

Conférence ArODES

Julien Knafou, Luc Mottin, Anaïs Mottaz, Alexandre Flament, Patrick Ruch

Findings of the Association for Computational Linguistics: EMNLP 2025

Manuscript classification to support the analysis of biases in publication opportunities

Conférence ArODES

Luc Mottin, Julien Gobeill, Jeevanthi Liyana Pathirana, Nona Naderi, Anaïs Mottaz, Khaoula Ben Messaoud, Angele Gayet-Ageron, Patrick Ruch

Intelligent Health Systems - From Technology to Data and Knowledge : proceedings of MIE 2025

2024

SIB Text-Mining at TREC PLABA 2024

Conférence ArODES

Luc Mottin, Anaïs Mottaz, Alexandre Flament, Julien Gobeill, Patrick Ruch

NIST Special Publication: 33rd Text REtrieval Conference Proceedings (TREC 2024)

Comparing sequence-based and literature-based pathogenicity scoring methods for human variants

Conférence ArODES

Luc Mottin, Nona Naderi, Anaïs Mottaz, Pierre-André Michel, Gerieke Been, Lennart Johansson, Morris Swertz, Andrew Stubbs, Emilie Pasche, Julien Gobeill, Patrick Ruch

Digital health and informatics innovations for sustainable health care systems : proceedings of MIE 2024

2022

Analyzing the information content of text-based files in supplementary materials of biomedical literature

Conférence ArODES

Nona Naderi, Anaïs Mottaz, Douglas Teodoro, Patrick Ruch

Challenges of trustable AI and added-value on health : proceedings of MIE 2022

Designing an optimal expansion method to improve the recall of a genomic variant curation-support service

Conférence ArODES

Anaïs Mottaz, Emilie Pasche, Pierre-André Michel, Luc Mottin, Douglas Teodoro, Patrick Ruch

Challenges of trustable AI and added-value on health : proceedings of MIE 2022

2020

BiTeM at WNUT 2020 shared task-1 :

named entity recognition over wet lab protocols using an ensemble of contextual language models

Conférence ArODES

Julien Knafou, Nona Naderi, Douglas Teodoro, Patrick Ruch

Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

SIB text mining at TREC precision medicine 2020

Conférence ArODES

Emilie Pasche, Déborah Caucheteur, Luc Mottin, Anaïs Mottaz, Julien Gobeill, Patrick Ruch

Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020)

SIB text mining at TREC 2020 deep learning track

Conférence ArODES

Julien Knafou, Matthew Jeffryes, Sohrab Ferdowsi, Patrick Ruch

Proceedings of the Twenty-Ninth Text REtrieval Conference (TREC 2020)

An extended overview of the CLEF 2020 ChEMU Lab :

information extraction of chemical reactions from patents

Conférence ArODES

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Jingqi Wang, Yuankai Ren, Zhi Zhang, Yaoyun Zhang, Mai Hoang Dao, Pedro Ruas, Andre Lamurias, Francisco M. Couto, Nona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro, Daniel Lowe, John Mayfield, Abdullatif Köksal, Hilal Dönmez, Elif Özkirimli, Arzucan Özgür, Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina Tang, Bridget T. McInness, C.S. Malarkodi, Pattabhi Rk Rao, Sobha Lalitha Devi, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, Karin Verspoor

Proceedings of the CLEF 2020 conference

Named entity recognition in chemical patents using ensemble of contextual language models

Conférence ArODES

Nona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro

Proceedings of the CLEF 2020 conference

Contextualized French language models for biomedical named entity recognition

Conférence ArODES

Julien Knafou, Nona Naderi, Claudia Moro, Patrick Ruch, Douglas Teodoro

Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition) Nancy, France, 08-19 juin 2020. Atelier DÉfi Fouille de Textes

2019

Designing retrieval models to contrast precision-driven ad hoc search vs. recall-driven treatment extraction in precision medicine

Conférence ArODES

Déborah Caucheteur, Emilie Pasche, Julien Gobeill, Anaïs Mottaz, Luc Mottin, Patrick Ruch

Proceedings of the Twenty-Eighth Text REtrieval Conference (TREC 2019)

Résumé:

The TREC 2019 Precision Medicine Track repeats the general structure and evaluation of the 2018 track. Our team participated in both tasks of the track, relative to scientific abstracts and clinical trials. 40 topics where patient data are given (demographic data, disease, gene and genetic variant) were available for this competition. The aim was to retrieve scientific abstracts and clinical trials of interest regarding a topic, modelling the description of a clinical case. In the first task, we aim at retrieving scientific abstracts introducing some relevant treatments for a given case. Our system is first based on the collection of a large set of abstracts related to a particular case using various strategies such as search with keywords within abstracts, search with normalized entities within annotated abstracts and the linear combination of various queries. We then apply different strategies to re-rank the resulting scientific abstracts set. In particular, we tested two strategies to re-rank the abstracts set in order to have a large variety of treatments returned in the top articles. Almost two thirds of the top-10 returned documents are judged relevant, while nearly a quarter of the relevant treatments is returned in the top-10 abstracts. The second task aims at retrieving some clinical trials for which patients are eligible. Criteria used to determine the eligibility of patients are those found in the topics. Information such as trial location or status of clinical trials, which are important from a patient's point of view, are questionably not used in these topics. Several strategies have been tested, relaxing of constraints (data required or not), expansion of information requests thanks to synonyms or regex, and retrieval status value boosting for some criteria or fields. After judging, for almost half of the topics, a minimum of 50% of the documents retrieved are relevant, up to 90% for 10 of the 38 topics provided. Almost two thirds of the top-10 returned documents are judged relevant, while nearly a quarter of the relevant treatments is returned in the top-10 abstracts. Our best runs achieve highly competitive results depending on the measures, with on average being ranked #2 or #3 according to the official results for the literature task.

Data-driven approach for measuring the severity of the signs of depression using reddit posts :

women and men in the orchestra

Conférence ArODES

Paul van Rijen, Douglas Teodoro, Nona Naderi, Luc Mottin, Julien Knafou, Patrick Ruch

Proceedings of CLEF (Conference and Labs of the Evaluation Forum) 2019 Working Notes

A baseline approach for early detection of signs of anorexia and self-harm in reddit posts

Conférence ArODES

Nona Naderi, Julien Gobeill, Douglas Teodoro, Emilie Pasche, Patrick Ruch

Proceedings of CLEF (Conference and Labs of the Evaluation Forum) 2019 Working Notes

Utilisation de méthodes de traitement automatique du langage pour assister la traduction de terminologies dans le cadre du projet EXPAND

Conférence ArODES

Luc Mottin, Anaïs Mottaz, Arnaud Gaudinat, Stéphane Spahni, Adrian Schmid, Stefan Wyss, Patrick Ruch

Actes de TALMED 2019 : Symposium satellite francophone sur le traitement automatique des langues dans le domaine biomédical

2018

SIB text mining at TREC 2018 precision medicine track

Conférence ArODES

Emilie Pasche, Paul van Rijen, Julien Gobeill, Anaïs Mottaz, Luc Mottin, Patrick Ruch

Proceedings of the TREC 2018 Conference

Résumé:

The TREC 2018 Precision Medicine Track largely repeats the structure and evaluation of the 2017 track. The collection remains identical. Again, our team participated in the both tasks of the track: 1) retrieving scientific abstracts addressing relevant treatments for a given case and 2) retrieving clinical trials for which a patient is eligible. Regarding the retrieval of scientific abstracts, we queried all abstracts concerning one of the entities of the topic (i.e. the disease, the gene or the genetic variant) using various strategies (e.g. search in annotations of the collection, free text search using or not using synonyms, search in the MeSH terms, etc.). Then, for a given topic, the complete set of abstracts was based on the generation of different queries with decreasing levels of specificity. The idea was to start with a very specific query containing gene, disease and variant, from which less specific queries would be inferred. Abstracts were then re-ranked based on different strategies to favor abstracts that we considered more relevant to the given task. In 2017 we tested the use of drug densities to identify abstracts related to treatment. For this year we refined this strategy by giving more weight to drugs related to cancer treatment. Secondly, we used demographic information to favor abstracts concerning patients of the specified age-group and gender, and disfavoring abstracts targeting other age-group or gender patients. For the third strategy we utilized a word-level convolutional neural network to increase the rank of abstracts related to precision medicine. The fourth strategy consisted to expand the query to parent and children diseases. Finally, we tested an exact run which only retrieved abstracts respecting all information given in the topic. Results showed that all strategies but the last one resulted in some improvement of the retrieval power of the engine. As expected, our final run, focusing of precision, resulted in our best results regarding precision at rank 10, while other measures were negatively impacted. Regarding the retrieval of scientific abstracts, we boosted our last year’s approach – which achieved competitive results – with supplementary strategies issued from other participants. Regarding the retrieval of clinical trials, we investigated filtering strategies for managing the condition (disease), and standard information retrieval for managing the gene and genetic variant. The results show that, despite the presence of a structured condition tag in the document, better performances are obtained when relaxing constraints: using synonyms and detecting the diseases in various fields, such as the summary.

2017

Customizing a variant annotation-support tool :

an inquiry into probability ranking principles for TREC precision medicine

Conférence ArODES

Emilie Pasche, Julien Gobeill, Luc Mottin, Anaïs Mottaz, Douglas Teodoro, Paul Van Rijen, Patrick Ruch

Proceedings of the Twenty-Sixth Text REtrieval Conference (TREC 2017)

2016

BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2 :

Multilingual Information Extraction

Conférence ArODES

Luc Mottin, Julien Gobeill, Anaïs Mottaz, Arnaud Gaudinat, Patrick Ruch

CEURS Workshop Proceedings, vol. 1609 - Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum

Effect of the named entity recognition and sliding window on the HONcode automated detection of HONcode criteria for mass health online content

Conférence ArODES

Celia Boyer, Ljiljana Dolamic, Patrick Ruch, Gilles Falquet

Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies

Julien Gobeill, Arnaud Gaudinat, Patrick Ruch

2015

Exploiting incoming and outgoing citations for improving Information Retrieval in the TREC 2015 Clinical Decision Support Track

Conférence ArODES

Proceedings of The 24th Text REtrieval Conference (TREC 2015)

Julien Gobeill, Arnaud Gaudinat, Patrick Ruch

Instance-based learning for tweet monitoring and categorization

Conférence ArODES

Experimental IR meets multilinguality, multimodality, and interaction

Patrick Ruch, Julien Gobeill, Arnaud Gaudinat, Emilie Pasche

2014

Full-texts representations with medical subject headings, and co-citations network reranking strategies for TREC 2014 clinical decision support track

Conférence ArODES

In : Proceedings of Text REtrieval Conference (TREC), Washington, USA, November 19-21 2014. 5 p

Patrick Ruch, Julien Gobeill, Arnaud Gaudinat

Instance-based learning for tweet categorization in CLEF REPLAB 2014

Conférence ArODES

In : Proceedings of Conference and Labs of the Evaluation Forum (CLEF), Sheffield, United Kingdom, 15-18 september 2014. P.1491-1499

2013

Using a question-answering approach in machine reading task of biomedical texts about the Alzheimer disease

Conférence ArODES

Dina Vishnyakova, Julien Gobeill, Patrick Ruch

In: Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013. CEUR Workshop proceedings, 2013, vol.?1179, 6 p.

ToxiCat : hybrid named entity recognition services to support curation of the Comparative Toxicogenomic Database

Conférence ArODES

Julien Gobeill, Emilie Pasche, Dina Vishnyakova, Patrick Ruch

In: Proceedings of the 4th BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, October 7-9 2013, vol.?1, p.?108-113

BiTeM/SIBtex group proceedings for BioCreative IV, Track 4 : Gene Ontology curation

Conférence ArODES

Julien Gobeill, Emilie Pasche, Vishnyakova Dina, Patrick Ruch

In: Proceedings of the 4th BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, October 7-9 2013, vol.?1, p.?108-113

2012

Selection of relevant articles for curation for the comparative toxicogenomic database

Conférence ArODES

Dina Vishnyakova, Emilie Pasche, Patrick Ruch

In : proceedings of Biocreative workshop, 2012, Washington, USA, april 4-5, p. 31-38

Julien Gobeill, Patrick Ruch

Bitem site report for the claims to passage task in CLEF-IP 2012

Conférence ArODES

In : proceedings of CLEF initiative, 2012, Rome, Italy, september 17-20

2011

Pathogens and genome normalization for literature-based knowledge discovery

Conférence ArODES

Dina Vishnyakova, Emilie Pasche, Douglas Teodoro, Patrick Ruch, Christian Lovis

In : proceedings of 23rd International conference of the european federation for medical informatics user centred networked health Care, MIE 2011, Oslo, Norway

Bitem group report for TREC medical records track 2011

Conférence ArODES

Julien Gobeill, Arnaud Gaudinat, Patrick Ruch, Emilie Pasche, Douglas Teodoro, Dina Vishnyakova

Proceedings of the Twentieth text retrieval conference, TREC, 2011

KART, a knowledge authoring and refinement tool for clinical guidelines development

Conférence ArODES

Emilie Pasche, Douglas Teodoro, Julien Gobeill, Dina Vishnyakova, Patrick Ruch, Christian Lovis

BMC proceedings, 2011, vol. 5, p. 49

Bitem group report for TREC chemical IR track 2011

Conférence ArODES

Julien Gobeill, Arnaud Gaudinat, Patrick Ruch, Emilie Pasche, Douglas Teodoro, Dina Vishnyakova

In : proceedings of the Twentieth text retrieval conference, TREC, 2011

2008

From episodes of care to diagnosis codes : automatic text categorization for medico-economic encoding

Conférence ArODES

Patrick Ruch, Julien Gobeill, Imad Tbahritia, Antoine Geissbuehler

AMIA Annual Symposium proceedings, 2008, vol. 2008, p. 636-640