Résumé:
The TREC 2018 Precision Medicine Track largely repeats the structure and evaluation of the 2017 track. The collection remains identical. Again, our team participated in the both tasks of the track: 1) retrieving scientific abstracts addressing relevant treatments for a given case and 2) retrieving clinical trials for which a patient is eligible. Regarding the retrieval of scientific abstracts, we queried all abstracts concerning one of the entities of the topic (i.e. the disease, the gene or the genetic variant) using various strategies (e.g. search in annotations of the collection, free text search using or not using synonyms, search in the MeSH terms, etc.). Then, for a given topic, the complete set of abstracts was based on the generation of different queries with decreasing levels of specificity. The idea was to start with a very specific query containing gene, disease and variant, from which less specific queries would be inferred. Abstracts were then re-ranked based on different strategies to favor abstracts that we considered more relevant to the given task. In 2017 we tested the use of drug densities to identify abstracts related to treatment. For this year we refined this strategy by giving more weight to drugs related to cancer treatment. Secondly, we used demographic information to favor abstracts concerning patients of the specified age-group and gender, and disfavoring abstracts targeting other age-group or gender patients. For the third strategy we utilized a word-level convolutional neural network to increase the rank of abstracts related to precision medicine. The fourth strategy consisted to expand the query to parent and children diseases. Finally, we tested an exact run which only retrieved abstracts respecting all information given in the topic. Results showed that all strategies but the last one resulted in some improvement of the retrieval power of the engine. As expected, our final run, focusing of precision, resulted in our best results regarding precision at rank 10, while other measures were negatively impacted. Regarding the retrieval of scientific abstracts, we boosted our last year’s approach – which achieved competitive results – with supplementary strategies issued from other participants. Regarding the retrieval of clinical trials, we investigated filtering strategies for managing the condition (disease), and standard information retrieval for managing the gene and genetic variant. The results show that, despite the presence of a structured condition tag in the document, better performances are obtained when relaxing constraints: using synonyms and detecting the diseases in various fields, such as the summary.