« Back

Zayene Oussama

Collaborateur scientifique HES

Main skills

Rédaction d'articles scientifiques

Main contract

Collaborateur scientifique HES

Desktop: HEIA_D20.19

Haute école d'ingénierie et d'architecture de Fribourg
Boulevard de Pérolles 80, 1700 Fribourg, CH

Institute
iCoSys - Institut des systèmes complexes

No data to display for this section

Completed

Chèque Innosuisse 'Video Protector Smart AID' - 54240.1 INNO-ICT

Role: Collaborator

Requérant(e)s: Hennebert Jean, FR- HEIA- Institut iCoSys

Financement: CTI; Morphean SA

Description du projet :

Several approaches for automatic scene description and visual object tracking have been analyzed and tested as essential components of an intelligent video surveillance technology. Two main use cases were addressed in this project: text-based search and image-based search. The first case consists in searching a specific target or activity in the video content based on a textual input query. The second case consists in detecting and tracking a person or a group of people, among other classes of interest, in single or multi-camera contexts.

Research team within HES-SO: Zayene Oussama , Chabbi Houda , Hennebert Jean

Partenaires académiques: Hennebert Jean, FR- HEIA- Institut iCoSys

Partenaires professionnels: Morphean SA

Durée du projet: 21.06.2021 - 30.09.2021

Statut: Completed

2018

Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video

Scientific paper ArODES

Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

IET Computer Vision, 2018, vol. 12, no. 5, pp. 710-719

Link to the publication

Summary:

This study presents a novel approach for Arabic video text recognition based on recurrent neural networks. In fact, embedded texts in videos represent a rich source of information for indexing and automatically annotating multimedia documents. However, video text recognition is a non-trivial task due to many challenges like the variability of text patterns and the complexity of backgrounds. In the case of Arabic, the presence of diacritic marks, the cursive nature of the script and the non-uniform intra/inter word distances, may introduce many additional challenges. The proposed system presents a segmentation-free method that relies specifically on a multi-dimensional long short-term memory coupled with a connectionist temporal classification layer. It is shown that using an efficient pre-processing step and a compact representation of Arabic character models brings robust performance and yields a low-error rate than other recently published methods. The authors’ system is trained and evaluated using the public AcTiV-R dataset under different evaluation protocols. The obtained results are very interesting. They also outperform current state-of-the-art approaches on the public dataset ALIF in terms of recognition rates at both character and line levels.

Open datasets and tools for arabic text detection and recognition in news video frames

Scientific paper ArODES

Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Journal of Imaging, 2018, vol. 4(2), no. 32

Link to the publication

Summary:

Recognizing texts in video is more complex than in other environments such as scanned documents. Video texts appear in various colors, unknown fonts and sizes, often affected by compression artifacts and low quality. In contrast to Latin texts, there are no publicly available datasets which cover all aspects of the Arabic Video OCR domain. This paper describes a new well-defined and annotated Arabic-Text-in-Video dataset called AcTiV 2.0. The dataset is dedicated especially to building and evaluating Arabic video text detection and recognition systems. AcTiV 2.0 contains 189 video clips serving as a raw material for creating 4063 key frames for the detection task and 10,415 cropped text images for the recognition task. AcTiV 2.0 is also distributed with its annotation and evaluation tools that are made open-source for standardization and validation purposes. This paper also reports on the evaluation of several systems tested under the proposed detection and recognition protocols.

DETECTION AND RECOGNITION OF ARTIFICIAL TEXT IN ARABIC NEWS VIDEOS

Doctoral thesis

Zayene Oussama

2018, Tunisia : University of Sfax

Prof. Najoua Essoukri Ben Amara, Prof. Jean Hennebert, Prof. Rolf Ingold

Link to the publication

Summary:

This thesis aims to contribute to the research field of Video Optical Character Recognition (VOCR) by developing novel approaches that automatically detect and recognize embedded Arabic text in news videos. We introduce a two-stage method for Arabic text detection in video frames. In the first stage, which represents the CC-based detection part of this method, text candidates are firstly extracted, then filtered and grouped by respectively applying the Stroke Width Transform (SWT) algorithm, a set of heuristic rules and a proposed textline formation technique. In the second stage, which represents the machine-learning verification part, we make use of Convolutional Auto-Encoders (CAE) and Support Vector Machines (SVM) for text/non-text classification.For text recognition, we adopt a segmentation-free methodology using multidimensional Recurrent Neural Networks (MDRNN) coupled with a Connectionist Temporal Classification (CTC) decoding layer. This system includes also a new preprocessing step and a compact representation of character models. We aim in this thesis to stand out from the dominant methodology that relies on hand-crafted features by using different deep learning methods, i.e. CAE and MDRNNs to automatically produce features. Initially, there has been no publicly available dataset for artificially embedded text in Arabic news videos. Therefore, creating one is unquestionable. The proposed dataset, namely AcTiV, contains 189 video clips recorded from a DBS system to serve as a raw material for creating 4,063 text frames for detection tasks and 10,415 cropped text-line images for recognition purposes. AcTiV is freely available for the scientific community. It is worth noting that the dataset was used as a benchmark for two international competitions in conjunction with the ICPR 2016 and ICDAR 2017 conferences, respectively.

2025

Truck classification on swiss highways using vision and weigh-in-motion systems

Conference ArODES

Oussama Zayene, Maël Vial, Marc-Antoine Fénart, Beat Wolf

AI days HES-SO '25

Link to the conference

Summary:

Weight-In-Motion (WIM) systems are crucial for detecting vehicle overloads and preventing infrastructure damage. However, their accuracy can be influenced by environmental factors and sensor limitations. This study proposes a vision-based approach for classifying heavy vehicles using the YOLOv5 deep learning model, providing an additional layer to verify and support WIM system outputs. Experimental results demonstrated test accuracy ranging from 96% to 100% for all truck classes. These findings highlight the potential of the proposed approach to improve WIM system reliability.

Are VLMs ready fo critical use cases ? Evaluating zero-shot performance with contextual prompts

Conference ArODES

Oussama Zayene, Vincent Audergon, Jean Hennebert, Houda Chabbi, Benoît de Raemy

AI days HES-SO '25

Link to the conference

Summary:

This study investigates Vision-Language Models (VLMs) for fire detection tasks, leveraging contextual prompts to assess their performance across various models. Notable results include, the Bunny model, which achieved a 76% F1-score, highlighting its effectiveness. These findings emphasize the impact of prompt engineering on performance while raising key questions about automating prompt optimization and selecting the most suitable VLMs based on task complexity, resource constraints, and real-world applicability.

2021

ICPR2020 competition on text detection and recognition in arabic news video frames

Conference ArODES

Oussama Zayene, Rolf Ingold, Najoua Essoukri BenAmara, Jean Hennebert

Proceedings of International Conference on Pattern Recognition, ICPR 2021: Pattern Recognition, CPR International Workshops and Challenges, 10-15 January 2021, Virtual Event

Link to the conference

Summary:

After the success of the two first editions of the “Arabic Text in Videos Competition—AcTiVComp”, we are proposing to organize a new edition in conjunction with the 25th International Conference on Pattern Recognition (ICPR’20). The main objective is to contribute in the research field of text detection and recognition in multimedia documents, with a focus on Arabic text in video frames. The former editions were held in the framework of ICPR’16 and ICDAR’17 conferences. The obtained results on the AcTiV dataset have shown that there is still room for improvement in both text detection and recognition tasks. Four groups with five systems are participating to this edition of AcTiVComp (three for the detection task and two for the recognition task). All the submitted systems have followed a CRNN-based architecture, which is now the de facto choice for text detection and OCR problems. The achieved results are very interesting, showing a significant improvement from the state-of-the-art performances on this field of research.

ICPR2020 Competition on Text Detection and Recognition in Arabic News Video Frames

Conference

AcTiVComp20

Zayene Oussama, Najoua Essoukri BenAmara, Rolf Ingold, Hennebert Jean

International Conference on Pattern Recognition, 10.01.2021 - 15.01.2021, Milan

Link to the conference

Summary:

After the success of the two first editions of the Arabic Text in Videos Competition|AcTiVComp" , we are proposing to organize a new edition in conjunction with the 25th International Conference on Pattern Recognition (ICPR'20). The main objective is to contribute in the research eld of text detection and recognition in multimedia documents, with a focus on Arabic text in video frames. The former editions were held in the framework of ICPR'16 and ICDAR'17 conferences. The obtained results on the AcTiV dataset have shown that there is still room for improvement in both text detection and recognition tasks. Four groups with ve systems are participating to this edition of AcTiV-Comp (three for the detection task and two for the recognition task). All the submitted systems have followed a CRNN-based architecture, which is now the de facto choice for text detection and OCR problems. The achieved results are very interesting, showing a signicant improvement from the state-of-the-art performances on this eld of research.

2017

ICDAR2017 competition on arabic text detection and recognition in multi-resolution video frames

Conference ArODES

Oussama Zayene, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 9-15 November 2017, Kyoto, Japan

Link to the conference

Summary:

This paper describes the multi-resolution Arabic Text detection and recognition in Video Competition-AcTiVComp held in the context of the 14 th International Conference on Document Analysis and Recognition (ICDAR' 2017), during November 9-15, 2017, in Kyoto, Japan. The main objective of this competition is to evaluate the performance of participants' algorithms for automatically detecting and recognizing Arabic texts in video frames using the freely available Arabic-Text-in-Video (AcTiV) dataset. A first edition was held in the framework of the 23 rd International Conference on Pattern Recognition (ICPR'2016). Three groups with five systems are participating to the second edition of AcTiVComp. These systems are tested in a blind manner on a closed-subset of the AcTiV database, which is unknown to all participants. In addition to the experimental setup and observed results, we also provide a short description of the participating groups and their systems.

2016

ICPR2016 contest on arabic text detection and recognition in video frames - AcTiVComp

Conference ArODES

Oussama Zayene, Nadia Hajjej, Sameh Masmoudi Touj, Soumaya Ben Mansour, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), 4-8 December 2016, Cancun, Mexico

Link to the conference

Summary:

This paper describes the AcTiVComp: detection and recognition of Arabic Text in Video competition in conjunction with the 23rd International Conference on Pattern Recognition (ICPR). The main objective of this competition is to evaluate the performance of participants' algorithms to automatically locate and/or recognize overlay text lines in Arabic video frames using the freely available AcTiV dataset. In this first edition of AcTiVComp, four groups with five systems are participating to the competition. In the detection challenge, the systems are compared based on the standard assessment metrics (i.e. recall, precision and F-score). The recognition results evaluation is based on the recognition rates at the character, word and line levels. The systems were tested in a blind manner on the closed-test set of the AcTiV dataset which is unknown to all participants. In addition to the test results, we also provide a short description of the participating groups and their systems.

Text detection in arabic news video based on SWT operator and convolutional auto-encoders

Conference ArODES

Oussama Zayene, Mathias Seuret, Sameh M. Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 11-14 April 2016, Santorini, Greece

Link to the conference

Summary:

Text detection in videos is a challenging problem due to variety of text specificities, presence of complex background and anti-aliasing/compression artifacts. In this paper, we present an approach for horizontally aligned artificial text detection in Arabic news video. The novelty of this method revolves around the combination of two techniques: an adapted version of the Stroke Width Transform (SWT) algorithm and a convolutional auto-encoder (CAE). First, the SWT extracts text candidates' components. They are then filtered and grouped using geometric constraints and Stroke Width information. Second, the CAE is used as an unsupervised feature learning method to discriminate the obtained textline candidates as text or non-text. We assess the proposed approach on the public Arabic-Text-in-Video database (AcTiV-DB) using different evaluation protocols including data from several TV channels. Experiments indicate that the use of learned features significantly improves the text detection results.

Data, protocol and algorithms for performance evaluation of text detection in Arabic news video

Conference ArODES

Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 21-23 March 2016, Monastir, Tunisia

Link to the conference

Summary:

Benchmark datasets and their corresponding evaluation protocols are commonly used by the computer vision community, in a variety of application domains, to assess the performance of existing systems. Even though text detection and recognition in video has seen much progress in recent years, relatively little work has been done to propose standardized annotations and evaluation protocols especially for Arabic Video-OCR systems. In this paper, we present a framework for evaluating text detection in videos. Additionally, dataset, ground-truth annotations and evaluation protocols, are provided for Arabic text detection. Moreover, two published text detection algorithms are tested on a part of the AcTiV database and evaluated using a set of the proposed evaluation protocols.

2015

A dataset for Arabic text detection, tracking and recognition in news videos- AcTiV

Conference ArODES

Oussama Zayene, Jean Hennebert, Sameh Masmoudi Touj, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 23-26 August 2015, Tunis, Tunisia

Link to the conference

Summary:

Recently, promising results have been reported on video text detection and recognition. Most of the proposed methods are tested on private datasets with non-uniform evaluation metrics. We report here on the development of a publicly accessible annotated video dataset designed to assess the performance of different artificial Arabic text detection, tracking and recognition systems. The dataset includes 80 videos (more than 850,000 frames) collected from 4 different Arabic news channels. An attempt was made to ensure maximum diversities of the textual content in terms of size, position and background. This data is accompanied by detailed annotations for each textbox. We also present a region-based text detection approach in addition to a set of evaluation protocols on which the performance of different systems can be measured.

2014

Semi-automatic news video annotation framework for Arabic text

Conference ArODES

Oussama Zayene, Sameh Masmoudi Touj, Jean Hennebert, Rolf Ingold, Najoua Essoukri Ben Amara

Proceedings of the 4th International Conference on Image Processing Theory, Tools and Applications (IPTA) 2014, 14-17 October 2014, Paris, France

Link to the conference

Summary:

In this paper, we present a semi-automatic news video annotation tool. The tool and its algorithms are dedicated to artificial Arabic text embedded in video news in the form of static text as well as scrolling one. It is performed at two different levels. Including specificities of Arabic script, the tool manages a global level which concerns the entire video and a local level which concerns any specific frame extracted from the video. The global annotation is performed manually thanks to a user interface. As a result of this step, we obtain the global xml file. The local annotation at the frame level is done automatically according to the information contained in the global metafile and a proposed text tracking algorithm. The main application of our tool is the ground truthing of textual information in video content. It is being used for this purpose in the Arabic Text in Video (AcTiV) database project in our lab. One of the functions that AcTiV provides, is a benchmark to compare existing and future Arabic video OCR systems.

Achievements

PEOPLE@HES-SO
Directory and Skills inventory

Zayene Oussama

Collaborateur scientifique HES

Main skills

Python

Vue.js

Docker

Applied Machine Learning

FastAPI

PyTorch

Rédaction d'articles scientifiques

Collaborateur scientifique HES

PEOPLE@HES-SO Directory and Skills inventory

Zayene Oussama

Collaborateur scientifique HES

Main skills

Python

Vue.js

Docker

Applied Machine Learning

FastAPI

PyTorch

Rédaction d'articles scientifiques

Collaborateur scientifique HES

PEOPLE@HES-SO
Directory and Skills inventory