Multimodal Deep Learning for Cyanobacteria Classification: A Fusion of CNN and Transformer Architectures

Maria Blanco Gonzalez-Mohino
Jesus Ruiz-Santaquiteria
Gabriel Cristobal
Elvira Perona
Jesus Salido
Gloria Bueno

0 evaluations Published on May 8, 2025

This article on Sciety

Abstract

Cyanobacteria play a fundamental role in aquatic ecosystems, contributing to global biogeochemical cycles and serving as indicators of environmental change. Their classification is critical for monitoring water quality, detecting harmful algal blooms and understanding ecosystem dynamics. However, accurate identification remains a major challenge due to their vast taxonomic diversity and significant morphological similarities. Visual inspection alone is often insufficient, highlighting the need for computational approaches to enhance classification accuracy. In this study, we present a multimodal deep learning model that combines convolutional neural networks (CNNs) for image-based feature extraction with bidirectional transformers for text embedding. These complementary features are fused via concatenation to improve species-level classification. To our knowledge, this is the first application of a multimodal neural architecture integrating CNNs and bidirectional transformers for cyanobacteria classification. We evaluate five CNN backbones of varying depth, resulting in eight model configurations. Performance is benchmarked against unimodal CNN models that rely solely on image data. The model is trained and validated on a dataset of 1 660 microscopic images and corresponding textual descriptions, covering nine cyanobacterial genera across three taxonomic orders. Results demonstrate the potential of multimodal deep learning to improve classification performance, supporting the development of scalable and accurate identification tools in microbiology and environmental monitoring.

Related articles are currently not available for this article.