The Mpox Contextual Data Specification Package: A Data Curation Toolkit to Support Collaborative Pathogen Genomic Surveillance

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The Mpox virus (MPXV) is known to cause severe blisters, swollen lymph nodes, body aches, and other symptoms. Mpox mortality rates vary according to lineage and mostly impact children and immunocompromised individuals. A sudden increase in the number of cases worldwide prompted the WHO to declare a Public Health Emergency of International Concern in 2022, and again in 2024. Public health genomic surveillance of MPXV is ongoing, with a growing number of sequences available in public sequence repositories. Critical to genomic surveillance is well curated and harmonized contextual data - the sample metadata, epidemiological and clinical data, lab results, and method information that enables the interpretation of sequence data for public health responses and decision making. Contextual data, however, is often unstructured or highly variable in formats, granularity, and terminology. This variability usually requires a great deal of manual clean-up before it can be integrated and used for analysis, which can be laborious, time-consuming and error-prone. To facilitate harmonization of contextual data for genomic surveillance during the 2022 and 2024 epidemics, an MPXV contextual data specification was developed by the Centre for Infectious Disease Genomics and One Health (Simon Fraser University, Canada) in collaboration with several teams at Canada’s National Microbiology Lab (Public Health Agency of Canada (PHAC)) as well as provincial public health laboratories. The MPXV specification provides standardized ontology-based fields and terms for capturing information about MPXV samples and infections, and prioritizes geo-temporal, data provenance, and sampling strategy information for surveillance. The specification utilizes the same semantic framework as the contextual data standard developed by the Public Health Alliance for Genomic Epidemiology (PHA4GE) for SARS-CoV-2 and a specification developed by the Canadian inter-agency Genomics Research and Development Initiative for One Health Antimicrobial Resistance surveillance (AMR-GRDI2), thus demonstrating the adaptability of the core framework for additional infectious diseases. The specification has been implemented as a template within an open source, spreadsheet-style data harmonization application known as the DataHarmonizer which has been used previously for standardizing SARS-CoV-2 contextual data during the COVID-19 pandemic. The DataHarmonizer enables public health practitioners to put the specification into practice as it provides curation, validation and data transformation features and functions. The MPXV specification and DataHarmonizer templates are already in use to harmonize contextual data for MPXV (and other pathogens) genomic surveillance in Canada, and are freely available for international use. The MPXV specification adds to a growing library of interoperable, harmonized community consensus contextual data standards for public health pathogen genomics.

Related articles

Related articles are currently not available for this article.