E-mail | SIS | Moodle | Helpdesk | Libraries | cuni.cz | CIS More

česky | english Log in



Artificial intelligence discovers new genes

Our environment is full of organisms carrying genetic information, and analysing it is complicated. The discovery of new genes in microorganisms opens the door to a fascinating world of genetic diversity. A team of scientists, including Martin Pospíšek from The Department of Genetics and Microbiology at the Faculty of Science, Charles University, has been involved in developing an algorithm that helps us better understand the complex world around us.

Metagenomics is a branch of molecular biology that deals with the study of genetic material (DNA and RNA) contained in a complex mixture of microorganisms in different environments. This analysis allows the study of microbial biodiversity and is crucial for understanding complex ecosystems and their functions. During the analysis of DNA from the environment, genome assembly, annotation, and gene prediction are performed. Exons (stretches of DNA containing information for protein formation) are an important element of genome annotation, while introns (stretches of DNA not containing this information) can cause some difficulties in gene annotation. Incorrect inclusion or omission of an intron can lead to misidentification of exon boundaries, which will affect the prediction of protein function.

Illustration of DNA. Source: Freepik.com

 

The potential of metagenomics in prokaryotic taxons is enormous. However, gene identification and annotation of the eukaryotic genome are limited, complicating the exploration of environments dominated by eukaryotes, especially fungi. This fact motivated a group of scientists to develop an algorithm to improve the annotation of fungal genomes. The resulting tool, called SVMmycointron, predicts introns and removes them from the DNA sequence.

Pictures: Wastewater as one of the possible environments to study. Source: Wikipedia

To maximize precision, the study focused on the two most abundant fungal strains - Basidiomycota and Ascomycota. Together, these two strains comprise more than 93% of the fungal taxons observed. An algorithm based on machine learning (a subfield of artificial intelligence) collected information from publicly available fungal genomes. Based on already known intron sequences, potential paired intron excision sites are identified. These sites indicate where non-coding stretches might be located. Once identified, the algorithm carves introns from the analyzed DNA sequence. Removing introns increased the number of predicted genes by up to 9.1%. The genes that were newly identified after intron removal were most commonly assigned to fungi and other eukaryotes.

Improving genome annotation with algorithms such as SVMmycointron provides new insights into genetic diversity in microbial communities, with the potential to reveal previously unrecognized functions and relationships between genes in eukaryotic environments. This study thus represents a valuable step forward in the field of metagenomics and microorganism research, contributing to our knowledge of biodiversity and the evolution of eukaryotic genomes in different environments.

Yulia Dyachenko

Le AV, Větrovský T, Barucic D, Saraiva JP, Dobbler PT, Kohout P, Pospíšek M, da Rocha UN, Kléma J, Baldrian P. Improved recovery and annotation of genes in metagenomes through the prediction of fungal introns. Mol Ecol Resour. 2023 Nov;23(8):1800-1811. doi: 10.1111/1755-0998.13852

Published: Jan 08, 2024 02:50 PM

Document Actions