Artificial intelligence discovers new genes
Metagenomics is a branch of molecular biology that deals with the study of genetic material (DNA and RNA) contained in a complex mixture of microorganisms in different environments. This analysis allows the study of microbial biodiversity and is crucial for understanding complex ecosystems and their functions. During the analysis of DNA from the environment, genome assembly, annotation, and gene prediction are performed. Exons (stretches of DNA containing information for protein formation) are an important element of genome annotation, while introns (stretches of DNA not containing this information) can cause some difficulties in gene annotation. Incorrect inclusion or omission of an intron can lead to misidentification of exon boundaries, which will affect the prediction of protein function.
The potential of metagenomics in prokaryotic taxons is enormous. However, gene identification and annotation of the eukaryotic genome are limited, complicating the exploration of environments dominated by eukaryotes, especially fungi. This fact motivated a group of scientists to develop an algorithm to improve the annotation of fungal genomes. The resulting tool, called SVMmycointron, predicts introns and removes them from the DNA sequence.
To maximize precision, the study focused on the two most abundant fungal strains - Basidiomycota and Ascomycota. Together, these two strains comprise more than 93% of the fungal taxons observed. An algorithm based on machine learning (a subfield of artificial intelligence) collected information from publicly available fungal genomes. Based on already known intron sequences, potential paired intron excision sites are identified. These sites indicate where non-coding stretches might be located. Once identified, the algorithm carves introns from the analyzed DNA sequence. Removing introns increased the number of predicted genes by up to 9.1%. The genes that were newly identified after intron removal were most commonly assigned to fungi and other eukaryotes.
Improving genome annotation with algorithms such as SVMmycointron provides new insights into genetic diversity in microbial communities, with the potential to reveal previously unrecognized functions and relationships between genes in eukaryotic environments. This study thus represents a valuable step forward in the field of metagenomics and microorganism research, contributing to our knowledge of biodiversity and the evolution of eukaryotic genomes in different environments.
Yulia Dyachenko
Document Actions