□ Novel 4D tensor decomposition-based approach integrating tri-omics profiling data can identify functionally relevant gene clusters
https://www.biorxiv.org/content/10.64898/2026.03.19.712900v1
Tri-omics profiles are transformed into a tensor to which tensor
decomposition is applied. After features and genes are selected, enrichment analysis and generative AI analysis are performed. Finally, various functional clusters are derived.
The three layers were organized into a tensor and analyzed to extract singular value vectors representing coordinated variation across omics, conditions, and replicates.
This approach distinguished patterns consistent with ribosome stacking, in which the transcriptome and translatome increase while the proteome decreases, from those of translational buffering, in which the proteome remains stable despite variations in upstream layers.
□ scSAGA: Single-cell Sampled Gromov Wasserstein Alignment for Scalable and Memory-efficient Integration of Multi-modal Single Cell Data
https://www.biorxiv.org/content/10.64898/2026.03.26.714573v1
scSAGA (Single-Cell Sampled Gromov-Wasserstein Alignment) is a geometry-preserving, scalable, and memory-efficient method for integrating paired and unpaired scRNA-seq and scATAC-seq datasets.
scSAGA combines sparse kNN graph geometry with on-demand geodesic distances, plan-guided sampled Gromov-Wasserstein optimization, and a matrix-free joint embedding computed via sparse iterative linear algebra.
□ SpacerScope: Binary-vectorized, genome-wide off-target profiling for RNA-guided nucleases without prior candidate-site bias
https://www.biorxiv.org/content/10.64898/2026.03.28.715005v1
SpacerScope integrates binary vectorization with a bitwise-operation-based pre-filtering mechanism, eliminating the need for biased candidate-site reduction. It identifies candidate spacer sites satisfying PAM constraints from both the query sequence and the reference genome.
SpacerScope employs different optimized encoding strategies for different search tasks: in mismatch search, it uses compact 2-bit sequence encoding and batch scanning for rapid Hamming-distance evaluation.
SpacerScope first encodes sequences into multi-channel binary features, then uses channel filtering and batch mask fusion to greatly compress the candidate set before performing exact alignment on the retained small subset.
SpacerScope employs right-end-anchored edit-distance dynamic programming, rather than simple local alignment, making it better suited to CRISPR spacer scenarios where insertions, deletions, and terminal constraints coexist.
□ BioPathNet: Enhancing link prediction in biomedical knowledge graphs with BioPathNet
https://www.nature.com/articles/s41551-025-01598-z
BioPathNet, a message-passing neural network for path representation learning, built on neural Bellman–Ford network (NBFNet).
As opposed to node-embedding approaches, BioPathNet uses path-based reasoning to learn representations between source and target nodes on the basis of relations along the path.
□ Chiron3D: an interpretable deep learning framework for understanding the DNA code of chromatin looping
https://www.biorxiv.org/content/10.64898/2026.03.20.713211v1
Chiron3D, a DNA-only attention model initialized with Borzoi embeddings designed to predict CTCF HiChIP contact maps.
Chiron3D is competitive with baselines that use CTCF ChIP-seq as additional input, while enabling nucleotide-level attribution of predictions to the input DNA sequence. It provides mechanistic insights into the physical control of loop dynamics.
□ scMagnifier: resolving fine-grained cell subtypes via GRN-informed perturbations and consensus clustering
https://www.biorxiv.org/content/10.64898/2026.03.26.714385v1
scMagnifier, a consensus clustering framework that leverages gene regulatory network (GRN)-informed in silico perturbations to amplify subtle transcriptional differences and uncover latent cell subpopulations.
scMagnifier perturbs candidate transcription factors (TFs), propagates perturbation effects through cluster-specific GRNs to simulate post-perturbation expression profiles, and integrates clustering results across multiple perturbations into stable subtype assignments.
scMagnifier introduces regulatory perturbation consensus UMAP (rpcUMAP), a perturbation-aware visualization that provides clearer separation between cell subtypes and guides the selection of the optimal number of clusters.
□ STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings
https://doi.org/10.1093/bioinformatics/btag146
STAR-GO, a Transformer-based framework embedding structural / semantic characteristics of GO terms for zero-shot prediction. STAR-GO integrates hierarchical relations and textual definitions, aligning ontology-informed embeddings w/ protein sequences to predict unseen functions.
STAR-GO refines GO term embeddings derived from a language model via a structure-recovering autoencoder trained with multi-task supervision, preserving both semantic similarity and hierarchical dependencies for zero-shot inference without retraining.
STAR-GO incorporates these enriched embeddings into an encoder–decoder transformer, where GO terms are decoded in topological order using causal self attention and linked to protein embeddings through cross attention.
□ GEMINI: Genetically encoded assembly recorder temporally resolves cellular history
https://www.nature.com/articles/s41586-026-10323-y
GEMINI (granularly expanding memory for intracellular narrative integration), an in cellulo recording platform that leverages a computationally designed protein assembly as an intracellular memory device to record the history of individual cells.
GEMINI functions like molecular ‘tree rings’: as the assembly expands through continued subunit addition, it lays down successive fluorescent layers that encode timing and amplitude of cellular events.
□ DeepLMI: Deep Feature Mining with a Globally Enhanced Graph Convolutional Network for Robust lncRNA–miRNA Interaction Prediction
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag145/8541986
For IncRNAs, DeepLMI combines sequence pre-training with self-attention mechanisms to learn multi-scale semantic representations; for miRNAs, DeepLMI fuses heterogeneous features through a graph convolutional encoder.
To further address the sparsity and structural complexity of known RNA interaction networks, DeepLMI employs a Global-Enhanced Graph Convolutional Network (GE-GCN) that jointly models local neighborhood information and global topological signals.
□ GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data
https://www.biorxiv.org/content/10.64898/2026.03.24.713924v1
GraphHDBSCAN*, a graph-based, hyperparameter-free extension of HDBSCAN* that performs hierarchical density-based clustering on a graph representation of the data, enabling robust recovery of both single-level and hierarchical relationships in high-dimensional and sparse datasets.
GraphHDBSCAN* offers an alternative by combining graph topology and density directly on a sparse neighborhood graph, without requiring a learned low-dimensional representation.
Using weighted structural similarity transformation and a graph-adapted HDBSCAN* accelerated with CORE-SG, it shifts computation from full distance space while preserving density strengths: handling heterogeneous densities, identifying noise and producing interpretable hierarchy.
□ NodeGWAS: Leveraging Graph Pangenomes for Sensitive and Accurate Association Analysis in Diverse Diploid and Polyploid Species
https://www.cell.com/plant-communications/fulltext/S2590-3462(26)00143-4
NodeGWAS, a novel genotyping framework that works directly on the graph pangenome and fills a critical gap in applying GWAS to species with high genetic diversity or complex structural variation – both of which create mappability challenges.
In a graph pangenome, each node represents a non-redundant, variable-length "DNA words" (or variable-length k-mers) that captures sequence across multiple genomes.
NodeGWAS can mitigate the alignment bias caused by using a single reference, capture comprehensive genetic diversity, and effectively resolve alignment challenges in polyploids.
NodeGWAS effectively sidestepped common potential errors associated with traditional polyploid genotyping while also maintaining the accuracy and completeness of genetic information, by using node coverage/counts as GWAS predictors.
□ EvoRMD: Integrating Biological Context and Evolutionary RNA Language Models for Interpretable Prediction of RNA Modifications
https://www.biorxiv.org/content/10.64898/2026.03.22.713386v1
EvoRMD integrates contextual sequence embeddings from a large-scale RNA language model with structured biological metadataincluding species, organ, cell type, and subcellular localization.
EvoRMD employs a shared multi-class classifier to generate a context-conditioned plausibility distribution over eleven modification types, consistent with the single-positive, multiple-unlabeled setting, producing calibrated multi-label predictions via sigmoid-transformed logits.
□ GRNFormer: Accurate Gene Regulatory Network Inference Using Graph Transformer
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag144/8540455
GRNFormer integrates a transformer-based gene expression encoder (Gene-Transcoder) with a variational graph autoencoder (GraViTAE) employing pairwise attention to jointly learn the representations of genes (nodes) and their co-expression relationships (edges).
GRNFormer employs TFWalker, a transcription factor-centered de novo subgraph sampling approach that constructs localized gene co-expression subgraphs from a full gene co-expression network (GCEN), capturing the neighborhood context around each transcription factor.
□ TimeVault: A molecular time machine for single cells
https://www.cell.com/molecular-cell/abstract/S1097-2765(26)00133-4
TimeVault acts as a durable, inducible recording device that captures snapshots of transcriptional activity, allowing researchers to see what genes were active before a cellular decision or treatment, rather than just the final state.
TimeVault works by fusing poly(A)-binding protein (PABP) with vault proteins, forcing the sequestration of messenger RNA (mRNA) into the stable, naturally occurring, hollow vault structures in the cytoplasm. TimeVault tracks cell lineage and transient states over long periods.
□ Why phylogenies compress so well: combinatorial guarantees under the Infinite Sites Model
https://www.biorxiv.org/content/10.64898/2026.03.18.712055v1
The Infinite Sites Model (ISM) instantiates a perfect phylogeny for genomes and point mutations, producing a rooted binary tree.
Binary matrices representing genome collections: SNP, k-mer, unitig and unique-row matrices derived from ISM-compliant genomes satisfy the four-gamete condition and are thus ISM-compliant, inheriting the additive structure between columns which can be recovered via Neighbor Joining.
□ evedesign: accessible biosequence design with a unified framework
https://www.biorxiv.org/content/10.64898/2026.03.17.712115v1
EveDesign frames biomolecular design as a conditional modeling problem: the user specifies a molecular system composed of individual entities such as proteins, DNA or RNA chains, or ligands.
EveDesign incorporates any known information about each entity, such as primary sequence, 3D structure, homologs, binding partners, post-translational modifications, or multimeric state, supplied as a standardized declarative data structure.
□ LongcallD: joint calling and phasing of small, structural and mosaic variants from long reads
https://www.biorxiv.org/content/10.64898/2026.03.20.713111v1
LongcallD explicitly distinguishes clean and noisy genomic regions, applies haplotype-aware multiple sequence alignment within noisy regions to derive consensus sequences, and integrates clean- and noisy-region variant calls through an iterative phasing procedure.
By leveraging established haplotype information together with a stringent, context-aware filtering strategy, longcalID distinguishes true mosaic mutations from long-read sequencing artifacts.
□ scComm: a contrastive learning framework for deciphering cell–cell communications at single-cell resolution
https://link.springer.com/article/10.1186/s13059-026-04043-9
scComm applies a data-adaptive weighting and scoring module to assign weight to L-R pairs according to their importance and employs a supervised contrastive learning framework to detect significant CCC events.
scComm generates a cell feature matrix where each element of the cell feature vector represents the interaction scores between the specific cell and other cell types on the given L-R pairs.
□ Super Bloom: Fast and precise filter for streaming k-mer queries
https://www.biorxiv.org/content/10.64898/2026.03.17.712354v1
Super Bloom Filteris a Bloom filter variant designed for streaming k-mer queries on biological sequences, using minimizers to group adjacent k-mers into super-k-mers and assigning all k-mers of a group to the same memory block.
Super Bloom thereby amortizes random accesses over consecutive k-mer queries, improves cache efficiency, and combines this layout with the findere scheme to reduce false positives by requiring consistent evidence across overlapping subwords.
□ ggvariant: Tidy, 'ggplot2'-Native Visualization for Genomic Variants
https://cran.r-project.org/web/packages/ggvariant/index.html
ggvariant, a simple, opinionated toolkit for visualizing genomic variant data using a ggplot2. Accepts VCF files or plain data frames and produces publication-ready lollipop plots, consequence summaries, mutational spectrum charts, and cohort-level comparisons with minimal code.
read_vcf() parses standard VCF v4.x files — including gzipped files and multi-sample VCFs — and returns a tidy data frame called a gvf object. Functional annotations from SnpEff (ANN) or VEP (CSQ) INFO fields are extracted automatically.
□ scCChain: Mapping spatial cell-cell communication programs by tailoring chains of cells for transformer neural networks
https://www.biorxiv.org/content/10.64898/2026.03.18.712664v1
scCChain, a transformer-based framework that integrates ligand-receptor activity into spatially resolved communication programs and localizes hotspots at spot and single-cell resolution.
scCChain derives programs using structured dimensionality reduction. It samples program-specific communication chains by linking transcriptionally similar sender cells to receivers via weighted random walks on a distance-informed cell graph, borrowing signal from neighbors.
Transformer-based modeling then scores chains to prioritize communication programs and pinpoint hotspots across the tissue. scCChain supports both exploratory communication program discovery and targeted analysis of user-specified ligand-receptor pairs.
□ Helicase: Vectorized parsing and bitpacking of genomic sequences
https://www.biorxiv.org/content/10.64898/2026.03.19.712912v1
Helicase, a high-throughput Rust library for parsing FASTA and FASTQ files that exploits SIMD vectorization to maximize single-threaded throughput on both x86 and ARM.
At the core of Helicase is a vectorized lexing stage based on bitmask classifiers derived from the theory of counter-free automata.
□ aaKomp: Alignment-free amino acid k-mer matching for genome completeness assessment at scale
https://www.biorxiv.org/content/10.64898/2026.03.19.713078v1
aaKomp employs aaHash, a recursive hashing algorithm with BLOSUM62-based substitution tolerance, combined with a multi-index Bloom filter (miBf) for efficient k-mer storage and querying.
aaKomp bypasses sequence alignment entirely while maintaining robust gene detection when there is sequence divergence between the analyzed genome and reference protein set.
aaKomp computes a proportional completeness score that provides a finer resolution than threshold-based classifications and supports user-defined gene sets for customized assessments across any organism or lineage.
□ REGEN: Learning gene interactions from tabular gene expression data using Graph Neural Networks
https://www.biorxiv.org/content/10.64898/2026.03.19.712949v1
REGEN (REconstruction of GEne Networks), a GNN-based framework that simultaneously learns latent gene interaction networks from bulk transcriptomic profiles and predicts patient vital status.
REGEN employs an efficient kNN based method to perform graph-level classification tasks. It uses a standard gradient based interpretability pipeline using the Integrated Gradients algorithm.
□ Rastair: an integrated variant and methylation caller https://www.biorxiv.org/content/10.64898/2026.03.19.712983v1
Rastair, an integrated software toolkit for simultaneous SNP detection and methylation calling from mC→T sequencing data such as those created with Watchmaker's TAPS+ and Illumina's 5-Base chemistries.
Rastair combines machine-learning-based variant detection with genotype-aware methylation estimation. rastair adjusts the estimated methylation depending on patient genotype for C→T SNPs at CpG sites.
Rastair clearly reports "de-novo" CpG positions that are formed by SPs, thus reporting nearly 500,000 additional CpG positions - relative to the reference sequence - for a typical human genome.
□ AutoGERN: Single-Cell RNA-Seq Gene Regulatory Network Inference via Explicit Link Modeling and Adaptive Architectures
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag143/8537914
AutoGERN explicitly models regulatory information in the message-passing space by learning expressive link (edge) embeddings, which are subsequently scored by a lightweight multilayer perceptron to infer TF-target interactions.
AutoGERN integrates two complementary message-passing spaces-intra-layer and inter-layer—capturing regulatory dependencies at multiple levels.
AutoGERN employs a robust, AutoGNN-based architecture search procedure that adapts the GNN design to the distributional characteristics of each dataset, mitigating the brittleness of one-size-fits-all architectures.
□ NLCD: A method to discover nonlinear causal relations among genes
https://www.biorxiv.org/content/10.64898/2026.03.20.713150v1
NLCD (NonLinear Causal Discovery) employs conditional importance scoring of features in a general nonlinear regression model, in order to generalize a key conditional independence test of causality from linear to nonlinear settings
NLCD extends a linear causal discovery method, Causal Inference Test (CIT), by conducting a series of permutation-based statistical tests of causality in the nonlinear setting.
The maximum p-value from these tests helps decide if data on a given triplet supports causality vs. independence between the traits. NLCD can flexibly work with any nonlinear regression model - Kernel Ridge Regression, Support Vector Regression, Artificial Neural Network models.
□ GraPhens: Solving the Diagnostic Odyssey with Synthetic Phenotype Data
https://www.biorxiv.org/content/10.64898/2026.03.19.712946v1
GraPhens, a simulation framework that uses gene-local HPO structure together with two empirically motivated soft priors, over the number of observed phenotypes per case and phenotype specificity, to generate synthetic phenotype–gene pairs that are novel yet clinically plausible.
Using these simulated cases to train GenPhenia, a graph neural network that operates on patient-specific phenotype sub graphs rather than flat phenotype sets. GenPhenia trained entirely on synthetic cases outperforms existing phenotype-driven prioritization methods.
□ FFC: A Scalable FASTA Compressor
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag132/8538003
FFC (Fast FASTA Compressor) is a highly-optimized multi-threaded tool, based on simple but DNA-dedicated ideas, attempting to obtain possibly high (de) compression speeds rather than possibly high compression ratios.
FFC achieves average compression speeds 4.7x and 11.4x higher than 2 high-performance compressors, zstd and NAF, respectively, across a benchmark set of seven single genomes. It also delivers average decompression speeds 3.5x and 2.7 x higher than zstd and NAF, respectively.
□ PACMON: Pathway-guided Multi-Omics data integration for interpreting large-scale perturbation screens
https://www.biorxiv.org/content/10.64898/2026.03.20.713295v1
PACMON (Pathway guided Multi-Omics data integration for interpreting large-scale perturbation screens), a Bayesian latent factor model that jointly infers pathway-level programs and their modulation by experimental perturbations.
PACMON decomposes multimodal molecular measurements into shared latent factors aligned with known biological pathways through structured sparsity priors, while simultaneously estimating how each perturbation activates or represses these pathway programs.
□ dreampy: Pseudobulk mixed-model differential expression for single-cell RNA-seq in Python
https://www.biorxiv.org/content/10.64898/2026.03.21.713408v1
dreampy, a native Python implementation of the dreamlet pseudobulk mixed-model workflow. dreampy reimplements the full pipeline — from pseudobulk aggregation and TMM normalization through voom precision weighting, mixed-model fitting, and empirical Bayes moderation.
dreampy supports both fixed-effect and random-effect formulas, dispatching to ordinary least squares or restricted maximum likelihood estimation as appropriate, and exposes every pipeline stage as an individual, inspectable function call.
□ ‘Zombie cells’ return from the dead — after a genome transplant
https://www.nature.com/articles/d41586-026-00938-6
Researchers have resurrected ‘dead’ bacterial cells by replacing their defunct DNA with the working genome of another species.
After killing Mycoplasma capricolumcells by chemically crosslinking their genome with Mitomycin C (MMC), they installed synthetic Mycoplasma mycoides genomes into the resulting dead cells using Whole Genome Transplantation (WGT).
During WGT, a synthetic donor genome is placed into a recipient cell, thereby reprogramming that cell to adopt a new genetic identity.
□ scellop: A Scalable Redesign of Cell Population Plots for Single-Cell Data
https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbag083/8533210
Scellop (previously CellPop) is an interactive visualization tool for cell type compositions. Scellop provides a flexible heatmap and side views with extending layered bar charts.
scellop is implemented in React, using visx to incorporate D3-based visualizations for various scales and axis rendering. scellop supports all desired interactions identified from our design study, including normalization, grouping and filtering.
□ PerturbGraph: Predicting Unseen Gene Perturbation Response Using Graph Neural Networks with Biological Priors
https://www.biorxiv.org/content/10.64898/2026.03.23.713780v1
PerturbGraph represents perturbations as stable transcriptional shift programs derived from pseudo-bulk perturbation signatures. PerturbGraph propagates information across the interaction network using message passing.
PerturbGraph integrates multiple sources of biological knowledge, incl. STRING interactions, graph embeddings obtained via Node2Vec, baseline transcriptional statistics, and Gene Ontology functional annotations.
□ Tripso: Self-supervised learning for a gene program-centric view of cell states
https://www.biorxiv.org/content/10.64898/2026.03.24.713961v1
Tripso (Transformers for learning Representations of Interpretable gene Programs in Single-cell transcriptOmics) learns contextualized gene, GP / cell embeddings, quantifies gene-and GP-level contributions to GPs and cell identity respectively, and enables discovery of novel GPs.
By combining the interpretability of GPs with the flexibility of transformer architectures, Tripso supports principled comparison of cellular states across tissues and conditions.
Tripso provides a scalable foundation for interpretable single-cell modeling that enables hypothesis generation and experimental optimization across development, disease, and engineered in vitro systems, and supports the development of interpretable virtual cell models.
□ Emergent Biological Realism in RL-Trained DNA Language Models
https://www.biorxiv.org/content/10.64898/2026.03.24.713963v1
Reinforcement learning substantially increases the probability of generating plasmids that pass their bioinformatics quality control (QC) pipeline, while supervised fine-tuning provides modest improvements.
When prompted with a simple start codon, the base model rarely produces a valid plasmid, whereas SFT provides modest improvements, and RL dramatically increases pass rates across all prompt types.
A similar pattern is observed in codon distribution and Gibbs free energy. Although not included in the reward function, the RL model generates tokens with codon distributions more similar to real plasmids than models trained on the correct distribution.
□ Hypergraph Representations of Single-Cell RNA Sequencing Data for Improved Cell Clustering
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag148/8551568
In a hypergraph random walk process, the walker alternates between transitioning from a hyperedge to a node and from a node to a hyperedge, following the node-to-edge transition probabilities and edge-to-node transition probabilities.
Dual-Importance Preference Hypergraph Walk (DIPHW) is a new hypergraph-based random walk algorithm that computes cell embeddings by considering the relative importance of genes to cells and cells to genes, incorporating a preference exponent to facilitate clustering.
CoMem-DIPHW (Co-expression and Memory-Integrated DIPHW) integrates two unipartite projections, the gene co-expression and cell co-expression networks, along with the cell-gene expression hypergraph derived from single-cell abundance count data into the random walk model.
CoMem-DIPHW leverages cell and gene co-expression networks to account for previously visited nodes and edges during transitions to capture both local expression relationships and global co-expression patterns.
□ A run-length-compressed skiplist data structure for dynamic GBWTs supports time and space efficient pangenome operations over syncmers
https://www.biorxiv.org/content/10.64898/2026.03.26.714584v1
A computational framework for efficient calculation over graph Burrows-Wheeler transforms (GBWTs), the corresponding data structure to the PBWT for pangenome graphs, and present a practical implementation.
Two doubly-linked skiplist variants over run-length-compressed BWTs support O(logN) rank, access, insert operations. These structures are used to store and search paths through a syncmer graph built from Robert C. Edgar’s closed syncmers, equivalent to a sparse de Bruijn graph.
□ Panmap: Scalable phylogeny-guided alignment, genotyping, and placement on pangenomes
https://www.biorxiv.org/content/10.64898/2026.03.29.711974v1
Panmap, a tool that leverages evolutionary structure to place, align, and genotype sequencing reads against mutation-annotated pangenomes containing up to millions of genomes.
Panmap introduces a phylogenetically compressed k-mer index that stores only sequence differences along branches, enabling efficient comparison of reads to both sampled genomes and inferred ancestors.
□ CCIDeconv: Hierarchical model for deconvolution of subcellular cell-cell interactions in single-cell data
https://www.biorxiv.org/content/10.64898/2026.03.26.714643v1
CCIDeconv effectively deconvolutes CCI scores into interactions primarily occurring in nucleus or cytoplasm leveraging information from sST. CCIDeconv provides insight into the location of CCI, and therefore can be used to further refine their communication scores.
The CCIDeconv communication score is a modified score derived from CellChat. LR pairs from the CellChatDB.human database are used to calculate the scores. The architecture is a hierarchical supervised machine learning framework with a classification followed by two regression models.
□ BCAR: A fast and general barcode-sequence mapper for correcting sequencing errors
https://www.biorxiv.org/content/10.64898/2026.03.27.714882v1
BCAR (Barcode Collapse by Aligning Reads) quickly aligns the sequencing reads associated with each barcode in a dataset and generate a maximum-likelihood consensus sequence, including confidence scores for each base call.
Reads are represented as arrays with entries corresponding to quality scores, then progressively aligned using a modified implementation of the Needleman-Wunsch algorithm.
At each step, the new read is aligned against the current consensus, using a scaled cosine similarity to determine match scores between ambiguous positions. Gaps are removed from the final alignment during consensus generation.
































































































































































