□ EDEN: an evolution-scale DNA foundation model for designing programmable therapeutics
EDEN (environmentally-derived evolutionary network) includes a 28 billion parameter model trained on 9.7 trillion nucleotide tokens from BaseData. This dataset, at the time of training, contained more than 10 billion novel genes from over 1 million new species.
EDEN can generate diverse LSRs when prompted only the first 30% of a protein sequence and extends this capability to the design of site-specific recombinases guided by a short DNA prompt containing only the desired genomic target site.
□ A cross-population compendium of gene–environment interactions
https://www.nature.com/articles/s41586-025-10054-6
A cross-population atlas of gene–environment interactions comprising 440,210 individuals from European and Japanese populations, with replication in 539,794 individuals from diverse populations.
By decomposing the contributions from age, sex and lifestyles, we delineate the aetiology of these gene–environment interactions, including a reverse-causality from a disease-related dietary change.
□ BIOS: Rethinking the AI Scientist: Interactive Multi-Agent Workflows for Scientific Discovery
https://x.com/bioaidevs/status/2016544598529638492
BIOS is powered by Deep Research, an iterative workflow that completes research cycles in minutes rather than hours. Where batch-processing systems like Kosmos require extended runtimes before producing results, Deep Research lets you steer the investigation as insights emerge.
BIOS incorporates a specialized data analysis agent designed to autonomously process datasets through iterative code generation and execution. This subsystem operates through a multi-node workflow architecture that decomposes complex analytical tasks into discrete computation.
□ AlphaGenome: Advancing regulatory variant effect prediction
https://www.nature.com/articles/s41586-025-10014-0
AlphaGenome advances efforts to decipher the regulatory code of the genome, offering a unified sequence model that simultaneously predicts diverse functional genomic signals from megabase-scale DNA sequences.
AlphaGenome learns to reproduce predictions from frozen all-fold teacher models using augmented and mutationally perturbed input sequences. AlphaGenome simultaneously scores variant impacts across all predicted modalities in a single inference pass.
□ EDEN: multiscale expected density of nucleotide encoding for enhanced DNA sequence classification with hybrid deep learning
https://link.springer.com/article/10.1186/s12859-026-06367-6
EDEN (Expected Density of Nucleotide Encoding), a unified multiscale encoding framework based on kernel density estimation (KDE). EDEN captures position-specific and context-dependent nucleotide patterns and integrates them into a hybrid deep learning architecture.
EDEN transtorms symbolic sequences into spatially-aware density profiles using KDE. This representation bridges the conceptual divide between one-hot encoding and k-mer analysis by modeling nucleotide distributions across multiple biologically-relevant scales.
□ Negative global-scale association between genetic diversity and speciation rates in mammals
https://www.nature.com/articles/s41467-025-56820-y
This study provides a thorough characterization of the relationship between genetic diversity and speciation by assembling a dataset encompassing the entire extant mammalian radiation.
The dataset is based on a mitochondrial gene (cytochrome b). Compared to nuclear genes, mitochondrial genes are characterized by high mutation rates, low effective population sizes, strong purifying selection, the absence of recombination, and strong linkage.
□ BiOmics: A Foundational Agent for Grounded and Autonomous Multi-omics Interpretation
https://www.biorxiv.org/content/10.64898/2026.01.17.699830v1
BiOmics is a foundational agent that introduces a novel dual-track architecture, comprising a harmonized explicit reasoning space for grounded logic and a unified latent embedding space for high-dimensional association mapping.
BiOmics enables a transformative "Retrieving-Reasoning-Predicting" paradigm for purposeful, cross-scale inference traversing the biological hierarchy, from molecular variants to disease phenotypes.
□ scAURA: Alignment- and Uniformity-based Graph Debiased Contrastive Representation Architecture for Self-Supervised Clustering of Single-Cell Transcriptomics
https://www.biorxiv.org/content/10.64898/2026.01.25.701579v1
scAURA (single cell Alignment- and Uniformity-based Graph Debiased Contrastive Representation Architecture), a unified framework that integrates graph debiased contrastive learning with self-supervised clustering.
scAURA learns latent representations that are robust to noisy or biased graph construction while iteratively refining cluster assignments. scAURA employs an adaptive k-nearest neighbor (kNN) strategy that dynamically adjusts neighbor-hood size to capture rare cell-type clusters.
□ cellGeometry: ultra-fast single-cell deconvolution of bulk RNA-Seq using a geometric solution
https://www.biorxiv.org/content/10.64898/2026.01.24.701240v1
cellGeometry employs non-negative geometric deconvolution (NGD), an intuitive vector projection method featuring non-negative matrix regularisation in high-dimensional gene space. Using matrix operations, scales to massive datasets and is ultrafast.
Sphere scaling of genes means that each gene has equal weighting in the vector projection, so that the process is not dominated by the most highly expressed genes. Here, genes are dimensions, in contrast to signature generation where cell types are dimensions.
□ Chromnitron: Decoding the gene regulatory landscape through multimodal learning of protein-DNA interactions
https://www.biorxiv.org/content/10.1101/2025.08.17.670761v2
Chromnitron (Chromatin omni-modal transformer), a biologically grounded multimodal foundation model that learns the principles of protein-DNA interaction by integrating 3 core modalities: DNA sequence, cell-type-specific chromatin accessibility, and protein amino acid sequences.
Chromnitron reconstructs causal regulatory programs and discovers key regulatory factors during cell fate transition.
□ OptMini: Generating minimum-density minimizers
https://www.biorxiv.org/content/10.64898/2026.01.25.701585v1
OptMini optimized search method, which finds a minimum-density minimizer in time linear in w and doubly exponential in k (solving the ILP is doubly exponential in (k+W)).
OptMini can compute the average density over all minimizers in the same runtime OptMini works much faster than the runtime predicts due to several additional tricks shrinking the search space without harming optimality.
□ NanoSimFormer: An end-to-end Transformer-based simulator for nanopore sequencing signal data
https://www.biorxiv.org/content/10.64898/2026.01.20.700442v1
NanoSimFormer leverages a frozen basecalling model as a discriminator and is optimized via a multi-objective training strategy to produce signals specifically tuned for accurate basecalling.
NanoSimFormer derives the nucleotide sequences and low-resolution sequence-signal alignments, which map blocks of 6 signal timepoints to basecalled bases, thereby eliminating reliance on static k-mer pore models.
□ X-intNMF: A Cross- and Intra-Omics Regularized NMF Framework for Multi-Omics Integration
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag046/8442894
X-intNMF, a network-regularized NMF framework that simultaneously integrates intra- and cross-omics feature interactions into a shared low-dimensional representation.
X-intNMF leverages feature feature interaction networks to model both within-layer and between-layer relationships, enabling the incorporation of known biological interactions such as mRNA-miRNA regulatory links.
□ STransfer: A Transfer Learning-Enhanced Graph Convolutional Network for Clustering Spatial Transcriptomics Data
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag049/8442896
STransfer employs a dual-graph strategy to model both local spatial relationships using standard GCNs and global statistical dependencies through positive pointwise mutual information (PPMI)-based adjacency matrices.
STransfer captures fine-grained spatial interactions and long-range co-expression patterns. An attention-based module fuses features from multiple graphs into unified node embeddings, enabling low-dimensional embeddings that jointly encode gene expression and spatial context.
□ TeraTools: RLBWT-Based LCP Computation in Compressed Space for Terabase-Scale Pangenome Analysis
https://www.biorxiv.org/content/10.64898/2026.01.23.701410v1
The algorithms for the construction of run-length Burrows-Wheeler transform (RLBWT)-based compressed full text indexes and their supporting data structures in compressed space.
The algorithms have a space complexity of O(r) words and run in O(n) time for repetitive datasets, where r is the number of runs in the BWT, n is the length of the text, and repetitive datasets implies the average run length is at least log n.
At the core of the approach is a novel O(n)-time, O(r)-space algorithm for computing irreducible PLCP values from a sample of the inverse suffix array, enabling computation of LCP thresholds and summaries for advanced matching-statistics and maximal exact match queries.
□ COSMIC: Generative modeling reveals the connection between cellular morphology and gene expression
https://www.biorxiv.org/content/10.64898/2026.01.22.700673v1
COSMIC, a bidirectional generative framework that enables quantitative decomposition of transcriptional variance reflected in morphology and morphological variance explained by gene expression.
COSMIC builds on a foundation model trained on over 21 million segmented nuclei and couples it with existing transcriptomic embeddings. COSMIC leverages a newly generated multimodal dataset acquired using IRIS.
□ Sidewinder: Construction of complex and diverse DNA sequences using DNA three-way junctions
https://www.nature.com/articles/s41586-025-10006-0
Sidewinder, a DNA-assembly technique based on the DNA three-way junction (3WJ) that can be reliably applied towards the construction of any DNA sequence without limitation.
Sidewinder helix orthogonally winds up on the side of the final assembled sequence. It is not part of the final assembled sequence and therefore removes constraints on where assembly occurs, what sequences are being assembled and how many DNA fragments can be assembled at once.
□ A comprehensive survey of genome language models in bioinformatics
https://academic.oup.com/bib/article/27/1/bbaf724/8426124
A comprehensively survey of contemporary gLM architectures, including Transformer models, Hyena convolutions, and state space models, as well as various sequence tokenization strategies, assessing their applicability, and effectiveness across diverse genomic applications.
Single-nucleotide embedding is a fundamental tokenization strategy in gLMs: each nucleotide is treated as a token in a vocabulary, and is then embedded into a learnable dense vector.
□ Quantum spin resonance in engineered proteins for multimodal sensing
https://www.nature.com/articles/s41586-025-09971-3
MagLOV is a class of magneto-sensitive fluorescent proteins. Through directed evolution, it is possible to engineer these proteins to alter the properties of their response to magnetic fields and radio frequencies.
MagLOV exhibits optically detected magnetic resonance in living cells, at sufficiently high signal-to-noise for single-cell detection. These effects are explained through the radical-pair mechanism, which involves the protein backbone and a bound flavin cofactor.
□ A holocentric pangenome links karyotype evolution to meiotic recombination
https://www.biorxiv.org/content/10.64898/2026.01.17.700048v2
Chromosomal rearrangements frequently driven by Tyba-mediated fusions and fissions reconfigure recombination landscapes by altering chromosome size, chromatin loop architecture, and synapsis dynamics.
These chromosome fissions create karyotypes with smaller chromosomes folded into shorter loops, thereby increasing the axial substrate accessible for double-strand break formation and elevating recombination frequency.
□ JanusX: an integrated and high-performance platform for scalable genome-wide association studies and genomic selection
https://www.biorxiv.org/content/10.64898/2026.01.20.700366v1
JanusX significantly reduces memory overhead and computational time by restructuring the Linear Mixed Model (LMM) algorithms and implementing chunk-based streaming with multi-core parallel computing.
□ VAETracer: Mutation-Guided Lineage Reconstruction and Generational State Inference from scRNA-seq
https://www.biorxiv.org/content/10.64898/2026.01.19.700238v1
VAETracer reconstructs cellular lineages by extracting cellular generation index (CGI) from mutation profiles of 3′ UTR in scRNA-seq data, thus enabling the inference of developmental trajectories without relying on noisy mutation signals.
□ SPSID: A single-parameter shrinkage inverse-diffusion for denoising gene regulatory networks
https://www.biorxiv.org/content/10.64898/2026.01.19.700249v1
SPSID (Single-Parameter Shrinkage Inverse-Diffusion) employs a principled spectral filter, built upon a shrinkage-regularized inverse-diffusion operator, to mathematically distinguish direct, one-step interactions from multi-step, indirect paths.
□ orthogene: a Bioconductor package to easily map genes within and across hundreds of species
https://www.biorxiv.org/content/10.64898/2026.01.17.700094v1
orthogene integrates automated species and identifierstandardization, homolog inference across multiple databases, flexible handling of ambiguous homolog relationships, and transformation of gene lists, tables, and high-dimensional matrices into analysis-ready formats.
□ DIVAS: an R package for identifying shared and individual variations of multiomics data
https://www.biorxiv.org/content/10.64898/2026.01.12.698985v1
DIVAS (Data Integration Via Analysis of Subspaces) employs angle-based subspace analysis with rigorous statistical inference through rotational bootstrap. DIVAS hierarchically searches through all possible combinations of data modalities, providing complete decomposition of multiomics data into interpretable components.
□ Stack: In-Context Learning of Single-Cell Biology
https://www.biorxiv.org/content/10.64898/2026.01.09.698608v1
STACK, a foundation model trained on 149 million uniformly preprocessed human single cells that leverages tabular attention to generate representations for each cell informed by the cells in its context.
Stack makes use of a rectangular mask pre-training task that prevents simple imputation shortcuts and enforces single-cell level resolution. Within a mini-batch, a randomly sampled list of genes are masked for all cells.
Stack abstracts the latent state of each cell as an ensemble of token vectors. Tokens are generated by projecting gene expression vectors into a latent space. The module is trained end-to-end alongside the rest of the model, without relying on external gene semantic information.

□ CIPHER: An end-to-end framework for designing optimized aggregated spatial transcriptomics experiments
https://www.biorxiv.org/content/10.64898/2026.01.08.698503v1
CIPHER (Cell Identity Projection using Hybridization Encoding Rules) a neural-network framework that jointly optimizes the experimental encoding matrix and the downstream cell-type embedding.
CIPHER learns to encode high-dimensional gene expression data into a low-dimensional "bit" representation (projection space) that can be measured using multiplexed in situ hybridization probes.
CIPHER incorporates physical limits of imaging assays directly into its loss function, shaping the latent space to maximize discriminability. It performs a linear projection from gene space to latent space, directly mirroring the physical encoding of gene combinations in situ.
□ esloco: simulation-based estimation of local coverage in long-read DNA sequencing
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag009/8418384
esloco, a Monte Carlo-based simulation framework for estimating local coverage in long-read sequencing experiments, including scenarios with unknown target regions (e.g. viral integration, CRISPR-Cas9) or PCR-free designs (e.g. base modifications).
The simulation models the inherent variability in local coverage based on predefined whole-genome coverage levels and read length distributions.
□ veloAgent: Dissecting and steering cell dynamics using spatially-informed RNA velocity
https://www.biorxiv.org/content/10.64898/2026.01.09.698589v1
veloAgent, a deep generative and agent-based framework that estimates gene- and cell-specific transcriptional kinetics while integrating spatial information through agent-based simulations of local microenvironments.
veloAgent improves velocity accuracy and achieves sublinear memory scaling, enabling efficient analysis of large and multi-batch spatial datasets.
veloAgent incorporates an in silico perturbation module that enables targeted manipulation of spatial velocity vectors to simulate regulatory interventions and predict their impact on cell fate dynamics.
□ scSNViz: Visualization and analysis of Cell-Specific expressed SNVs
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag023/8425595
scSNViz, a tool for the exploration, quantification, and visualization of expressed SNVs from cell-barcoded scRNA-seq. It supports estimation of variant allele fractions, clustering of SNV expression profiles, and 2D/3D visualization of individual SNVs or user-defined SNV groups.
scSNViz facilitates investigation of cell-, cluster-, or lineage-specific variant expression patterns, as well as allelic dynamics including imprinting, random allele inactivation, and transcriptional bursting.
scSNViz interoperates seamlessly with established single-cell frameworks - Seurat for clustering, Slingshot for trajectory inference, scType for cell-type annotation, and CopyKat for copy-number profiling - enabling integrative multi-omic analyses of expressed variation.
□ NetCrafter: Ontology-derived gene network modeling and interpretation
https://www.biorxiv.org/content/10.64898/2026.01.16.699831v1
NetCrafter transforms enrichment results into quantitative semantic similarity scores between genes, enabling the creation of context-specific statistical networks. These networks can be further decomposed into optimal sub-networks, facilitating multi-functional interpretation.
NetCrafter calculates gene–gene semantic similarity as the probabilistic sum of all overlapping ontology terms; networks with higher semantic similarity are defined by applying increasingly stringent cutoffs and are visualized using a force-directed layout algorithm.
□ ChromoMapper: a new tool to quickly compare large genome assemblies
https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbag005/8418497
ChromoMapper uses the information provided about aligned blocks, combined with additional annotations, to represent the main alignment regions at chromosomal or sub-chromosomal scale.
ChromoMapper highlights similarities and collinearity between compared sequences, points of inconsistency, discontinuities, repeated regions and interruptions in the assembled sequences.
□ VIST: variational inference for single cell time series
https://link.springer.com/article/10.1186/s13059-025-03874-2
VIST (Variational Inference of Single cell Time series) implements a variational autoencoder VAE framework to simultaneously decompose the gene expression profile of each cell into time dependent and time-independent components in a low dimensional latent space.
The time independent component learned by the VAE thus encodes the identity of each cell, and is combined with a time dependent factor to reconstruct the full gene expression profile.
□ OKR-Cell: Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training
https://www.biorxiv.org/content/10.64898/2026.01.09.698573v1
OKR-CELL leverages Large Language Models (LLMs) based workflow with retrieval-augmented generation (RAG) enriches cell textual descriptions using open-world knowledge. It devises a Cross-modal Robust Alignment (CRA) objective that incorporates sample reliability assessment, curriculum learning, and coupled momentum contrastive learning to strengthen the model's resistance to noisy data.
OKR-CELL learns intra-modal cellular information using scGPT’s masked gene modeling objective; On the other hand, It aligns cell and textual representations in a shared embedding space to convey complementary information across modalities.
□ SDrecall: a sensitive approach for variant detection in segmental duplications
https://link.springer.com/article/10.1186/s13059-025-03928-5
SDrecall is designed to provide sensitive variant detection in segmental duplications (SDs) supplementary to state-of-the-art variant callers like GATK and DeepVariant.
SDrecall constructs a comprehensive SD catalog as a network to delineate groups of homologous regions sharing similar sequences and their convoluted relationships.
□ PathDiffusion: modeling protein folding pathway using evolution-guided diffusion
https://www.biorxiv.org/content/10.64898/2026.01.16.699856v1
PathDiffusion extracts structure-aware evolutionary information from 52 million predicted structures in the AlphaFold database. Then an evolution-guided diffusion model with a dual-score fusion strategy is trained to generate high-fidelity folding pathways.
□ Bayesian Inference of Gene Regulatory Networks at Stochastic Steady State
https://www.biorxiv.org/content/10.64898/2026.01.10.698684v1
A novel Bayesian inference approach based on using the Chemical Langevin Equation (CLE) as a model of gene expression dynamics at stochastic equilibrium.
Interactions in GRNs are sparse, it uses a regularized horseshoe prior enabling selective shrinkage of unsupported interactions while identifying strong regulatory edges.
The resulting posterior distributions determine network topology (presence/absence of edges), interaction types (activation or repression), and reaction rates, while providing principled uncertainty quantification for all inferred quantities.
□ GenCore: Genomic distance estimation using Locally Consistent Parsing
https://www.biorxiv.org/content/10.64898/2026.01.10.698768v1
GenCore processes each input genomic data set (reads or assembled genome) up to a specified Locally Consistent Parsing level to calculate the cores. It calculates the distance score for the input pair of genomic data sets to construct the distance matrix.
□ UniWave: A Waveform-Based Encoding Framework for Nucleic Acid Feature Extraction
https://www.biorxiv.org/content/10.64898/2026.01.12.698567v1
UniWave, a dynamic waveform-based encoding framework that converts discrete nucleotide sequences into biologically informative one-dimensional continuous waveforms through base mapping, windowed sine interpolation, and wavelet-based downsampling.
UniWave employs a learnable positional encoding module, WavePosition, which incorporates positional information to project the one-dimensional waveform into a two-dimensional continuous representation.
□ HERMES: Hierarchical Encoding of Regulatory Mechanisms and Expression Syntax by a foundational genomic sequence-to-function model
https://www.biorxiv.org/content/10.64898/2026.01.10.698352v1
HERMES (Hierarchical Encoding of Regulatory Mechanisms and Expression Syntax), a framework that progressively defines a fundamental sequence vocabulary for the complex regulatory genome and parses gene expression syntax into transparent biological insights.
HERMES is a foundational sequence-to-function model on a massive compendium of 137,127 functional genomics profiles spanning diverse cellular conditions: DNA methylation, transcription factor binding, polymerase binding, histone marks, chromatin accessibility and RNA expression.
□ LongFUSE: Fusion gene discovery in single cells from high throughput long read single cell transcriptomes
https://www.biorxiv.org/content/10.64898/2026.01.13.699333v1
LongFUSE devises XOR logic circuit to achieve fast identification of fusion candidates among large numbers of high-softclipping reads, followed by a selection process for true fusion genes with stringent filtering criteria.
□ LoopBin, a VaDE-based neural network for chromatin loop classification
https://www.biorxiv.org/content/10.64898/2026.01.13.699359v1
LoopBin, a framework based on a variational deep embedding (VaDE) neural network. LoopBin learns the most significant features that can be used for clustering (reconstruction). LoopBin is traind with loop-rich Micro-C from human cells and matching CUT&Tag epigenomic data.
VaDE architecture interprets the lower-dimensional representation in terms of different clusters: regularization process minimizing the KL-divergence. By jointly optimizing reconstruction and regularization loss functions, a continuous cluster-friendly latent space is learned.
□ AtlasMap: enabling low-cost, map-style exploration of million-cell single-cell atlases
https://www.biorxiv.org/content/10.64898/2026.01.14.699595v1
AtlasMap, a scalable visualization framework that overcomes these bottlenecks through a multi-resolution, tile-based architecture.
AtlasMap decouples visualization performance from dataset size. While point-based approaches incurred prohibitive client-side memory costs or failed entirely at the 11-million-cell scale, AtlasMap maintained sub-second startup latency and a negligible browser footprint.
□ DeepSpaceDB 2.0: an interactive spatial transcriptomics database for large-scale Xenium data exploration
https://www.biorxiv.org/content/10.64898/2026.01.15.699623v1
DeepSpaceDB integrates large-scale single-cell spatial transcriptomics data from the Xenium platform, systematically collecting 628 public datasets and processing them through a pipeline that validates, repairs, and harmonizes heterogeneous inputs into a unified representation.
□ CellFluxV2: An Image Generative Foundation Model for Virtual Cell Modeling
https://www.biorxiv.org/content/10.64898/2026.01.19.696785v1
CellFłuxV2 is to learn distribution-level transformations from unperturbed to perturbed cells within the same experimental batch using flow matching, enabling it to disentangle true perturbation effects from confounding batch effects.
CellFlux V2 effectively isolates true perturbational signals. It learns smooth, bidirectional transitions between control and perturbed states and that the transitional states match the true morphology of perturbed cells at intermediate time points.
□ Benchmarking Next-Generation Sequencing Platforms: A Comprehensive Comparison Of Single-Cell RNA-Seq from Ultima UG 100 vs. Illumina NovaSeq X Plus
https://www.biorxiv.org/content/10.64898/2026.01.16.699571v1
Ultima Genomics has introduced a platform leveraging a distinct, flow based "mostly natural sequencing-by-synthesis" (mnSBS) chemistry, where nucleotides are introduced sequentially rather than simultaneously, and a single-end, variable-length read architecture.
□ Harmonizing single-cell 3D genome data with STARK and scNucleome
https://link.springer.com/article/10.1186/s13059-026-03938-x
STARK is a powerful software suite capable of processing extensive datasets from various sc3DG-seq techniques. The framework implements computational optimizations including parallel processing and Monte Carlo simulation acceleration.
□ MetaFX: feature extraction from whole-genome metagenomic sequencing data
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag018/8431606
MetaFX - an open-source library for feature extraction from whole-genome metagenomic sequencing data and classification of groups of samples. MetaFX compares samples grouped by metadata criteria and constructs genomic features distinct for certain types of communities.
□ cyto: ultra high-throughput processing of 10x-flex single cell sequencing
https://www.biorxiv.org/content/10.64898/2026.01.21.700936v1
cyto exploits the fixed sequence geometry of Flex libraries through direct k-mer lookup rather than alignment-based mapping, and introduces IBU (Indexed-Barcode-UMI), a compact binary format for efficient read processing.
□ Optimizing sparse and skew hashing: faster k-mer dictionaries
https://www.biorxiv.org/content/10.64898/2026.01.21.700884v1
SSHash is based on a refined data structure that enables simpler and faster algorithms for streaming lookup queries. Compared with Burrows–Wheeler transform–based indexes with similar capabilities, such as SBWT and FMSI, SSHash is significantly faster to build and query.
□ ReCon: Modelling multicellular coordination by bridging cell-cell communication and intracellular regulation through multilayer networks
https://www.biorxiv.org/content/10.64898/2026.01.20.700561v1
ReCoN (REconstruction of multicellular COordination Networks from single-cell data) combines tissue- or cell type-specific intracellular regulatory networks with inferred ligand–receptor communication graphs in a multilayer network.
ReCoN formulates the direct and indirect effects across tissues separately, allowing for weighing their individual contributions.
□ Geometric Multidimensional Representation of Omic Signatures
https://www.biorxiv.org/content/10.64898/2026.01.26.701791v1
A geometric framework that reconceptualizes omic signatures as multidimensional informational entities whose biological meaning arises from structural organization rather than molecular membership alone.
This representation preserves internal organization and enables intrinsic geometric measurements - including barycenter distance, volume, anisotropy, and asymmetry - that quantify concordance, divergence, and latent complexity.
□ PIMO: Pathway-based Interpretable Multi-Omics interactions for multi-omics integration
https://www.biorxiv.org/content/10.64898/2026.01.27.702136v1
PIMO captures relationships among transcriptomics, DNA methylation, and copy number alterations. PIMO employs an interaction-aware network inspired by cross-attention mechanisms,leveraging key and query representations.
□ AniAnn's: alignment-free annotation of tandem repeat arrays using fast average nucleotide identity estimates
https://www.biorxiv.org/content/10.64898/2026.01.27.702063v1
AniAnn's, an algorithm for annotating large blocks of tandemly repeating DNAs. AniAnn's exploits the high Average Nucleotide Identity (ANI) shared between repeat units of the same array.
AniAnn's infers the location of satellite arrays based on the formation of squares along the main diagonal of the matrix.
□ scTREND: An annotation-free single-cell time-resolved and condition-dependent hazard model
https://www.biorxiv.org/content/10.64898/2026.01.26.701686v1
scTREND learns variational single-cell embeddings, deconvolves bulk or spatial samples without labels, and fits a conditional piecewise-constant hazard model to estimate cell-level hazard coefficients across discrete time bins and conditions.




























































































