lens, align.

lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

 □ Novel 4D tensor decomposition-based approach integrating tri-omics profiling data can identify functionally relevant gene clusters

 

https://www.biorxiv.org/content/10.64898/2026.03.19.712900v1

 

Tri-omics profiles are transformed into a tensor to which tensor

decomposition is applied. After features and genes are selected, enrichment analysis and generative AI analysis are performed. Finally, various functional clusters are derived.

 

The three layers were organized into a tensor and analyzed to extract singular value vectors representing coordinated variation across omics, conditions, and replicates.

 

This approach distinguished patterns consistent with ribosome stacking, in which the transcriptome and translatome increase while the proteome decreases, from those of translational buffering, in which the proteome remains stable despite variations in upstream layers.

 

 

□ scSAGA: Single-cell Sampled Gromov Wasserstein Alignment for Scalable and Memory-efficient Integration of Multi-modal Single Cell Data

 

https://www.biorxiv.org/content/10.64898/2026.03.26.714573v1

 

scSAGA (Single-Cell Sampled Gromov-Wasserstein Alignment) is a geometry-preserving, scalable, and memory-efficient method for integrating paired and unpaired scRNA-seq and scATAC-seq datasets. 

 

scSAGA combines sparse kNN graph geometry with on-demand geodesic distances, plan-guided sampled Gromov-Wasserstein optimization, and a matrix-free joint embedding computed via sparse iterative linear algebra.

 

 

□ SpacerScope: Binary-vectorized, genome-wide off-target profiling for RNA-guided nucleases without prior candidate-site bias

 

https://www.biorxiv.org/content/10.64898/2026.03.28.715005v1

 

SpacerScope integrates binary vectorization with a bitwise-operation-based pre-filtering mechanism, eliminating the need for biased candidate-site reduction. It identifies candidate spacer sites satisfying PAM constraints from both the query sequence and the reference genome.

 

SpacerScope employs different optimized encoding strategies for different search tasks: in mismatch search, it uses compact 2-bit sequence encoding and batch scanning for rapid Hamming-distance evaluation.

 

SpacerScope first encodes sequences into multi-channel binary features, then uses channel filtering and batch mask fusion to greatly compress the candidate set before performing exact alignment on the retained small subset.

 

SpacerScope employs right-end-anchored edit-distance dynamic programming, rather than simple local alignment, making it better suited to CRISPR spacer scenarios where insertions, deletions, and terminal constraints coexist.

 

 

□ BioPathNet: Enhancing link prediction in biomedical knowledge graphs with BioPathNet

 

https://www.nature.com/articles/s41551-025-01598-z

 

BioPathNet, a message-passing neural network for path representation learning, built on neural Bellman–Ford network (NBFNet).

 

As opposed to node-embedding approaches, BioPathNet uses path-based reasoning to learn representations between source and target nodes on the basis of relations along the path.

 

 

□ Chiron3D: an interpretable deep learning framework for understanding the DNA code of chromatin looping

 

https://www.biorxiv.org/content/10.64898/2026.03.20.713211v1

 

Chiron3D, a DNA-only attention model initialized with Borzoi embeddings designed to predict CTCF HiChIP contact maps. 

 

Chiron3D is competitive with baselines that use CTCF ChIP-seq as additional input, while enabling nucleotide-level attribution of predictions to the input DNA sequence. It provides mechanistic insights into the physical control of loop dynamics.

 

 

 □ scMagnifier: resolving fine-grained cell subtypes via GRN-informed perturbations and consensus clustering

 

https://www.biorxiv.org/content/10.64898/2026.03.26.714385v1

 

scMagnifier, a consensus clustering framework that leverages gene regulatory network (GRN)-informed in silico perturbations to amplify subtle transcriptional differences and uncover latent cell subpopulations.

 

scMagnifier perturbs candidate transcription factors (TFs), propagates perturbation effects through cluster-specific GRNs to simulate post-perturbation expression profiles, and integrates clustering results across multiple perturbations into stable subtype assignments.

 

scMagnifier introduces regulatory perturbation consensus UMAP (rpcUMAP), a perturbation-aware visualization that provides clearer separation between cell subtypes and guides the selection of the optimal number of clusters.

 

 

□ STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings 

 

https://doi.org/10.1093/bioinformatics/btag146

 

STAR-GO, a Transformer-based framework embedding structural / semantic characteristics of GO terms for zero-shot prediction. STAR-GO integrates hierarchical relations and textual definitions, aligning ontology-informed embeddings w/ protein sequences to predict unseen functions.

 

STAR-GO refines GO term embeddings derived from a language model via a structure-recovering autoencoder trained with multi-task supervision, preserving both semantic similarity and hierarchical dependencies for zero-shot inference without retraining.

 

STAR-GO incorporates these enriched embeddings into an encoder–decoder transformer, where GO terms are decoded in topological order using causal self attention and linked to protein embeddings through cross attention.

 

 

□ GEMINI: Genetically encoded assembly recorder temporally resolves cellular history

 

https://www.nature.com/articles/s41586-026-10323-y

 

GEMINI (granularly expanding memory for intracellular narrative integration), an in cellulo recording platform that leverages a computationally designed protein assembly as an intracellular memory device to record the history of individual cells.

 

GEMINI functions like molecular ‘tree rings’: as the assembly expands through continued subunit addition, it lays down successive fluorescent layers that encode timing and amplitude of cellular events.

 

 

□ DeepLMI: Deep Feature Mining with a Globally Enhanced Graph Convolutional Network for Robust lncRNA–miRNA Interaction Prediction

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag145/8541986

 

For IncRNAs, DeepLMI combines sequence pre-training with self-attention mechanisms to learn multi-scale semantic representations; for miRNAs, DeepLMI fuses heterogeneous features through a graph convolutional encoder.

 

To further address the sparsity and structural complexity of known RNA interaction networks, DeepLMI employs a Global-Enhanced Graph Convolutional Network (GE-GCN) that jointly models local neighborhood information and global topological signals.

 

 

□ GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data 

 

https://www.biorxiv.org/content/10.64898/2026.03.24.713924v1

 

GraphHDBSCAN*, a graph-based, hyperparameter-free extension of HDBSCAN* that performs hierarchical density-based clustering on a graph representation of the data, enabling robust recovery of both single-level and hierarchical relationships in high-dimensional and sparse datasets.

 

GraphHDBSCAN* offers an alternative by combining graph topology and density directly on a sparse neighborhood graph, without requiring a learned low-dimensional representation.

 

Using weighted structural similarity transformation and a graph-adapted HDBSCAN* accelerated with CORE-SG, it shifts computation from full distance space while preserving density strengths: handling heterogeneous densities, identifying noise and producing interpretable hierarchy.

 

 

□ NodeGWAS: Leveraging Graph Pangenomes for Sensitive and Accurate Association Analysis in Diverse Diploid and Polyploid Species

 

https://www.cell.com/plant-communications/fulltext/S2590-3462(26)00143-4

 

NodeGWAS, a novel genotyping framework that works directly on the graph pangenome and fills a critical gap in applying GWAS to species with high genetic diversity or complex structural variation – both of which create mappability challenges.

 

In a graph pangenome, each node represents a non-redundant, variable-length "DNA words" (or variable-length k-mers) that captures sequence across multiple genomes.

 

NodeGWAS can mitigate the alignment bias caused by using a single reference, capture comprehensive genetic diversity, and effectively resolve alignment challenges in polyploids.

 

NodeGWAS effectively sidestepped common potential errors associated with traditional polyploid genotyping while also maintaining the accuracy and completeness of genetic information, by using node coverage/counts as GWAS predictors.

 

 

□ EvoRMD: Integrating Biological Context and Evolutionary RNA Language Models for Interpretable Prediction of RNA Modifications 

 

https://www.biorxiv.org/content/10.64898/2026.03.22.713386v1

 

EvoRMD integrates contextual sequence embeddings from a large-scale RNA language model with structured biological metadataincluding species, organ, cell type, and subcellular localization.

 

EvoRMD employs a shared multi-class classifier to generate a context-conditioned plausibility distribution over eleven modification types, consistent with the single-positive, multiple-unlabeled setting, producing calibrated multi-label predictions via sigmoid-transformed logits.

 

 

 □ GRNFormer: Accurate Gene Regulatory Network Inference Using Graph Transformer 

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag144/8540455

 

GRNFormer integrates a transformer-based gene expression encoder (Gene-Transcoder) with a variational graph autoencoder (GraViTAE) employing pairwise attention to jointly learn the representations of genes (nodes) and their co-expression relationships (edges).

 

GRNFormer employs TFWalker, a transcription factor-centered de novo subgraph sampling approach that constructs localized gene co-expression subgraphs from a full gene co-expression network (GCEN), capturing the neighborhood context around each transcription factor.

 

 

□ TimeVault: A molecular time machine for single cells

 

https://www.cell.com/molecular-cell/abstract/S1097-2765(26)00133-4

 

TimeVault acts as a durable, inducible recording device that captures snapshots of transcriptional activity, allowing researchers to see what genes were active before a cellular decision or treatment, rather than just the final state.

 

TimeVault works by fusing poly(A)-binding protein (PABP) with vault proteins, forcing the sequestration of messenger RNA (mRNA) into the stable, naturally occurring, hollow vault structures in the cytoplasm. TimeVault tracks cell lineage and transient states over long periods.

 

 

□ Why phylogenies compress so well: combinatorial guarantees under the Infinite Sites Model

 

https://www.biorxiv.org/content/10.64898/2026.03.18.712055v1

 

The Infinite Sites Model (ISM) instantiates a perfect phylogeny for genomes and point mutations, producing a rooted binary tree.

 

Binary matrices representing genome collections: SNP, k-mer, unitig and unique-row matrices derived from ISM-compliant genomes satisfy the four-gamete condition and are thus ISM-compliant, inheriting the additive structure between columns which can be recovered via Neighbor Joining.

 

 

 □ evedesign: accessible biosequence design with a unified framework

 

https://www.biorxiv.org/content/10.64898/2026.03.17.712115v1

 

EveDesign frames biomolecular design as a conditional modeling problem: the user specifies a molecular system composed of individual entities such as proteins, DNA or RNA chains, or ligands.

 

EveDesign incorporates any known information about each entity, such as primary sequence, 3D structure, homologs, binding partners, post-translational modifications, or multimeric state, supplied as a standardized declarative data structure.

 

 

□ LongcallD: joint calling and phasing of small, structural and mosaic variants from long reads

 

https://www.biorxiv.org/content/10.64898/2026.03.20.713111v1

 

LongcallD explicitly distinguishes clean and noisy genomic regions, applies haplotype-aware multiple sequence alignment within noisy regions to derive consensus sequences, and integrates clean- and noisy-region variant calls through an iterative phasing procedure.

 

By leveraging established haplotype information together with a stringent, context-aware filtering strategy, longcalID distinguishes true mosaic mutations from long-read sequencing artifacts.

 

 

□ scComm: a contrastive learning framework for deciphering cell–cell communications at single-cell resolution

 

https://link.springer.com/article/10.1186/s13059-026-04043-9

 

scComm applies a data-adaptive weighting and scoring module to assign weight to L-R pairs according to their importance and employs a supervised contrastive learning framework to detect significant CCC events.

 

scComm generates a cell feature matrix where each element of the cell feature vector represents the interaction scores between the specific cell and other cell types on the given L-R pairs.

 

 

□ Super Bloom: Fast and precise filter for streaming k-mer queries

 

https://www.biorxiv.org/content/10.64898/2026.03.17.712354v1

 

Super Bloom Filteris a Bloom filter variant designed for streaming k-mer queries on biological sequences, using minimizers to group adjacent k-mers into super-k-mers and assigning all k-mers of a group to the same memory block.

 

Super Bloom thereby amortizes random accesses over consecutive k-mer queries, improves cache efficiency, and combines this layout with the findere scheme to reduce false positives by requiring consistent evidence across overlapping subwords.

 

 

□ ggvariant: Tidy, 'ggplot2'-Native Visualization for Genomic Variants

 

https://cran.r-project.org/web/packages/ggvariant/index.html

 

ggvariant, a simple, opinionated toolkit for visualizing genomic variant data using a ggplot2. Accepts VCF files or plain data frames and produces publication-ready lollipop plots, consequence summaries, mutational spectrum charts, and cohort-level comparisons with minimal code.

 

read_vcf() parses standard VCF v4.x files — including gzipped files and multi-sample VCFs — and returns a tidy data frame called a gvf object. Functional annotations from SnpEff (ANN) or VEP (CSQ) INFO fields are extracted automatically.

 

 

 □ scCChain: Mapping spatial cell-cell communication programs by tailoring chains of cells for transformer neural networks 

 

https://www.biorxiv.org/content/10.64898/2026.03.18.712664v1

 

scCChain, a transformer-based framework that integrates ligand-receptor activity into spatially resolved communication programs and localizes hotspots at spot and single-cell resolution.

 

scCChain derives programs using structured dimensionality reduction. It samples program-specific communication chains by linking transcriptionally similar sender cells to receivers via weighted random walks on a distance-informed cell graph, borrowing signal from neighbors.

 

Transformer-based modeling then scores chains to prioritize communication programs and pinpoint hotspots across the tissue. scCChain supports both exploratory communication program discovery and targeted analysis of user-specified ligand-receptor pairs.

 

 

□ Helicase: Vectorized parsing and bitpacking of genomic sequences

 

https://www.biorxiv.org/content/10.64898/2026.03.19.712912v1

 

Helicase, a high-throughput Rust library for parsing FASTA and FASTQ files that exploits SIMD vectorization to maximize single-threaded throughput on both x86 and ARM.

 

At the core of Helicase is a vectorized lexing stage based on bitmask classifiers derived from the theory of counter-free automata.

 

 

□ aaKomp: Alignment-free amino acid k-mer matching for genome completeness assessment at scale

 

 

https://www.biorxiv.org/content/10.64898/2026.03.19.713078v1

 

aaKomp employs aaHash, a recursive hashing algorithm with BLOSUM62-based substitution tolerance, combined with a multi-index Bloom filter (miBf) for efficient k-mer storage and querying. 

 

aaKomp bypasses sequence alignment entirely while maintaining robust gene detection when there is sequence divergence between the analyzed genome and reference protein set.

 

aaKomp computes a proportional completeness score that provides a finer resolution than threshold-based classifications and supports user-defined gene sets for customized assessments across any organism or lineage.

 

 

□ REGEN: Learning gene interactions from tabular gene expression data using Graph Neural Networks

 

https://www.biorxiv.org/content/10.64898/2026.03.19.712949v1

 

REGEN (REconstruction of GEne Networks), a GNN-based framework that simultaneously learns latent gene interaction networks from bulk transcriptomic profiles and predicts patient vital status.

 

REGEN employs an efficient kNN based method to perform graph-level classification tasks. It uses a standard gradient based interpretability pipeline using the Integrated Gradients algorithm.

 

 

□ Rastair: an integrated variant and methylation caller https://www.biorxiv.org/content/10.64898/2026.03.19.712983v1

 

Rastair, an integrated software toolkit for simultaneous SNP detection and methylation calling from mC→T sequencing data such as those created with Watchmaker's TAPS+ and Illumina's 5-Base chemistries.

 

Rastair combines machine-learning-based variant detection with genotype-aware methylation estimation. rastair adjusts the estimated methylation depending on patient genotype for C→T SNPs at CpG sites.

 

Rastair clearly reports "de-novo" CpG positions that are formed by SPs, thus reporting nearly 500,000 additional CpG positions - relative to the reference sequence - for a typical human genome.

 

 

□ AutoGERN: Single-Cell RNA-Seq Gene Regulatory Network Inference via Explicit Link Modeling and Adaptive Architectures

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag143/8537914

 

AutoGERN explicitly models regulatory information in the message-passing space by learning expressive link (edge) embeddings, which are subsequently scored by a lightweight multilayer perceptron to infer TF-target interactions.

 

AutoGERN integrates two complementary message-passing spaces-intra-layer and inter-layer—capturing regulatory dependencies at multiple levels.

 

AutoGERN employs a robust, AutoGNN-based architecture search procedure that adapts the GNN design to the distributional characteristics of each dataset, mitigating the brittleness of one-size-fits-all architectures.

 

 

□ NLCD: A method to discover nonlinear causal relations among genes

 

https://www.biorxiv.org/content/10.64898/2026.03.20.713150v1

 

NLCD (NonLinear Causal Discovery) employs conditional importance scoring of features in a general nonlinear regression model, in order to generalize a key conditional independence test of causality from linear to nonlinear settings

 

NLCD extends a linear causal discovery method, Causal Inference Test (CIT), by conducting a series of permutation-based statistical tests of causality in the nonlinear setting.

 

The maximum p-value from these tests helps decide if data on a given triplet supports causality vs. independence between the traits. NLCD can flexibly work with any nonlinear regression model - Kernel Ridge Regression, Support Vector Regression, Artificial Neural Network models.

 

 

□ GraPhens: Solving the Diagnostic Odyssey with Synthetic Phenotype Data

 

https://www.biorxiv.org/content/10.64898/2026.03.19.712946v1

 

GraPhens, a simulation framework that uses gene-local HPO structure together with two empirically motivated soft priors, over the number of observed phenotypes per case and phenotype specificity, to generate synthetic phenotype–gene pairs that are novel yet clinically plausible.

 

Using these simulated cases to train GenPhenia, a graph neural network that operates on patient-specific phenotype sub graphs rather than flat phenotype sets. GenPhenia trained entirely on synthetic cases outperforms existing phenotype-driven prioritization methods.

 

 

□ FFC: A Scalable FASTA Compressor 

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag132/8538003

 

FFC (Fast FASTA Compressor) is a highly-optimized multi-threaded tool, based on simple but DNA-dedicated ideas, attempting to obtain possibly high (de) compression speeds rather than possibly high compression ratios.

 

FFC achieves average compression speeds 4.7x and 11.4x higher than 2 high-performance compressors, zstd and NAF, respectively, across a benchmark set of seven single genomes. It also delivers average decompression speeds 3.5x and 2.7 x higher than zstd and NAF, respectively.

 

 

□ PACMON: Pathway-guided Multi-Omics data integration for interpreting large-scale perturbation screens

 

https://www.biorxiv.org/content/10.64898/2026.03.20.713295v1

 

PACMON (Pathway guided Multi-Omics data integration for interpreting large-scale perturbation screens), a Bayesian latent factor model that jointly infers pathway-level programs and their modulation by experimental perturbations.

 

PACMON decomposes multimodal molecular measurements into shared latent factors aligned with known biological pathways through structured sparsity priors, while simultaneously estimating how each perturbation activates or represses these pathway programs.

 

 

□ dreampy: Pseudobulk mixed-model differential expression for single-cell RNA-seq in Python

 

https://www.biorxiv.org/content/10.64898/2026.03.21.713408v1

 

dreampy, a native Python implementation of the dreamlet pseudobulk mixed-model workflow. dreampy reimplements the full pipeline — from pseudobulk aggregation and TMM normalization through voom precision weighting, mixed-model fitting, and empirical Bayes moderation.

 

dreampy supports both fixed-effect and random-effect formulas, dispatching to ordinary least squares or restricted maximum likelihood estimation as appropriate, and exposes every pipeline stage as an individual, inspectable function call.

 

 

□ ‘Zombie cells’ return from the dead — after a genome transplant

 

https://www.nature.com/articles/d41586-026-00938-6

 

Researchers have resurrected ‘dead’ bacterial cells by replacing their defunct DNA with the working genome of another species.

 

After killing Mycoplasma capricolumcells by chemically crosslinking their genome with Mitomycin C (MMC), they installed synthetic Mycoplasma mycoides genomes into the resulting dead cells using Whole Genome Transplantation (WGT).

 

During WGT, a synthetic donor genome is placed into a recipient cell, thereby reprogramming that cell to adopt a new genetic identity.

 

 

□ scellop: A Scalable Redesign of Cell Population Plots for Single-Cell Data 

 

https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbag083/8533210

 

Scellop (previously CellPop) is an interactive visualization tool for cell type compositions. Scellop provides a flexible heatmap and side views with extending layered bar charts.

 

scellop is implemented in React, using visx to incorporate D3-based visualizations for various scales and axis rendering. scellop supports all desired interactions identified from our design study, including normalization, grouping and filtering.

 

 

□ PerturbGraph: Predicting Unseen Gene Perturbation Response Using Graph Neural Networks with Biological Priors 

 

https://www.biorxiv.org/content/10.64898/2026.03.23.713780v1

 

PerturbGraph represents perturbations as stable transcriptional shift programs derived from pseudo-bulk perturbation signatures. PerturbGraph propagates information across the interaction network using message passing.

 

PerturbGraph integrates multiple sources of biological knowledge, incl. STRING interactions, graph embeddings obtained via Node2Vec, baseline transcriptional statistics, and Gene Ontology functional annotations.

 

 

□ Tripso: Self-supervised learning for a gene program-centric view of cell states

 

https://www.biorxiv.org/content/10.64898/2026.03.24.713961v1

 

Tripso (Transformers for learning Representations of Interpretable gene Programs in Single-cell transcriptOmics) learns contextualized gene, GP / cell embeddings, quantifies gene-and GP-level contributions to GPs and cell identity respectively, and enables discovery of novel GPs.

 

By combining the interpretability of GPs with the flexibility of transformer architectures, Tripso supports principled comparison of cellular states across tissues and conditions.

 

Tripso provides a scalable foundation for interpretable single-cell modeling that enables hypothesis generation and experimental optimization across development, disease, and engineered in vitro systems, and supports the development of interpretable virtual cell models.

 

 

□ Emergent Biological Realism in RL-Trained DNA Language Models 

 

https://www.biorxiv.org/content/10.64898/2026.03.24.713963v1

 

Reinforcement learning substantially increases the probability of generating plasmids that pass their bioinformatics quality control (QC) pipeline, while supervised fine-tuning provides modest improvements.

 

When prompted with a simple start codon, the base model rarely produces a valid plasmid, whereas SFT provides modest improvements, and RL dramatically increases pass rates across all prompt types.

 

A similar pattern is observed in codon distribution and Gibbs free energy. Although not included in the reward function, the RL model generates tokens with codon distributions more similar to real plasmids than models trained on the correct distribution.

 

 

□ Hypergraph Representations of Single-Cell RNA Sequencing Data for Improved Cell Clustering

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag148/8551568

 

In a hypergraph random walk process, the walker alternates between transitioning from a hyperedge to a node and from a node to a hyperedge, following the node-to-edge transition probabilities and edge-to-node transition probabilities.

 

Dual-Importance Preference Hypergraph Walk (DIPHW) is a new hypergraph-based random walk algorithm that computes cell embeddings by considering the relative importance of genes to cells and cells to genes, incorporating a preference exponent to facilitate clustering.

 

CoMem-DIPHW (Co-expression and Memory-Integrated DIPHW) integrates two unipartite projections, the gene co-expression and cell co-expression networks, along with the cell-gene expression hypergraph derived from single-cell abundance count data into the random walk model.

 

CoMem-DIPHW leverages cell and gene co-expression networks to account for previously visited nodes and edges during transitions to capture both local expression relationships and global co-expression patterns.

 

 

□ A run-length-compressed skiplist data structure for dynamic GBWTs supports time and space efficient pangenome operations over syncmers

 

https://www.biorxiv.org/content/10.64898/2026.03.26.714584v1

 

A computational framework for efficient calculation over graph Burrows-Wheeler transforms (GBWTs), the corresponding data structure to the PBWT for pangenome graphs, and present a practical implementation.

 

Two doubly-linked skiplist variants over run-length-compressed BWTs support O(logN) rank, access, insert operations. These structures are used to store and search paths through a syncmer graph built from Robert C. Edgar’s closed syncmers, equivalent to a sparse de Bruijn graph.

 

 

□ Panmap: Scalable phylogeny-guided alignment, genotyping, and placement on pangenomes

 

https://www.biorxiv.org/content/10.64898/2026.03.29.711974v1

 

Panmap, a tool that leverages evolutionary structure to place, align, and genotype sequencing reads against mutation-annotated pangenomes containing up to millions of genomes.

 

Panmap introduces a phylogenetically compressed k-mer index that stores only sequence differences along branches, enabling efficient comparison of reads to both sampled genomes and inferred ancestors.

 

 

□ CCIDeconv: Hierarchical model for deconvolution of subcellular cell-cell interactions in single-cell data 

 

https://www.biorxiv.org/content/10.64898/2026.03.26.714643v1

 

CCIDeconv effectively deconvolutes CCI scores into interactions primarily occurring in nucleus or cytoplasm leveraging information from sST. CCIDeconv provides insight into the location of CCI, and therefore can be used to further refine their communication scores.

 

The CCIDeconv communication score is a modified score derived from CellChat. LR pairs from the CellChatDB.human database are used to calculate the scores. The architecture is a hierarchical supervised machine learning framework with a classification followed by two regression models.

 

 

□ BCAR: A fast and general barcode-sequence mapper for correcting sequencing errors

 

https://www.biorxiv.org/content/10.64898/2026.03.27.714882v1

 

BCAR (Barcode Collapse by Aligning Reads) quickly aligns the sequencing reads associated with each barcode in a dataset and generate a maximum-likelihood consensus sequence, including confidence scores for each base call.

 

Reads are represented as arrays with entries corresponding to quality scores, then progressively aligned using a modified implementation of the Needleman-Wunsch algorithm.

 

At each step, the new read is aligned against the current consensus, using a scaled cosine similarity to determine match scores between ambiguous positions. Gaps are removed from the final alignment during consensus generation.

 

□ 『LATE SHIFT (ナースコール)』 (原題: Heldin) 

 

苛烈を極めるスイスの満床病棟、女性看護士の一夜を追体験する。撮影前にインターンシップを修了したレオニー・ベネシュの手際には一切の省略がなく緊迫感が漂う。献身と対価、システムの瑕疵が炙り出す人間性の臨界点。硬質で具象性の高い映像作品

 

Swiss / German

Zodiac Pictures (2025)

Director: Petra Biondina Volpe

Producer: Reto Schärli,  Lukas Hobi

Cinematographer: Judith Kaufmann

Editor: Hansjörg Weißbrich

Composer:     Emilie Levienaise-Farrouch

 

 

□ Joep Beving / “Liminal”

 

ギヨーム・ロジェ著『Wild Renaissance』を翻案した現代詩的ピアノ曲。

システムが不安定性から自然な動態へと有機的に遷移する

境界的(リミナル)な『揺らぎ』を、

深く潜るような瞑想的な音色で表現する

 

 

□ “voda”

 

Deutsche Grammophon

Release; 2026-3-20

Composer / Programming: Joep Beving

Mastering Engineer: Maria Triana

□ 『Project Hail Mary』(プロジェクト・ヘイル・メアリー)

 

Pascal Pictures (2026)

Director: Phil Lord / Christopher Miller

Based on the novel by Andy Weir

Screenplay by Drew Goddard

Cinematography by Greig Fraser

Composed by Daniel Pemberton

Production Design by Charles Wood

 

居るはずのない現在から、智慧と共感を辿り〝あったはずの未来〟へ還る。『AD ASTRA』と対極の生命讃歌。並行する過去の時間軸が足元の座標を書き換える。異記号体系の翻訳と、物語の視聴覚表現というメタ構造をエモーショナルに具象化する

 

原作では、「共生」や「知的探求」の追体験に主軸を置いた作品だと感じていたが、映画ではあえて別の側面に光を当てている。宇宙規模で引き合う知性と、共鳴する感情が伝播しながら形成するネットワークが、映像のダイナミクスとユーモアによって推進力を得ている

 

原作も映画も叙述の時間構造はほぼ一緒なので、2周目が一番泣けるんだよね。特に映画はゴズリンの演技が🥹 1周目でヘイル・メアリーされた経緯を知ってからの、2周目の”Grace go Home”の重みよ。しかも観客(読者)のみがそれを知っている

 

第三の主人公、エヴァ・ストラット長官。人類の善性に諦観を抱きつつ、人類救済のため自身が裁かれようと超法規的手段も厭わない。そもそも法が途絶えた後で誰が彼女を裁けるのか。ヒュラーは『落下の解剖学』でも真実を内に抱く役を演じていた。そしてカラオケの選曲は彼女のアイデアだという

 

 

『プロジェクト・ヘイル・メアリー』 4DXで2回目鑑賞。お勧めするのは断然IMAXだけれど、風や振動演出が荒ぶるタウ・セチeの大気圏突入パートは、やはり4DXの没入感が勝る。ヘイル・メアリー号が電離圏のオーロラ下端から雲上に三人称視点で急降下するシーンは心臓が浮き上がる感覚

 

 

Pascal Pictures (2026)

Director: Phil Lord / Christopher Miller

Based on the novel by Andy Weir

Screenplay by Drew Goddard

Cinematography by Greig Fraser

Composed by Daniel Pemberton

Production Design by Charles Wood

 

 

□ Daniel Pemberton / “A Moment”

 

劇中音楽は、大編成の合唱隊によるコーラスを主軸に展開されており、エレクトロニカ音響とオーケストラ、モダンなパーカッションが外宇宙の神秘性を描写する

 

 

 □ Daniel Pemberton / “Time Go Fishing”

 

 

 

『プロジェクト・ヘイル・メアリー』のサウンドトラック・コンプリートBOXおしゃれ過ぎる!ボイジャーのゴールデン・レコードを模したデザインで、この媒体でしか聴くことが出来ない15分間の追加トラックがあるとのこと #プロジェクト・ヘイル・メアリー 

 

 

『プロジェクト・ヘイル・メアリー』  157分の長尺なこともあり、『トイレに行くならカラオケのシーン』なんて言われてるけど、影の主役、エヴァ博士の本心が滲む重要なシーンだし、”Sign of the Times”の歌詞は物語の象徴とも言える。ゴズリン繋がりで『フォールガイ』のカラオケシーンも良かったね🥹

 

IMAXシアターにあったスタンディPOP。ちゃんと椅子に座れる仕様!ちなみに劇中挿入歌のプレイリストが公開中で、全体的にヴィンテージ感の漂う選曲で非常に良き

 

 

\

□ 問題のこのシーン。SFファンとして真っ先に連想したのがカール・セーガンの『CONTACT』だったのだけど、古今東西、とかく『渚』というモチーフは思索的引用として好まれる題材で、特段オマージュの意図を汲む必要はないのかもしれない。グレースが『霧』を好むように

 

 

□ Turakina Maori Girls’ Choir / “Po Atarau”

 

□ 『超かぐや姫!』(Cosmic Princess Kaguya!)

 

 月面世界から飛来した知的生命体が、バーチャル世界のライバー文化を無双するという、ウィリアム・ギブスンも真っ青の設え。叙事詩的な時間トリックとジュブナイルSFとしての勢いも兼ね備えた、隠れた本格サイバーパンク作品。劇中劇映画としてのカタルシスも文句無し

 

 

 

 

“ray”

 

 

Twin Engine (2026)

Director: Shingo Yamashita

Writers: Saeri Natsuo / Shingo Yamashita

Producer: Koji Yamamoto

Music: Conisch

Cinematography: Daisuke Chiba

Art Director: Taichi Shishido

□ Quantum Hamiltonian Learning using Time-Resolved Measurement Data and its Application to Gene Regulatory Network Inference

 

https://www.biorxiv.org/content/10.64898/2026.03.05.709897v1 

 

The quantum Hamiltonian-based gene-expression model (QHGM) encodes gene interactions as a parameterized Hamiltonian that governs gene expression evolution over pseudotime. It generates gene-expression data by modeling regulatory interactions as quantum-like couplings.

 

The QHGM employs a scalable variational quantum network inference algorithm (VQ-Net). VQ-Net  is built on an empirical risk minimization framework and minimizes the negative log-likelihood loss over mini-batches of scRNA-seq data collected at multiple pseudotime bins.

 

 

□ X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

 

https://www.cdn.xaira.com/papers/X_CELL_V1_0316_final.pdf

 

X-Cell, a diffusion language model (LM) capturing the transcriptomic shift from a control state to a perturbed state via an iterative diffusion process incorporating multi-modal biological priors directly into the generative architecture through cross-attention.

 

X-Cell enables robust prediction of perturbation effects across diverse cellular contexts. X-Cell scales from 55M parameters to 4.9 billion parameters (X-Cell-Ultra), exceeding the size of existing single-cell foundation models.

 

 

□ Manufacturing-aware generative models enable petascale synthesis of designed DNA 

 

https://www.nature.com/articles/s41587-026-03020-8

 

Manufacturing-aware generative models, or "variational synthesis" models, enable the simultaneous design and physical, high-throughput synthesis of trillions of DNA sequences at petascale, reducing gene synthesis costs by up to one trillion-fold.

 

The method demonstrated the manufacturing of 10-100 quadrillion independent DNA sequences, highly realistic biological designs-such as antibodies, TCRs, or polymerases-at a cost of roughly $10^3, overcoming the limitations of traditional, expensive individual-sequence synthesis.

 

 

 □ AetherCell: A generative engine for virtual cell perturbation and in vivo drug discovery 

 

https://www.biorxiv.org/content/10.64898/2026.03.13.710968v1

 

AetherCell, a deep generative foundation model that unifies transcriptomic measurements into a shared, platform-aligned representation and enables transfer of perturbation effects into clinically grounded contexts.

 

AetherCell learns a common coordinate system from large-scale RNA-seq and aligns high-throughput perturbation signatures to the same space, while conditioning generation on multi-modal priors capturing chemical structure and genetic regulatory logic.

 

 

 □ scTimeBench: A streamlined benchmarking platform for single-cell time-series analysis

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712069v1

 

scTimeBench, a modular and scalable benchmark designed to assess methods across three critical tasks: forecast accuracy (temporal cell alignment) for projecting cells to unseen time points, embedding coherence between original and projected data, and cell-type lineage fidelity.

 

scTimeBench uses Wasserstein Distance, Gaussian maximum mean discrepancy, and Hausdorff loss, which respectively measure the cost of transporting projected cells to the ground truth, compare distributions via kernel embeddings, and capture worst-case mismatches in the projections.

 

scTimeBench trains a random forest classifier on the original data and calculates the average normalized entropy per cell. If the embeddings preserve cell-type-specific signals, low entropy is expected in both the original and projected cells.

 

 

□ PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling 

 

https://arxiv.org/pdf/2602.19685v1 

 

PerturbDiff shifts modeling from individual cells to entire distributions. By embedding distributions as points in a Hilbert space, they define a diffusion-based generative process operating directly over probability distributions.

 

PerturbDiff embed cell distributions into a reproducing kernel Hilbert space (RKHS), where each distribution is represented by its kernel mean embedding as a single point in this function space.

 

PerturbDiff  introduces Gaussian Random Elements as a generalization of Gaussian measures to infinite-dimensional Hilbert spaces, which play an analogous role to Gaussian noise vectors in Euclidean diffusion models.

 

 

□ RNAElectra: An ELECTRA-Style RNA Foundation Model for RNA Regulatory Inference

 

https://www.biorxiv.org/content/10.64898/2026.03.15.711950v1

 

RNA Electra, a single-nucleotide resolution RNA foundation model pretrained on diverse non-coding RNAs from RNAcentral using ELECTRA-style replaced-token detection (RTD).

 

RTD trains a discriminator with a loss defined over all input positions on realistically corrupted sequences, providing dense supervision that better aligns pretraining with sequence-to-function fine-tuning.

 

RNAElectra combines nucleotide-resolution tokenization with an ellicient attention design to capture local regulatory motits and longer-range dependencies within a single reusable backbone.

 

 

 □ scRGCL: Neighbor-Aware Graph Contrastive Learning for Robust Single-Cell Clustering

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712039v1

 

scRGCL, a single-cell clustering method that learns a regularized representation guided by contrastive learning. Specifically, scRGCL captures the cell-type-associated expression structure by clustering similar cells together while ensuring consistency.

 

scRGCL performs negative sampling by selecting cells from distinct clusters, thereby ensuring semantic dissimilarity between the target cell and its negative pairs.

 

scRCL introduces a neighbor-aware re-weighting strategy that increases the contribution of samples from clusters closely related to the target. This mechanism prevents cells from the same category from being mistakenly pushed apart, effectively preserving intra-cluster compactness.

 

 

□ scAPEX-seq: Subcellular transcriptome sequencing with single cell APEX-seq identifies regulators of cell-cell interactions

 

https://www.biorxiv.org/content/10.64898/2026.03.17.712496v1

 

single-cell APEX-seq (scAPEX-seq), a proximity labeling-based method for mapping subcellular transcriptomes at single-cell resolution. scAPEX-seq resolves distinct cell states and captures coculture-induced changes that are missed by conventional scRNA-seq.

 

scAPEX-seq refines interaction predictions and reveals functionally important cell states and regulatory pathways that are difficult to resolve at the level of total transcript abundance alone. 

 

scAPEX-seq should be easily extensible to other important subcellular regions, such as the mitochondria, stress granules, and synapses, enabling systematic investigation of how RNA localization and dynamics shape complex cell and tissue behaviors.

 

 

 □ DiffEvol: Constrained Diffusion as a Paradigm for Evolution

 

https://www.biorxiv.org/content/10.64898/2026.03.10.710948v1

 

DiffEvol, a framework that models evolution as constrained diffusion over a discrete genotype space, in contrast to classical diffusive systems where all states are accessible. 

 

Using frequency and sequence data alone, DiffEvol estimates these constraints by inverting the diffusion dynamics to recover the. constrained subspace representing the viable genotype manifold, as well as its evolution over time.

 

 

□ DNA-MGC+: A versatile codec for reliable and resource-efficient data storage on synthetic DNA

 

https://www.biorxiv.org/content/10.64898/2026.03.11.711016v1

 

DNA-MGC+, a DNA storage codec designed to enable reliable and resource-efficient data retrieval under diverse operating conditions.

 

DNA-MGC+ achieves simultaneous gains across several key performance metrics under explicit reliability constraints, including the minimum required sequencing depth, read cost, decoding time, maximal error-correction capability, and storage density.

 

At the inner level, an MGC+ code protects against base-level errors by enabling the correction of insertions, deletions, and substitutions (IDS) within individual DNA sequences.

 

At the outer level, a Reed-Solomon code mitigates the effects of coverage bias by enabling recovery from sequence dropouts, while also correcting residual errors that remain after inner decoding.

 

 

□ AnewSampling: Learning the All-Atom Equilibrium Distribution of Biomolecular Interactions at Scale

https://www.biorxiv.org/content/10.64898/2026.03.10.710952v1

 

AnewSampling, a transferable generative foundation framework designed for the high-fidelity sampling of all-atom equilibrium distributions, which is the first model to faithfully reproduce MD at the all-atom level.

 

AnewSampling uses a novel quotient-space generative framework to ensure mathematical consistency and leverages the largest self-curated database of protein-ligand trajectories to date, with over 15 million conformations.

 

 

□ CellDEEP: Cell DiffErential Expression by Pooling highlights issues in differential gene expression in scRNA-seq

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710522v1

 

CellDEEP aggregates gene read counts to construct a metacell. By selectively pooling cells prior to differential-expression (DE) testing, CelIDEEP reduces noise and zero inflation while retaining resolution and biological signal.

 

CellDEEP pools cells either randomly within a cluster or via k-means clustering in the embedding space, generates metacell UMI counts by summation or averaging, and makes the number of cells per metacell adjustable.

 

 

 □ Benchmarking zero-shot single-cell foundation model embeddings for cellular dynamics reconstruction

 

https://www.biorxiv.org/content/10.64898/2026.03.10.710748v1

 

A systematic benchmark was conducted to compare single-cell foundation models (SCFMs) with a highly variable gene (HVG) baseline for cellular dynamics reconstruction across three tasks: backtracking, interpolation, and extrapolation.

 

All methods were evaluated in a shared aligned embedding space using complementary metrics that capture different aspects of dynamical reconstruction:

 

(i) distributional recovery, quantified by the Wasserstein-1 distance (Earth Mover's Distance).

(ii) global agreement with reference pseudotime, quantified by Spearman correlation, and (iii) local velocity coherence, which measures neighborhood-level consistency of inferred velocity vectors.

 

 

□ Trivial Tangle Traverser: Automatic Generation of Model Sequences for Complex Regions in Assembly Graphs

 

https://www.biorxiv.org/content/10.64898/2026.03.06.710180v1

 

Trivial Tangle Traverser (TTT) algorithm that finds optimized resolutions of assembly graph tangles. TTT uses depth of coverage and read-to-graph alignment information in a two-stage process to identify evidence-based traversals.

 

TTT enables estimation of graph-edge multiplicities within a tangle. Sequence multiplicities are then estimated through mixed-integer linear programming, after which an Eulerian path is found in the derived multigraph and optimized by gradient descent.

 

 

□ scEvolver: PROTOTYPE-BASED CONTINUAL LEARNING FOR SINGLE-CELL ANNOTATION 

 

https://www.biorxiv.org/content/10.64898/2026.03.05.709973v1

 

scEvolver enables robust analysis through continual reference atlas construction, accurate query mapping into a harmonized latent space, detection of outlier / novel cell populations, and prototype-correlated gene signatures capturing cellular heterogeneity and state transitions.

 

scEvolver incorporates a similarity score between cells and prototypes, spanning a continuum from prototypical to peripheral cells within a class. It further incorporates memory prototypes and data replay strategies to accumulate knowledge while mitigating catastrophic forgetting.

 

 

□ BranchSBM: Branched Schrödinger Bridge Matching

 

https://arxiv.org/abs/2506.09007

 

BranchSBM, a framework for solving the branched Schrödinger bridge problem by parameterizing diverging velocity fields and branch-specific growth rates, which together define a set of conditional stochastic bridges from the common source to multiple terminal modes.

 

BranchSBM learns distinct, non-linear branched paths that curve along the 3-dimensional manifold while minimizing kinetic energy land state-cost. It captures the true differentiation dynamics through the combined influence of the neural interpolant and the path-energy objective.

 

 

□ PerturbGen: Predicting how perturbations reshape cellular trajectories 

 

https://www.biorxiv.org/content/10.64898/2026.03.04.709254v1

 

PerturbGen, a foundation model trained on 100 million single-cell transcriptomes that predicts perturbation responses along cellular trajectories. It predicts how genetic perturbation at source state shapes downstream states, alters gene programs and trajectories across time.

 

PerturbGen generates a token sequence representing the target state using an encoder–decoder architecture, which is subsequently converted back into gene expression. PerturbGen represents a step toward AI-driven construction of virtual cells and in silico perturbation atlases.

 

 

□ CellSweep: Single-Cell Genomics Decontamination 

 

https://www.biorxiv.org/content/10.64898/2026.03.04.709349v1

 

CellSweep incorporates explicit mixture components for cell-type expression, ambient / global bulk contamination, and uses an expectation—maximization algorithm for inference. It adopts a fully tractable generative likelihood that admits closed-form E- and M-steps. 

 

 

□ RNA-seq analysis in seconds using GPUs 

 

https://www.biorxiv.org/content/10.64898/2026.03.04.709526v1

 

By redesigning the core algorithms: pseudoalignment, equivalence class intersection, and the EM algorithm; for massively parallel execution on GPUs, It achieves a 30-50× speedup over multithreaded CPU kallisto.

 

 

□ GPU-accelerated single-cell analysis at scale with rapids-singlecell 

 

https://arxiv.org/abs/2603.02402

 

Rapids-singlecell integrates into the scverse ecosystem and operates directly on the AnnData data structure, a community standard. Built on the NVIDIA and scverse ecosystems, rapids-singlecell accelerates single-cell workflows by up to several orders of magnitude.

 

 

□ Generative models of cell dynamics: from Neural ODEs to flow matching

 

https://www.nature.com/articles/s42003-026-09758-w

 

Neural Ordinary Differential Equations (Neural ODEs) allows for coupling genes’ dynamics and joint learning of the mapping function and velocity, providing a more accurate representation of cellular dynamics.

 

Neural ODEs captures the underlying continuous dynamics of noisy/irregularly sampled data. Their causal interpretation allows connections to Structural Causal Models in equilibrium states, demonstrated by translating deterministic behavior in ODE systems into a causal framework.

 

 

□ SPAE: Deciphering Cell Cycle Dynamics and Cell States in Single-cell RNA-seq data 

 

https://www.biorxiv.org/content/10.64898/2026.03.05.709782v1

 

SPAE (Sinusoidal and Piecewise AutoEncoder) employs an autoencoder integrating nonlinear and piecewise linear components, uniquely utilizing sine and cosine functions within the decoder to fit the periodicity of the cell cycle, facilitating precise pseudotime estimation.

 

SPAE learns continuous pseudotime and discrete cell clusters. SPAE maps pseudotime to cell cycle phases with a Gaussian Mixture Model, though it may not capture cyclic continuity. SPAE provides a practical approximation for clinical stage definition from continuous trajectories.

 

 

□ Perseus: Lineage-Aware Refinement of Kraken2 Taxonomic Classification for Long Read Metagenomes

 

https://www.biorxiv.org/content/10.64898/2026.03.06.710148v1

 

Perseus, a lineage-aware confidence estimation framework for taxonomic classification that models the spatial distribution and hierarchical consistency of k-mer evidence along sequences. Perseus reframes taxonomic classification as a hierarchical confidence estimation problem.

 

Perseus is built on a multi-headed 1-D convolutional neural network. It refines k-mer-level taxonomic signals from Kraken2 using a multi-headed convolutional neural network that estimates calibrated confidence scores for taxonomic correctness at each canonical rank.

 

 

□ geneSTRUCTURE: A Modern Platform for Visualization of Gene Structures

 

https://www.biorxiv.org/content/10.64898/2026.03.05.709980v1

 

geneSTRUCTURE, a highly customizable command-line tool for gene structure visualization. It supports multiple annotation layers, including mutations (insertions, deletions, and SNPs) and multiple domains, with extensive options for fine-tuning visual output. 

 

 

□ Benchmarking tissue- and cell type-of-origin deconvolution in cell-free transcriptomics

 

https://www.biorxiv.org/content/10.64898/2026.03.05.709833v1

 

A systematic benchmarking of tissue- and cell type-of-origin deconvolution for plasma cfRNA that considers both methodological and reference-related sources of variability under realistic cfRNA simulation settings.

 

 

□ Gene Portals: A Framework for Integrating Clinical, Functional, and Structural Evidence into Rare Disease Variant Classification 

 

https://www.medrxiv.org/content/10.64898/2026.03.05.26347086v1

 

Gene Portals, a framework for gene-centered multimodal knowledge bases that co-localize expert-harmonized clinical data, functional assays, population variation, structural annotations and gene-specific ACMG/AMP specifications within a single resource.

 

 

□ NIRD: Inferring large networks with matrix factorisation to capture non-linear dependencies among genes using sparse single-cell profiles 

 

https://www.biorxiv.org/content/10.64898/2026.03.08.710347v1

 

Network Inference in Reduced Dimension (NIRD) handles sparsity and computational complexity while still inferring non-linear dependencies among genes (features) using large, sparse gene-expression matrices.

 

NIRD is based on matrix factorisation of gene-expression matrix to facilitate internal imputation as well as network inference using tree ensemble-based non-linear regression. NIRD can also be used with RNA velocity for better inference of non-linear causality.

 

 

□ LongHap: Harnessing methylation signals inherent in long-read sequencing data for improved variant phasing

 

https://www.biorxiv.org/content/10.64898/2026.03.11.710820v1

 

LongHap, a read-based variant phasing method that seamlessly integrates sequence and 5mC methylation signals from PacBio HiFi and ONT sequencing data. LongHap significantly improves haplotype reconstruction.

 

LongHap creates phase blocks based on overlapping heterozygous sequence variants, accurately phasing complex variants by embedding them into the broader haplotype context through belief propagation.

 

LongHap then dynamically identifies differentially methylated sites that are informative for phasing to refine and extend initial phase blocks.

 

 

□ CDS-BART: A BART-Based Foundation Model for mRNA Sequence Analysis

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710670v1

 

CDS-BART integrates SentencePiece sub-word tokenization with the denoising sequence-to-sequence training of Bidirectional and Auto-Regressive Transformers (BART).

 

 

□ STAR Suite: Integrating transcriptomics through AI software engineering in the NIH MorPhiC consortium

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710580v1

 

STAR Suite, a human-engineered and Al-implemented modernization that integrates functionality directly into the Ct † source. STAR-core restores parity with Cell Ranger 9.0.1 for bulk and single-cell RNA-seq.

 

In STAR-Perturb, feature barcode searches execute in parallel with genomic alignment. STAR-Perturb also auto-detects the barcode chemistry, and unlike Cell Ranger, supports processing multiple feature libraries (e.g., gRNA and lineage barcodes) in a single run.

 

STAR-Flex builds a hybrid reference genome with synthetic probe pseudo-chromosomes, then uses STAR's alignment engine to quantify probe hits while using genomic alignments to confirm matches and detect off-probe noise.

 

STAR-SLAM creates "logic drift," as external tools must approximate the aligner's internal counting decisions. It eliminates this by performing mutation detection and background modeling directly within the aligner's critical path.

 

 

□ MESSI: Multimodal Experiments with SyStematic Interrogation using nextflow

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710452v1

 

Multimodal Experiments with SyStematic Interrogation (MESSI) enforces a consistent, leakage-free model assessment strategy across diverse classes of multimodal integration methods.

 

Grounded in nested cross-validation, MESSI ensures unbiased hyperparameter tuning and generalization estimates regardless of whether the underlying method follows early, intermediate, or late integration principles.

 

 

□ TEgenomeSimulator: A Flexible Framework for Simulating Genomes with Configurable Transposable Element Landscapes 

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710711v1

 

TEgenomeSimulator, a flexible framework for generating synthetic genomes with configurable TE landscapes. It supports both randomly generated and biologically derived backbone sequences, enabling the modeling of TE insertions under diverse structural and evolutionary contexts.

 

 

□ Porting AlphaGenome to PyTorch

 

https://genomicsxai.github.io/blogs/2026-004/

 

alphagenome-pytorch is a faithful port of the full AlphaGenome architecture in PyTorch, with weights released for fold-specific and distilled models.

 

AlphaGenome-PyTorch ensures close numerical equivalence with the JAX model and includes tests verifying numerical equivalence of outputs from individual model heads as well as full forward and backward passes, gradients, and loss values.

 

Each convolutional block, attention mechanism, and transformer layer produces outputs within numerical precision of the JAX implementation.

 

Backpropagation yields equivalent gradients, ensuring training dynamics remain faithful to the original implementation. End-to-end predictions across all genomic tracks match within floating-point precision.

 

 

□ Benchmarking DNA Foundation Models: Biological Blind Spots in Evo2 Variant-Effect Prediction

 

https://www.biorxiv.org/content/10.64898/2026.03.10.710786v1

 

A DNA foundation model claiming biological utility must have internalized the statistical and structural grammar of genomes. Genomic signals operate across a wide range of length scales, and a model's blind spots may depend on scale.

 

 

□ DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling

 

https://www.biorxiv.org/content/10.64898/2026.03.09.710665v1

 

DEX (DISTATIS-based consensus of experimental exchangeability) introduces a consensus-based experimental exchangeability measure that best fits real codon substitution patterns across three diverse lineages.

 

DEX fit best across all alignments compared to the individual measures based on model rank. DEX could also be useful as a static comparison for future variant effect predictors.

 

 

□ HitAnno: Atlas-level cell type annotation based on scATAC-seq data via a hierarchical language model https://www.biorxiv.org/content/10.64898/2026.03.10.710729v1

 

HitAnno represents each cell as a structured "cell sentence", constructed from accessibility profiles on specific peaks with consideration of both major and rare cell types.

 

 

□ BioPipelines: Accessible Computational Protein and Ligand Design for Chemical Biologists

 

https://www.biorxiv.org/content/10.64898/2026.03.11.711024v1

 

BioPipelines allows researchers to define multi-step computational design workflows in just a few lines of code. Its architecture provides a straightforward way to extend the toolkit with additional functionalities, particularly by leveraging coding agents with minimal effort.

 

 

□ Longdust: Finding low-complexity DNA sequences 

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag112/8519623

 

Longdust identifies long highly repetitive STRs, VNTRs, satellite DNA and other low-complexity regions (LCRs) in a genome. Longdust also overlaps with tandem repeat finders (e.g. TRFTANTAN and ULTRA) in functionality.

 

Longdust defines string complexity by statistically modeling the k-mer count distribution with the parameters: the k-mer length, the context window size and a threshold on complexity.

 

 

 □ ctOTVelo: Cell type-specific gene regulatory network inference from single cell transcriptomics

 

https://www.biorxiv.org/content/10.64898/2026.03.11.711174v1

 

ctOTVelo first learns a probabilistic mapping between cells of adjacent timepoints through optimal transport. ctOTVelo utilizes a cell type-aware time-lagged correlation to infer cell type specific correlations between genes. The aggregated correlation can be interpreted as a cell type specific GRN.

 

 

 □ ANNEVO: Highly accurate ab initio gene annotation 

 

https://www.nature.com/articles/s41592-026-03036-7

 

ANNEVO, a mixture of experts-based genomic language model that directly models distal sequence dependencies and joint evolutionary relationships from diverse genomes, enabling precise ab initio gene annotation.

 

ANNEVO is capable of modeling distal sequence information and joint evolutionary relationships across diverse species directly from genomes.

 

 

□ FlashS: Frequency-domain kernels enable atlas-scale detection of spatially variable genes

 

https://www.biorxiv.org/content/10.64898/2026.03.12.711372v1

 

FlashS detects spatially variable genes by testing whether gene expression associates with spatial structure captured by Random Fourier Features.

 

For each spatial location, RFF constructs a D-dimensional feature vector whose inner products approximate a Gaussian kernel, converting kernel-based testing into linear operations.

 

 

□ scAttnVI: Gateway analysis reveals transient molecular programs at cell-fate transitions

 

https://www.biorxiv.org/content/10.64898/2026.03.12.711328v1

 

scAttnVI (single-cell Attention-weighted VariationalInference), a binary mutual information (BMI)-regularized variational autoencoder that preserves BMI-defined neighborhoods during latent learning through BMI-derived neighbor weights. Here, the "attention" weights are fixed BMI-derived coefficients rather than learned transformer-style attention.

 

 

 □ SC-BIG: A Hierarchical Bayesian Model for Bulk-Informed Single Nucleotide Variant Calling in Single Cells

 

https://www.biorxiv.org/content/10.64898/2026.03.12.705671v1

 

SC-BIG, a hierarchical Bayesian model that leverages bulk sequencing data from a representative tumor sample to improve SNV detection. SC-BIG propagates uncertainty across multiple biological parameters, including copy number alterations, sample purity, and SNV clonality.

 

 

□ SCPRO-VI: Explainable graph learning for multimodal single-cell data integration 

 

https://link.springer.com/article/10.1186/s12859-026-06413-3

 

Single-Cell PROteomics Vertical Integration (SCPRO-VI) method, a similarity graph fusion approach that incorporates a multi-view variational graph auto-encoder (VGAE) for embedding modalities into a latent space.

 

 

□ NYX: Format-aware, learned compression across omics file types

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712193v1

 

NYX, a format-aware compression system for FASTA, FASTQ, VCF, WIG, H5AD, and BED files. The reconstructed output is byte-for-byte identical to the original. Less storage, faster I/O, zero data loss.

 

 

□ OmicClaw: executable and reproducible natural-language multi-omics analysis over the unified OmicVerse ecosystem.

 

https://www.biorxiv.org/content/10.64898/2026.03.13.711464v1

 

OmicClaw, an executable natural-language framework for multi-omics analysis built on the unified Omic Verse ecosystem and the J.A.R.V.I.S. runtime.

 

OmicVerse organizes upstream processing, preprocessing, single-cell, spatial, bulk-transcriptomic and foundation-model workflows into a shared AnnData-centered interface spanning over 100 methods.

 

 

 □ SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712056v1

 

SpatialFusion fuses molecular and morphological information to generate biologically grounded representations of spatial niches. SpatialFusion jointly captures cellular composition, morphological context, and functional signaling.

 

SpatialFusion incorporates key strengths of both spatial foundation models and specialist niche-discovery approaches: it learns neighborhood-level rather than single-cell embeddings, integrates H&E and transcriptomic modalities, and explicitly encodes pathway activation patterns.

 

 

□ SLAB: A Sweep Line Algorithm in PBWT for Finding Haplotype Block Cores

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712201v1

 

SLAB is a set of algorithms for characterizing structural patterns within the block overlap graph. SLAB identifies all width-maximal haplotype blocks. Using additional thresholds to define block overlaps, it constructs the block overlap graph, where nodes represent haplotype blocks and edges indicate pairwise overlaps that satisfy these criteria.

 

 

□ New Space-Time Tradeoffs for Subset Rank and k-mer Lookup

 

https://www.biorxiv.org/content/10.64898/2026.03.16.712042v1

 

Faster subset rank data structures using less than 3 bits per k-mer are designed. Experiments show that this translates to new Pareto-optimal SBWT-based k-mer lookup structures at the low-memory end of the space-time spectrum.

 

 □ HKS: Hierarchical genomic feature annotation with variable-length queries

 

https://www.biorxiv.org/content/10.64898/2026.03.15.711907v1

 

HKS, a data structure for exact hierarchical variable-length k-mer annotation. Building on the Spectral Burrows-Wheeler Transform (SBWT), a single HKS index is constructed for a specified maximum query length, and supports queries at any length.

 

HKS enables exact k-mer-based sequence annotation with respect to a user-defined category hierarchy, which may represent a taxonomy, a set of chromosomes, repeat families, or any other hierarchical organization of genomic labels.

 

 

 □ Integration of large, complex single-cell datasets with Harmony2 

 

https://www.biorxiv.org/content/10.64898/2026.03.16.711825v1

 

Harmony2 incorporates optimized data structures, including a hybrid sparse-dense matrix backend and closed-form inversion for arrowhead-structured regression problems, enabling linear scaling in both cells and batches.

 

Harmony2 introduces automated batch pruning and dynamic parameter tuning that reduce the influence of outliers or non-overlapping populations.

 

Using the batch-corrected Harmony2 embeddings computed on all cells, a marked increase in single-cell plate entropy was observed across all cell lines and across plates, indicating that the original dataset exhibited moderate plate-driven batch effects mitigated by Harmony2.

 

 

□ SNMF: Ultrafast, Spatially-Aware Deconvolution for Spatial Transcriptomics

 

https://www.biorxiv.org/content/10.64898/2026.03.17.712043v1

 

SNMF (Spatial Non-negative Matrix Factorization) extends the standard NMF framework with a spatial mixing matrix that models neighborhood influences, guiding the factorization toward spatially coherent solutions.

 

 

□ scZiva: imputation method for single-cell RNA-seq data with zero-inflated variational autoencoder

 

https://link.springer.com/article/10.1186/s12859-026-06422-2

 

scZiva, a structured probabilistic imputation framework built on a VAE for scRNA-seq data. The framework is implemented using a Zero-Inflated Negative Binomial (ZINB) likelihood with a convolution-enhanced encoder architecture.

 

scZiva adopts a probability-guided selective imputation mechanism to recover likely technical dropouts while preserving biologically meaningful zeros.

□ Noémi Büchi / “Exuvie”

 

スイスの先鋭的IDM/サウンド・アーティストによる、物質の『不在と余波』の音響的表現。粒度の高いブルータルなリズムの反復と、メタリックなシンセ・オーケストラを以て、時間軸上に寄せては返す音楽と非音楽の境界を削り出す

 

«I am interested in the quiet resonance of what is no longer, yet still vibrates.» — Noémi Büchi

 

彼女はチューリッヒ芸術大学で電子音響作曲の修士号を取得しており、極めて高レベルな抽象的コンテクストで新たな音楽構造を模索している

 

 

□ “After the Fold”

 

 

□ “dislocated bodies (feat. Ana Chkheidze)”

 

 

"Her music is defined by a delicate synthesis of textural rhythms and electroacoustic-orchestral abstraction. She contrasts rhythmic physicality with disruption and playfully emphasizes irregularities, creating an expansive listening experience marked by detail and elevation."

 

Cat.No.: OUS058

Release Date: 2026-03-31

Written and produced by Noémi Büchi 

Mixing and mastering by Manuel Oberholzer, Suoni Speziali

Album cover by Brigitte Fässler

Graphic design by Lydia Perrot

□ Alice Sara Ott / “Jóhann Jóhannsson: Piano Works”

 

しばしば心から敬愛する作曲家が亡くなることがある。

アイスランドの現代音楽家、ヨハン・ヨハンソンも私にとってその一人だ。

アリス=紗良・オットが彼の故郷を訪れ、愁色のピアノを奏でる

 

 

□ “The Sun’s Gone Dim and the Sky’s Turned Black“

 

 

 

□ “Flugeldar II”

 

 

Deutsche Grammophon GmbH, Berlin

 

Released on: 2026-03-06

Performed by Alice Sara Ott

Mastering Engineer: Bergur Þórisson

Composer: Jóhann Jóhannsson

Photographer: Jónatan Gretarsson

Art director Anders Ladegaard

Liner notes written by Wyndham Wallace

 

□ Hans Zimmer・Richard Harvey / “Poisoned Chalice”

 

https://youtu.be/97GEFrrpTtQ

 

ハンス・ジマー、00年代初頭まではクラシックや前衛音楽を技巧的に引用する天才として認識していたけど、私の彼に対する評価を『真の芸術家』にまで押し上げたのが、この『毒杯』という楽曲。闇から闇へ流れる旋律の断片、神聖で宇宙的スケールさえ感じる終曲部

 

Composer, Producer: Hans Zimmer

Studio Personnel, Recording Engineer: Geoff Foster

Mixer, Studio Personnel: Alan Meyerson

Studio Personnel, Mix Engineer: Al Clay

Editor: Simon Charger

Associated Performer, Orchestra Leader: Gavyn Wright

Soprano, Associated Performer: Hila Plitmann

 

 

 

 

□ Richard Harvey / “The Mirror of TIme” (CHORAL SOUNDSCAPES)

 

https://youtu.be/ZnpNq0Q7Ciw

 

Hans Zimmerの最高傑作の一つである”Da Vinci Code”の共同作曲者リチャード・ハーヴェイ。どちらかというとニューエイジ畑の人だけれど、主にコーラス・パートを担当していると思われる。ダークで神秘的な合唱曲

□ 『Hoppers』(私がビーバーになる時)

 

傑作。主人公の『正しさ』の暴走が、相反するはずの自然界と人間社会に予測不可能なカタストロフィを齎す。後半からとんでもなくハチャメチャで尖りまくってるけど決して破綻せず着地する。普段何食べてたらあんな狂ったチェイスシーン思いつくんだ…

 

PIXER (2026)

Director: Daniel Chong

Written by Jesse Andrews

Produced by Nicole Paradis Grindle

Composed by Mark Mothersbaugh

Cinematography by Jeremy Lasky / lan Megibben

 

 

これ、今後アバターがやりたかったネタ(アバターに期待してたネタ)を先にやってしまっている感じもするので、キャメロン今ごろ頭抱えてるんじゃ無いかと勝手に心配してる 

 

みんなの感想を読んでいて興味深いのが、「主人公の理想が思わぬ余波を招き痛い目にあう」部分を評価する人たちに対して、「主人公(作り手)が理想を押し付けてきて乗れなかった」と批判する人たちがいる点。どうしてか同じ映画を観て真逆の解釈が発生してしまっている

 

 

 

 

□ Mark Mothersbaugh / “Grandma Tanaka”