lens, align.

lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

 

『MoN Takanawa: The Museum of Narratives』

 

私たちの隈研吾が高輪ゲートウェイに築いたモニュメンタルな美術館。ここでこそ本物の無垢材を使って欲しかった😆。THE LINKPILLERからの動線が本当に未来感に溢れてて、5月の涼風を感じながら歩くのが心地良かった

 

 

□ RAYE / “Click Clack Symphony (feat. Hans Zimmer)”

 

 

R&B/ネオ・ソウルのニュー・ディーヴァ RAYEと、映画音楽界の巨匠ハンス・ジマーのコラボ。ヴィンテージ・ポップな導入からは想像もつかないほど、『インターステラー』を彷彿とさせる壮大なシンフォニーに展開していく

 

Released on: 2026-03-20

Lyrics & melodies - RAYE & Hans Zimmer

Composed - RAYE, Mike Sabath, Hans Zimmer, Hendric Buenck, Russell Emanuel & Billie Ray Fingers

Produced - RAYE, Mike Sabath, Hans Zimmer, Hendric Buenck & Russell Emanuel.

Recorded with Nashville Music Scoring Orchestra

Arranged by Hendric Buenck & Russell Emanuel for Bleeding Fingers Music.

 

 

 □ RAYE / "Life Boat."

『The Sheep Detectives (ひつじ探偵団)』

 

古典的フーダニットな推理ものかと思いきや、羊視点の人間社会の地獄めぐりであるという点では、ブレッソンの『バルタザールどこへ行く』的でもある。喪失の痛みに耐えられず忘れることを選んだ羊の群れの中で、思い出を胸に真実を追う者。ラストシーンはずるすぎる。号泣必至🐏

 

 

グラシネのBESTIA上映で鑑賞。ずっと面白くて多幸感に満ちた時間を過ごせたし、プロジェクト・ヘイル・メアリーに通じる『名前』というテーマもあったけど、正直エンロール直前までは「まぁ、こんなもんかな」ってタカを括ってた。なんだよあのラストカットは。涙腺爆破された😭

 

 

 

□ Christophe Beck / “Every Sheep Should Have a Name”

 

2026 Amazon MGM Studios

Directed by Kyle Balda

Screenplay by Craig Mazin

Based on the novel by Leonie Swann

 

Composed by Christophe Beck

Cinematography by George Steel

Production Design by Suzie Davies

□ 1000 Rabbits / “Are we friends yet?”

 

個人的に大注目株のフォーク・パンクバンド。名門 英トリニティ音楽院出身。牧歌的ながら不穏なヴァイオリンとシンセ、緻密なドラム。穏やかに、時に野生的に響く歌声。郊外のライブハウスでこんな曲を浴びて、静かな夜道を歩いて帰りたい 

 

□ "Rubik's Cube"

 

 

2026 Young

River Fernandez - vocal

Laura Hussey - violin

Luke Brueck Seeley - drums

Olivia Hughes - synthesiser

Paolo Guglielminotti - guitar

 

 

□ Klur / “Laniakea (INNERVERSE Remix)”

□ 『The Life of Chuck』(サンキュー、チャック) 

 

W・ホイットマンの詩『Song of Myself』を翻案したスティーブン・キングのSF短編を映像化。どこかブレイクの詩も想起させる。宇宙の創生と終末は、あらゆる命それぞれが過ごす時間と等価だ。〝砂粒の中に世界を、ひとときの中に永遠を〟

 

 

聞きしに勝る感動作。誰もが自分を、大切な人たちが永遠に生き続けるに値する特別な存在と思っていい。その道中には棘や痛みも確かに存在する。幼少期を演じたベンジャミン・パジャックのダンスが圧巻!

 

 

『サンキュー、チャック』 その高い評価の割にとっつきにくい印象を与えるのは、第3章→第1章へと時系列が遡行する、スティーヴン・キングならではの一筋縄でない構成によるところ。自らが創造した箱庭の構造と「在り方そのもの」から奇蹟を見出す筆致。少年期からキングの愛読者だというフラナガン監督の神通力なくしては為し得なかった作品

 

 

NEON (2025)

Directed by Mike Flanagan

Based on the Story by Stephen King

Produced by Trevor Macy

Production Design by Steve Arnold

Cinematography by Eben Bolter

Composed by The Newton Brothers

I visited IMA today. The Grand Gallery features a brutalist spatial design that boldly incorporates natural light.

 

 

□ 『The Devil Wears PRADA 2 (プラダを着た悪魔2)』

 

美学と信条の交差点で果たされた再会。豪華絢爛なミラノ・ロケで、時代の軋みに喘ぐ価値と伝統の最期の矜持を描く。出版不況やコンプラ、多様性やAIといった当世的なテーマを扱いながら、人と人との信頼に明るい希望を見出す内容。普段使いでマルジェラを着こなすアンディが今風

 

 

Lady Gaga, Doechii / “RUNWAY”

 

 

 

Theodore Shapiro / “Satan Meets Satin”

 

Wendy Finerman Productions / 20th Century Studios (2026)

 

Director: David Frankel

Writers: Aline Brosh McKenna / Lauren Weisberger

Production Design: Jess Gonchor

Art Director: Christopher J. Morris

Costume Designer: Molly Rogers

Cinematography: Florian Ballhaus

Composer: Theodore Shapiro

 

 

□ 『プラダを着た悪魔』(2006)

 

 お仕事映画の王道を定義した名作。今はミランダに肩入れしてしまう。頂点に君臨する者にとって、〝ランウェイ〟の只中で降りたアンディの選択が如何に眩しかったか。余白なく敷き詰められた映画だけど、ラストで真っ新なノートを突きつけてくる。そろそろ答え合せをしよう

 

(Image by HHMI)

 

 

□ SCIGMA: Scalable, Generalizable, and Uncertainty-Aware Integration of Spatial Multi-Omics Across Diverse Modalities and Platforms

 

https://www.biorxiv.org/content/10.64898/2026.04.19.718223v1

 

SCIGMA (uncertainty-aware Spatially-informed Contrastive learning-based Integration w/ Graph neural networks for spatial Multi-modal Analysis) employs a novel contrastive learning framework that preserves modality-specific features while learning a cohesive joint representation.

 

SCIGMA leverages a multi-view graph capturing both spatial and feature-based similarities to dissect complex tissue organization, ensuring distinct molecular signals contribute meaningfully to biological interpretation without artificially enforcing similarity.

 

□ CELLULAR LEARNING: A mechanism for adaptive genome regulation in cancer  

 

https://www.nature.com/articles/s41586-026-10269-1

 

A  theoretical framework for how such cellular adaptation in cancer drug resistance  could be ‘learned’ by the AP-1 family of transcription factors.

 

AP-1-mediated gene regulation is a conserved mechanism across diverse biological systems through which cells fine-tune gene expression and establish lasting transcriptional responses to environmental stimuli.

 

A central challenge now lies in deciphering the regulatory code that orchestrates AP-1 cooperation with other transcription factors and the epigenetic machinery to produce cell-type and stimulus-specific responses.

 

According to the stress-induced evolutionary innovation model, ancestral stress responses - initially transient mechanisms for survival - were potentially stabilized by natural selection, giving rise to new cell types.

 

 

□ scDIVIDE: Inferring division-associated stochasticity from time-series single-cell transcriptomes

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718485v1

 

scDIVIDE (single-cell DIVision-linked Inference of stochastic Dynamics in Expression states), a neural stochastic differential equation framework that infers continuous cell dynamics from temporal scRNA-seq data while accounting for division-associated partitioning noise.

 

scDIVIDE integrates a theoretical formulation of partitioning noise, based on birth–death–mutation processes, into neural stochastic differential equations-based inference of population dynamics. It provides a biologically informed constraint for continuous dynamics inference.

 

 

□ OrionGeno: Advancing ab initio genome annotation

 

https://www.biorxiv.org/content/10.64898/2026.04.26.720859v1

 

OrionGeno, a multispecies phylogeny-aware deep learning framework for end-to-end eukaryotic genome annotation. It resolves complex gene structure variations across divergent lineages, jointly predicting exon-intron architectures, UTRs, and repeats directly from genomic sequences.

 

OrionGeno employs a hybrid backbone combining bidirectional mamba blocks (BiMamba) with a U-Net-like encoder-decoder. This design effectively captures both local and long-range genomic dependencies while maintaining subquadratic computational complexity.

 

 

 □ Fourier Kolmogorov-Arnold Network integrated into BioBERT-based model for Biomedical Named Entity Recognition 

 

https://www.nature.com/articles/s41746-026-02677-4

 

FRKAN-BioNER integrates BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) with the Fourier Kolmogorov-Arnold Network, namely FourierKAN, as the classifier component. FRKAN-BioNER is designed to replace conventional MLP classifiers with KAN-based alternatives for biomedical sequence labeling.

 

 

□ scPert: A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects 

 

https://www.biorxiv.org/content/10.64898/2026.04.24.720560v1

 

scPert, a multi-modal embedding fusion strategy framework based on Transformer architecture that integrates large language model embeddings with structured biological knowledge to predict single-cell transcriptomic responses to genetic perturbations.

 

scPert captures the semantic similarity between related perturbations. scPert predicts post-perturbation gene expression changes across individual cells, enabling computational prediction of cellular responses to genetic interventions.

 

 

□ Learning biophysical models of gene regulation with probability flow matching 

 

https://arxiv.org/abs/2604.25062

 

Probability Flow Matching (PFM), a scalable and simulation-free framework for inferring mechanistic models of gene expression dynamics from multi-marginal single-cell omics data.

 

PFM enables direct regression of gene regulatory dynamics w/o high-dimensional ODE solves, all while accommodating stochasticity of arbitrary nature. It ensures numerical stability by parameterizing conditional Gaussian paths using spectrally regularized Chebyshev interpolants.

 

 

□ RVQ-Alpha: Bridging Single-Cell Transcriptomics and Large Language Models via Discrete Tokenization and Verifiable Reinforcement Learning

 

https://www.biorxiv.org/content/10.64898/2026.04.20.719773v1

 

RVQ-Alpha employs a Residual Vector Quantization (RVQ) tokenizer that compresses each cell into a fixed 10-token sequence via eight residual codebooks embedded directly in the LLM vocabulary, enabling bidirectional unification of cell interpretation and generation.

 

scCoT-Synth, a teacher–student engine that grounds newly added biological tokens through evidence-before-conclusion reasoning, using the language modeling objective as the cross-modal alignment signal without a separate projection network.

 

A Fact-Aware RLVR system combines an ontology-grounded answer judge with saliency-weighted verification of biological claims against actual expression data, under dynamic gating that conditions hallucination suppression on task competence.

 

The semantic confusion structure of predictions is examined with the embedding-space organization, showing off-diagonal mass concentrates between adjacent cell types rather than across unrelated lineages, confirming RVQ tokenization preserves local gene expression structure.

 

 

□ GANGE: Achieving Sequencing Without Sequencing With Diffusion Guided Generative Genomic Transformer

 

https://www.biorxiv.org/content/10.64898/2026.04.15.718133v1

 

GANGE (Generative Additive Nucleotides based Genome Evolver), an innovative generative deep-learning framework to perform reads error correction and their extension, while being capable to work at as low as 4x coverage for genome assembly.

 

GANGE seamlessly integrates the Denoising Diffusion Probabilistic Model with a transformer-based encoder-decoder framework, combining the stochastic generative power of diffusion models with the contextual learning and long-range dependency detection capabilities of transformers.

 

 

□ MIMIC: A Generative Multimodal Foundation Model for Biomolecules 

 

https://arxiv.org/abs/2604.24506

 

MIMIC, a generative foundation model with a novel split-track architecture. MIMIC learns to translate fluently between the languages of the genome, the transcriptome, and the proteome while interfacing with context-aware text representations that emphasize cellular specificity.

 

LORE, a curated dataset released alongside MIMIC. LORE aligns heterogeneous, multi-scale biological data into unified per-entity snapshots, providing the critical cross-modal supervision.

 

 

□ GenNA: Conditional generation of nucleotide sequences guided by natural-language annotations

 

https://www.biorxiv.org/content/10.64898/2026.04.22.720063v1

 

GenNA is a generative nucleotide foundation model guided by natural language annotations. It adopts a decoder-only Transformer architecture, configured with reference to Qwen3, and uses a custom cross-modal BPE vocabulary, resulting in approximately 3.6 billion parameters.

 

GenNA is pretrained on a unified nucleotide-text-structure-label corpus spanning 2,221 eukaryotic species and approximately 416 billion characters.

 

GenNA employs a cross-modal BPE tokenizer, mapping frequent functional phrases, tag fragments, and nucleotide patterns into a discrete token space. It enables joint modeling of sequence patterns, functional semantics, and structural boundaries w/in a unified representation space.

 

 

□ scConcept enables concept-level exploration of single-cell transcriptomic data

 

https://www.biorxiv.org/content/10.64898/2026.04.21.719959v1

 

scConcept, a framework that introduces concept-level representation by transforming gene-level topic representations into structured, human-interpretable biological concepts.

 

scConcept integrates neural topic modeling with LLMs. It distills fragmented gene programs into semantically coherent concepts defined by a biological label, description, and gene set, and quantitatively maps them back to individual cells.

 

 

 

□ CHORD: a framework for cross-species single-cell integration across gene, cell and cell-type levels

 

https://www.biorxiv.org/content/10.64898/2026.04.19.719426v1

 

CHORD (Cross-species Hierarchical Orthologous Relationship Discovery) learns gene embeddings for all input genes by combining one-to-one ortholog anchors with cell-type-averaged expression protiles, and links cells to their types to model explicit cell-type embeddings.

 

 

 □ Benchmarking single-cell foundation models for real-world RNA-seq data integration 

 

https://www.biorxiv.org/content/10.64898/2026.04.17.719314v1

 

Benchmarking of leading single-cell foundation models (scGPT; scGPT_CP, a continually pretrained checkpoint of scGPT; scFoundation; scMulan; CellFM) against established baseline methods (scVI; Harmony) for data integration using over 1.5 million cells from clinical samples.

 

In practice, scVI remains the safest default across diverse datasets and evaluation criteria. For embedding reuse, scFoundation offers zero-shot stability and scGPT_CP provides stronger integration, while performance on downstream tasks should also be verified.

 

 

□ scTrends: automated classification and strength quantification of gene expression trends along pseudotime in single-cell RNA-seq 

 

https://www.biorxiv.org/content/10.64898/2026.04.21.719599v1

 

scTrends, an automated and interpretable framework for gene-level trend classification and strength quantification along a given pseudotime trajectory. It operates downstream of pseudotime methods to characterize expression dynamics once a temporal ordering is available.

 

scTrends models pseudotime-binned gene expression profiles and assigns genes to predefined temporal trend categories through a hierarchical, rule-based procedure combined with empirical significance testing, parameter selection, and quantitative assessment of trend strength.

 

 

□ GeneBench: Assessing AI Agents for Multi-Stage Inference Problems in Genomics and Quantitative Biology

 

https://www.biorxiv.org/content/10.64898/2026.04.22.720113v1

 

GeneBench, a novel benchmark spanning industry and academic-relevant subdomains of genomics as well as adjacent 'omics and quantitative biology topic.

 

 

 

□ HMCVelo: A Deterministic Model for Hydroxymethylation Velocity in Single Cells

 

https://www.biorxiv.org/content/10.64898/2026.04.20.719607v1

 

HMCVelo (hydroxymethylation velocity), the first velocity framework for DNA methylation dynamics. HMCVelo is a deterministic ordinary differential equation (ODE) model that computes the time derivative of hydroxymethylation state for individual cells / genes.

 

HMCVelo exploits a recent advance in single-cell epigenomics, Joint-snhmC-seq, enabling subtraction-free quantification of 5hmC / 5mC at single-cell resolution and resolving temporal methylation dynamics from static snapshots.

 

 

□ SPLICE: Long-read sequencing with isoform-level resolution, for bulk tissue or single cells. 

 

https://hox.bio/splice

 

Nearly all human genes produce multiple transcript isoforms. SPLICE explores cell populations and expression patterns with integrated tools for read processing, V(D)J resolution, and interactive exploration.

 

SPLICE enables responsive, fluid navigation across full-length alignments and supports 5′ long-read sequencing with untargeted polyA transcript capture for both bulk tissue and single-cell samples.

 

 

□ SVScope improves somatic structural variations detection via graph-genome optimization

 

https://link.springer.com/article/10.1186/s13059-026-04076-0

 

SVScope leverages full-length sequence information and local graph genome optimization. SVScope utilizes read alignment breakpoint information from the whole-genome scale to cluster and identify split-alignment somatic SVs and candidate inner-alignment somatic SVs. 

 

SVScope re-analyzes the alignment relationships among all full-length sequences spanning the candidate somatic SV interval using a partial order alignment (POA) graph with multi-sequence alignment representation and accurately clusters reads with a sequence mixture model.

 

 

□ Spark: Sparse Hierarchical Energy Minimization for Scalable Prediction of RNA Pseudoknots 

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag194/8660451

 

The time and space complexities of Spark, which parallel those of previous sparse pseudoknot-free energy minimization algorithms, arise from general sparsification strategies combined with careful rewriting, adaptation, and optimization.

 

The time complexity arises from iterating over O(n^2) pairs and minimizing over candidates, whose total per outer iteration is Z. Interior loop sizes are limited, restricting minimizations over all possible interior loops.

 

 

□ Signal, Bounds, and Baselines: Principles for Rigorous Evaluation of High-Dimensional Biological Perturbation Prediction 

 

https://www.biorxiv.org/content/10.64898/2026.04.20.719650v1

 

Signal: Verify metric sensitivity using meta-metrics, such as the BDS and PDS, before benchmarking. When sensitivity is insufficient, apply DEG-based weighting, top-n DEG filtering, or another signal exposure technique and report the effective gene number.

 

Bounds: Report performance relative to empirically grounded reference points, not only in absolute terms. For instance, use perturbation-wise calibrated scores relative to the technical duplicate and uninformative control bounds.

 

Baselines: Compare models against a hierarchy of unlearned, learned, and oracle baselines to establish what simple approaches already capture.

 

 

□ Recursive Repeat Extender (RRE): A recursive approach to automatically extend repeat element models

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718546v1

 

Recursive Repeat Extender (RRE) employs a novel recursive extension approach and profile HMMs (hidden Markov models) to extend repeat sequences in a genome, thereby overcoming the challenges of extending highly degenerate and tragmented repeats.

 

RRE uses a dynamic search approach in which the repeat model extended in one round is used to query the genome again in the next. This recursive procedure enables the progressive merging of repeat regions linked only through transitive adjacencies.

 

 

□ DeSCOPE: Decoding Single-Cell Omics of Perturbation 

 

https://www.biorxiv.org/content/10.64898/2026.04.13.718147v1

 

DeSCOPE, a lightweight and efficient virtual cell model for accurate prediction of single-cell responses to genetic perturbations. DeSCOPE leverages gene embeddings derived from the protein language model ESM2.

 

DeSCOPE explicitly decouples the latent distributions of control and perturbed cells, enabling robust out-of-distribution generalization across two demanding scenarios: unseen genes and unseen cell types.

 

 

□ Hybrid Gated Fusion: A Multimodal Deep Learning Framework for Protein Function Annotation

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718564v1

 

Hybrid Gated Fusion, a multimodal framework for GO annotation that integrates intrinsic protein evidence (sequence and structure) with extrinsic functional context (text and interaction networks) under incomplete availability of all input features.

 

Hybrid Gated Fusion applies a learnable Bilinear Gated Early Fusion module to estimate how informative each available modality is and how well it agrees with the others, producing a fused representation that emphasizes complementary evidence.

 

Hybrid Gated Fusion employs Residual Late Fusion, in which modality-specific auxiliary predictions are combined with the same gating weights so that decision-level contributions remain aligned with feature-level evidence quality.

 

 

□ cellNexus: Quality control, annotation, aggregation and analytical layers for the Human Cell Atlas data

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718336v1

 

cellNexus, a comprehensive tool and resource that converts the Human Cell Atlas collection into analysis-ready data by linking quality control layers, metadata enrichment, expression normalisation, analysis and data aggregation.

 

cellNexus enables robust statistical modelling across studies, exemplified by a multi-tissue map of immune cell communication during ageing, which reveals macrophage-muscle axes as among the most depleted regenerative interactions with age.

 

All harmonised layers, including pseudobulk and cell-cell communication summaries, are accessible via a public web interface and with R and Python APIs.

 

cellNexus provides continuous integration with CELL×GENE releases and transforms large cell atlas corpora into an accessible, reproducible, interoperable foundation for large-scale biological discovery and the next generation of single-cell foundation models.

 

 

□ HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data

 

https://arxiv.org/abs/2506.11152

 

HEIST encodes both gene co-expression networks and spatial cell graphs to support downstream tasks such as cell clustering, gene imputation, and clinical outcome prediction. The HEIST decoder can be fine-tuned while the encoder remains frozen.

 

HEIST captures fine-grained gene co-expression within cells and long-range cellular interactions through novel cross-level message passing, producing biologically contextualized embeddings.

 

 

□ scDisent: disentangled representation learning with causal structure for multi-omic single-cell analysis 

 

https://www.biorxiv.org/content/10.64898/2026.04.12.717909v1

 

scDisent, a generative framework for disentangled representation learning that separates expression-associated variables (zexpr) from regulation-associated variables (zreg) and links them through a sparse directed mapping.

 

scDisent combines modality-specific encoding, variational disentanglement with total-correlation and orthogonality constraints, and a Gumbelgated causal module protected by detach-based gradient isolation.

 

 

□ HySimODE: A hybrid stochastic-deterministic simulation framework for multiscale models of biological systems

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag185/

 

HySimODE uses a short deterministic pre-simulation and a machine-learning classifier to automatically assign each species to a stochastic or deterministic regime, and then combines a simple stochastic update rule with a stiff ODE solver in a single event-driven simulation loop.

 

HySimODE was trained and validated on a diverse dataset of biochemical ODE models spanning multiple dynamical regimes, enabling robust stochastic-deterministic partitioning beyond simple abundance thresholds.

 

 

□ Reconstructing cell-cell interaction network in single-cell spatial transcriptomics via directed heterogeneous graph autoencoder 

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag130/8658569

 

DualCellChat is a directed graph auto-encoder that endogenously models cell-cell interaction (CCI) directionality to reconstruct a complete interaction network. DualCellChat learns two feature spaces for cell sending and receiving signals in the encoding phase to clarify the dual role of a cell as a source and target.

 

 

□ iClust: Interpretable Biological Sequence Clustering

 

https://www.biorxiv.org/content/10.64898/2026.04.13.718335v1

 

iClust, an interpretable clustering method that characterizes each cluster by a representative prototype and an adaptive radius. 

 

iClust estimates local radius from local neighborhoods and iteratively refines both the prototype and the radius of each cluster, so that cluster centers and boundaries are characterized jointly rather than by a representative sequence alone.

 

 

□ LingoDNABench: Canonical self-supervised pretraining paradigm constrains the capacity of genomic language models on regulatory decoding

 

https://www.biorxiv.org/content/10.64898/2026.04.13.715198v1

 

LingoDNABench, a comprehensive regulatory-oriented benchmark suite to evaluate whether gLMs can extract transferable sequence representations across the full regulatory hierarchy: the prediction of chromatin profiling, transcription regulation, post-transcription regulation, and gene expression.

 

 

 

□ π-MSNet: A billion-scale, AI-ready living proteomics data portal 

 

https://www.biorxiv.org/content/10.64898/2026.04.13.718149v1

 

π-MSNet provides an AI-ready data framework for efficient training and systematic benchmarking of multiple models across three representative tasks (e.g., MS/MS spectrum prediction, retention time prediction, and de novo peptide sequencing).

 

 

□ A deterministic computational kernel encoded in the human genome

 

https://www.biorxiv.org/content/10.64898/2026.04.12.718009v1

 

A computational kernel is defined by four properties: boot sequence, instruction set, process table, and dispatch network.

 

The human genome satisfies all four, validated by five null model tests, fifteen robustness analyses, and independent biological validation against gene essentiality and protein-protein interaction data.

 

The same pipeline rejects random sequences of equivalent length and composition. The genome does not merely resemble a computational kernel. It satisfies the definition of one.

 

 

 □ BABAPPAlign: A Multiple Sequence Alignment Engine with a Learned Residue-Level Scoring Function

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag189/8656831

 

BABAPPAlign is an embedding-first progressive multiple sequence alignment engine for protein and coding nucleotide sequences. It integrates pretrained protein language model embeddings with a learned neural residue–residue scoring function within a affine-gap dynamic programming.

 

 

□ vcfilt: A Zero-Allocation Streaming Filter for High-Throughput VCF Processing

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718370v1

 

vcfilt, a streaming, batch-parallel VCF filter implemented in Go that restricts its scope to three high-frequency filter criteria (INFO/DP, INFO/AF, and QUAL) and applies them via a zero-allocation byte-scan parser.

 

vcfilt achieves 147,000 variants/second on a single thread — a 12.2x improvement over beftools 1.18 on identical hardware under identical conditions.

 

 

□ QCatch: A framework for quality control assessment and analysis of single-cell sequencing data

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag184/8654715

 

QCatch provides a robust, automated solution for quality control tailored specifically to the alevin-fry and simpleaf frameworks.

 

QCatch introduces two novel visualizations that leverage the splicing-aware quantification capabilities of alevin-fry in USA mode, which categorizes UMIs into Unspliced (U), Spliced (S), and Ambiguous (A) categories.

 

The first is a bar chart that displays the total number of UMIs in each splicing category, providing a snapshot of the transcript composition. The second is a histogram illustrating the distribution of the spliced ratio, calculated as (S + A) / (S + U + A), across all cells.

 

 

 □ fastVEP: A Fast, Comprehensive Variant Effect Predictor Written in Rust

 

https://www.biorxiv.org/content/10.64898/2026.04.14.718452v1

 

fastVEP, a complete reimplementation of the VEP variant consequence prediction engine in Rust. Rust is a systems programming language that guarantees memory safety without garbage collection, enabling both the performance of C/C++ and the safety of managed languages.

 

 

□ SpaNiche: spatial niche analysis to explore colocalization patterns and cellular interactions in spatial transcriptomics data

 

https://link.springer.com/article/10.1186/s13059-026-04069-z

 

SpaNiche leverages graph-regularized joint non-negative matrix factorization to integrate information from cell abundance and ligand-receptor expression, identifying colocalization patterns among cell types while providing insights into associated ligand-receptor interactions.

 

 

□ GenePT Revisited: Do Better Text Embeddings Make Better Gene Embeddings?

 

https://www.biorxiv.org/content/10.64898/2026.04.16.718976v1

 

Since GenePT's release, embedding models have improved rapidly, with many strong open and commercial encoders benchmarked on suites such as the Massive Text Embedding Benchmark (MTEB).

 

Across gene-gene interaction, gene classification, cell type classification, and perturbation prediction tasks, replacing the GenePT embedding backbone for stronger general-purpose text embedding models yields consistent improvements of 1-17%, depending on task and metric.

 

 

□ SpaFlow depicts the dynamics of ligand-receptor interaction in spatial transcriptomics data

 

https://www.biorxiv.org/content/10.64898/2026.04.17.719264v1

 

SpaFlow, a reaction-diffusion framework that models ligand diffusion, binding, dissociation, production and degradation to infer spatially resolved LRI activity and hotspots from ST data.

 

In SpaFlow, each spatial unit is represented as a node in a graph, and neighboring units are connected by edges, enabling tissue architecture to be captured without imposing a regular grid.

 

 

□ RICE: Robust causal gene network estimation for large-scale single-cell perturbation screens using reduced control function

 

https://www.biorxiv.org/content/10.64898/2026.04.20.719759v1

 

RICE, a scalable framework for causal gene network estimation that integrates a reduced con trol function to address latent confounding with a constrained generalized linear model accommodating both hard and soft interventions.

 

RICE enables efficient GPU-based optimization for large-scale data. RICE achieves higher accuracy and robustness than existing methods and remains stable under strong confounding and high-multiplicity-of-infectionMOI settings.

 

RICE produces a long tail of near-zero edge weights, which is removed by thresholding. The detected change point defines a threshold that effectively separates true causal signals from back ground noise.

 

 

□ PathPinpointR: Predicting the progression of sc-RNAseq samples through reference trajectories. 

 

https://www.biorxiv.org/content/10.64898/2026.04.21.715327v1

 

PathPinpointR (PPR) predicts the positions of scRNA-seq samples along reference biological trajectories, such as those created from large cell atlas projects.

 

PPR utilises sets of switching-gene events from reference trajectories as indicators of cellular progression. PPR uses the switching-gene events as checkpoints to rapidly predict a pseudotime for each cell within a query dataset.

 

 

□ scSketch: Interactive Sketch-based Trajectory Exploration and Pathway-Aware Analysis of Single-Cell Data

 

https://www.biorxiv.org/content/10.64898/2026.04.16.718997v1

 

scSketch, a tool that enables users to iteratively explore and test trajectory hypotheses in single-cell data while maintaining statistical validity and biological interpretability.

 

scSketch enables users to apply interactive directional sketching to draw trajectories across embeddings and probe continuous processes such as cellular differentiation and cell state transitions.

 

scSketch automatically computes gene-trajectory correlations and applies online false discovery rate (FDR) control to maintain statistical validity during iterative exploration.

 

 

□ Kernel Matrix Completion with Topological and Spectral Features for Multi-Modal Classification

 

https://www.biorxiv.org/content/10.64898/2026.04.19.713528v1

 

A computational pipeline based on kernel matrix completion, in which topological data analysis (TDA) and persistent spectral analysis are incorporated into the classification setting.

 

This method captures geometric structure across scales while spectral descriptors reflect connectivity patterns through Laplacian eigenvalues.

 

 

 □ IDEAL-GENOM: Integrated Downstream Analytical Toolkit for Genomic Analysis

 

https://www.biorxiv.org/content/10.1101/2025.08.27.672528v2

 

IDEAL-GENOM (Integrated Downstream Analytical Toolkit for Genomic Analysis), a Python-based wrapper designed to streamline the analytical workflow commonly implemented in genome-wide association studies (GWAS) settings.

 

 

□ BioEngine: scalable execution and adaptation of bioimage AI through agent-readable interfaces

 

https://www.biorxiv.org/content/10.64898/2026.04.19.719496v1

 

BioEngine is an open-source platform that manages AI model execution, fine-tuning, and application deployment on any GPU hardware, from a single laptop or workstation to a multi-node cluster, with a one-time setup.

 

 

□ LCPAN: efficient variation graph construction using locally consistent parsing

 

https://link.springer.com/article/10.1186/s13059-026-04088-w

 

Locally Consistent Parsing (LCP) is a string processing technique that partitions and labels strings into nearly equal-length substrings, ensuring a uniform positional distribution of cores over the input string.

 

 

 □ SpatialQuery: scalable discovery and molecular characterization of multicellular motifs from spatial omics data

 

https://www.biorxiv.org/content/10.64898/2026.04.22.720136v1

 

SpatialQuery, a framework that can both identify cellular motifs, i.e. recurrent multicellular co-localization patterns, and perform molecular analyses focused on the motifs. It uncovers genes modulated by spatial contexts through differential expression analysis.

 

SpatialQuery detects coordinated expression changes through covariation analysis. SpatialQuery can identify functional tissue units, and goes beyond pairwise analyses to characterize multicellular interactions.

 

 

 □ SNPic: SNP Topic Modeling for Interpretable Clustering of Complex phenotypes

 

https://www.biorxiv.org/content/10.64898/2026.04.22.720106v1

 

SNP topic model (SNPic), a generative probabilistic framework that reframes GWAS summary statistics as a structured corpus and models genetic architecture.

 

SNPic applies Latent Dirichlet Allocation (LDA), to infer latent "genetic topics", representing interpretable, overlapping biological modules that jointly explain complex traits.

 

 

 □ MultiOmicsXplorer, a tool to browse, access and analyse multi-omics data 

 

https://link.springer.com/article/10.1186/s12859-026-06460-w

 

MultiOmicsXplorer builds on theSignalingProfiler pipeline to extract protein activities from proteogenomic data, thereby reducing data complexity and dimensionality while enabling mechanistic hypothesis generation.

 

 

□ FUSE: Data-driven FUnctional SEgmentation of DNA Methylation Data

 

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag201/8665234

 

FUSE, a data-driven segmentation method that extracts intrinsic methylation blocks from whole-genome bisulfite sequencing (WGBS) data by jointly analyzing multiple samples. FUSE identifies blocks of CpG sites that capture spatially coherent methylation patterns while allowing variation in methylation states across cell types.

 

 

□ Topology-driven classification of time series

 

https://www.biorxiv.org/content/10.64898/2026.04.25.720787v1

 

A geometric framework that establishes a direct correspondence between the generative structure of a time series and the topology of its delay embedding.

 

Broad classes of signals (including exponential, harmonic, and exponentially modulated oscillatory processes) induce invariant low-dimensional subspaces in Hankel embedding space, which dimension is determined solely by the number and type of latent dynamical components.

 

Time series classification is reformulated as the problem of separating equivalence classes defined by ε-neighborhoods of subspaces on a Grassmann manifold.