INSAIT, part of Sofia University “St. Kliment Ohridski” announced StateSpaceDiffuser, a new diffusion-based world model that achieves substantial improvements in spatial and temporal consistency, has been accepted for presentation at NeurIPS 2025 - one of the world’s leading conferences in artificial intelligence, taking place this December in San Diego, USA.
Modern video diffusion models often struggle to maintain context over time, causing inconsistencies in scene structure and object placement. StateSpaceDiffuser addresses this challenge by combining diffusion modeling with a state-space architecture, enabling the system to retain information across frames and generate coherent, high-quality video sequences. Experiments demonstrate that the model maintains an order of magnitude longer temporal context than standard diffusion-based world models, while requiring minimal additional computation.
The work also introduces a new methodology for evaluating temporal consistency. Across extensive 2D and 3D benchmarks, StateSpaceDiffuser consistently outperforms existing approaches, delivering significantly improved stability and scene understanding.
This achievement contributes to a record year for INSAIT at NeurIPS. A total of six papers from INSAIT researchers have been accepted for NeurIPS 2025 - four in the main track and two in workshops, underscoring Bulgaria’s growing presence among Europe’s leading AI research centers.
Authors of the paper are Nedko Savov, Naser Kazemi, Deheng Zhang, Dr. Danda Pani Paudel, Dr. Xi Wang, and Prof. Luc Van Gool.


