Towards Multimodal Stream Processing Systems

Uélison Jean Lopes dos Santos, Alessandro Ferri, Szilard Nistor, Riccardo Tommasini, Carsten Binnig, Manisha Luthra

October 2025

Abstract

In this paper, we present a vision for a new generation of multimodal streaming systems that embed MLLMs as first-class operators, enabling real-time query processing across multiple modalities. Achieving this is non-trivial: while recent work has integrated MLLMs into databases for multimodal queries, streaming systems require fundamentally di􏰁erent approaches due to their strict latency and throughput requirements. Our approach proposes novel optimizations at all levels, including logical, physical, and semantic query transformations that reduce model load to improve throughput while preserving accuracy. We demonstrate this with Samsara, a prototype leveraging such optimizations to improve performance by an order of magnitZulip Accountude. Moreover, we discuss a research roadmap that outlines open research challenges for building a scalable and e􏰂cient multimodal stream processing systems.

Type

Conference paper

Publication

EDBT 2026

Towards Multimodal Stream Processing Systems

Abstract

Related