This vision paper presents a new generation of multimodal streaming systems that embed Multimodal Large Language Models (MLLMs) as first-class operators, enabling real-time query processing across multiple modalities. While recent work has integrated …
Learned cost models (LCMs) have recently gained traction as a promising alternative to traditional cost estimation techniques in data management, offering improved accuracy by capturing complex interactions between queries, data, and runtime …
The growing number of IoT devices has led to decentralized networks for handling unbounded data streams, but traditional centralized window aggregation results in high network overhead and processing bottlenecks. Current decentralized solutions only …
PDSP-Bench is a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. While existing benchmarking systems focus on analyzing stream processing systems using …
COSTREAM provides a learned cost model for Distributed Stream Processing Systems that can accurately predict the execution costs of a streaming query in an edge-cloud environment. The model can be used to find an initial placement of operators across …
ZERoTuNE introduces a novel cost model for parallel and distributed stream processing that can be used to effectively set initial parallelism degrees of streaming queries. Unlike existing models, which rely majorly on online learning statistics that …
Stream processing systems designed to process data streams in real-time must handle sensitive or personal data across multilayered systems (sensor, fog, and cloud layers), which raises privacy concerns as data may be subject to unauthorized access …
This paper presents zero-shot cost models for parallel stream processing, enabling accurate cost predictions for parallel streaming queries without having observed any query deployment. The approach leverages data-efficient zero-shot learning …
Distributed Stream Processing (DSP) systems highly rely on parallelism mechanisms to deliver high performance in terms of latency and throughput. Yet the development of such parallel systems altogether comes with numerous challenges. In this paper, …
This paper proposes a learned cost estimation model for Distributed Stream Processing Systems (DSPS) with an aim to provide accurate cost predictions of executing queries. A major premise of this work is that the proposed learned model can generalize …