Scalable Distributed Stream Processing

Mitch Cherniack,, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Ugur Cetintemel, Ying Xing, and Stan Zdonik

Stream processing fits a large class of new applications for which conventional DBMSs fall short. Because many stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management and higher availability, future stream processing systems will operate in a distributed fashion. They will run across the Internet on computers typically owned by multiple cooperating administrative domains. This paper describes the architectural challenges facing the design of large-scale distributed stream processing systems, and discusses novel approaches for addressing load management, high availability, and federated operation issues. We describe two stream processing systems, Aurora* and Medusa, which are being designed to explore complementary solutions to these challenges.

CIDR 2003 - First Biennial Conference on Innovative Data Systems Research, Asilomar, California, January 2003.

[PDF (515K)]