The Department of Computer Science at the University of Cyprus cordially invites you to the Colloquium entitled:
ECM-sketches and Multi-cloud MapReduce
Speaker: Dr. Odysseas Papapetrou
In this talk I will present my recent work on big data management. I will start with ECM-sketch, a compact and efficient sketch that enables a wide range of sliding window queries over distributed high-dimensional data streams. The sketch allows effective summarization of streaming data over both time-based and count-based sliding windows, and enables point and inner-product queries with probabilistic accuracy guarantees. It can be employed to address a broad range of problems over centralized and distributed data streams, such as maintaining frequency statistics, finding heavy hitters and computing quantiles in the sliding-window model. The ECM-sketch is recently published in VLDB and VLDB journal. Furthermore, we are currently working towards an FPGA-based implementation of the sketch, which can further increase performance and reduce energy cost drastically. In the second part of the talk, I will introduce a new programming model that enables better utilization of the computational and network resources of multiple distributed clouds. Existing cloud programming models (e.g., MapReduce), assume that all cloud resources (consequently, also all data) are located within a single data center that supports high-speed network, e.g., infiniband. This is a restrictive assumption for many real-world scenarios, where the data to be processed is physically distributed, e.g., over cloud resources hosted by different providers, or even over multi-site data centers. I will explain why existing programming models fail in such scenarios, and describe a novel model suitable for this distribution. The model enables scalability of MapReduce computations across large distributed cloud federations, and requires a very small learning curve for existing MapReduce developers. The core innovation of the model is that it enables developers to clearly distinguish between local and holistic reductions, i.e., reductions that can be performed in isolation, inside each individual cloud, vs reductions that need to incorporate data from all clouds. This information can then be exploited by the execution engine, in order to alleviate network and processing bottlenecks and to increase parallelism. This work is currently under preparation.
Odysseas Papapetrou received his PhD in Computer Science from University of Hannover, after obtaining an M.Sc. from Saarland University, and a B.Sc. and M.Sc. from the University of Cyprus. Since 2011, he is a researcher at the Software Technology and Network Applications Laboratory of the Technical University of Crete. His research focuses on big data management, with a special interest on distributed data.
|Mailing List: https://listserv.cs.ucy.ac.cy/mailman/listinfo/cs-colloquium|
|Sponsor: The CS Colloquium Series is supported by a generous donation from