Idit Keidar: Concurrent Big Data Processing – Data Structures & Semantics
Abstract:
Big data processing systems often employ batched updates and data sketches to estimate certain properties of large data. For example, a Θ sketch estimates the number of unique items in a data stream and the CountMin sketch approximates the frequencies at which distinct stream elements occur. This talk will focus on concurrent (multi-threaded) implementations of such objects.
First, we will present an efficient generic approach to parallelizing data sketches and allowing them to be queried in real time, while bounding the error that such parallelism introduces. This solution achieves high scalability while keeping the error small, and is now publicly available as part of the popular open-source data sketches library.
Second, we will discuss the correctness semantics of such objects. Specifically, we will consider (ε,δ)-bounded objects that estimate some quantity with an error of at most ε with probability at least 1-δ. We will define Intermediate Value Linearizability (IVL), a correctness criterion that relaxes linearizability to allow more parallelism, and yet preserves the error bounds of sequential (ε,δ)-bounded objects. To illustrate the power of this result, we will show a straightforward and efficient concurrent implementation of an (ε,δ)-bounded CountMin sketch, which is IVL (albeit not linearizable).
Based on joint works with Arik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel, Lee Rhodes, and Hadar Serviansky
About the speaker:
Idit Keidar is the Lord Leonard Wolfson Professor at the Technion’s Vitrerbi Faculty of Electrical Engineering. She received her BSc (summa cum laude), MSc (summa cum laude), and PhD from the Hebrew University of Jerusalem in 1992, 1994, and 1998, respectively. Subsequently, she was a Rothschild Postdoctoral Fellow at MIT’s Laboratory for Computer Science. Prof. Keidar has served as the program chair for a number of leading conferences (PODC, DISC, PPoPP, and SYSTOR). She currently heads the Technion Rothschild Scholars Program for Excellence and consults for Yahoo Labs and Orbs.
Torsten Hoefler: High-performance distributed memory systems – from supercomputers to data centers
(slides)
Abstract:
We will cover distributed memory programming of high-performance supercomputers and datacenter computers. Starting from the Message Passing Interface, we observe abstractions for distributed computations that we carry through optimizations such as topology mapping and collective communication optimization. We then discuss efficient correction protocols to enable fault tolerance in such high-performance distributed systems. Armed with these insights, we observe that supercomputers are likely to migrate into megadatacenter installations leading to a general convergence of such architectures. The first step, converging the network interfaces, is well underway towards a general acceptance of Remote Direct Memory Access (RDMA) networking. RDMA moves the distributed system closer to shared memory, with a weakly consistent memory model. We discuss several algorithmic and systems approaches to accelerate distributed replicated state machines, databases, and locking systems by orders of magnitude using RDMA. Finally, if time allows, we will outline parametric program graphs – a sound abstraction for analyzing and optimizing applications. Each topic will identify open problems and provide ideas for further work to deepen our understanding of high-performance distributed memory systems.
About the speaker:
Torsten is a Professor of Computer Science at ETH Zürich, Switzerland. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the “Collective Operations and Topologies” working group. His research interests revolve around the central topic of “Performance-centric System Design” and include scalable networks, parallel programming techniques, and performance modeling. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, SC19, EuroMPI’13, HPDC’15, HPDC’16, IPDPS’15, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. He received the Gordon Bell Prize, the Latsis prize of ETH Zurich, as well as an ERC starting grant. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.