16/5/2012CS: Taub 337

 

Dynamic Reconfiguration of Primary/Backup Clusters

Dr. Alex Shraer

Yahoo! Research

Dynamically changing (reconfiguring) the membership of a replicated distributed system while preserving data consistency and system availability is a challenging problem. In this talk I will discuss this problem in the context of Primary/Backup clusters and Apache Zookeeper. Zookeeper is an open source system which enables highly reliable distributed coordination. It is widely used in industry, for example in Yahoo!, Facebook,Twitter, VMWare, Box, Cloudera, Mapr, UBS, Goldman Sachs, Nicira, Netflix and many others. A common use-case of Zookeeper is to dynamically maintain membership and other configuration metadata for its users. Zookeeper itself is a replicated distributed system. Unfortunately, the membership and all other configuration parameters of Zookeeper are static - they're loaded during boot and cannot be altered. Operators resort to ''rolling restart'' - a manually intensive and error-prone method of changing the configuration that has caused data loss and inconsistency in production. Automatic reconfiguration functionality has been requested by operators since 2008. Several previous proposals were found incorrect and rejected. We designed and implemented a new reconfiguration protocol in Zookeeper and are currently integrating it into the codebase. It fully automates configuration changes: the set of Zookeeper servers, their roles, addresses, etc. can be changed dynamically, without service interruption and while maintaining data consistency. By leveraging the properties already provided by Zookeeper our protocol is considerably simpler than state of the art in reconfiguration protocols. Our protocol also encompasses the clients -- clients are rebalanced across servers in the new configuration, while keeping the extent of migration proportional to the change in membership.

This is a joint work with Benjamin Reed (Yahoo!), Dahlia Malkhi (MSR) and Flavio Junqueira (Yahoo!). A paper describing this work will appear in the 2012 Usenix ATC conference and in the 2012 Hadoop Summit. http://www.cs.technion.ac.il/~shralex/zkreconfig.pdf

Bio: Bio: Alexander (Alex) Shraer is a Research Scientist in Yahoo!. His research interests include large-scale and dynamic distributed systems, various aspects of fault-tolerance, secure cloud storage, publish-subscribe systems, etc. Alex received his B.Sc. (2004, summa cum laude) and M.Sc. (2006, cum laude) degrees in Computer Science and PhD in Electrical Engineering (2010) from the Technion, Israel. He is a recipient of the Levi Eshkol 3-year PhD fellowship, awarded by the Israeli Ministry of Science.

23/5/2012 14:30EE: Meyer 1003

 

Dr. Philip M. Merlin Memorial Lecture and Prize Award
Scheduling Algorithms for the 4G Cellular Networks

Prof. Reuven Cohen

Computer Science Department, Technion

Cellular networks are becoming more and more crucial to our daily lives and operators are seeking new technologies for increasing their bandwidth. Examples for such technologies are cell sectorization, fractional frequency reuse, and coordinated multipoint Tx/Rx. To take advantage of these new technologies, the scheduler logic at the base station needs to determine not only when to transmit each packet but also what modulation and coding scheme to use, which frequency reuse area's resources, and by which transmitting antenna. In this talk, I will discuss all these modern scheduling problems and present algorithms for solving them

Bio: Reuven Cohen received the Ph.D. degree from the Technion in 1991. Since 1993, he has been a professor in the Department of Computer Science at the Technion, working on protocols and architectures for computer networks. He has served as an editor of the IEEE/ACM Transactions on Networking and the ACM/Kluwer Journal on Wireless Networks (WINET). He was the technical program co-chair of IEEE Infocom 2010.

13/6/2012CS: Taub 337

 

Improved Bounds for Byzantine Self-Stabilizing Clock Synchronization

Dr. Christoph Lenzen

Weizmann Institue of Science

The challenging task of Byzantine self-stabilizing pulse synchronization requires that, in the presence of a minority of nodes that are permanently maliciously faulty, the non-faulty nodes must start firing synchronized pulses in a regular fashion after a finite amount of time, regardless of the initial state of the system. We study this problem under full connectivity in a model where nodes have local clocks of unknown, but bounded drift, and messages are delayed for an unknown, but bounded time.

We present a generic scheme that, given a synchronous consensus protocol P, defines a self-stabilizing pulse synchronization algorithm A(P). If P terminates within R rounds (deterministically or with high probability), A(P) stabilizes within O(R) time (deterministically or with high probability, respectively). Utilizing different consensus routines, our transformation yields various improvements over previous techniques in terms of stabilization time and bit complexity. Finally, we sketch how to establish the abstraction of synchronous, integer-labeled rounds on top of pulse synchronization, at essentially the same complexity bounds.

We will discuss our approach and its merits assuming no previous knowledge on the problem, however, basic familiarity with consensus will be beneficial.

Bio: Christoph Lenzen received a diploma degree in Mathematics from the University of Bonn, Germany, and subsequently performed his graduate studies in Distributed Computing in the group of Professor Roger Wattenhofer at ETH Zurich. In 2011, he was a postdoctoral Fellow at the Hebrew University of Jerusalem, with Danny Dolev. Currently, he is a postdoctoral fellow at the Weizmann Institue of Science, with Professor David Peleg. His research interests cover distributed computing in a wider sense, including topics such as randomized load balancing, graph algorithms, and clock synchronization. He published e.g. at PODC, SPAA, FOCS, and STOC, and in JACM. In 2009, he and his coauthors received the PODC best paper award for their work on gradient clock synchronization.