14/12/2016 11:30Taub 401


Resampling with Feedback - A New Paradigm of Using Workload Data for Performance Evaluation

Dror Feitelson

Computer Science, Hebrew University

Reliable performance evaluations require representative workloads. This has led to the use of accounting logs from production systems as a source for workload data in simulations. I will survey 20 years of ups and downs in the use of workload logs, culminating with the idea of resampling with feedback. It all started with the realization that using workload logs directly suffers from various deficiencies, such as providing data about only one specific situation, and lack of flexibility, namely the inability to adjust the workload as needed. Creating workload models solves some of these problems but creates others, most notably the danger of missing out on important details that were not recognized in advance, and therefore not included in the model. Resampling solves many of these deficiencies by combining the best of both worlds. It is based on partitioning the workload data into basic components (e.g. the jobs contributed by different users), and then generating new workloads by sampling from this pool of basic components. This allows analysts to create multiple varied (but related) workloads from the same original log, all the time retaining much of the structure that exists in the original workload. However, resampling should not be applied in an oblivious manner. Rather, the generated workloads need to be adjusted dynamically to the conditions of the simulated system using a feedback loop. Resampling with feedback is therefore a new way to use workload logs which benefits from the realism of logs while eliminating many of their drawbacks. In addition, it enables evaluations of throughput effects that are impossible with static workloads.

Bio: Dror Feitelson is a professor of Computer Science at the Hebrew University of Jerusalem, where he has been on the faculty of the Rachel and Selim Benin School of Computer Science and Engineering since 1995. His research emphasizes experimental techniques and real-world data in computer systems performance evaluation, and more recently also in software engineering. Using such data he and his students have demonstrated the importance of using correct workloads in performance evaluations, identified commonly made erroneous assumptions that may call research results into question, and developed methodologies to replace assumptions with real data. Other major contributions include co-founding the JSSPP series of workshops (now in its 20th year), establishing and maintaining the Parallel Workloads Archive (which has been used in about a thousand papers), and a recent book on Workload Modeling published by Cambridge University Press in 2015.

21/12/2016 11:30TBA



Michael Chertkov



Bio: TBA

27/12/2016 11:30Taub 301


Temporal planning: towards highly utilized clouds

Ishai Menache

Microsoft Research, Redmond

Existing resource management frameworks for large scale cloud systems leave unresolved the problematic tension between high resource utilization and job's performance predictability - respectively coveted by operators and users. In this talk, I will present recent efforts to resolve this tension through temporal planning: unlike popular scheduling and routing schemes, we propose mechanisms that plan the resource allocations into future time steps. Intuitively, such planning allows the operator to pack the cloud more densely, while offering performance SLAs to users. I will describe two recent systems that incorporate forms of temporal planning: (i) Morpheus - a cluster resource management system that offers automated SLAs to customers while reducing the cluster footprint; and (ii) Pretium - a combined traffic engineering and pricing framework for WAN bandwidth.

Bio: Ishai Menache is a researcher in Microsoft Research, Redmond. He received his PhD from the Technion - Israel Institute of Technology. Subsequently, he was a postdoc at the Laboratory for Information and Decision Systems in MIT. Ishai's research focuses on developing large-scale resource management and optimization frameworks for datacenters. More broadly, his areas of interest include systems and networking, algorithms and machine learning