Resampling with Feedback - A New Paradigm of Using Workload Data for Performance Evaluation
Computer Science, Hebrew University
Reliable performance evaluations require representative workloads. This has led to the use of accounting logs from production systems as a source for workload data in simulations. I will survey 20 years of ups and downs in the use of workload logs, culminating with the idea of resampling with feedback. It all started with the realization that using workload logs directly suffers from various deficiencies, such as providing data about only one specific situation, and lack of flexibility, namely the inability to adjust the workload as needed. Creating workload models solves some of these problems but creates others, most notably the danger of missing out on important details that were not recognized in advance, and therefore not included in the model. Resampling solves many of these deficiencies by combining the best of both worlds. It is based on partitioning the workload data into basic components (e.g. the jobs contributed by different users), and then generating new workloads by sampling from this pool of basic components. This allows analysts to create multiple varied (but related) workloads from the same original log, all the time retaining much of the structure that exists in the original workload. However, resampling should not be applied in an oblivious manner. Rather, the generated workloads need to be adjusted dynamically to the conditions of the simulated system using a feedback loop. Resampling with feedback is therefore a new way to use workload logs which benefits from the realism of logs while eliminating many of their drawbacks. In addition, it enables evaluations of throughput effects that are impossible with static workloads.
Bio: Dror Feitelson is a professor of Computer Science at the Hebrew University of Jerusalem, where he has been on the faculty of the Rachel and Selim Benin School of Computer Science and Engineering since 1995. His research emphasizes experimental techniques and real-world data in computer systems performance evaluation, and more recently also in software engineering. Using such data he and his students have demonstrated the importance of using correct workloads in performance evaluations, identified commonly made erroneous assumptions that may call research results into question, and developed methodologies to replace assumptions with real data. Other major contributions include co-founding the JSSPP series of workshops (now in its 20th year), establishing and maintaining the Parallel Workloads Archive (which has been used in about a thousand papers), and a recent book on Workload Modeling published by Cambridge University Press in 2015.