27/4/2015 11:30EE, 861


Catapulting beyond Moore's Law: Using FPGAs to Accelerate Data Centers

Derek Chiou


In this talk, I will describe a joint Microsoft Research and Bing project to study the possibility of using field programmable gate arrays to accelerate cloud applications. We developed an FPGA card that plugs into a Microsoft designed cloud server and used it to accelerate a significant portion of Bing's search engine. Because the application does not fit into a single FPGA, we spread the application across multiple FPGAs spread across multiple servers and connected together with a low latency network also implemented by the FPGA. Our 1,632 server pilot demonstrated a factor of two performance improvement over pure software at significant power savings. FPGAs will accelerate Bing searches in one data center starting in 2015.

Bio: Derek Chiou is a Principal Architect at Microsoft where he co-leads a team working on FPGAs for data center applications and an associate professor at The University of Texas at Austin. His research areas are FPGA acceleration, high performance computer simulation, rapid system design, computer architecture, parallel computing, Internet router architecture, and network processors. Before going to UT, Dr. Chiou was a system architect at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

29/4/2015 11:30CS, Taub 9 classroom


Scaling Concurrent Log-Structured Data Stores

Dr. Eshcar Hillel

Yahoo Labs

Log-structured data stores (LSM-DSs) are widely accepted as the state-of-the-art implementation of key-value stores. They replace random disk writes with sequential I/O, by accumulating large batches of updates in an in-memory data structure and merging it with the on-disk store in the background. While LSM-DS implementations proved to be highly successful at masking the I/O bottleneck, scaling them up on multicore CPUs remains a challenge. This is nontrivial due to their often rich APIs, as well as the need to coordinate the RAM access with the background I/O.

We present cLSM, an algorithm for scalable concurrency in LSM-DS, which exploits multiprocessor-friendly data structures and non-blocking synchronization. cLSM supports a rich API, including consistent snapshot scans and general non-blocking read-modify-write operations. We implement cLSM based on the popular LevelDB key value store, and evaluate it using intensive synthetic workloads as well as ones from production web-serving applications. Our algorithm outperforms state of the art LSMDS implementations, improving throughput by 1.5x to 2.5x. Moreover, cLSM demonstrates superior scalability with the number of cores (successfully exploiting twice as many cores as the competition).

This is a joint work with Guy Golan, Eddie Bortnikov (Yahoo Labs) and Idit Keidar (Technion, Yahoo Labs), presented at EuroSys 2015.

Bio: Eshcar Hillel is a Research Scientist at Yahoo Labs, currently working in the Scalable Search Systems Research team. Eshcar received the B.Sc. degree in Software Engineering and the Ph.D. degree in Computer Science from the Technion, in 2002 and 2011, respectively. Prior to her graduate studies she worked at Hewlett-Packard as part of the Non-Stop relational database system team.