Recently, Robin Harris attended the Google conference on scalability and then mused How Yahoo can beat Google. Few months ago, Google published results of their work on disk drive failure in paper Failure Trends in a Large Disk Drive Population [PDF]. It was extensively covered in blogosphere including by me in blog entries SMART not so smart in predicting disk drive failure and Google Findings of Disk Failures Rates and Implications and by Robin Harris in his blog entry Google’s Disk Failure Experience.
Google has done it again and presented results of their work on power consumption and provisioning in paper Power Provisioning for a Warehouse-sized Computer [PDF] at the ACM International Symposium on Computer Architecture, San Diego CA, June 9 – 13, 2007. In this work, Google researchers, Xiaobo Fan, Wolf-Dietrich Weber and Luiz Andre Barroso looked in to 15,000 servers running three different applications – Websearch, Webmail and Mapreduce for six months to determine the power usage characteristics at Rack, PDU and Datacenter levels.
Websearch: A service with high request throughput and large data processing for each request.
Webmail: A disk I/O intensive service. Machines configured with large number of disk drives. Each request involves a relatively small number of servers.
Mapreduce: A cluster dedicated to running large offline batch jobs. Involve process terabytes of data using thousands of machines.
The key findings from this work are:
- The difference between maximum power used by large number of computing devices, cumulatively, and their theoretical peak usage can be as much as 40% in datacenters.
- It may be more efficient to leverage power management techniques at datacenter level than at rack level.
- Nameplate ratings are of little use in power provisioning as they significantly overestimate actual maximum usage.
- CPU utilization as a measure of machine-level activity produces accurate results for dynamic power usage especially with large group of machines. The dynamic power range is less than 30% for disks and negligible for motherboards.
- Using maximum power draw of individual machines to provision the datacenter, will have some stranded capacity.
- A mix of diverse workload reduces the difference between average and peak power, an argument in favor of mixed deployment.
- Idle power is significantly lower than the actual peak power, but generally never below 50%.
- CPU dynamic voltage/frequency scaling may yield moderate energy savings (up to 23%) at datacenter levels.
- Peak power consumption at the data center level could be reduced by 30% and energy usage could be halved if systems were designed so that lower activity levels meant correspondingly lower power usage profiles.
More details from this study later.