Monday, July 16, 2007

Power Consumption by Google Services

Even though, Google doesn’t share a lot of details of their infrastructure, as we have seen from limited published information, they are obsessed with continuously monitoring, managing and improving the efficiency of their infrastructure.

Recently, Robin Harris attended the Google conference on scalability and then mused How Yahoo can beat Google. Few months ago, Google published results of their work on disk drive failure in paper Failure Trends in a Large Disk Drive Population [PDF]. It was extensively covered in blogosphere including by me in blog entries SMART not so smart in predicting disk drive failure and Google Findings of Disk Failures Rates and Implications and by Robin Harris in his blog entry Google’s Disk Failure Experience.

Google has done it again and presented results of their work on power consumption and provisioning in paper Power Provisioning for a Warehouse-sized Computer [PDF] at the ACM International Symposium on Computer Architecture, San Diego CA, June 9 – 13, 2007. In this work, Google researchers, Xiaobo Fan, Wolf-Dietrich Weber and Luiz Andre Barroso looked in to 15,000 servers running three different applications – Websearch, Webmail and Mapreduce for six months to determine the power usage characteristics at Rack, PDU and Datacenter levels.

Google Services

Websearch: A service with high request throughput and large data processing for each request.

Webmail: A disk I/O intensive service. Machines configured with large number of disk drives. Each request involves a relatively small number of servers.

Mapreduce: A cluster dedicated to running large offline batch jobs. Involve process terabytes of data using thousands of machines.

Key Findings

The key findings from this work are:
  1. The difference between maximum power used by large number of computing devices, cumulatively, and their theoretical peak usage can be as much as 40% in datacenters.

  2. It may be more efficient to leverage power management techniques at datacenter level than at rack level.

  3. Nameplate ratings are of little use in power provisioning as they significantly overestimate actual maximum usage.

  4. CPU utilization as a measure of machine-level activity produces accurate results for dynamic power usage especially with large group of machines. The dynamic power range is less than 30% for disks and negligible for motherboards.

  5. Using maximum power draw of individual machines to provision the datacenter, will have some stranded capacity.

  6. A mix of diverse workload reduces the difference between average and peak power, an argument in favor of mixed deployment.

  7. Idle power is significantly lower than the actual peak power, but generally never below 50%.

  8. CPU dynamic voltage/frequency scaling may yield moderate energy savings (up to 23%) at datacenter levels.

  9. Peak power consumption at the data center level could be reduced by 30% and energy usage could be halved if systems were designed so that lower activity levels meant correspondingly lower power usage profiles.
I believe there is a value proposition in this study waiting to be discovered by Copan Systems for their MAID technology to be applied at PDU and datacenter scales or even along the lines of SUN Project Blackbox.

More details from this study later.

2 comments:

  1. Hi,

    I've read also this article but i'm still looking for some papers or study around a 'power intensive' benchmarks mentioned in the document.

    ANy ideas ?

    ReplyDelete
  2. Not sure I understood your question. Do you want to clarify further with what you mean by power intensive benchmark? Check out 'Related Works' section in the paper for some other studies that may be relevant to you.

    ReplyDelete