Google Compute Engine (GCE) experienced an outage on April 11, 2016 that resulted in all instances in different regions losing connectivity. The outage lasted for a total of 18 minutes. The company has issued an explanation or reason for outage (RFO) and what steps GCE will be taking to prevent the outage from occurring again. GCE customers will be offered up to 25% of their monthly charges in a form of credit for GCE and VPN services.
Google takes the outage very seriously and will be working to make sure such an incident does happen again.
We recognize the severity of this outage, and we apologize to all of our customers for allowing it to occur. As of this writing, the root cause of the outage is fully understood and GCE is not at risk of a recurrence. In this incident report, we are sharing the background, root cause and immediate steps we are taking to prevent a future occurrence. Additionally, our engineering teams will be working over the next several weeks on a broad array of prevention, detection and mitigation systems intended to add additional defense in depth to our existing production safeguards.
We take all outages seriously, but we are particularly concerned with outages which affect multiple zones simultaneously because it is difficult for our customers to mitigate the effect of such outages. This incident report is both longer and more detailed than usual precisely because we consider the April 11th event so important, and we want you to understand why it happened and what we are doing about it. It is our hope that, by being transparent and providing considerable detail, we both help you to build more reliable services, and we demonstrate our ongoing commitment to offering you a reliable Google Cloud platform.
Benjamin Treynor Sloss | VP 24×7 | Google