Over the past few years, the popularity of cloud computing has snowballed, and it is boosting the power of the internet more than ever. And with all its benefits that it provides, the cloud is not only becoming a vital part for companies across the globe, but also an essential tool for the future existence of the internet.
However, in 2018, the world saw the cloud storage’s dark side. Some unexpected cloud outages have hit even the most prominent cloud providers, causing embarrassment all around.
Major Cloud Outages Of 2018
Many products and services nowadays depend on platforms such as AWS and Google Cloud and if their cloud goes down or runs into a problem, it not only affects the companies but also its customer base. And unfortunately, in July 2018, Google Cloud witnessed an outage that caused headaches for some of the popular platforms like Snapchat, Spotify, and Pokémon Go. According to Google’s status page, the search giant has stated that the outage started at 12:17 on 17 July 2018 and ended at 12:55 on the same day (times are US/Pacific).
Discord, a free VoIP application and digital distribution platform for video gaming communities, said the issue was completely due to Google outage, however, other platforms like Snapchat, Spotify, and Pokémon Go didn’t mention Google.
We've identified the cause of our connectivity issues to @GCPcloud which is also impacting other popular apps such as SnapChat. No eta on the recovery time, but awaiting updates on https://t.co/2zsKQSmzKF
— Discord (@discordapp) July 17, 2018
Trainers, we're aware of a technical issue causing an outage. Stay tuned for more information, and thanks for your patience as we investigate.
— Pokémon GO (@PokemonGoApp) July 17, 2018
Something’s not quite right, and we’re looking into it. Thanks for your reports!
— SpotifyCares (@SpotifyCares) July 17, 2018
Even though the issue lasted for almost an hour, Google took quick actions and fixed the problem. “The issue with Google App Engine should be resolved for some users and we expect a full resolution in the near future,” said Google in an update.
Amazon Web Services (AWS)
On March 2018, Amazon Web Services (AWS) was hit by a cloud outage that silenced Amazon’s Alexa and affected hundreds of enterprise services including Atlassian, Slack, and Twilio. The outage happened in the important region of data centres in Virginia when the Direct Connect dedicated links from AWS North Virginia region to other server warehouses and premises on the East Coast got disabled.
The outage wasn’t the only thing Amazon was dealing with, Twitter was also going crazy with some of the funniest tweets in response to the AWS outage.
We can’t publish our story about AWS being down because, well, AWS is down pic.twitter.com/cwUWEkLBuM
— Mashable (@mashable) February 28, 2017
wow this amazon outage is really taking a toll pic.twitter.com/7efX84789P
— JARRY (@jarry) February 28, 2017
That is not all, in May 2018, Amazon was hit by another outage — it witnessed a critical connectivity issue due to some hardware failure in a data centre in North Virginia. And AWS’ EC2, Relational Database Service, Workspaces, and Redshift were all impacted by the outage.
The same day, Amazon in an update said, “customers with EC2 instances in the availability zone may see issues with connectivity to the affected instances.”
However, the company took the necessary measures to deal with the issue and restored power to the vast majority of the affected instances.
Slack was running fine after its May outage that lasted for around 20 minutes — it had a 100% uptime throughout the month of June 2018, however, at the end of the month, the workplace messaging platform suffered a Global outage and the according to the company, the reason was a connectivity issue.
“We’ve received word that all workspaces are having troubles connecting to Slack. We’re currently investigating the issue, and will have updates shortly,” Slack confirmed on its website.
The investigation continues for our connectivity issues, and we're working hard to get things back to normal. https://t.co/uQIDJzyLSV
— Slack (@SlackHQ) June 27, 2018
With a massive customer base of organisations and people, Slack is a renowned platform and even a small outage can cause great loss to all its customers. So, when Slack was down, people lost their minds and to respond to the global outage, they took it to Twitter.
How am I supposed to tell my team that Slack is down when I don't have Slack to tell them at Slack is down?
— Justin Karp (@jskarp) June 27, 2018
Slack is down so I can’t share memes with my friends so have resorted to printing them out and leaving them on their desks. pic.twitter.com/xhg92anReH
— Dave Jewitt (@IrregularDave) June 27, 2018
After all the disruption and Tweets that Slack was dealing with, it resolved the issue and tweeted that all the services had been restored.
Folks should be able to connect to Slack again. We're sorry for the disruption. https://t.co/uQIDJzyLSV
— Slack (@SlackHQ) June 27, 2018
Announced in 2008, Microsoft Azure is a popular cloud computing service and over the years, it has gained tremendous popularity. In June 2018, Microsoft Azure suffered a critical outage overnight and it affected the platform’s storage and networking services. The outage affected the Northern Europe region and the reason behind it was an underlying temperature issue in one of the data centres in the region.
According to the company, the outage started at 5.45 PM and lasted till 4.30 AM. However, it seemed that many customers faced issues for a long-time despite Azure Support claiming that engineers had “mitigated the issue and impacted services should be recovered at this time”.
We need an update and solution now. We have more that +40 customers that cannot use their services.
Please provide us with some information when this is solved.
— Peter Jensbøl (@PLJ4330) June 19, 2018
@azuresupport waiting 24 hours for SQL backups to restore from long term retention. Recent attempts for older backups restored in minutes. Have previous attempts failed or do we just keep waiting? #azTechHelp
— SpinnakerSoft (@SpinnakerSoft) June 20, 2018
Witnessing all the complaints, Microsoft quickly did the needful and it was back in normal. But, just after two months, Microsoft suffered another outage caused by a severe lightning storm in the San Antonio. Azure’s South-Central US data centre region was down for quite a while. Customers across the world using Active Directory and Visual Studio Team Services faced trouble for more than 24 hours.
Whether it is due to a human error or natural disaster, when it comes to cloud, expect outage. So, why not be prepared for the downtime. And when talking about being prepared, it starts with knowing all the pieces of your organisation’s technical chain which includes network, servers, load balancers, applications, DB, other third-party vendors etc. Knowing the chain will not only help you prepare a back-up but also help you prepare it fast.