top of page
The Facebook outage and network configuration

Cyber Attacks & Incident Response

The Facebook outage and network configuration

Prof. Avishai Wool

Prof. Avishai Wool

Short bio about author here Lorem ipsum dolor sit amet consectetur. Vitae donec tincidunt elementum quam laoreet duis sit enim. Duis mattis velit sit leo diam.

Tags

Share this article

10/6/21

Published

Avishai Wool, CTO at AlgoSec, analyses the recent Facebook outage and the risks all organizations face in network configuration

Social media giant Facebook was involved in a network outage on the 4th October 2021 that lasted for nearly six hours and took its sister platforms Instagram and WhatsApp offline.


As the story developed, it became apparent that the incident was caused by a configuration issue within Facebook’s BGP (Border Gateway Protocol), one of the systems that the internet uses to get your traffic where it needs to go as quickly as possible. The outage also cut off the company’s internal communications, along with authentication to third-party services including Google and Zoom. Some reports suggested security passes went offline, which stopped engineers from entering the building to physically reset the data center.


The impact was felt worldwide, with Downdetector recording more than 10 million problem reports, the largest number for one single incident. Facebook released an official statement following the outage stating: “Our engineering teams learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.”


While Facebook has assured its users that no data has been lost in this process, the outage is a stark reminder of how small configuration errors can have huge, far-reaching consequences.


The fundamentals of application availability

At the fundamental level, Facebook suffered from a lack of application availability. When a change was actioned, it caused a major chain reaction that ultimately wiped Facebook and its related services from the internet because they couldn’t see the entire lifecycle of that change and the impact it would have.


To avoid an incident like this in the future, organizations should consider a few simple steps:


  • Back up configuration files to allow for rollbacks should an issue arise

  • Use a test system alongside live processes to run scenarios without causing any disruptions

  • Retain low-tech alternatives to guarantee access to the network if the primary route fails


The outages across Facebook’s infrastructure highlight the operational risks all organizations face around faulty configuration changes which can drastically impact application availability. Intelligent automation, thorough change management and proactive checks are key to avoid these outages.

Related Articles

A secure VPC as the main pillar of cloud security

A secure VPC as the main pillar of cloud security

Mar 19, 2023 · 2 min read

Unveiling the Cloud's Hidden Risks: How to Gain Control of Your Cloud Environment 

Unveiling the Cloud's Hidden Risks: How to Gain Control of Your Cloud Environment 

Mar 19, 2023 · 2 min read

Unleash the Power of Application-Level Visibility: Your Secret Weapon for Conquering Cloud Chaos

Unleash the Power of Application-Level Visibility: Your Secret Weapon for Conquering Cloud Chaos

Cloud Security

Mar 19, 2023 · 2 min read

Speak to one of our experts

bottom of page