November 18, 2025: Cloudflare goes down at 7am ET. X, ChatGPT, Spotify, Zoom, and thousands of other sites become unreachable. 20% of the internet stops working.

Four hours later, Cloudflare comes back up.

Then GitHub goes down. Git operations fail globally. Developers can't push code. CI/CD pipelines break. Deployments stop.

Two critical internet infrastructure providers failing on the same day. Millions of users affected. Billions in productivity lost.

This isn't a coincidence. It's the inevitable result of an internet built on a handful of single points of failure.

What Happened: Cloudflare

The Outage:

Started: ~11:30 GMT / 7:00 AM ET
Resolved: 14:30 UTC (~4 hour duration)
Cause: Auto-generated configuration file grew beyond expected size, crashed threat management system
Impact: Sites returning 500 errors, timeouts, Cloudflare error pages

Sites Affected:

X (Twitter)
ChatGPT
Claude
Spotify
Zoom
Canva
Amazon (some services)
Thousands more

Even Downdetector - the site people visit to check if other sites are down - went down.

Root Cause:

Cloudflare CTO Dane Knecht explained the failure:

"The root cause of the outage was a configuration file that is automatically generated to manage threat traffic. The file grew beyond an expected size of entries and triggered a crash in the software system that handles traffic for a number of Cloudflare's services. The Cloudflare team was able to diagnose the issue and revert to a previous version of the file which restored services as of 14:30 UTC. There is no evidence of an attack or malicious activity causing the issue."

Translation: An auto-generated config file for threat management grew too large. The software couldn't handle it. It crashed. That crash cascaded through Cloudflare's network. Everything broke.

Not an attack. Not malicious. Just a config file that grew beyond expected size and took down 20% of the internet.

What Happened: GitHub

Hours later, same day:

GitHub Status: "Git Operations is experiencing degraded availability."

Users seeing:

fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Impact:

Can't clone repos
Can't push code
Can't pull updates
CI/CD pipelines blocked
Deployments stopped

Multiple accounts, multiple orgs, multiple repos. Global impact.

Two critical infrastructure providers. Same day. Unrelated failures.

The Single Point of Failure

Cloudflare powers 20% of all websites

Think about that. One company. One fifth of the internet.

When Cloudflare crashes, 20% of the web becomes unreachable. Not because those sites crashed. Because the infrastructure protecting and routing to those sites crashed.

GitHub hosts the majority of open source and enterprise code

Most companies use GitHub for source control. When GitHub goes down:

Developers can't work
Deployments stop
CI/CD breaks
Open source grinds to a halt

The consolidation problem:

The internet runs on a handful of companies:

Cloudflare (CDN, DDoS protection, DNS)
AWS (cloud infrastructure)
Microsoft Azure (cloud infrastructure)
Google Cloud (cloud infrastructure)
GitHub (source control)
Fastly (CDN)

When any of these have problems, significant chunks of the internet break.

Why This Keeps Happening

Within one month alone:

AWS outage (October 20)
Microsoft Azure outage (days after AWS)
Cloudflare outage (November 18)
GitHub outage (November 18, same day as Cloudflare)

Four major infrastructure providers. Four outages. One month.

Professor David Choffnes (Northeastern University):

"We now have AWS, Azure and Cloudflare outages in the span of a month. That's a very large portion of the biggest cloud providers in the world. It has not been the case that we have seen major outages like this in a short period of time."

This isn't normal. But it's becoming normalized.

Why Companies Rely on These Services

Cost

Building your own CDN: Millions in infrastructure, operations, staff.

Using Cloudflare: Free tier or $20-200/month.

Building your own Git infrastructure: Servers, backups, reliability engineering.

Using GitHub: $0-$21/user/month.

The economics are obvious.

Expertise

Cloudflare has 330 data centers globally. 13,000 networks directly connected.

Most companies can't build that. Even if they could, it would cost more than using Cloudflare.

DDoS protection

Cloudflare's primary value: protecting sites from DDoS attacks.

DDoS attacks can cost millions in downtime. Cloudflare prevents them for $20/month.

But when Cloudflare goes down, sites become unreachable anyway. The irony is thick.

Network effects

GitHub has the code. Developers use GitHub. Companies hire developers who know GitHub. Everyone uses GitHub.

Switching to GitLab, Bitbucket, or self-hosted Git means retraining, migration cost, and losing ecosystem integrations.

Lock-in is real.

The Latent Bug Problem

Cloudflare's outage was caused by a "latent bug" - a bug that existed in production but wasn't detected until a specific condition triggered it.

How this happens:

Code has bug
Bug doesn't manifest under normal conditions
Bug passes testing
Bug ships to production
Months/years pass
Configuration change or traffic pattern triggers bug
Service crashes
Cascading failure takes down everything

Why testing doesn't catch it:

Testing simulates normal conditions. Latent bugs manifest under abnormal conditions - edge cases, specific configurations, unusual traffic patterns.

You can't test for every possible scenario. Some bugs hide until production triggers them.

The cascade problem:

Bug in bot mitigation service crashed that service. That service is critical to other services. Those services crashed. Those services were critical to more services. Cascade.

Modern distributed systems have interdependencies. One failure propagates everywhere.

The Apology

Cloudflare CTO Dane Knecht posted on LinkedIn:

"I won't mince words: earlier today we failed our customers and the broader Internet when a problem in Cloudflare's network impacted large amounts of traffic that rely on us. The sites, businesses, and organizations that rely on Cloudflare depend on us being available and I apologize for the impact that we caused... That issue, impact it caused, and time to resolution is unacceptable. Work is already underway to make sure it does not happen again, but I know it caused real pain today. The trust our customers place in us is what we value the most and we are going to do what it takes to earn that back."

Cloudflare's formal statement:

"We apologise to our customers and the Internet in general for letting you down today. Given the importance of Cloudflare's services, any outage is unacceptable."

Cloudflare's importance makes outages unacceptable. But outages will happen anyway. The apology is sincere. It also doesn't prevent the next outage.

Why This Won't Get Fixed

No alternative exists

You can't avoid Cloudflare by using... what? Build your own global CDN?

For most companies, that's not realistic.

Diversification is expensive

Using multiple CDN providers means:

Double the cost
Complex failover logic
More things to manage
Still vulnerable if primary provider fails and failover is slow

The market rewards consolidation

Cloudflare wins because they're the biggest, cheapest, most feature-rich option.

Smaller competitors can't match their scale or price.

Market concentration increases. Single point of failure risk increases.

Regulatory inaction

Governments could regulate critical internet infrastructure. Require redundancy, disaster recovery, testing standards.

They don't. Cloudflare is a private company. Regulators don't care until something catastrophic happens.

Cost of downtime vs cost of prevention

Cloudflare's 4-hour outage cost the internet billions.

But Cloudflare's cost was minimal:

No SLA violations for most customers (free tier has no SLA)
Stock down 3%, recovered quickly
No regulatory penalty
No lawsuits with teeth

The GitHub Timing

GitHub going down the same day as Cloudflare is probably coincidence.

But it illustrates the problem: we have no redundancy.

When GitHub is down, there's no "backup GitHub." You just wait.

When Cloudflare is down, there's no failover. Sites just show errors.

Why no backup:

Maintaining GitHub failover means:

Mirroring all repos to another provider
Keeping mirrors in sync
Switching workflows when GitHub is down
Training developers on two systems

Cost and complexity aren't worth it for most companies. So they accept the risk.

The AWS/Azure/Cloudflare Trifecta

In one month:

AWS outage knocked out 1,000+ sites
Azure outage followed days later
Cloudflare outage took down 20% of the internet

Professor Timothy Edgar (Brown University):

"This is another alarming example of how dependent we have become on critical internet infrastructure, and how little the government is doing to hold big companies accountable."

The consolidation timeline:

2010: Many CDN providers, diverse cloud infrastructure 2015: Market consolidating around AWS, Cloudflare, Azure 2020: Three companies dominate cloud infrastructure 2025: Internet runs on a handful of companies; outages affect billions

Consolidation was economically rational for companies. Disastrous for internet resilience.

The Irony

Cloudflare's primary product: DDoS protection. Keeping sites online during attacks.

Cloudflare's outage: Made sites unreachable. Same result as DDoS.

The tool meant to prevent downtime caused downtime.

From Alp Toker (NetBlocks):

"What's striking is how much of the internet has had to hide behind Cloudflare infrastructure to avoid denial of service attacks in recent years. [It] has become one of the internet's largest single points of failure."

We built internet infrastructure to prevent attacks from taking down sites.

Instead we created infrastructure that can take down sites without any attack.

The Truth

The internet is fragile because it's consolidated.

20% of websites depend on one company. Most code is hosted by one company. Most cloud infrastructure is split between three companies.

When any of these fail, significant chunks of the internet stop working.

This happens because:

Consolidation is economically efficient
Building alternatives is expensive
Network effects create lock-in
Regulation doesn't require resilience

November 18, 2025 had two major outages. Same day. Unrelated causes. Both critical infrastructure.

This will happen again. More frequently. Because the internet's foundation is built on single points of failure, and nobody with the power to fix it has an incentive to do so.

What we're told: "Incidents are unacceptable. We're working to prevent them."

What will happen: More incidents. More apologies. No structural change.

The internet runs on a handful of companies. When they fail, the internet fails.

We've optimized for cost and convenience. We've sacrificed resilience.

November 18 was a reminder of that trade-off.

The next reminder is coming. We just don't know when.

Timeline:

7:00 AM ET / 11:30 GMT: Cloudflare outage begins
10:30 AM ET / 14:30 UTC: Cloudflare services restored
3:39 PM ET: GitHub Git operations begin failing
Same day: Two critical infrastructure providers down, millions affected

The internet's fragility has never been clearer.

When Cloudflare and GitHub Go Down on the Same Day: The Internet's Fragile Foundation

What Happened: Cloudflare

What Happened: GitHub

The Single Point of Failure

Why This Keeps Happening

Why Companies Rely on These Services

The Latent Bug Problem

The Apology

Why This Won't Get Fixed

The GitHub Timing

The AWS/Azure/Cloudflare Trifecta

The Irony

The Truth

Comments

More from this blog

Dependency Confusion Attacks: How Package Names Steal Your Code

Critical Vulnerability in React Server Components (CVE-2025-55182)

Cloud Costs Are Destroying Startup Margins

AI Code Review Tools Are Making Code Worse

Command Palette

What Happened: Cloudflare

What Happened: GitHub

The Single Point of Failure

Why This Keeps Happening

Why Companies Rely on These Services

The Latent Bug Problem

The Apology

Why This Won't Get Fixed

The GitHub Timing

The AWS/Azure/Cloudflare Trifecta

The Irony

The Truth

Comments

More from this blog