Cloudflare suffered a major global outage on Tuesday, taking down nearly 20% of the Internet and disrupting major platforms including X (Twitter), ChatGPT, Canva, and several popular services. The outage, which lasted close to five hours, left users worldwide facing HTTP 500 errors – a common indicator of internal server failure.

Today, Cloudflare Co-Founder and CEO Matthew Prince published a detailed post-mortem explaining what caused the widespread disruption. Importantly, he confirmed that the issue was not a cyberattack but an internal configuration failure.
“An outage like today is unacceptable. We’ve architected our systems to be highly resilient to failure to ensure traffic will always continue to flow. When we’ve had outages in the past, it’s always led to us building new, more resilient systems,” said Prince.
Not a DDoS Attack – It Was an Internal System Error
Prince described the event as “Cloudflare’s worst outage since 2019”, apologizing to customers and Internet users for the disruption.
According to him, the outage was triggered by a permissions change in one of Cloudflare’s database systems. This led to the generation of a faulty feature file used by its Bot Management system, and this file unexpectedly doubled in size, exceeding software limits.
Once the oversized file was created, it propagated across Cloudflare’s global network and caused proxy software to fail, resulting in widespread HTTP 5xx errors.
Bad File Originated from ClickHouse Cluster
The problematic feature file came from a query running on a ClickHouse database cluster.
Every five minutes, the file was regenerated. But because some ClickHouse nodes had newer updates while others did not, a “bad” file was occasionally produced and distributed. This created a cycle of failures and partial recoveries, making diagnosis much more difficult.
Initially, Cloudflare engineers suspected a massive DDoS attack due to the sudden wave-like pattern of failures. But further investigation revealed the misconfigured file propagation to be the real culprit.
Once identified, Cloudflare:
- Stopped the distribution of the corrupted file
- Rolled back to a stable version
- Restarted core proxy services
- Traffic stabilized by 14:30 UTC (8:00 PM IST), with full recovery achieved by 17:06 UTC (10:36 PM IST).
Services Affected
Several major Cloudflare systems were impacted, including:
- CDN & Security Services: Higher-than-normal HTTP 5xx errors
- Turnstile Bot Challenge: Completely failed to load
- Workers KV: Elevated error rates due to gateway failures
- Dashboard: Partially functional, but many users couldn’t log in
- Email Security: Reduced spam detection accuracy temporarily
Despite the severity, Cloudflare confirmed that internal services are now stable, and preventive measures are being implemented to avoid such recurrence.
