Ory Homepage

Post-Mortem: Ory Network Outage caused by a massive DDOS attack

A post-mortem for the Ory Network Outage caused by a massive DDOS attack on 30/10/2023.

Picture of Piotr Mścichowski
Piotr Mścichowski

Engineering

Oct 31, 2023

Incident Report: DDOS Attack

Dear Customers and Users,

We’d like to share a post-mortem about the event on October 30th. We are sincerely sorry for the inconvenience caused by the incident and share this write-up to describe actions we have taken to prevent such an incident in future.

Incident Summary

Incident 1: 30.10.2023 13:43 CET to 30.10.2023 14:42 CET

Incident 2: 30.10.2023 16:03 CET to 30.10.2023 17:19 CET

Services affected:

  • Login & Identities APIs
  • OAuth2 & OIDC APIs
  • Account Experience UI

At 13:43 CET on October 30, 2023, our SRE team was alerted to multiple DDOS attacks. Although our security solution's automated response blocked a limited number of malicious requests, the overwhelming volume of requests led to increased service response times between 13:47 CET and 16:03 CET. Following an analysis of the DDOS attack, the SRE team modified the DDOS attack protection settings and created WAF rules to counter these attacks. In addition to the changes in the DDOS protection setup, adjustments were made to the auto-scaling configuration and node types. By 17:19 CET, Ory Network was fully operational again.

Root Cause Analysis

The existing DDOS prevention and rate-limiting system configuration were inadequate in reliably detecting the DDOS requests. Initially, this caused a rise in service latency and subsequently overloaded the services. Auto-scaling responded by increasing our service instances, resulting in a brief service recovery. Nevertheless, the overwhelming volume of requests led to latency surges once more, ultimately congesting all services.

Resolution and Recovery

Our Site Reliability Engineering (SRE) team studied the attack method and traffic, refining both the infrastructure configuration and the WAF/DDOS configurations to enhance bandwidth and sensitivity. Customized WAF rules were created to obstruct the DDOS attack by targeting its key attack characteristics.

Preventive Measures

  • Redefined infrastructure size and set up to provide more bandwidth for services (implemented)
  • Redefined infrastructure auto-scaling mechanism (implemented)
  • Additional traffic blocking rules (implemented)