Atomic Data's Response to CrowdStrike Windows Outage
What Happened?
On Friday at 04:09 UTC, third-party security vendor CrowdStrike pushed a global software update which underpins several of their products. This update caused Windows Workstations and Servers which were online at the time to reboot, with many endpoints then encountering the ‘Blue Screen of Death’, or BSOD. The BSOD forced the endpoints offline and made them inoperable.
At 05:27 UTC, CrowdStrike rolled-back the prior update and issued a known-stable version of their software, which allowed most impacted endpoints to proceed past the BSOD and return to their online state. However, some servers and workstations have not successfully resumed operation and remain in a ‘reboot loop’.
Atomic Data’s 24×7 Reaction
Atomic Data engineers have successfully restored operations for the majority of client endpoints impacted by the CrowdStrike event. A small percentage of these endpoints remain offline due to access, network, or other factors, and engineers are working tirelessly to restore their operation.
The following is a brief recap of Atomic Data’s reaction so far to this global event:
At 11:45 PM CDT, July 18, Atomic Data’s 24×7 NSOC began receiving reports and alerts from Atomic Monitoring® indicating a large number of client and internal endpoints had gone offline. At 11:49 PM a ‘War Room’ process was initiated and engineers from across Atomic Data quickly came together and immediately began reviewing alerts, logs, hypervisors, and third party data sources. At 1:26 AM on Friday it was determined that this was a global outage caused by a defect in CrowdStrike’s latest channel file update. The defect caused an incompatibility with certain Windows operating systems and led online servers and workstations to reboot, then persist in the BSOD mode.
At 3:12 AM, Atomic Data engineers validated and began deploying a fix to purge the defective CrowdStrike file from impacted endpoints. For the next several hours, engineers worked carefully to bring as many endpoints back online as possible.
As of 8:39 AM, only a small percentage remain offline. Some servers, such as those in Azure or those inaccessible due to network configurations, require additional manual steps to revert the update and restore functionality. Engineers will not close the War Room until all servers are brought back online.
Additional Resources:
CrowdStrike Statement: https://www.crowdstrike.com/blog/statement-on-windows-sensor-update/
Microsoft Azure Status: https://azure.status.microsoft/en-us/status