2024-08-08
CrowdStrike Publishes External Technical Root Cause Analysis
CrowdStrike has published a technical root cause analysis of the July 19 incident that disrupted travel and commerce around the world was due to an out-of-bounds read issue 'beyond the end of the input data array and resulted in a system crash.' In a separate story, CrowdStrike has also responded to claims that the Falcon sensor issue could be exploited to achieve privilege elevation or remote code execution.
Editor's Note
One of the critical claims, that the issue is not exploitable, has been disputed. In the end, I think this comes down to a public proof showing how the outlines exploit technique works (or doesn't work) against an unpatched CrowdStrike instance.
Johannes Ullrich
The root cause analysis shows a long list of mitigations CrowdStrike has put in place. The issues are mostly the usual causes of software errors - new features were tested more to make sure they worked than to make sure they couldn't cause bad things to happen. The two major mitigations (runtime bounds checking and increased test coverage) illustrate this and are what we expect security companies to routinely include in their highly privileged host-based software - especially for software that (as CrowdStrike puts at the top of the Root Cause Analysis) uses powerful on-sensor AI and machine learning models to protect customer systems by identifying and remediating the latest advanced threats. These models are kept up-to-date and strengthened with learnings from the latest threat telemetry from the sensor and human intelligence from Falcon Adversary OverWatch, Falcon Complete and CrowdStrike threat detection engineers. Complex security software requiring frequent update requires high levels of runtime protection and extensive pre-release testing of updates.
John Pescatore
The root cause analysis reads like an audit report providing insight as to why the functionality introduced in Channel File 291 back in March wasn't a problem until July. The short version is IPC for detecting malicious actions and had 21 parameters and the file only contained 20, until the interpreter tried to use the missing 21st Ð which was missed in early testing and validation. As with an audit, the issues have been addressed. If you're worried about the risks of a kernel-level plugin, CrowdStrike also published analysis of the Falcon sensor and its limitations/mitigations as a service with that level of access. This would be a good time for OS providers to evaluate the viability of reducing or eliminating kernel level access for third party services.
Lee Neely
CrowdStrike has been extremely forthcoming in acknowledging and subsequently releasing technical details of the flaw in their application development and update process. Generally, this type of software bug (memory safety) would be caught during QA testing but was somehow missed. Publishing root cause analysis and hiring not one but two outside security review teams are each calculated steps by CrowdStrike at damage control. It appears to be working.
Curtis Dukes
For most of us, automatic updates are the low risk option. For large enterprises running mission critical applications, not so much. Changes to mission critical applications should be more measured, cautious, and reversible.
William Hugh Murray
Read more in
CrowdStrike: External Technical Root Cause Analysis Channel File 291 (PDF)
CrowdStrike: Tech Analysis: Addressing Claims About Falcon Sensor Vulnerability
The Register: CrowdStrike hires outside security outfits to review troubled Falcon code
SC Magazine: Massive CrowdStrike outage caused by an out-of-bounds memory error
Security Online: CrowdStrike Identifies Root Cause of Massive Windows Outage
Security Week: CrowdStrike Dismisses Claims of Exploitability in Falcon Sensor Bug