CrowdStrike has issued a detailed post-incident review (PIR) explaining the cause of a recent configuration update that affected 8.5 million Windows machines. According to CrowdStrike, the issue stemmed from a bug in test software that failed to properly validate a content update, which was distributed widely last Friday. In response, the company plans to enhance its content testing procedures, improve error handling, and implement a staggered deployment strategy to mitigate the risk of similar issues in the future.
The problem arose with CrowdStrike’s Falcon software, which is widely used to protect against malware and security breaches. Last week, the company released a content configuration update intended to “gather telemetry on possible novel threat techniques.” These updates are a routine part of the software’s operation. However, the specific configuration update issued on Friday caused Windows systems to crash.
CrowdStrike delivers updates in two main forms: Sensor Content, which updates the Falcon sensor running at the kernel level, and Rapid Response Content, which adjusts the sensor’s behavior to better detect malware. The recent issue was triggered by a 40KB Rapid Response Content file.
CrowdStrike typically ensures that its Sensor Content updates undergo rigorous validation, including AI and machine learning model updates to enhance detection capabilities. These updates often involve what the company calls Template Types, which enable new detection functionalities. However, last week’s problematic content update, categorized as a Template Instance, passed through CrowdStrike’s validation system due to a bug in the Content Validator.
The error led to the Falcon sensor misinterpreting the Rapid Response Content, resulting in an out-of-bounds memory exception that caused a Windows operating system crash, commonly known as a Blue Screen of Death (BSOD).
To prevent such incidents in the future, CrowdStrike is enhancing its testing procedures for Rapid Response Content. This includes implementing local developer testing, stress testing, fuzzing, and fault injection. The company is also updating its cloud-based Content Validator to better catch problematic content before it is released.
Additionally, CrowdStrike plans to refine its error handling within the Falcon sensor’s Content Interpreter. The company will also roll out a staggered deployment process for Rapid Response Content, gradually releasing updates to larger sections of its user base to prevent widespread issues. This staggered approach has been recommended by security experts and aims to improve the reliability of future updates.
CrowdStrike’s commitment to improving its systems and processes comes as a reassurance to its customer base, highlighting the importance of robust validation and gradual deployment strategies in software updates.