Powered by MOMENTUM MEDIA
cyber daily logo
Breaking news and updates daily. Subscribe to our Newsletter

The industry speaks: Everyone’s talking about the CrowdStrike outage

Industry experts and observers had a lot to say about last week’s CrowdStrike incident, but the long story made short is it was not a good look.

user icon David Hollingworth
Mon, 22 Jul 2024
The industry speaks: Everyone’s talking about the CrowdStrike outage
expand image

As Australian companies and many more around the world continue to pick up the pieces following a catastrophic CrowdStrike update that crashed 8.5 million Windows PCs last week, it may not surprise you to learn that a lot of folks in the industry had some strong opinions.

Cyber Daily was inundated with commentary as the incident was still unfolding on Friday evening, 19 July.

Below, we’ve collected what the experts were saying as the world began to learn just exactly what had happened, in the order we received them.

============
============


Professor of Practice Nigel Phair
Department of software systems and cyber security, faculty of information technology at Monash University

A major outage has occurred, affecting a number of Australian and global organisations; it appears not to be malicious in nature [but] rather an error stemming from a network outage. The type of outage is unknown at this stage, but it highlights the dependencies organisations have on the internet and related online technologies.

It is looking like the outage is focused on the Microsoft operating system linked to the global cyber security company CrowdStrike.

Organisations need to take an “all hazards” approach to the availability of their IT networks and take appropriate risk management practises to ensure they can be resilient against any future incidents.


Omer Grossman
CIO at CyberArk

The current event appears – even in July – that it will be one of the most significant cyber issues of 2024. The damage to business processes at the global level is dramatic. The glitch is due to a software update of CrowdStrike’s EDR product. This is a product that runs with high privileges that protects endpoints. A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash.

There are two main issues on the agenda: the first is how customers get back online and regain continuity of business processes. It turns out that because the endpoints have crashed – the blue screen of death – they cannot be updated remotely, and this problem must be solved manually, endpoint by endpoint. This is expected to be a process that will take days.

The second is around what caused the malfunction? The range of possibilities ranges from human error – for instance, a developer who downloaded an update without sufficient quality control – to the complex and intriguing scenario of a deep cyber attack, prepared ahead of time and involving an attacker activating a “doomsday command” or “kill switch”. CrowdStrike’s analysis and updates in the coming days will be of the utmost interest.


Satnam Narang
Senior staff research engineer at Tenable

The outage affecting computer systems worldwide is severe. It is affecting critical systems, such as those in hospitals, airports, financial institutions and more. For instance, patients aren’t able to get medications in the hospital setting.

It’s impacted me personally as I have a loved one who is currently in the hospital setting. While the issue is associated with Windows systems, it does not appear to be an issue with Microsoft Windows, but rather, security software installed on millions of Windows computers worldwide. Because this is a security software, it requires a higher level of privileges to the underlying operating system, so a bad or faulty security update can result in a catastrophic impact.

This event is unprecedented, and the ramifications of it are still developing.


Dmytro Tereshchenko
Head of information security department at Sigma Software Group

The CrowdStrike failure has significantly impacted many organisations globally. This includes critical sectors such as banking, stock exchanges, airports, and emergency services. Recovery protocols are in place for those affected, though a comprehensive restoration across many entities will likely be a protracted process.

For cyber security professionals, this incident isn’t something new and unexpected. It underscores a known issue within our highly interconnected supply chains. A disruption to any key supplier can indeed have extensive repercussions, affecting a broad spectrum of systems and services.

While this situation is neither unprecedented nor unexpected, the timeline for complete recovery remains uncertain. We clearly understand the problem’s scale, but precise recovery estimates are still forthcoming.


Jake Moore
Global security adviser at ESET

These outages are increasing in volume due to the sheer increase in numbers of online users and traffic. After witnessing the blue screen of death (BSOD), many people are quick to suspect a cyber attack or find similarities to Netflix’s Leave The World Behind, but this can often add to the confusion. It highlights the importance of these services and the millions of people they serve.

Businesses must test their infrastructure and have multiple fail safes in place, however large the company is; this is typically referred to as a cyber resilience plan. But as often is the case, it is simply impossible to simulate the size and magnitude of the issue in a safe environment without testing the actual network.

The inconvenience caused by the loss of access to services for thousands of people serves as a reminder of our dependence on big tech, such as Microsoft, in running our daily lives and businesses. Upgrades and maintenance to systems and networks can unintentionally include small errors, which can have wide-reaching consequences, as experienced today by Crowdstrike’s customers.


Christiaan Beek
Senior director of threat analytics at Rapid7

The global Windows outage highlights the vulnerabilities and interdependencies within our IT infrastructure. A single update caused widespread disruptions, demonstrating the critical need for robust testing while rapidly protecting assets. This incident underscores our heavy reliance on IT systems, with significant impacts on banking, aviation, and government operations globally. The economic damage from such disruptions can be substantial, emphasising the need for resilient and adaptive response strategies. This event is a reminder of the delicate balance in our IT landscape and the importance of safeguarding against large-scale issues.


Maxine Holt
Senior director, cyber security at Omdia

The global IT outage crisis is escalating, and organisations everywhere are in full scramble mode, desperately implementing workarounds to keep their businesses afloat. Microsoft has pointed fingers at a third-party software update, while CrowdStrike admits to a “defect found in a single content update for Windows hosts” and is working feverishly with affected customers. Omdia analysts connect the dots: this isn’t a cyber attack, but it’s unquestionably a cyber security disaster.

Cyber security’s role is to protect and ensure uninterrupted business operations. Today, on 19 July 2024, many organisations are failing to operate, proving that even non-malicious cyber security failures can bring businesses to their knees. The workaround, involving booting into safe mode, is a nightmare for cloud customers. Cloud-dependent businesses are facing severe disruptions.

Omdia’s cloud and data centre analysts have long warned about over-reliance on cloud services. Today’s outages will make enterprises rethink moving mission-critical applications off-premises. The ripple effect is massive, hitting CrowdStrike, Microsoft, AWS, Azure, Google, and beyond. CrowdStrike’s shares have plummeted by more than 20 per cent in unofficial pre-market trading in the US, translating to a staggering US$16 billion loss in value.

Looking forward, there’s a shift towards consolidating security tools into integrated platforms. However, as one CISO starkly put it, “Consolidating with fewer vendors means that any issue has a huge operational impact. Businesses must demand rigorous testing and transparency from their vendors.”

CrowdStrike’s testing procedures will undoubtedly be scrutinised in the aftermath. For now, the outages continue to rise, and the tech world watches as the fallout unfolds.


Kevin Reed
Chief information security officer at Acronis

The recent CrowdStrike outage appears to stem from a bug in their EDR agent, which was unfortunately not thoroughly tested. This resulted in widespread disruption as many installations were affected globally. The flawed update necessitates manual intervention to resolve, specifically rebooting systems in “safe mode” and deleting the faulty driver file. This process is cumbersome and leaves systems vulnerable in the interim, potentially inviting opportunistic attacks.

This incident highlights the importance of rigorous testing and staged updates for EDR agents. Normally, testing is done with every release and can take days to weeks, depending on the size of the update or changes. The ease with which their driver files can be deleted also raises questions about the self-protection mechanisms of CrowdStrike’s software.

For our Acronis customers, those with recent backups can restore their systems to a stable state, minimising downtime and exposure. Moving forward, we recommend all businesses ensure robust backup solutions and advocate for better testing protocols from their security vendors.

This issue reminds us how fragile IT infrastructure is and why cyber security should be natively integrated with backup. An integrated solution is the only way to provide complete protection that would enable fast rollback to the working state.


Xavier Sheikrojan
Senior risk intelligence manager at Signifyd

In today’s fully digitised world, an IT outage like the one happening right now has a dire effect on businesses and consumers. With payment service providers (PSPs) responsible for servicing thousands of merchants, each of which having thousands of transactions happening every second and all of which having to be reviewed, validated and actioned appropriately, merchants can experience a backlog of transactions, opening themselves up to the risk of fraud and other vulnerabilities.

Firewalls and other detection tools are likely to be down, giving cyber criminals an opportunity to gain access to exploit vulnerabilities, potentially resulting in new data breaches. There is a higher chance that suspicious activity and transactions that are normally automatically monitored will now slip through as fraud and security teams deal with a backlog of reviews that need to be conducted much more quickly.

Outages like these remind us that the payment ecosystem requires a holistic approach to transaction verification. This includes having contingency plans for these types of events, with enhanced security protocols, such as multifactor authentication (MFA), to deflect unauthorised access. Businesses also need to consider post-outage audits to identify anomalies and suspicious activity that might have occurred during the outage.


Matt Fedele-Sirotich
Chief technology officer of CSO Group and Cyber Wardens

It is crucial that businesses operate with heightened awareness after major outages or global events as attackers capitalise on our eagerness to resolve the issue or be better informed. We all need to slow down and think before we act as this will enable us to collectively better protect our customers.

While this incident was not a deliberate cyber attack, it underscores the importance of businesses taking proactive measures to mitigate the risk of such threats.

Unfortunately, it is often user error and lack of basic digital knowledge that opens the door to cyber threats, highlighting the need for ongoing education and awareness programs to strengthen cyber security resilience.

Shane Maher
Managing director of Intelliworx

This shows why disaster preparedness is so important. And it’s not just about security, it’s more about disaster recovery and handling the situation. There are so many people affected by this outage. It’s not just a technical problem, it’s a business problem. Businesses should have a plan for these kinds of situations because they can happen anytime. And they should communicate clearly and honestly with their customers and stakeholders when they do.

David Hollingworth

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

newsletter
cyber daily subscribe
Be the first to hear the latest developments in the cyber industry.