The possibility of having to deal with a situation in which an attacker has compromised a host within your organisation can be uncomfortable to consider. Despite investing substantial efforts in attempting to prevent such a scenario, many organisations find that they have little in the way of concrete plans as to how to react should the unthinkable actually occur. Yet far from being unimaginable, the increasing porosity of networks and interconnectivity of systems has led to a new paradigm of “assume breach” becoming increasingly common. Under this paradigm, it is assumed that host and network compromises are impossible to provide absolute assurance against, and a proportionate amount of security investment and effort should be channelled into preparing for and mitigating the impact should a breach occur.
In this blog post we look at why an “assume breach” mentality has gained ground, what it involves, and some of the more common best practice guidance around how to react should the worst occur.
In the simplest terms, hosts typically operate both a core operating system shell and a set of one or more exposed network services. The core operating system shell can be exposed legitimately in some cases (such as SSH access provision or Remote Desktop Protocol (RDP) services) but in the context of web application servers and similar, it typically remains intended for administrative use by nominated staff within an organisation only, and access tightly restricted.
In the case of host compromise, an unauthorised attacker can gain access beyond the intentionally exposed network services and into a trusted control or administrative context – typically a host operating system or shell.
The exact vulnerabilities that can be exploited to gain unauthorised access to a host are manifold, but in general they can be categorised as involving either the compromise of a planned administrative access mechanism by an unauthorised party, or the introduction of an unplanned access vector. In the former case, examples might include the use of brute force or stolen authentication details to gain unauthorised access via an administrative login service. In the latter case, access may be achieved via some form of Remote Code Execution (RCE) vulnerability in an exposed network service, or via the installation of a Remote Access Trojan (RATs), other malware, or a “backdoor” access mechanism in legitimately installed software.
If a host is compromised, then the exact impact will depend to a large degree on the given context, both of the compromised host, and of the attacker. Different hosts may present different opportunities for attackers, and different attackers may similarly be motivated by different goals or objectives that govern their next move. A host may be directly exploited (for example, wiped of data, or data encrypted in a ransomware attack), or a more advanced attacker may seek to perform an onward “pivot” attack to hosts not accessible via the internet but accessible using the originally compromised host as an onward proxy or jump point. If the host is directly compromised, then it may become clear very quickly. However, an attack interested in ongoing, or pivot access may take opportunities to use stealth and mask their presence. So-called Advanced Persistent Threats (APT) are attackers who will take steps to remain hidden and work towards more long-term and potentially more devastating compromises of entire networks, working silently to establish outbound control channels, deleting logs to cover their tracks in a process known as log scrubbing and potentially even fixing the original vulnerability that they exploited in order to secure the system against attack by other, unrelated attackers.
Nobody wants to consider the possibility of a host being compromised. It definitely isn’t something that anyone wants to experience and the very thought of it can make for uncomfortable consideration. However, whilst a host being compromised certainly has the makings of a difficult day in the office, a host being compromised when there is no plan as to how to react has the makings of a far worse one.
Quite apart from being mutually exclusive, the twin goals of preventing host breaches, but also being prepared for the worst should it happen, can together combine to deliver an effective security programme. As former CIA and NSA director General Michael Hayden said in 2012, “If somebody wants to get in, they’re getting in”.
There is a growing acceptance within cybersecurity that a traditional “prevention-only” focus on security controls is not enough to address determined and persistent adversaries and should not preclude or exclude serious consideration and preparation being taken for what to do should a host become compromised. Rather than being an alternative approach to security, “prevent breach” and “assume breach” each suggest different controls that can be used to collectively provide a more robust security programme:
The “assume breach” paradigm can initially come across as a pessimistic and unwelcome mindset that is counter to the goals of cybersecurity: surely security should be about preventing cyber-attacks from happening, as opposed to operating on the assumption that they will occur?
However, the attack surface of organisations only increases in size and complexity over time, with networks becoming increasingly complex and porous. Supply-chain security often involves complicated inter-organisational and inter-networking connections to other providers, customers, and cloud services, as well as wider usage of third-party software libraries and platforms. In this environment, guaranteeing that no breach can ever occur becomes increasingly difficult to sustain.
As with any endeavour, the need to plan and prepare starts well ahead of an actual occurrence of the event itself. Many of the topics in this blog post cover the need to prepare for a breach or compromise ahead of time, in order to provide the foundation necessary to support effective action should they occur. This article is necessarily only a basic overview, but detailed standards and guidance are available from a number of sources, including ISO/IEC 27001/27002, NIST publication 800-34, and NFPA codes 1600 and 1620.
An “assume breach” mentality, and the associated focus on preparedness for the eventuality of a breach or compromise in no way means that proportionate efforts should not be taken to try and prevent breaches and compromises.
The standard use of security controls involves the installation and operation of both safeguards and countermeasures. Safeguards are the collective use of processes, policies, procedures, applications, items of hardware or configuration items that mitigate against a security risk. Countermeasures are the actions taken to patch a vulnerability or secure a system against an attack and can include altering access controls, reconfiguring security settings as part of hardening efforts, installing new security devices or mechanisms, or adding or removing services.
The recognition in “assume breach” is simply that these efforts can never fully guarantee that a breach or compromise will not occur, and very often are not intended to. Controls will typically perform some variety of risk mitigation but not risk elimination: controls cannot completely eliminate every risk, and it would be prohibitively expensive to do so even if it were possible. Exploits that may lead to breaches can occur even in the most robustly architected and operated security environments due to control risk (failures in operated controls) and control gaps (areas where controls were not implemented).
Control gaps may exist simply because an organisation performed a Trade-Off Analysis (TOA) that determined that the cost/benefit analysis (formal or informal) on whether it is worth implementing argued against the introduction of the control. However, there is also a usability/security trade-off in most operated services. This describes the inversely proportional relationship between usability and security: data access can either be very secure but restrictive or very open yet risky. New services must be secure, but they must also be fit for purpose. To give a somewhat extreme or absurd example, you could arguably make a web application very secure by air-gapping it (removing its network connection entirely), however it is no use making something so secure if that results in a service that’s unusable to its intended audience.
The favoured approach is therefore not to provide guarantees of absolute security, but to deliver appropriate security that is proportionate in cost and effort and implemented via the selection and operation of security controls that are proportionate to the risks facing the services they protect. The aim is to be confident in having observed both due diligence and due care in the selection, delivery, and operation of all elements of the security programme.
This means that the possibility always exists, no matter how negligible, that a breach will occur at some point. Rather than denying the possibility, it is important that all stakeholders, including senior management, are aware up front that this possibility always exists. Only with this understanding in place can due weight be given to efforts to prepare ahead of time for how to react should the worst occur.
Preparation for a breach largely resolves around having a clear Incident Response Plan (IRP) that is known to all parties, is kept up to date, and is modelled or assessed periodically to determine how effective it would be under various scenarios. An effective incident response plan makes clear what actions should be taken and can help to reduce the possibility of delayed or chaotic responses in the event of a breach. An incident response plan will typically cover any resources that are needed, as well as outline communication channels: key amongst these is the role of a central incident coordinator, who can function as a single, authoritative point of contact for tracking the incident response, as well as function as a point of contact to screen queries and interruptions from those tasked directly with responding to the incident.
An incident response plan can only be effective if it is simple to follow and staff are trained and drilled on carrying it out. The ITP covers what to do if controls to prevent unauthorised access fail, specifically the processes of initial triage following detection, and subsequent reaction/response and recovery operations.
A compromised host or network breach is never going to make for an enjoyable day, but equally it doesn’t have to be a terminal event for an organisation. Much will depend on the extent to which the compromise of a single host or exploit of a single vulnerability can be contained from spreading within an organisation’s network. This is typically referred to as the “blast radius” of an attack. Once inside an organisation’s systems, attackers will seek to establish remote command and control, exploit any weaknesses they can find, and attempt to compromise further systems as well as gain access to accounts with high-level access.
Rather than make security investments randomly or applying the same degree of security to all systems, it is generally advised to secure systems proportionate to their criticality in terms of function or sensitivity of data that they manage. Minimising the impact of a breach depends therefore upon both suitably screening key systems containing critical data so that they are not directly exposed to public networks but are in segregated network segments firewalled off from even other internal corporate network segments, as well as ensuring that there are effective restrictions preventing access across boundaries separating these different zones of trust.
An “assume breach” paradigm therefore typically is delivered via a layered approach to security that leverages the “defence in depth” principle. Rather than operating a single hardened network perimeter that is assumed impenetrable, a defence in depth involves multiple layers of defence even within an organisational network, such as network segmentation and perimeter endpoint security. It is often delivered via what is known as a “zero trust” architecture, but more broadly covers any approach to security that recognises that total security is not possible and instead operates an “onion” of security measures, wherein no layer is infallible, but each partially blunts the effectiveness of any given attack.
Specific additional measures that can further hinder an attacker are: to ensure that any given service is operated at the lowest viable permission level with rights sufficient only to operate the service in question and nothing further; to limit outbound access as well as inbound access to each layer, to prevent data exfiltration and remote shells being established from a compromised system; and limiting the useful of data to an attacker by ensuring that it is encrypted where possible and passwords for example are hashed rather than stored in plaintext.
The faster that a breach or host compromise can be detected, the faster it can be isolated and the less of a chance an attacker will have to perform further pivot attacks against other targets on the network. Published research in recent years indicates that the average dwell time for an intruder to remain inside a network undetected may be as long as six months, which sounds extraordinary, but may indicate that many organisations have already been compromised and simply do not know it. When considering breach detection, it is therefore important that any detective controls can check for signs indicating an existing attacker presence or compromised hosts, not only new attacks that are in progress. The actual active window for an attack may be extremely narrow, so rather than focusing only on detecting attacks in progress, it is important that a significant proportion of detective controls are monitoring for signs of existing attacker presence.
Signs that a host has been breached or compromised are termed Indicators of Compromise. They include unusual outbound network traffic, unexplained or unusual activity by privileged or administrative user accounts, high numbers of or clusters of authentication failures, large or anomalous spikes in data transfer to or from a given system, and unexplained configuration changes.
Various technologies together provide some insight into one or more of these areas, including effective logging and monitoring (using alerts and ideally managed by a centralised security information and event management system (SIEM)) as well as host-based (HIDS) and network-based (NIDS) intrusion detection systems, and file integrity monitoring (FIM) solutions that can spot changes to key files based on calculated checksums of their contents. A more recent toolset are the services provided by tools such as Digital Shadows and similar threat intelligence services that will trawl the dark web and alert organisations if indicators are found that their data is available for sale on the dark web.
Given the increasing number and complexity of third-party supplier and provider integrations within the typical organisation, it is important to consider that third parties may also be compromised: incident response plans should consider the impact should this occur, and channels established with key partners to ensure that any data breaches are swiftly communicated to allow appropriate action to be taken.
If a breach is detected or reported, then the initial step to take is one of triage. A term borrowed from the medical field, triage involves the initial and rapid assessment of harm or damage, as well as the determination of prioritisation of action, based on both urgency or timeliness, as well as importance or criticality. Different systems are used within different medical organisations, but each allows quick examination and categorisation of patients into one of a number of categories, which then determines their prioritisation for treatment, and target treatment deadlines. The same overall approach can be adopted for triaging compromised hosts, based on their data sensitivity and service criticality of operation.
A key pre-requisite to support this decision making, is being able to determine the relative data sensitivity and system criticality of a given system. Rather than attempting to decide this on the fly when an incident is in progress, it is best practice to establish an asset inventory ahead of time, detailing the criticality of each system and the sensitivity of the data that it contains or handles.
Once the initial triage has been conducted, then the general reaction phase involves an organisation conducting steps involving more detailed analysis followed by attempted containment and elimination of the active threat. This can involve significant service disruption, since it may involve the removal of some systems from active service by removing their network connection, both to sever active connections to an attacker, as well as prevent the system’s use in onward “pivot” attacks that spread the breach to further systems. It is important that system or service outage is agreed in incident response plans ahead of time and that senior management are aware of the necessity for this step, so that actions are not delayed during a breach scenario to discuss whether this step is necessary or not.
An incident response plan will generally incorporate details such as the defined Maximum Tolerable Downtime (MTD) for a given system or service. This is the amount of time that a mission- or business-critical process can be disrupted without causing significant harm to the organisation’s mission, and the lower the value the more investment may be required ahead of time in measures to allow either service failover to an alternative, or rapid system restoration.
An area that many organisations will struggle with is that of digital forensics. During incident response processes, it can be important to preserve or capture system state or indicators of compromise detected. This is important in establishing a clear timeline and being able to reproduce or model the steps an attacker took, so that confidence can be gained that all detected systems have been discovered and contained. It is also important to make a record of any actions taken, as well as the time stamps for each, and for a chain of custody to be recorded for any data gathered that may potentially be required as evidence in a criminal investigation.
This is a highly specialist area and extremely easy to get wrong, and typically is made much easier by involving external specialists such as digital forensic investigation teams as early as possible in the event of a breach.
Once an attack has been contained and forensics used to establish with confidence that all compromised hosts have been detected and threats neutralised and the attacker’s presence purged from the network, then work can begin on the implementation of a recovery plan with the aim of restoring normal operations.
This step can only be performed once almost absolute confidence has been gained that an attacker is no longer present in the network and that any backdoors, remote shells or other access methods established by the attacker have been removed and their ongoing access removed. Critically, the initial root cause of the breach should have been identified and either fully resolved or else mitigated by firewall rules etc. before recovery can begin.
Recovery relies on systems having effective and tested backups of data available, and that these backups have been taken frequently enough to permit recover to a defined recovery point. It is also critical that caution is taken that backups of systems or data are known to be “clean” in having been taken prior to an attacker establishing their presence: if not, then in recovering a system, an attacker’s presence may simply be restored to its breach state.
Rapid recovery to a given state relies not just on data backups but on the ability to restore a system or host to a known good operational state. Rather than having hand-crafted systems that are difficult to reproduce manually, most organisations now rely on some form of configuration management or infrastructure automation/infrastructure as code (IAC), such as Ansible, Puppet or Terraform, to be able to exactly replicate and recover a system to a specific known-good state.
The final stage of response to a breach involves the prevention of recurrence of the breach, as well as the production of a full report and debrief to senior management both on what occurred, why, and how future occurrences can be prevented, as well as the costs of doing so if necessary.
It is important that management are aware of the full and detailed impact of the breach, and whether any data belonging to or managed on behalf of customers and corporate partners has been impacted. Not only is it appropriate to let any affected parties know about the impact of a breach, but in many jurisdictions, there is often a “duty to report” to either customers, authorities, or both.
Host breach and compromise is an extremely broad topic but hopefully this brief introduction is useful in at least highlighting some of the key areas to consider, as well as how to take the first steps in incorporating “assume breach” measures into your security programme.
AppCheck help you with providing assurance in your entire organisation’s security footprint. AppCheck performs comprehensive checks for a massive range of web application vulnerabilities from first principle to detect vulnerabilities in in-house application code. AppCheck vulnerability scanning works by continually attempting to break into and compromise your applications in the same way that a hacker would pinpoint any potential gaps in your security. Our proprietary scanning technology is built and maintained by leading penetration testing experts allowing us to understand how a penetration tester or attacker would explore a given application, allowing it to explore all your potential weaknesses that a hacker would discover and inform you of how to resolve these before they can be exploited.
AppCheck performs the same “kill chain” steps that an attacker would, combining open-source intelligence gathering and a sophisticated browser-based crawling engine to identify application components that could be vulnerable to attack. The AppCheck crawling engine uses a combination of application modelling techniques and subtle heuristic cues to automatically discover the complete attack surface of any given application in the shortest time possible.
The AppCheck Vulnerability Analysis Engine provides detailed rationale behind each finding including a custom narrative to explain the detection methodology, verbose technical detail, and proof of concept evidence through safe exploitation.
AppCheck is a software security vendor based in the UK, offering a leading security scanning platform that automates the discovery of security flaws within organisations websites, applications, network, and cloud infrastructure. AppCheck are authorised by the Common Vulnerabilities and Exposures (CVE) Program as a CVE Numbering Authority (CNA).
As always, if you require any more information on this topic or want to see what unexpected vulnerabilities AppCheck can pick up in your website and applications then please contact us: info@localhost
No software to download or install.
Contact us or call us 0113 887 8380
AppCheck is a software security vendor based in the UK, offering a leading security scanning platform that automates the discovery of security flaws within organisations websites, applications, network and cloud infrastructure. AppCheck are authorized by te Common Vulnerabilities and Exposures (CVE) Program aas a CVE Numbering Authority (CNA)