Security Logging and Monitoring Failures are called out by OWASP (the Open Web Application Security Project) in their latest “Top 10” list for 2021 as one of the most critical security risks facing organizations. It is an area that is often overlooked as a security risk, since it is challenging to robustly test systems for logging failures in an easily automated and reportable way. Rather, good logging practices rely primarily on ensuring that developers, system administrators and others are aware of best practices, and that processes are put in place to audit, monitor, and test the effectiveness of logging and monitoring processes within and across systems. In this blog post, we take a look at why logging is important, what the “best practice” guidance is around configuring monitoring, and what risks and issues can occur if these practices are not followed.
Computer and networking systems are all based upon the principle of state change in that they are not completely static but may exist in one of a number of finite states: a system’s internal behaviour or interaction with its environment consists of separately occurring individual actions or events, such as accepting input or producing output, that may or may not cause the system to change its state.
In a web application taken as a whole, for example, the system will typically receive input via HTTP requests, and send output in the form of HTTP responses. HTTP methods or verbs such as DELETE, PUT, PATCH and POST are all explicitly designed to me used within CRUD (create, read, update, or delete) operations to change the state of the system on the server, typically via changing data within a server-side data store or database.
For example, a user may create an order using an HTTP request such as:
And the server will create a new record within its database using the details provided.
However, although these actions can change the system state, the system as a whole may not be inherently stateful in that it may not preserve records of all previous states (i.e., track all changes made), only record the current (latest) state.
Logging is a method of producing a persistent record of changes that occur within the system, and typically preserve both some data around the change that is made (such as a new order being added) as well as surrounding or contextual metadata (such as the time the order was placed, the requesting IP address, the server that performed the action, and the requesting userID).
There are specific types of logs used within individual systems – such as transaction logs used within databases to allow the database state to be rewound or replayed to a previous state or to support database replication – but in general when we refer to logging we’re referring specifically to event logs – logs that record events taking place within the system to provide an audit trail for analysis by human agents, rather than as a system (as with transaction logs).
Events logs may record a range of system or application-level actions or activities, including requested (or granted) file access, authorized or unauthorized activity by users, policy and configuration changes, and other activities performed against system records, especially those containing sensitive data.
The recording of logs for events are fundamental in helping to explain the changes that are going on within a system or network, and to understand the activities that are taking place – something that is not possible by simply examining the current system state as a snapshot at any given time.
It is very common for logs to be used by developers and support staff to investigate reports of issues that have arisen and to explain and understand how and why they may have occurred, in support of debugging – the process of finding and resolving defects or problems that prevent correct operation within the system in question.
However, from a security perspective in particular, logs are essential primarily within the areas of incident detection, incident response, and security forensics or incident analysis. It is important for organisations to be able to detect when there is an ongoing (actual or attempted) attack, exploit or breach, in order to be able to respond efficiently and implement corrective controls.
Consider our example above of a door modelled as a state machine – and lets pretend it is the door to a bank vault. At any given time, you could check the door and determine that its state is “closed”. Perfect! However, there is a big difference in security terms between the door having remained closed for the entirety of the last 24 hours, versus the door having been open and closed several times during the period. Since the state machine (the computer) doesn’t track past states, this is where logging comes in: to provide a record of how the current state was arrived at and record critical state changes as well as their instigators and other surrounding context.
Effective monitoring relies on proportionate, reliable logging and monitoring practices in order to identify what are terms indicators of compromise. These may be individual events of note, or a pattern or volume of events that – whilst individually potentially benign – in totality and collectively represent an ongoing security incident. In the event that detected events are confirmed as security incidents, logging data can be used to help to identify the source and the extent of compromise more effectively.
Notable events – events are in the broadest sense all occurrences that involve some state change within a system, usually as a result of an input to the system. Identifying which events are especially notable will depend on the system in question but a typical audit policy on a system will classify notable events as those that involve a change to the system configuration or policies themselves, changes to user privileges, creation of accounts, authentication successes and failures, accesses to sensitive files or resources, and actions performed by security systems such as firewalls and intrusion detection/prevention systems. Some events may be notable individually based on their criticality, while others may be notable based on volume or rate performed.
System coverage – It is natural when considering which events to record to focus attention on the perimeter of an organisation’s network in order to detect and prevent unauthorized access or attacks from malicious parties external to the organization. However, the potential for a security breach is just as likely, if not more likely, to originate from an internal source within the organisation, such as a disgruntled employee, so it is important to consider various potential threat vectors and to ensure that events related to critical resources are logged regardless of the origins of the event.
Event identification and correlation – It is important to be able to uniquely identify given events, both in order to allow effective coordination and communication in the event of incident investigation, as well as to allow multiple correlated events to be unambiguously linked. Since several events may share many or all of the same attributes, it is common to add a unique index or primary key to each event at the time of logging, using a log event ID that is unique.
Event metadata – In addition to recording the details of the event in question, it greatly assists investigation of incidents if relevant metadata about the request is captured at the time of logging, as well as key data items from the event itself. Metadata includes any information relating to the state change being performed that is not inherent in the state change itself and helps to provide context as to how and why the state change was performed. It can include attributes such as the network origin (e.g., the IP address) that the state change request originated from, as well as details of the claimed client/subject that initiated the state change/event.
Timestamps – Where an incident is being pieced together from events across multiple systems, it is key to be able to correlate events not only by logical or spatial connections but temporal associations (i.e., events that occur on disparate systems but within the same timeframe). It is therefore important to include timestamps with every log entry, but also to ensure that the time on every system is in close synchronisation. This is typically performed by ensuring that all systems within an organisation make use of NTP (Network Time Protocol) to source highly accurate time data from a common origin system.
Exclusions and Redaction – It is important to consider what data is being logged and to recognise that the log data or data store may itself represent a potentially exploitable security weakness if it is sent sensitive data within log events, especially if the log system and its data is sited in a less protected or lower sensitivity network segment or logical zone than the source system that generated the event. A typical failure in security in this regard is for events reporting authentication failures to be sent to a logging system containing the password that was received in the failed login attempt. Although the password is invalid, it may be a simple typo error on an administrator’s part – by harvesting these failed authentication records an attacker can often trivially reconstruct the password and gain access to the system in question.
Immutability and Centralisation – If a host system is compromised, then one of the first things that an attacker may seek to do is to cover their tracks so that their presence cannot be detected by authorised system administrators and security personnel within the organisation. Attackers will often therefore seek to wipe or redact traces of their activity from system and application logs, so that there is no record of their having gained access to the system. This is particularly true if the attacker seeks to establish a long term presence known as an advanced persistent threat (APT) within the system or network in question, in order to use it as a launchpad for further attacks or to silently harvest further data for exfiltration. It is therefore best practice to make sure that logs, or at least higher priority or critical events within them, are not stored locally on the system in question but are sent off-system across the network to a centralised collector or log aggregator. It is important that the system is architected such that the logs received on the collector are immutable in that they can be written, but not updated, overwritten on deleted; and also that the log store is not subject to log rotation or purging based on log rotation in a FIFO (first in, first out) setup such that an attacker can cause records of their activities to be overwritten by simply sending sufficiently high volume of benign events to the collector to cause it to overwrite evidence of the attacker’s presence that was captured in older logs.
Encryption and transport – If using the standard Syslog network transport protocol for the transport of logs from client machines to a centralised logging server or collector, it is worth bearing in mind that Syslog can operate across a number of underlying protocols. Historically the most common transport layer protocol for network logging has been User Datagram Protocol (UDP). However, UDP lacks congestion control mechanisms (meaning that messages may be lost), a reliable error checking mechanism and inherently operates with a connectionless state, leaving the log traffic more susceptible to an attacker using a man in the middle (MTIM) attack.
A centralised log repository or collector may itself be a target of attack and a potential source of a data breach, so encryption of data at rest should also be considered, in addition to the transport encryption requirements.
Log Format Standardisation – Centralised log collectors may receive logs from a number of different applications, services, systems, and appliances, all of which may implement their own proprietary or custom log format for local logging. When looking to collate logs on a central collector however, particularly if looking to be able to process logs in a standard manner and to cross-correlate events generated by different systems, a standard log format is vital. The de facto standard for log message format (as well as transmission/receipt) is the Syslog protocol. Syslog is both a log message format and log transmission protocol defined by the Internet Engineering Task Force (IETF) as a standard in RFC3164 and RFC5424. Networking devices, operating systems, and many software and hardware platforms and applications implement Syslog as a shared common standard logging format and transmission mechanism to collate logs in a centralised log management repository.
Sanitisation and Neutralisation – most developers are well aware of the potential for injection attacks , however the majority of the focus tends to be on parsers handling data received directly within network requests from external parties (clients). However, injection attacks can occur whenever data transitions across different system or code boundaries, particularly where the data is parsed, or encoded/encoded between different languages, character encoding or formatting schemes. Specifically, log parsers and collectors can be subject to injection attacks even though they may be significantly downstream from the areas of the application stack handling customer input.
In potentially the most benign form, an attacker may be able to insert false entries into the log file by providing the application with input that includes injected unexpected characters – as with attempts to simply overwrite or purge logs locally, such log forging on the centralised collector serves a similar purpose of being used to cover an attacker’s tracks. In more serious cases, it may be possible for an attacker to use log injection attacks to perform log file poisoning in which code execution or command injection is achieved, potentially allowing complete compromise of the centralised log collection system.
It is therefore very important to ensure that any data sent to and received by the logging system performs appropriate escaping, neutralizing and sanitization of data received before being examined and interpreted.
Denial of service – Since the ability to understand the circumstances that led to a specific state on a server is related directly to capturing sufficient information on relevant events, it can be tempting to send large amounts of data to the log collector (both in terms of number of events that trigger a log event, as well as the amount of data or message size recorded for each individual event). However, logging too much can cause problems, just as logging too little can.
There are generally three specific concerns with excessive logging that is insufficiently selective: firstly, it can prevent efficient incident investigation if the signal-noise ratio is too high and too many irrelevant events are logged; secondly it can lead to the loss of relevant logs if a log rotation mechanism is in use, with older logs rewritten by newer ones at such a high rate that event logs relating to security incidents have been rotated and overwritten by newer log entries in such a short timescale that log events relating to an incident requiring investigation are no longer available for analysis; and finally it may be possible that a sufficiently high volume of logs actually causes a Denial of Service (DoS) outage of either the client or server system, consuming all available CPU, disk storage, memory allocation or network bandwidth. The volume of logs can be managed by ensuring that all log events are assigned an appropriate log levels or criticality/priority with only log events over a defined threshold recorded and send to the centralised collector. Additionally, ensuring that logs are written asynchronously from a local buffer or queue ensures that the application can keep running rather than stalling or halting waiting on logs to be written.
Testing and monitoring – One aspect of logging that is often neglected is the continuous monitoring and testing of the logging and monitoring systems themselves to ensure that they are operational and working as expected. As with backups, log records are only useful if there is confidence that they will be there and available for use when needed, and that logging has not been silently failing for a period of time without system administrators noticing. In the event that a security incident occurs and needs investigating, you don’t want to be in a position where you discover that the logs required for incident investigation and analysis are simply missing and not available as expected.
Retention and Archival – It is important both from a compliance perspective and also a capacity management angle to explicitly define how long to retain collected logs for, under an archival and retention policy. It may be necessary for compliance reasons in various jurisdictions to set both minimal and maximal data retention periods for different data types and sources, such as personal and/or financial data. Where data is being collected from across multiple, disparate systems, then a single centralised log repository may contain heterogenous data that has multiple competing data retention requirements. Data retention policies may require careful consideration in relation to how they can be technically implemented therefore, and log data may need to be tagged or otherwise differentiated to support data retention processing.
SIEMs and SOCs – Although establishing a robust logging infrastructure is itself a significant undertaking, it should be seen as one component that enables or underpins a broader system for monitoring, analysis, investigation and alerting. Typically, the security events recorded in a log collector will be considered within the wider remit of a Security Information and Event Management System (SIEM), which will also incorporate the ingestion of threat intelligence feeds as well as provide a platform for log correlation and analysis, security alerts and reporting. In turn, a SIEM itself requires both initial and ongoing configuration, tuning, monitoring and response – functions typically performed in many organisations by a Security Operations Centre (SOC) facility and function – a team responsible for ensuring that security events are appropriately detected, investigated and any incidents responded to a timely fashion.
AppCheck is a software security vendor based in the UK, offering a leading security scanning platform that automates the discovery of security flaws within organisations websites, applications, network, and cloud infrastructure. AppCheck are authorized by the Common Vulnerabilities and Exposures (CVE) Program as a CVE Numbering Authority (CNA).
As always, if you require any more information on this topic or want to see what unexpected vulnerabilities AppCheck can pick up in your website and applications then please get in contact with us: firstname.lastname@example.org
No software to download or install.
Contact us or call us 0113 887 8380