SOF-ELK is a free
and open source bootable virtual machine (VM) preconfigured with a fully functional and customized Elastic Stack implementation. SOF-ELK specifically focuses on the workflows and needs of computer forensic and security operations professionals, with dozens of parsers to extract useful and relevant fields from numerous log formats. Dashboards with visualizations and investigative views to examine the loaded data are also provided. The project aims to eliminate the significant system administration workload required to start using the powerful Elastic Stack, a preeminent big data analytics platform, in a production capacity. It is designed for both seasoned experts and newcomers to the digital forensics and incident response (DFIR) field to perform mass-scale analysis with disparate data sources. SOF-ELK ingests both at-rest data from evidence files and live data sources. The ability to ingest both types of data make it suitable for both forensic investigation and security operations use cases.
Elastic Common Schema
The most recent release includes numerous updates to the Elastic Stack components, majorly overhauled log parsers, and many additional improvements. The most significant change is the adoption of the Elastic Common Schema (ECS) for nearly 1,100 fields parsed across all data types. The ECS is a consistent approach to field naming, enabling easier analysis while also allowing correlation across multiple tools and visualizations. Unfortunately, each tool in an investigator’s toolbox might use a unique field naming scheme, making searches across different platforms complex, frustrating, and potentially inaccurate.
Consider a simple example: a field containing the source IP address for a log entry or a NetFlow record. This field name could be reflected as:
source_ip, src_ip, srcip, ip.source, source.ip,
or countless other names in various tools.
However, when data is normalized to the ECS, analysts and investigators can create search filters that work equally across multiple tools while also flattening the learning curve they typically encounter when using new or unfamiliar tools.
Using the source IP address as an example again, the ECS specifies this value is always reflected in the source.ip field. With the ECS-based naming structure, a filter of
source.ip:192.168.6.75
can be used across ECS-compliant tools, with consistent results.
This consistency also provides a path to future capabilities such as using community-built dashboards, Elastic’s security information and event management (SIEM) tool, machine learning capabilities, and more.
Another key feature of the ECS model, also implemented in SOF-ELK’s parsers, is the aggregation of values from similar fields to a single field for convenient searching. For example, a record with both a source.ip and destination.ip will have those values copied into a list of IP addresses named related.ip, which will also contain any other IP address values from the source data. This aggregated field allows an investigator the opportunity to more broadly search for records containing an IP address of interest, regardless of the specific type of IP address. (For example:
related.ip:192.168.6.75.
This aggregation of related fields is accomplished using numerous field types in addition to IP addresses, such as MAC addresses, network ports, hostnames, hash values for files, and many others.)
Data Enrichments
SOF-ELK also implements several data source enrichments that provide additional insight not available in the original.
Geolocation and Network Provider
One key enrichment is the addition of both location and network provider information to IP addresses. This lookup is performed on the platform itself so no data is transmitted outside of the VM, ensuring sound operational security practices that are important to ongoing investigations. The location enrichment feature allows for visualizing the location context of network artifacts on a map, while the network provider enables searching for traffic involving a specified ISP or SaaS platform.
Community ID Network Conversation Hash
The automatic calculation of a Community ID value is another new enrichment. The Community ID is a hashed identifier that specifies a specific network conversation. This public algorithm, created by the Corelight company, is based on source and destination IP addresses and ports and transport protocol. The resulting string,
1:OS79QgipeMxLNHu2rB35Gx+682k=
can be used to search for the same network conversation across multiple investigatory tools.
While originally developed by Corelight for the Zeek Network Security Monitor (NSM) platform, the public nature of the algorithm means that the Community ID has been integrated in countless network analysis tools. This is an invaluable way to identify network conversations, but it is rarely available from the original network evidence itself. Therefore, SOF-ELK will calculate and store the Community ID records from any data source if the original includes the necessary source fields.
Dashboards for Visualization
One lesson every investigator or analyst has learned is that reviewing massive amounts of data from several millions of records can be a challenging task. Most tools are simply not built to accommodate that scale of source data, yet this has become a common requirement even for smaller cases. SOF-ELK aims to alleviate that problem with dashboards and visualizations that make quick work of spotting anomalies or trends, correlating disparate data points, and simplifying even the most complex of data sources into visually digestible components.
For example, the NetFlow dashboard shown below reflects the sample source data provided in the VM. Spotting the spike of traffic in the left-most time series graph is visually simple. However, finding that pattern in over 300,000 records of text would be quite difficult. Similarly, the two nested donut charts depict ports and protocols observed in the source data. Identifying the most heavily used ports and their ratio of occurrence is much easier to accomplish visually than from those same source records.
It’s also important to note that these dashboards are all interactive and designed to support the iterative nature of an investigation. An analyst can simply click on a particular slice from a donut chart, draw a box on the map covering a particular focus area, select a time frame of interest, and immediately narrow an extremely large set of source records to a small subset of interest based on the search characteristic at the time. This makes the dashboards themselves tools in addition to visualization and reporting tools.
Extensive Parsing Capabilities
SOF-ELK already includes parsing capabilities for dozens of data types, with more being added all the time. Currently, the data types include:
- Syslog-formatted log entries from *NIX systems, covering numerous subtypes such as SSH, DHCP, DNS, firewalls, and more
- HTTP server logs in several formats including Common Log Format, Combined/Extended, IIS CSV formatted, web proxy logs, proxy server logs, and more
- Zeek NSM logs, in JSON form
- KAPE (suite of endpoint forensic software) logs, in JSON form
- Amazon Web Services (AWS) logs
- Google Cloud Platform (GCP) logs
- Microsoft Azure and Microsoft 365 logs
- Kubernetes logs
- NetFlow network traffic summaries covering NetFlow versions 5, 7, and 9, IPFIX, and equivalent files from Zeek, GCP, AWS, and Azure
SOF-ELK can process each of these data types from static files loaded to the platform. In most cases, it can also process source data from live sources transmitted over a network connection. This enables both a post-incident investigative workflow for DFIR purposes as well as security operations workflows to support ongoing collection and observation.
Free and Open Source with Dynamic Updates
All configuration files used on the SOF-ELK platform are maintained in a GitHub repository. This permits public review of all the project’s content and users to report and discuss any bugs, optimizations, or feature requests they identify. The GitHub repository also provides a means of updating platforms operating in the field without needing to redownload another VM. This update feature requires that the VM have Internet access, but only requires a single command to download and activate updated parsers, dashboards, or visualizations. Any newly added data sources are also accommodated using the field update process. Generally, a new VM download is only required for significant updates to the base operating system, Elastic Stack, or similar major components.
The SOF-ELK platform is a completely free community resource anyone can use for casework, research, or any other purpose. It is also used in several SANS courses. This allows students of all skill levels to gain experience in realistic hands-on scenarios using sample case data collected from controlled environments designed to model real-world enterprises. The SANS course, FOR572TM: Advanced Network Forensic and AnalysisTM uses SOF-ELK to correlate log data from various sources and examine large volumes of NetFlow records. The FOR509: Enterprise Cloud Forensics and Incident ResponseTM course uses SOF-ELK to examine cloud data evidence across all cloud service providers and FOR589: Cybercrime IntelligenceTM incorporates SOF-ELK for large scale data analysis. Other course authors are in the process of integrating SOF-ELK into more SANS courses. This will provide future students and practitioners a consistent user experience across a growing range of forensic evidence types.
Here are several online resources to help you get started with SOF-ELK:
- SOF-ELK project wiki
- Virtual machine README, (includes download and usage instructions)
- SOF-ELK GitHub project page
If you’re looking for a turnkey tool that immediately adds value to massive volumes of common forensic evidence data types, consider giving SOF-ELK a try.
Discover the power of SOF-ELK! Enhance your forensic skills and master network and cloud evidence analysis with the SANS FOR572: Advanced Network Forensic and AnalysisTM, FOR509: Enterprise Cloud Forensics and Incident ResponseTM, and FOR589: Cybercrime IntelligenceTM courses. Ready to take the next step? Register today or request a demo to see these courses in action!