Introduction
Synthetic Adversarial Log Objects or SALO is a novel framework for generating realistic and diverse log events without requiring any actual infrastructure or actions. Created by Marcus LaFerrera, a staff security strategist at Splunk, this framework enables security practitioners, data scientists, and researchers to create log events in a simple, repeatable, and scalable way. SALO can be used for various applications, such as testing and improving log analysis tools, simulating cyberattacks and defenses, generating synthetic datasets for machine learning models, and enhancing security education and awareness.
Why even use fake logs?
Let’s consider a hypothetical situation. Suppose a large financial institution has just detected an attempted cyberattack on its network. The attack appears to be a sophisticated, multi-stage attack, and the security team is struggling to identify the root cause and the extent of the damage.
In order to investigate the attack, they would need to build an extensive infrastructure to replicate the environment, logging, and malicious events, which can be challenging, time-consuming, and costly. This is where tools like SALO come in handy. The security team could create synthetic log events that mimic the behavior of the attacker. By specifying the parameters and conditions of the log generation, the security team could create log events in a consistent and reproducible way. This would enable the team to test their log analysis tools, identify any weaknesses or blind spots, and improve their detection and response capabilities.
Moreover, by generating synthetic datasets, the security team could train machine learning models to detect and respond to similar attacks in the future. This would enable the team to improve their defenses and reduce the likelihood of a successful attack.
Why SALO?
There are several reasons to use SALO:
- Low barrier to entry: SALO is easy to install and use and does not require any prior knowledge of log formats or protocols. You can start generating log events with just a few lines of code
- Minimal effort: SALO automates the process of creating realistic and diverse log events, saving you time and resources. You do not need to set up any infrastructure or perform any actions to generate log events
- Repeatable process: SALO allows you to create log events in a consistent and reproducible way. You can specify the parameters and conditions of your log generation and run it as many times as you want (as long as your CPU can handle it)
- Highly customizable: SALO gives you full control over the content and format of your log events. You can customize every aspect of your log generation, such as the source, destination, timestamp, message, severity, etc. You can also create your own templates and modules to generate log events for specific scenarios or applications
A Simple Scenario
To kick off, suppose that we aim to produce a simple DNS log. How can we achieve this? We can create a new YAML file, which we will call `custom_dns.yaml`, and fill it with the following content.
We can now generate the necessary data by executing the following command:salo recipe custom_dns.yaml | jq
Here’s the output of that command,
And just like that, we’ve successfully created our DNS log! The data is generated randomly for us by default. However, if we want to customize or modify specific parameters, we need to provide the relevant details ourselves.
Note: Pipe the result to `jq` to get structured JSON data to the console.
SALO Layers
SALO layers contain 4 components that can be described as in the following picture:
Recipes
The script above is what SALO calls a recipe. SALO recipes refer to configuration files that specify how SALO generates log events. These files allow users to define the log events to generate, specify their values, determine the order in which they should be logged, and make other customizations.
In order to modify the scenario to our needs, the following script provides additional options. These options can be found under ‘sessions’ where various ‘event’ configuration options are listed. We will implement a few configurations for our example. The full, comprehensive list of available configurations can be seen in the documentation.
- repeat → perform the `event` and its `spawns` repeatedly for a number of times specified
- time → defines multiple attributes of the timestamp. In this example, we include the start time
- options → specify the values of the model attribute. Options will be inherited from any and all parent event objects in the same session. In this example, the options declared for `suricata.DNSModel` will be inherited in `zeek.DNSmodel`. We modify the attributes such as the model `src_ip`, `src_port`, and `dest_ip`
The above output is repeated 10 times, with any unspecified options being randomly generated.
Stencil
Stencils are an optional layer that enables users to define specific patterns, behaviors, or features of an attack that they want SALO to emulate. With this, we can programmatically define the values of our log output and mimic actual adversaries’ activities.
For example, we can create a `stencil` that replicates the command and control (C2) beacons over DNS. This `stencil` would send a query to a DNS server that follows a specific pattern and leverage TXT records with base64 encoded content,
The stencils will produce the following logs,
Events
Events are specific schemas that represent the structure of a logged event, essentially being the generation of individual log objects. For example, a model could have a `src_ip` key with a value that follows the pattern of an IPv4 or IPv6 address. SALO comes with several pre-built Events, with more being developed and shared regularly.”
Outputs
Outputs are a means to produce and save the output from a SALO recipe. One or more can be defined, allowing to save results to multiple locations. The current outputs supported are:
- Console
- Raw Log File
- Directly to Splunk
References