The Round I MITRE ATT&CK Product Evaluations: A Guide By Security Experts



MITRE ATT&CK is a comprehensive knowledge base and complex framework of over 200 techniques that adversaries may use over the course of an attack. These include specific and general techniques, as well as concepts and background information on well-known adversary groups and their campaigns.

One of the more recent and most exciting components of MITRE ATT&CK is their ATT&CK-based product evaluations. These product evaluations use the ATT&CK framework to give specific evaluations of how security vendors approach detection for a comprehensive view. Their goal is to aid MITRE’s government sponsors and the industry as a whole to make more informed decisions to combat security threats and advance industry threat detection.

Want to use MITRE ATT&CK to optimize your SOC? Read How to Create a Closed-Loop Security Process with MITRE ATT&CK.


Table of Contents

What are ATT&CK-based Product Evaluations?

MITRE ATT&CK-based Product Evaluations give details on how security vendors approach detection in the ATT&CK context. Thus far, the initial evaluation has been completed for several security vendors to report on how they identify the techniques used by threat actors for a specific APT without scores, rankings, or comparisons.

In order to complete the evaluation, the MITRE red team executes two predetermined APTs end-to-end on a range machines equipped with a vendor’s endpoint detection and response (EDR) product. The evaluation chains techniques together in a logical attack flow. Each technique is identified as a step. Whether or not each technique was detected and how is catalogued, and the EDR product is assigned a subset of six detection types for each step. Each step may be modified in three different ways to provide more details on what the EDR product was able to identify.

What is MITRE Testing For?

MITRE does not explicitly state what they are looking for. Instead, their main objective is to state facts about each product’s capabilities and avoid any scores, rankings, or comparisons. All products are assigned a subset of six detection types in addition to modifications for each step taken by the MITRE red team. The hope is that MITRE will be able to define each product’s coverage of the framework in an unbiased and factual way.

The resulting report is difficult to interpret at a glance, since there are multiple steps, several nuanced detection types for each step, and no ranking basis for how effective an EDR product is. In many ways, we find this to be powerful, since it cuts through the marketing noise. The test takes extra effort to understand and get significant value from, but it is no longer about which vendor has 100% coverage. Instead, it is about which product best fits the mindset they are trying to drive. MITRE recognizes that any security product should be measured by balancing the data it collects and what it does with that data. In order to do that, they look at several important factors, which we can deduce from the evaluation and associated detection types.

Key Factors to Measure Effectiveness of an EDR product

  1. Which malicious activity was identified.
  2. What relevant data was collected about the malicious activity.
  3. What relevant context was provided about the malicious activity.
  4. What enrichment the vendor provided to the end user.
  5. How fast and easy it was to gather relevant information and artifacts about the malicious activity.

MITRE looks at the entire project from an analyst’s point of view, which is what makes this new approach so useful to those looking to evaluate a potential security tool against an adversary emulation.

What does MITRE Use for the Initial Evaluation?

The initial evaluation is two isolated, end-to-end attacks based off of the threat group APT3/GOTHIC PANDA. This choice was largely due to substantial reporting on post-exploit behavior, including information on harvesting credentials, issuing on-keyboard commands, and using programs trusted by the operating system. The initial evaluation began with a Cobalt Strike scenario, and the second exercises a PowerShell Empire scenario.

What are the Main MITRE ATT&CK Detection Types?

Each security vendor is evaluated using a subset of six detection types for each step taken by the MITRE red team during evaluation. When a detection occurs, the techniques used are recorded along with notes on how the detection occurred. These techniques may have more than one detection if the evaluated vendor’s product, or suite of products, detects the technique in multiple ways. The main MITRE ATT&CK Detection Types include None, Telemetry, Indicator of Compromise, Enrichment, General Behavior, and Specific Behavior.

Main Detection Type: None

In this instance, the evaluated vendor is unable to detect red activity due to capability limitations or other reasons.

This can mean a few things: either the vendor was unable to collect the relevant data, or unable to identify the relevant behavior. Either way, this is never a good thing. Alert fatigue is a real concern, absolutely. But even if you don’t want an alert, you want to know you could find that information if you needed it.

Main Detection Type: Telemetry

In this case, the vendor produces some minimally processed data. This is accessible to the end user and allows them to perform their own analysis to identify red team activity.

Essentially, this means the vendor collected the right data and managed to identify behavior relating to the attack based on investigation queries. This is always good news- it's always good to be informed on what's happening in the environment. While you may not want to alert (i.e detection) on every step, you would definitely want to collect telemetry on it. That said, a telemetry alone might not suffice. You may want just telemetry, or you may want telemetry and a detection. Regardless, the more of these, the better.

Main Detection Type: Indicator of Compromise

Here, the vendor identifies red team activity based on known hashes, IP addresses, C2 domain, tool names, tool strings, or module names.

Essentially, this means the vendor created an alert based on reputation, not behavior. This is irrelevant and should be disregarded.

Main Detection Type: Enrichment

This type means the vendor captures data, typically data identified in the Telemetry detection type, and enriches it with additional information such as rule name, labels, or tags. It may also include ATT&CK tactics or techniques that would assist in a user’s analysis of the data beyond what would have been originally presented.

Essentially, this means the vendor tagged an element with something insightful such as evidence, suspicion, or a malop, with bonus points for specifying the relevant ATT&CK category. This is always a good thing so long as it is relevant and not overwhelming. If the product can enrich any piece of telemetry or alert with something useful, this is good news.

Main Detection Type: General Behavior

This type shows that the vendor created an alert for suspicious or potentially malicious behavior based on reported complex logic or a rule. This indicates the behavior is anomalous, but does not provide specific details on the detected procedure.

Essentially, this means the vendor created a general alert without specifying exactly what the attack technique was. Instead, it provided a more general description and was able to classify the type of malicious activity happening, without identifying exactly which technique was being used. This is useful, but never as useful as providing specific behavior. If you have the specific behavior, you can infer the general behavior, but not the other way around.

Main Detection Type: Specific Behavior

In this case, the vendor detects suspicious behavior based on a complex rule or logic and provides an ATT&CK “technique”-level description of the activity.

Essentially, this means the vendor created a specific alert describing the exact technique being used. If a detection is warranted, this is the best type. It’s important to note that it may need to be tainted to avoid alert fatigue if the action is often benign. Not every technique is malicious in and of itself.

Isn't it always better to have more Specific/general behaviors than few?

No. Since the evaluation allows for vendors to alert more than once on the same activity. Alerting twice or more on each activity is unnecessary, and has the potential to create a large number of alerts, which could decrease SecOps efficiency. For many security teams over-alerting (or alert fatigue) is a serious problem. With some vendors evaluated double and triple alerting over 40% of the time, simply adding up the number of general and specific behaviors could lead to a misleading interpretation of the vendor evaluated. Knowing when and how to alert is the difference between a crude tool and a sophisticated one, and ultimately depends on the security philosophy of each vendor.

Further consideration should be given to the stages of attack discovered. Not all techniques are created equal- a vendor may excel at alerting on many behaviors in the beginning of an attack but not the end, or vice versa. Ideally, you would want to make sure a vendor is able to alert correctly across all stages of an attack.

What are the MITRE ATT&CK Modifier Detection Types?

Each detection in the evaluation may be modified to provide more detailed information on what the vendor was able to identify. The MITRE ATT&CK Modifier Detection Types include delayed, tainted, and configuration change.

Modifier Detection Type: Delayed

In this instance, a detection was triggered a notable amount of time (e.g. hours) after the initial analysis and not in real time, due to an external input (e.g. a vendor services team) or lack of capability.

Essentially, this means either their product needs to do time-consuming processing, or other services are needed. Real-time, product-driven detections are always superior, and delayed detection is always a bad thing.

Modifier Detection Type: Tainted

In this case, a detection was triggered by association - which typically means its parent process is deemed malicious. This is usually good for detections that by themselves would be too noisy.

Essentially, the tainted modifier denotes that the process in question is highly suspicious or malicious due to its association to a confirmed malicious process. If an event in and of itself is enough to raise a flag, then tainted may not be necessary. However, if the event needs more context to decide if something is bad, this is a powerful and useful thing for a product to be able to do. Tainted is always good, but not always necessary.

Modifier Detection Type: Configuration Change

This type shows a detection was triggered thanks to a particular configuration change or additional API access that allows data not normally accessible to the end user to become available.

This is never good but not always bad. It’s not the end of the world, but perhaps a red flag if there are lots of configuration changes.

How Does ATT&CK Differ from Other Frameworks?

ATT&CK is unique in the space and nicely contrasts other frameworks because it is not solely a stand-alone list of Indicators of Compromise (IoC) on which a vendor should alert. MITRE recognizes that real-world threats are constantly advancing, and legacy evaluation methods simply do not give buyers a clear understanding of how vendors will protect them. In order to give buyers the complete picture, MITRE maps events as well as stand-alone IoC’s. By themselves, these events may appear benign. However, when correlated with other events, they give analysts the necessary context to identify advanced persistent threats (APT).

Threat hunting and MITRE

This type of product evaluation is incredibly valuable from a threat hunting perspective. How a product deals with events from a hunter's perspective can be very circumstantial. It is near impossible to show a product's ability to assist in threat hunting with a mere score or chart. It’s important to look at each type of event and know if the product is going to provide telemetry, an alert, or nothing at all. You need to understand details of the alert or telemetry to see if they are enriched or tainted. Furthermore, you want to see what the alert looks like in the actual product. The MITRE evaluation provides all of this key information, so a threat hunter will get a true sense of what it would be like to use the product in a way that a mere number could never properly convey. 

Conclusion: MITRE ATT&CK Product Evaluations

MITRE ATT&CK product evaluations are an exciting, useful evaluation for companies, government organizations, and threat hunters looking to identify the best vendor for their needs. Keep in mind, no one vendor can or should aim to have 100% ATT&CK coverage. Some blocks are completely irrelevant to many customers, while others are not relevant to EPP products. Moreover, every block can be tested in multiple ways. Learn more about MITRE ATT&CK and Cybereason.

Cybereason Team
About the Author

Cybereason Team

Cybereason is dedicated to partnering with Defenders to end attacks at the endpoint, in the cloud and across the entire enterprise ecosystem. Only the AI-driven Cybereason XDR Platform provides predictive prevention, detection and response that is undefeated against modern ransomware and advanced attack techniques. The Cybereason MalOp™ instantly delivers context-rich attack intelligence across every affected device, user and system with unparalleled speed and accuracy. Cybereason turns threat data into actionable decisions at the speed of business.

All Posts by Cybereason Team