Skip to content

Penetration Testing of A.I. Models

  • by

Penetration testing is a cornerstone of any mature security program and is a mature and well understood practice supported by robust methodologies, tools, and frameworks. The tactical goals of these engagements typically revolve around identification and exploitation of vulnerabilities in technology, processes, and people to gain initial, elevated, and administrative access to the target environment. When executed well, the insights from penetration testing are invaluable and help the organization reduce IT related risk.

Organizations are still discovering new ways in which Large Language Models (LLM’s) and Machine Learning (ML) can create value for the business. Conversely, security practitioners are concerned with the unique and novel risks to the organization these solutions may bring. As such the desire to expand penetration testing efforts to include these platforms is not surprising. However, this is not as straight forward as giving your testers the IP addresses to your AI stack during your next test. Thoroughly evaluating these platforms will require adjustments in approach for both organizations being evaluated and the assessors.

Much of the attack surface to be tested for AI systems (i.e. cloud, network, system, and classic application layer flaws) is well known and addressed by existing tools and methods. However, the models themselves may contain risks as detailed in the OWASP Top Ten lists for LLM’s (https://llmtop10.com/) and Machine Learning (https://mltop10.info/).

Unlike testing for legacy web application Top Ten flaws, where the impacts of any adversarial actions were ephemeral (i.e., SQL Injection) or easily reversed (i.e., stored XSS attack), this may not be the case when testing AI systems. The attacks submitted to the model during the penetration test could potentially influence long-term model behavior. While it is common to test web applications in production environments, for AI models that incorporate active feedback or other forms of post-training learning where testing could lead to perturbations in responses, it may be best to perform penetration testing in a non-production environment.

Checksum mechanisms can be used to verify that the model versions are equivalent. Furthermore, several threat vectors in these lists deal specifically with the poisoning of training data to make the model generate malicious, false, or bias responses. If successful such an attack would potentially impact other concurrent users of the environment and having trained the model on such data, persist beyond the testing period. Lastly, there are hard dollar costs involved in training and operating these models. Taking any compute/storage/transport costs into account should test environments or retraining be required as part of recovering from a penetration test will be a new consideration for most.

As penetration testers, the MITRE ATT&CK framework has long been a go-to resource for offensive security Tactics, Techniques and Procedures (TTP’s). With the attack surface expanding to AI platforms MITRE has expand their framework and created the Adversarial Threat Landscape for Artificial-Intelligence Systems, or “ATLAS”, knowledge base (https://atlas.mitre.org/matrices/ATLAS). ATLAS, along with the OWASP lists, give penetration testers a great place to start in terms of understanding and assessing the unique attack surface presented by AI models.

Context of the model will need to be considered in both the rules of engagement under which the test is performed but also in judging model responses. Is the model public or private? Production or test? If access to training data is achieved, can poisoning attacks be conducted? If allowable, what tools and methods would be used to generate the malicious training data and once trained, how is the effect of the attack demonstrated and documented? How could we even evaluate some risk areas – for example LLM09 Overreliance – as part of a technical test?

LLM and ML technologies have been evolving for many years and have only recently exploded to the forefront of most technology related conversations. This makes the solutions seem like they have come out of nowhere to disrupt the status quo. From a security perspective, these solutions are disruptive as far as their adoption is outpacing the security team’s ability to put as many technical controls in place as they might like. But the industry is making progress. There are a number of commercial and open-source tools to help evaluate the security posture of commonly deployed AI models with more on the way. Regardless, we can rely on penetration testing to understand the areas of exposure these platforms introduce to our environments today. These tests may require a bit more preparation, transparency, and collaboration than before to evaluate all the potential areas of risk posed by AI models, especially as they become more complex and integrated into critical systems.

​Penetration testing is a cornerstone of any mature security program and is a mature and well understood practice supported by robust methodologies, tools, and frameworks. The tactical goals of these engagements typically revolve around identification and exploitation of vulnerabilities in technology, processes, and people to gain initial, elevated, and administrative access to the target environment. When executed well, the insights from penetration testing are invaluable and help the organization reduce IT related risk.

Organizations are still discovering new ways in which Large Language Models (LLM’s) and Machine Learning (ML) can create value for the business. Conversely, security practitioners are concerned with the unique and novel risks to the organization these solutions may bring. As such the desire to expand penetration testing efforts to include these platforms is not surprising. However, this is not as straight forward as giving your testers the IP addresses to your AI stack during your next test. Thoroughly evaluating these platforms will require adjustments in approach for both organizations being evaluated and the assessors.

Much of the attack surface to be tested for AI systems (i.e. cloud, network, system, and classic application layer flaws) is well known and addressed by existing tools and methods. However, the models themselves may contain risks as detailed in the OWASP Top Ten lists for LLM’s (https://llmtop10.com/) and Machine Learning (https://mltop10.info/).

Unlike testing for legacy web application Top Ten flaws, where the impacts of any adversarial actions were ephemeral (i.e., SQL Injection) or easily reversed (i.e., stored XSS attack), this may not be the case when testing AI systems. The attacks submitted to the model during the penetration test could potentially influence long-term model behavior. While it is common to test web applications in production environments, for AI models that incorporate active feedback or other forms of post-training learning where testing could lead to perturbations in responses, it may be best to perform penetration testing in a non-production environment.

Checksum mechanisms can be used to verify that the model versions are equivalent. Furthermore, several threat vectors in these lists deal specifically with the poisoning of training data to make the model generate malicious, false, or bias responses. If successful such an attack would potentially impact other concurrent users of the environment and having trained the model on such data, persist beyond the testing period. Lastly, there are hard dollar costs involved in training and operating these models. Taking any compute/storage/transport costs into account should test environments or retraining be required as part of recovering from a penetration test will be a new consideration for most.

As penetration testers, the MITRE ATT&CK framework has long been a go-to resource for offensive security Tactics, Techniques and Procedures (TTP’s). With the attack surface expanding to AI platforms MITRE has expand their framework and created the Adversarial Threat Landscape for Artificial-Intelligence Systems, or “ATLAS”, knowledge base (https://atlas.mitre.org/matrices/ATLAS). ATLAS, along with the OWASP lists, give penetration testers a great place to start in terms of understanding and assessing the unique attack surface presented by AI models.

Context of the model will need to be considered in both the rules of engagement under which the test is performed but also in judging model responses. Is the model public or private? Production or test? If access to training data is achieved, can poisoning attacks be conducted? If allowable, what tools and methods would be used to generate the malicious training data and once trained, how is the effect of the attack demonstrated and documented? How could we even evaluate some risk areas – for example LLM09 Overreliance – as part of a technical test?

LLM and ML technologies have been evolving for many years and have only recently exploded to the forefront of most technology related conversations. This makes the solutions seem like they have come out of nowhere to disrupt the status quo. From a security perspective, these solutions are disruptive as far as their adoption is outpacing the security team’s ability to put as many technical controls in place as they might like. But the industry is making progress. There are a number of commercial and open-source tools to help evaluate the security posture of commonly deployed AI models with more on the way. Regardless, we can rely on penetration testing to understand the areas of exposure these platforms introduce to our environments today. These tests may require a bit more preparation, transparency, and collaboration than before to evaluate all the potential areas of risk posed by AI models, especially as they become more complex and integrated into critical systems.  

​ Read More

Leave a Reply

Your email address will not be published. Required fields are marked *