Anyone holding out hope that AI can be made inherently secure are in for a disappointment, with Microsoft’s research team saying it is an impossible task.
AI has been both a blessing and a curse for cybersecurity. While it can be a useful tool for analyzing code and finding vulnerabilities, bad actors are already working overtime to use AI in ever-increasingly sophisticated attacks. Beyond direct cyberattacks, AI also poses everyday risks to data security and trade secrets, thanks to how AI consumes and indexes data.
To better understand the burgeoning field, a group of Microsoft researchers tackled the question of AI security, publishing their findings in a new paper. The research was done using Microsoft’s own AI models.
In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted.
The team focused on eight specific areas.
- Understand what the system can do and where it is applied
- You don’t have to compute gradients to break an AI system
- AI red teaming is not safety benchmarking
- Automation can help cover more of the risk landscape
- The human element of AI red teaming is crucial
- Responsible AI harms are pervasive but difficult to measure
- LLMs amplify existing security risks and introduce new ones
- The work of securing AI systems will never be complete
AI Will Never Be 100% Secure
When discussing the eighth point, the researchers highlighted the issues involved in securing AI systems.
Engineering and scientific breakthroughs are much needed and will certainly help mitigate the risks of powerful AI systems. However, the idea that it is possible to guarantee or “solve” AI safety through technical advances alone is unrealistic and overlooks the roles that can be played by economics, break-fix cycles, and regulation.
Ultimately, the researchers conclude that the key to AI security is raising the cost of attacking, using the three specific methods cited.
Economics of cybersecurity. A well-known epigram in cybersecurity is that “no system is completely foolproof” [2]. Even if a system is engineered to be as secure as possible, it will always be subject to the fallibility of humans and vulnerable to sufficiently well-resourced adversaries. Therefore, the goal of operational cybersecurity is to increase the cost required to successfully attack a system (ideally, well beyond the value that would be gained by the attacker) [2, 26]. Fundamental limitations of AI models give rise to similar cost-benefit tradeoffs in the context of AI alignment. For example, it has been demonstrated theoretically [50] and experimentally [9] that for any output which has a non-zero probability of being generated by an LLM, there exists a sufficiently long prompt that will elicit this response. Techniques like reinforcement learning from human feedback (RLHF) therefore make it more difficult, but by no means impossible, to jailbreak models. Currently, the cost of jailbreaking most models is low, which explains why real-world adversaries usually do not use expensive attacks to achieve their objectives.
Break-fix cycles. In the absence of safety and security guarantees, we need methods to develop AI systems that are as difficult to break as possible. One way to do this is using break-fix cycles, which perform multiple rounds of red teaming and mitigation until the system is robust to a wide range of attacks. We applied this approach to safety-align Microsoft’s Phi-3 language models and covered a wide variety of harms and scenarios [11]. Given that mitigations may also inadvertently introduce new risks, purple teaming methods that continually apply both offensive and defensive strategies [3] may be more effective at raising the cost of attacks than a single round of red teaming.
Policy and regulation. Finally, regulation can also raise the cost of an attack in multiple ways. For example, it can require organizations to adhere to stringent security practices, creating better defenses across the industry. Laws can also deter attackers by establishing clear consequences for engaging in illegal activities. Regulating the development and usage of AI is complicated, and governments around the world are deliberating on how to control these powerful technologies without stifling innovation. Even if it were possible to guarantee the adherence of an AI system to some agreed upon set of rules, those rules will inevitably change over time in response to shifting priorities.
The work of building safe and secure AI systems will never be complete. But by raising the cost of attacks, we believe that the prompt injections of today will eventually become the buffer overflows of the early 2000s – though not eliminated entirely, now largely mitigated through defense-in-depth measures and secure-first design.
Additional Findings
The study cites a number of issues involved in securing AI systems, not the least of which is the common scenario of integrating AI with legacy systems. Unfortunately, trying to marry the two often results in serious security issues.
The integration of generative AI models into a variety of applications has introduced novel attack vectors and shifted the security risk landscape. However, many discussions around GenAI security overlook existing vulnerabilities. As elaborated in Lesson 2, attacks that target end-to-end systems, rather than just underlying models, often work best in practice. We therefore encourage AI red teams to consider both existing (typically system-level) and novel (typically model-level) risks.
Existing security risks. Application security risks often stem from improper security engineering practices including outdated dependencies, improper error handling, lack of input/output sanitization, credentials in source, insecure packet encryption, etc. These vulnerabilities can have major consequences. For example, Weiss et al., [49] discovered a token-length side channel in GPT-4 and Microsoft Copilot that enabled an adversary to accurately reconstruct encrypted LLM responses and infer private user interactions. Notably, this attack did not exploit any weakness in the underlying AI model and could only be mitigated by more secure methods of data transmission. In case study #5, we provide an example of a well-known security vulnerability (SSRF) identified by one of our operations.
Model-level weaknesses. Of course, AI models also introduce new security vulnerabilities and have expanded the attack surface. For example, AI systems that use retrieval augmented generation (RAG) architectures are often susceptible to cross-prompt injection attacks (XPIA), which hide malicious instructions in documents, exploiting the fact that LLMs are trained to follow user instructions and struggle to distinguish among multiple inputs [13]. We have leveraged this attack in a variety of operations to alter model behavior and exfiltrate private data. Better defenses will likely rely on both system-level mitigations (e.g., input sanitization) and model-level improvements (e.g., instruction hierarchies [43]).
While techniques like these are helpful, it is important to remember that they can only mitigate, and not eliminate, security risk. Due to fundamental limitations of language models [50], one must assume that if an LLM is supplied with untrusted input, it will produce arbitrary output. When that input includes private information, one must also assume that the model will output private information. In the next lesson, we discuss how these limitations inform our thinking around how to develop AI systems that are as safe and secure as possible
Conclusion
The study’s conclusion is a fascinating look into the challenges involved in securing AI systems, and sets realistic expectations that organizations must account for.
Ultimately, AI is proving to be much like any other computer system, where it will become a never-ending battle of one-upmanship between security professionals and bad actors.
from WebProNews https://ift.tt/syk5UXM