Artificial Intelligence

Navigating the Shadows: Safeguarding AI Infrastructure Amidst CVE-2023-48022

We all want to leverage AI, but models are only as good as the data used to train them. Often, training data is comprised of confidential information. How do you balance the need to make an AI run effectively without exposing PII? It’s not only the initial training that could be exposing you to risk – the models can drift or be poisoned over time based on the data they’re exposed to post-training. Attention is being paid to models themselves and how they can become biased and hallucinate but the burden falls on organizations to start by evaluating the initial training data to mitigate risk.

What happens when an AI tool is vulnerable and houses critical and sensitive data? In the intricate world of AI governance and security, recent events have highlighted a critical vulnerability in the widely used open-source AI framework, Ray. The ramifications of this vulnerability, CVE-2023-48022, dubbed “ShadowRay” by researchers, are profound, posing significant risks to thousands of companies across various sectors, including education, cryptocurrency, and medical and video analytics.

Jeff Sizemore, Truyo Chief Strategy Officer and cybersecurity expert says, “There are a lot of tools on the market helping companies facilitate the usage of AI, utilized by data scientists and data engineers for projects of all sizes. You’re seeing a product that enables software stack builds with Python, but when you look at the technology they’re employing you begin to see gaps in the infrastructure that lead to vulnerabilities.”

Sizemore continues, “You’ll need the same level of security in AI tools and processes as you do in all other technical aspects of your organization. These are serious issues every company needs to examine. Look at the people in your company using these tools and emphasize that safety and security must top of mind. Any tool accessing an AI model, personal data, or users must have robust authentication. Otherwise, it should not be a part of any enterprise company’s stack.”

Let’s take a look at the issues posed by this particular open-source AI model.

The Stakes of Exploiting AI Workloads

Vulnerabilities allowed attackers to exploit Ray’s computing power, potentially compromising sensitive data and even gaining remote code execution capabilities. What made this vulnerability particularly alarming is the fact that it remained disputed and unpatched, leaving countless development teams unaware of the threat it posed to their AI infrastructure.

The exploitation of CVE-2023-48022 underscores the pressing need for robust AI governance practices. Attackers targeting AI workloads can not only compromise the integrity and accuracy of AI models but also steal valuable datasets and infect models during the training phase. This not only jeopardizes the confidentiality and integrity of sensitive data but also undermines the reliability of AI-driven systems.

Navigating the Complexity of AI Deployments

One of the key challenges in mitigating such threats lies in the dynamic nature of AI workloads and the myriad of software frameworks used in their deployment. As organizations race to leverage AI for diverse use cases, security and cloud operations teams find themselves grappling with the complexity of monitoring and securing these deployments effectively. Large tech companies are objecting to abiding by the requirements to delete data, all the way back to the training data. This isn’t on a philosophical premise, it’s based on the fact that you can’t remove data that AI models were trained on. The importance lies in protecting that data in the first place when it’s leveraged for training.

Testing for Model Drift and Scrambling Data

In light of these challenges, it becomes imperative for organizations to adopt proactive measures to safeguard their AI infrastructure. Regular testing for model drift is essential to ensure the continued accuracy and integrity of AI models. Additionally, scrambling data before training mitigates the risk of data exfiltration, thereby reducing the attractiveness of AI workloads as a target for attackers.

However, addressing vulnerabilities like CVE-2023-48022 requires collaboration and vigilance across the AI ecosystem. Developers, maintainers, and security researchers must work together to identify and address vulnerabilities promptly. Transparent communication channels and clear documentation are essential to ensure that development teams are aware of potential threats and can take appropriate measures to mitigate risks.

When asked how organizations should identify vulnerabilities, Jeff Sizemore says, “You absolutely must test models continuously for drift and hallucinations, but you also have to make sure the data isn’t poisoned. If you’re not controlling the process in which you build these applications, you can essentially leave yourself unprotected. How can you trust the output? How can you trust the input? There are insufficient security mechanisms and software out there and it’s up to the organization to perform the proper due diligence.”

Best Practices for Isolation and Cloud Configuration

Furthermore, organizations must focus on isolating AI workloads and securing cloud configurations to prevent unauthorized access and lateral movement within their environments. This includes implementing strict access controls, monitoring for anomalous behavior, and regularly auditing configurations to identify and remediate potential security gaps.

Ultimately, the exploitation of vulnerabilities like CVE-2023-48022 serves as a sobering reminder of the evolving threat landscape facing AI-driven systems. By prioritizing AI governance and implementing robust security measures, organizations can mitigate risks and safeguard the integrity and confidentiality of their AI infrastructure through measures like data scrambling. In an era defined by rapid technological advancement, proactive measures are crucial to ensuring the responsible and secure deployment of AI technologies.

To learn more about Truyo’s AI Governance Platform and data scrambling capability to protect sensitive data, reach out to hello@truyo.com.

Author

Dan Clarke
President, Truyo

May 1, 2024