Skip to main content

Songs on the Security of Networks
a blog by Michał "rysiek" Woźniak

AI will compromise your cybersecurity posture

Yes, “AI” will compromise your information security posture. No, not through some mythical self-aware galaxy-brain entity magically cracking your passwords in seconds or “autonomously” exploiting new vulnerabilities.

It’s way more mundane.

When immensely complex, poorly-understood systems get hurriedly integrated into your toolset and workflow, or deployed in your infrastructure, what inevitably follows is leaks, compromises, downtime, and a whole lot of grief.

Complexity means cost and risk

LLM-based systems are insanely complex, both on the conceptual level, and on the implementation level. Complexity has real cost and introduces very real risk. These costs and these risks are enormous, poorly understood – and usually just hand-waved away. As Suha Hussain puts it in a video I’ll discuss a bit later:

Machine learning is not a quick add-on, but something that will fundamentally change your system security posture.

The amount of risk companies and organizations take on by using, integrating, or implementing LLM-based – or more broadly, machine learning-based – systems is massive. And they have to eat all of that risk themselves: suppliers of these systems simply refuse to take any real responsibility for the tools they provide and problems they cause.

After all, taking responsibility is bad for the hype. And the hype is what makes the line go up.

The Hype

An important part of pushing that hype is inflating expectations and generating fear of missing out, one way or another. What better way to generate it than by using actual fear?

What if spicy autocomplete is in fact all that it is cracked up to be, and more? What if some kid somewhere with access to some AI-chatbot can break all your passwords or automagically exploit vulnerabilities, and just waltz into your internal systems? What if some AI agent can indeed “autonomously” break through your defenses and wreak havoc on your internal infrastructure?

You can’t prove that’s not the case! And your data and cybersecurity is on the line! Be afraid! Buy our “AI”-based security thingamajig to protect yourself!..

It doesn’t matter if you do actually buy that product, by the way. What matters is that investors believe you might. This whole theater is not for you, it’s for VCs, angel investors, and whoever has spare cash to buy some stock. The hype itself is the product.

Allow me to demonstrate what I mean by this.

Over two years ago “AI” supposedly could crack our passwords “in seconds”. Spoiler: it couldn’t, and today our passwords are no worse for wear.

The source of a sudden deluge of breathless headlines about AI-cracked passwords – and boy were there quite a few! – was a website of a particular project called “PassGAN”. It had it all: scary charts, scary statistics, scary design, and social media integrations to generate scary buzz.

What it lacked was technical details. What hardware and infrastructure was used to crack “51% popular passwords in seconds”? The difference between doing that on a single laptop GPU versus running it on a large compute cluster is pretty relevant. What does “cracking” a password actually mean here – presumably reversing a hash? What hashing function, then, was used to hash them in the first place? How does it compare against John the Ripper and other non-“AI” tools that had been out there for ages? And so on.

Dan Goodin of Ars Technica did a fantastic teardown of PassGAN. The long and short of it is:

As with so many things involving AI, the claims are served with a generous portion of smoke and mirrors. PassGAN, as the tool is dubbed, performs no better than more conventional cracking methods. In short, anything PassGAN can do, these more tried and true tools do as well or better.

If anyone was actually trying to crack any passwords, PassGAN was not a tool they’d use, simply because it wasn’t actually effective. In no way was PassGAN a real threat to your information security.

Exploiting “87% of one-day vulnerabilities”

Another example: over a year ago GPT-4 was supposedly able to “autonomously” exploit one-day vulnerabilities just based on CVEs. Specifically, 87% of them.

Even more specifically, that’s 87% of exactly 15 (yes, fifteen) vulnerabilities, hand-picked by the researchers for that study. For those keeping score at home, that comes out to thirteen “exploited” vulnerabilities. And even that only when the CVE included example exploit code.

In other words, code regurgitation machine was able to regurgitate code when example code was provided to it. Again, in no way is this an actual, real threat to you, your infrastructure, or your data.

“AI-orchestrated” cyberattack

A fresh example of generating hype through inflated claims and fear comes from Anthropic. The company behind an LLM-based programming-focused chatbot Claude pumps the hype by claiming their chatbot was used in a “first reported AI-orchestrated cyber-espionage campaign”.

Anthropic – who has vested interest in convincing everyone that their coding automation product is the next best thing since sliced bread – makes pretty bombastic claims, using sciencey-sounding language; for example:

Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically (perhaps 4-6 critical decision points per hacking campaign). (…) At the peak of its attack, the AI made thousands of requests, often multiple per second—an attack speed that would have been, for human hackers, simply impossible to match.

Thing is, that just describes automation. That’s what computers were invented for.

A small script, say in Bash or Python, that repeats certain tedious actions during an attack (for example, generates a list of API endpoints based on a pattern to try a known exploit against) can easily “perform 80-90%” of a campaign that employs it. It can make “thousands of requests, often multiple per second” with curl and a for loop. And “4-6 critical decision points” can just as easily mean a few simple questions asked by that script, for instance: what API endpoint to hit when a given target does not seem to expose the attacked service on the expected one.

And while LLM chatbots somewhat expand the scope of what can be automated, so did scripting languages and other decidedly non-magic technologies at the time they were introduced. Anyone making a huge deal out of a cyberattack being “orchestrated” using Bash or Python would be treated like a clown, and so should Anthropic for making grandiose claims just because somebody actually managed to use Claude for something.

There is, however, one very important point that Anthropic buries in their write-up:

At this point [the attackers] had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.

The real story here is not that an LLM-based chatbot is somehow “orchestrating” a cyber-espionage campaign by itself. The real story is that a tech company, whose valuation is at around $180 billion-with-a-b, put out a product – “extensively trained to avoid harmful behaviors” – that is so hilariously unsafe that its guardrails can be subverted by a tactic a 13-year-old uses when they want to prank-call someone.

And that Anthropic refuses to take responsibility for that unsafe product.

Consider this: if Anthropic actually believed their own hype about Claude being so extremely powerful, dangerous, and able to autonomously “orchestrate” attacks, they should be terrified about how trivial it is to subvert it, and would take it offline until they fix that. I am not holding my breath, though.

The boring reality

The way to secure your infrastructure and data remains the same regardless of whether a given attack is automated using Bash, Python, or an LLM chatbot: solid threat modelling, good security engineering, regular updates, backups, training, and so on. If there is nothing that can be exploited, no amount of automation will make it exploitable.

The way “AI” is going to compromise your cybersecurity is not through some magical autonomous exploitation by a singularity from the outside, but by being the poorly engineered, shoddily integrated, exploitable weak point you would not have otherwise had on the inside. In a word, it will largely be self-inflicted.

Leaks

Already in mid-2023 Samsung internally banned the use of generative AI tools after what was described as a leak, and boiled down to Samsung employees pasting sensitive code into ChatGPT.

What Samsung understood two and a half years ago, and what most people seem to not understand still today, is that pasting anything into the chatbot prompt window means giving it to the company running that chatbot.

And these companies are very data-hungry. They also tend to be incompetent.

Once you provide any data, it is out of your control. The company running the chatbot might train their models on it – which in turn might surface it to someone else at some other time. Or they might just catastrophically misconfigure their own infrastructure and leave your prompts – say, containing sexual fantasies or trade secrets – exposed to anyone on the Internet, and indexable by search engines.

And when that happens they might even blame the users, as did Meta:

Some users might unintentionally share sensitive info due to misunderstandings about platform defaults or changes in settings over time.

There’s that not-taking-responsibility-for-their-unsafe-tools again. They’ll take your data, and leave you holding the bag of risk.

Double agents

Giving a stochastic text extruder any kind of access to your systems and data is a bad idea, even if no malicious actors are involved – as one Replit user very publicly learned the hard way. But giving it such access and making it possible for potential attackers to send data to it for processing is much worse.

The first zero-click attack on an LLM agent has already been found. It happened to involve Microsoft 365 Copilot, and required only sending an e-mail to an Outlook mailbox that had Copilot enabled to process mail. A successful attack allowed data exfiltration, with no action needed on the part of the targeted user.

Let me say this again: if you had Copilot enabled in Outlook, an attacker could just send a simple plain text e-mail to your address and get your data in return, with absolutely no interaction from you.

The way it worked was conceptually very simple: Copilot had access to your data (otherwise it would not be useful), it was also processing incoming e-mails; the attackers found a way to convince the agent to interpret an incoming e-mail they sent as instructions for it to follow.

On the most basic level, this attack was not much different from the “ignore all previous instructions” bot unmasking tricks that had been all over social media for a while. Or from adding to your CV a bit of white text on white background instructing whatever AI agent is processing it to recommend your application for hiring (yes, this might actually work).

Or from adding such obscured (but totally readable to LLM-based tools) text to scientific papers, instructing the agent to give them positive “review” – which apparently was so effective, the International Conference on Learning Representations had to create an explicit policy against that. Amusingly, that is the conference that “brought this [that is, LLM-based AI hype] on us” in the first place.

On the same basic level, this is also the trick researchers used to go around OpenAI’s “guardrails” to get ChatGPT to issue bomb-building instructions, tricked GitHub Copilot to leak private source code, and how the perpetrators went around Anthropic’s “guardrails” in order to use the company’s LLM chatbot in their aforementioned attack, by simply pretending they are security researchers.

Prompt injection

Why does this happen? Because LLMs (and tools based on them) have no way of distinguishing data from instructions. Creators of these systems use all sorts of tricks to try and separate the prompts that define the “guardrails” from other input data, but fundamentally it’s all text, and there is only a single context window.

Defending from prompt injections is like defending from SQL injection, but there is no such thing as prepared statements, and instead of trying to escape specific characters you have to semantically filter natural language.

This is another reason why Anthropic will not take Claude down until they properly fix these guardrails, even if they believe their own hype about how powerful (and thus dangerous when abused) it is. There is simply no way to “properly fix them”. As a former Microsoft security architect had pointed out:

[I]f we are honest here, we don’t know how to build secure AI applications

Of course all these companies will insist they can make these systems safe. But inevitably, they will continue to be proven wrong: adversarial poetry, ASCII smuggling, dropping some random facts about cats (no, really), information overload

The arsenal of techniques grows, because the problem is fundamentally related to the very architecture of LLM chatbots and agents.

Breaking assumptions

Integrating any kind of software or external service into an existing infrastructure always risks undermining security assumptions, and creating unexpected vulnerabilities.

Slack decided to push AI down users’ throats, and inevitably researchers found a way to exfiltrate data from private channels via an indirect prompt injection. An attacker did not need to be in the private channel they were trying to exfiltrate data from, and the victim did not have to be in the public channel the attacker used to execute the attack.

Gemini integration within Google Drive apparently had a “feature” where it would scan PDFs without explicit permission from the owner of these PDFs. Google claims that was not the case and the settings making the files inaccessible to Gemini were not enabled. The person in question claims they were.

Whether or not we trust Google here, it’s hard to deny settings related to disabling LLM agents’ access to documents in Google Workplace are hard to find, unreliable, and constantly shifting. That in and of itself is an information security issue (not to mention it being a compliance issue as well). And Google’s interface decisions are to blame for this confusion. This alone undermines your cybersecurity stance, if you happen to be stuck with Google’s office productivity suite.

Microsoft had it’s own, way better documented problem, where a user who did not have access to a particular file in SharePoint could just ask Copilot to provide them with its contents. Completely ignoring access controls.

You might think you can defend from that just by making certain files private, or (in larger organizations) unavailable to certain users. But as the Gemini example above shows, it might not be as simple because relevant settings might be confusing or hidden.

Or… they might just not work at all.

Bugs. So many bugs.

Microsoft made it possible to set a policy (NoUsersCanAccessAgent) in Microsoft 365 that would disable LLM agents (plural, there are dozens of them) for specific users. Unfortunately it seems to have been implemented with the level of competence and attention to detail we have grown to expect from the company – which is to say, it did not work:

Shortly after the May 2025 rollout of 107 Copilot Agents in Microsoft 365 tenants, security specialists discovered that the “Data Access” restriction meant to block agent availability is being ignored.

(…)

Despite administrators configuring the Copilot Agent Access Policy to disable user access, certain Microsoft-published and third-party agents remain readily installable, potentially exposing sensitive corporate data and workflows to unauthorized use.

This, of course, underlines the importance of an audit trail. Even if access controls were ignored, and even when agents turned out to be available to users whom they should not be available to, at least there are logs that can be used to investigate any unauthorized access, right? After all, these are serious tools, built by serious companies and used by serious institutions (banks, governments, and the like). Legal compliance is key in a lot of such places, and compliance requires auditability.

It would be pretty bad if it was possible for a malicious insider, who used these agents to access something they shouldn’t have, to simply ask for that fact not to be included in the audit log. Which, of course, turned out to be exactly the case:

On July 4th, I came across a problem in M365 Copilot: Sometimes it would access a file and return the information, but the audit log would not reflect that. Upon testing further, I discovered that I could simply ask Copilot to behave in that manner, and it would. That made it possible to access a file without leaving a trace.

In June 2024 Microsoft’s president, Brad Smith, promised in US Congress that security will be the top priority, “more important even than the company’s work on artificial intelligence.”

No wonder, then, that the company treated this as an important vulnerability. So important, in fact, that it decided not to inform anyone about it, even after the problem got fixed. If you work in compliance and your company uses Microsoft 365, I cannot imagine how thrilled you must be about that! Can you trust your audit logs from the last year or two? Who knows!

Code quality

Even if you are not giving these LLMs access to any of your data and just use them to generate some code, if you’re planning to use that code anywhere near a production system, you should probably think twice:

Businesses using artificial intelligence to generate code are experiencing downtime and security issues. The team at Sonar, a provider of code quality and security products, has heard first-hand stories of consistent outages at even major financial institutions where the developers responsible for the code blame the AI.

This is probably a good time for a reminder that availability is also a part of what information security is about.

But it gets worse. It will come as no surprise to anyone at this stage that LLM chatbots “hallucinate”. Consider what might happen if somewhere in thousands of lines of AI-generated code there is a “hallucinated” dependency? That seems to happen quite often:

“[R]esearchers (…) found that AI models hallucinated software package names at surprisingly high rates of frequency and repetitiveness – with Gemini, the AI service from Google, referencing at least one hallucinated package in response to nearly two-thirds of all prompts issued by the researchers.”

The code referencing a hallucinated dependency might of course not run; but that’s the less-bad scenario. You see, those “hallucinated” dependency names are predictable. What if an attacker creates a malicious package with such a name and pushes it out to a public package repository?

“[T]he researchers also uploaded a “dummy” package with one of the hallucinated names to a public repository and found that it was downloaded more than 30,000 times in a matter of weeks.”

Congratulations, you just got slopsquatted.

Roll your own?

If you are not interested in using the clumsily integrated, inherently prompt-injectable Big Tech LLMs, and instead you’re thinking of rolling your own more specialized machine learning model for some reason, you’re not in the clear either.

I quoted Suha Hussain at the beginning of this piece. Her work on vulnerability of machine learning pipelines is as important as it is chilling. If you’re thinking of training your own models, her 2024 talk on incubated machine learning exploits is a must-see:

Machine learning (ML) pipelines are vulnerable to model backdoors that compromise the integrity of the underlying system. Although many backdoor attacks limit the attack surface to the model, ML models are not standalone objects. Instead, they are artifacts built using a wide range of tools and embedded into pipelines with many interacting components.

In this talk, we introduce incubated ML exploits in which attackers inject model backdoors into ML pipelines using input-handling bugs in ML tools. Using a language-theoretic security (LangSec) framework, we systematically exploited ML model serialization bugs in popular tools to construct backdoors.

Danger ahead

In a way, people and companies fear-hyping generative AI are right that their chatbots and related tools pose a clear and present danger to your cybersecurity. But instead of being some nebulous, omnipotent malicious entities, they are dangerous because of their complexity, the recklessness with which they are promoted, and the break-neck speed at which they are being integrated into existing systems and workflows without proper threat modelling, testing, and security analysis.

If you are considering implementing or using any such tool, consider carefully the cost and risk associated with that decision. And if you’re worried about “AI-powered” attacks, don’t – and focus on the fundamentals instead.