Electrical engineer Gilbert Herrera was appointed research director of the US National Security Agency in late 2021, just as an AI revolution was brewing inside the US tech industry.
The NSA, sometimes jokingly said to stand for No Such Agency, has long hired top math and computer science talent. Its technical leaders have been early and avid users of advanced computing and AI. And yet when Herrera spoke with me by phone about the implications of the latest AI boom from NSA headquarters in Fort Meade, Maryland, it seemed that, like many others, the agency has been stunned by the recent success of the large language models behind ChatGPT and other hit AI products. The conversation has been lightly edited for clarity and length.
How big of a surprise was the ChatGPT moment to the NSA?
Oh, I thought your first question was going to be “what did the NSA learn from the Ark of the Covenant?” That’s been a recurring one since about 1939. I’d love to tell you, but I can’t.
What I think everybody learned from the ChatGPT moment is that if you throw enough data and enough computing resources at AI, these emergent properties appear.
The NSA really views artificial intelligence as at the frontier of a long history of using automation to perform our missions with computing. AI has long been viewed as ways that we could operate smarter and faster and at scale. And so we’ve been involved in research leading to this moment for well over 20 years.
Large language models have been around long before generative pretrained (GPT) models. But this “ChatGPT moment”—once you could ask it to write a joke, or once you can engage in a conversation—that really differentiates it from other work that we and others have done.
The NSA and its counterparts among US allies have occasionally developed important technologies before anyone else but kept it a secret, like public key cryptography in the 1970s. Did the same thing perhaps happen with large language models?
At the NSA we couldn’t have created these big transformer models, because we could not use the data. We cannot use US citizen’s data. Another thing is the budget. I listened to a podcast where someone shared a Microsoft earnings call, and they said they were spending $10 billion a quarter on platform costs. [The total US intelligence budget in 2023 was $100 billion.]
It really has to be people that have enough money for capital investment that is tens of billions and [who] have access to the kind of data that can produce these emergent properties. And so it really is the hyperscalers [largest cloud companies] and potentially governments that don’t care about personal privacy, don’t have to follow personal privacy laws, and don’t have an issue with stealing data. And I’ll leave it to your imagination as to who that may be.
Doesn’t that put the NSA—and the United States—at a disadvantage in intelligence gathering and processing?
II’ll push back a little bit: It doesn’t put us at a big disadvantage. We kind of need to work around it, and I’ll come to that.
It’s not a huge disadvantage for our responsibility, which is dealing with nation-state targets. If you look at other applications, it may make it more difficult for some of our colleagues that deal with domestic intelligence. But the intelligence community is going to need to find a path to using commercial language models and respecting privacy and personal liberties. [The NSA is prohibited from collecting domestic intelligence, although multiple whistleblowers have warned that it does scoop up US data.]
How could commercially available large language models be useful to the NSA?
One of the things that these large models have demonstrated they are pretty good at is reverse engineering and automating cyber defenses. And those things can be accomplished without being overly constrained when it comes to laws related to personal privacy [since it could be trained on software code that isn’t as sensitive].
Let’s say that we wanted to create an analyst “copilot,” something that uses a GPT-type thing to help an analyst analyze data. If we wanted to do that. Then we’d need something with analytical skills in American culture and the English language, and that would be really hard for us to do, given the various laws [about accessing US data].
Hypothetical we could use something like RAG [retrieval augmented generation, a technique in which a language model responds to a query by summarizing trusted information] to utilize an LLM to only look at data that had been through our compliance scrutiny.
How would the law complicate the development of language models at the NSA?
We might need to keep certain datasets that were used to train models for very long periods of time, and it raises a question of their data retention issues. The other issue is, imagine getting a lot of information and it was the entire internet. You might have US persons’ data on it and might have copyrighted data. But you don’t look at it [when feeding it to an AI model]. At what time do all the laws apply?
I think it will be difficult for the intelligence community to replicate something like GPT-10, because we already know the scale of investment they have. And they can do things with data that nobody in government would ever think of doing.
Does widespread use of AI create new security problems for the US?
On day one of the release of ChatGPT, there was evidence of improved phishing attacks. And if it improves their success rate from one in 100,000 to one in 10,000. That’s an order of magnitude improvement. Artificial intelligence is always going to favor people who don’t have to worry about quantifying margins and uncertainties in the usage of the product.
Is AI opening a new frontier of information security then?
They’re going to be huge new security threats. That’s one of the reasons why we formed an AI Security Center. There are a lot of things you can do to harm a model. You can steal models and engineer on them, and there are inversion attacks where you can try to steal some of the private data out of them.
The first line of defense in AI security is good cybersecurity. It means protecting your models, protecting the data that’s in there, protecting them from being stolen or manipulated.
Title: The NSA Warns of Potential AI Advantage for US Adversaries through Private Data Mining
Introduction
In an increasingly interconnected world, data has become a valuable asset, and its potential for exploitation is a growing concern. As artificial intelligence (AI) continues to advance, the National Security Agency (NSA) has cautioned about the potential advantage that US adversaries could gain through private data mining. This article explores the implications of this warning and the need for robust safeguards to protect sensitive information.
The Rise of Artificial Intelligence
Artificial intelligence has revolutionized various industries, from healthcare and finance to transportation and entertainment. AI algorithms have the ability to process vast amounts of data, identify patterns, and make predictions with remarkable accuracy. This technology holds immense potential for enhancing efficiency and improving decision-making processes across different sectors.
However, the same AI capabilities that offer numerous benefits also raise concerns about privacy and security. The NSA has expressed apprehension that US adversaries could exploit private data mining using AI algorithms to gain an advantage in various domains, including national security.
Private Data Mining and National Security
Private data mining refers to the practice of extracting valuable insights from large volumes of personal data collected by private companies or government agencies. This data includes everything from social media posts and online purchases to medical records and financial transactions. By leveraging AI algorithms, adversaries can analyze this data to identify vulnerabilities, patterns, or even predict future behaviors.
The NSA warns that such private data mining can provide adversaries with a significant advantage in areas like cyber warfare, espionage, and social engineering. For instance, by analyzing patterns in individuals’ online behavior, adversaries could identify potential targets for cyberattacks or manipulate public sentiment through targeted disinformation campaigns.
Protecting Sensitive Information
To counter this potential threat, it is crucial to implement robust safeguards to protect sensitive information from falling into the wrong hands. The NSA emphasizes the importance of collaboration between government agencies, private companies, and individuals to address this challenge effectively.
1. Strengthening Cybersecurity: Government agencies and private companies must invest in robust cybersecurity measures to protect data from unauthorized access. This includes implementing strong encryption protocols, regularly updating software, and conducting thorough security audits.
2. Enhanced Data Privacy Regulations: Governments should enact comprehensive data privacy regulations that ensure individuals have control over their personal information. Stricter regulations can limit the amount of data collected, increase transparency regarding data usage, and empower individuals with the right to consent or opt-out of data collection practices.
3. Ethical AI Development: AI algorithms should be developed with ethical considerations in mind. This involves ensuring fairness, transparency, and accountability in AI decision-making processes. Additionally, AI systems should be designed to minimize biases and prevent the misuse of personal data.
4. Public Awareness and Education: Raising public awareness about the potential risks associated with private data mining is crucial. Individuals should be educated about the importance of protecting their personal information and adopting secure online practices, such as using strong passwords, being cautious of phishing attempts, and regularly updating their privacy settings.
Conclusion
As AI technology continues to advance, the potential advantage that US adversaries could gain through private data mining poses a significant concern for national security. The NSA’s cautionary warning highlights the need for proactive measures to protect sensitive information and mitigate potential risks. By strengthening cybersecurity, enacting robust data privacy regulations, promoting ethical AI development, and raising public awareness, we can collectively safeguard our personal data and ensure a secure digital future.