Ethereum co-founder Vitalik Buterin claims it is a “bad idea” to use artificial intelligence (AI) for governance. In an X post on Saturday, Buterin wrote:
“If you use an AI to allocate funding for contributions, people WILL put a jailbreak plus “gimme all the money” in as many places as they can.”
Why AI governance is flawed
Buterin’s post was a response to Eito Miyamura, co-founder and CEO of EdisonWatch, an AI data governance platchorm who revealed a fatal flaw in ChatGPT. In a post on Friday, Miyamura wrote that the addition of full support for MCP (Model Context Protocol) tools on ChatGPT has made the AI agent susceptible to exploitation.
The update, which came into effect on Wednesday, allows ChatGPT to connect and read data from several apps, including Gmail, Calendar, and Notion.
Miyamura noted that with just an email address, the update has made it possible to “exfiltrate all your private information.” Miscreants can gain access to your data in three simple steps, Miyamura explained:
First, the attackers send a malicious calendar invite with a jailbreak prompt to the intended victim. A jailbreak prompt refers to code that allows an attacker to remove restrictions and gain administrative access.
Miyamura noted that the victim does not have to accept the attacker’s malicious invite for the data leak to take place.
The second step involves waiting for the intended victim to seek ChatGPT’s help to prepare for their day. Finally, once ChatGPT reads the jailbroken calendar invite, it gets compromised—the attacker can completely hijack the AI tool, make it search the victim’s private emails, and send the data to the attacker’s email.
Buterin’s alternative
Buterin suggests using the info finance approach to AI governance. The info finance approach consists of an open market where different developers can contribute their models. The market has a spot-check mechanism for such models, which can be triggered by anyone and evaluated by a human jury, Buterin wrote.
In a separate post, Buterin explained that the individual human jurors will be aided by large language models (LLMs).
According to Buterin, this type of ‘institution design’ approach is “inherently more robust.” This is because it offers model diversity in real time and creates incentives for both model developers and external speculators to police and correct for issues.
While many are excited at the prospect of having “AI as a governor,” Buterin warned:
“I think doing this is risky both for traditional AI safety reasons and for near-term “this will create a big value-destructive splat” reasons.”