Could AI ruin the election?

The artificial intelligence space is strange. Significantly overfunded, overhyped and overcovered — in part because AI can easily produce bad, generic copywriting, which is how many journalists presently earn their livelihoods. Though AI tools have rapidly advanced over the past year, few look to be truly society-morphing, and it’s fairly obvious when something is a product of AI, or it hasn’t mattered. Do you really care if a cliché-ridden cover letter was produced by an unimaginative human mind or a chatbot?

But Thursday’s announcement of OpenAI’s new video-creating tool, Sora, is something different. Just type your request…

The artificial intelligence space is strange. Significantly overfunded, overhyped and overcovered — in part because AI can easily produce bad, generic copywriting, which is how many journalists presently earn their livelihoods. Though AI tools have rapidly advanced over the past year, few look to be truly society-morphing, and it’s fairly obvious when something is a product of AI, or it hasn’t mattered. Do you really care if a cliché-ridden cover letter was produced by an unimaginative human mind or a chatbot?

But Thursday’s announcement of OpenAI’s new video-creating tool, Sora, is something different. Just type your request into a text box — such as “historical footage of California during the gold rush” — and it will generate video that is almost indistinguishable from reality to the untrained eye. At the moment, Sora is locked down, only available to a small group at OpenAI, but other AI companies will not be far behind — and the ability for tools like this to ruin lives, reputations and potentially elections is immense.

In the recent past, AI footage has been inherently sloppy. The most famous example of this — from only ten months ago — was a horrific video of an AI Will Smith eating spaghetti. It was so sloppy as the software was generating an image of Will Smith, and then trying to guess what the next frame would look like if were the start of a video. It’s the AI equivalent of bad stop-motion animation; and until recently, this meant the best AI-generated footage stuttered, the physics were off (objects tended to bend as they turned, or the camera turned around them) and elements were not consistent from frame to frame. If you requested footage of a woman walking down a sidewalk, the signs would flicker through various random word-like symbols, and her face and clothes would be markedly different to how “she” looked at the start of the video. With Sora, that’s all gone now.

Elements are consistent through the minute-long clips, as are the reflections and textures. The physics are lifelike — in “footage” of a cat waking up its sleeping owner, the blanket crumples in a fully realistic way. Though there are still some visual tics, like odd hands, stuttering frames and camera movement that just feels off, they are relatively minor. Even if you have an eye for what AI footage looks like, their best sample “footage” is staggeringly lifelike; be that a Land Cruiser driving down a dirt path, a drone shot of Big Sur, a man reading on a cloud, a Chinese New Year celebration or a green chameleon or Victoria crowned pigeon in close up.

What’s particularly alarming is that the same intuitions we use to tell if something is staged or faked don’t work anymore. If you see shaky footage of an event, seemingly taken off a mobile phone as someone runs away, it looks a lot more credible than clean, Hollywood-esque images; but for new models like Sora, this rawer “authentic” style helps to conceal some of the detectable flaws of AI footage. If I didn’t tell you, you would think this video of someone looking out the window of a Tokyo train is real. Instead, it was made with no effort by someone typing “reflections in the window of a train traveling through the Tokyo suburbs” into a text box. This video from LA isn’t fake — apparently — but it easily could be, and there’d be no way to tell.

As with every new technology there are some obvious, acceptable economic and social consequences of this. Indie films are going to get a lot better looking, pornography will get more competition, video games will look even more realistic and there’s going to be a decreased need for real stock footage. OpenAI places heavy restrictions on the kind of content you can generate, but expect similar tools, with far fewer limitations, to be available within a matter of month. The detection systems to separate real footage from AI-generated are completely losing this arms race.

There are ways for social media feeds and AI companies to combat this. OpenAI and other AI companies could code hidden details into AI-produced images and videos, which would act much like the security “holographs” on cash, and allow social media companies to instantly recognize and label AI-produced content for what it is. Tools such as PhotoDNA are already used to detect CSAM and other illegal content, so this isn’t impossible, but AI companies are spending far more time advancing their tools than learning how to restrain them. Silicon Valley is opening Pandora’s Box, and no legislator, investor, or concerned member of the public can stop them.

It’s only a matter of time before Facebook’s news feed circulates completely realistic footage of a politician being assassinated or a building being blown up; just as couples will break up because the guy stumbles upon a sex tape featuring his girlfriend, not knowing that this completely lifelike footage was conjured out of nothing with AI, taking her face from her Instagram. Eventually someone will wind up in a police station, being asked questions about crimes he didn’t commit, seen on CCTV footage that looks completely real, but isn’t.

This sounds like science-fiction fear-mongering, but these tools are getting so realistic, so quickly, that it’s not impossible for the “October Surprise” of this election to be footage that isn’t real.

Ross Anderson

Ross Anderson is the life editor of The Spectator World and a regular contributor to the New York Sun.