Deepfakes and deep media: A new battleground for security



This article is part of a special issue of VB. Read the full series here: AI and Security .

The number of diphakes - media that take an existing photo, audio or video and replace the person’s personality on it with someone else’s using AI - is growing rapidly. This is worrying, not only because such fakes can be used to influence people's opinions during elections or to entangle someone in crimes, but also because they have already been abused to create fake porn and to deceive the director of a British energy company .

Anticipating this kind of new reality, the union of academic institutions, technology firms and nonprofits is developing ways to identify misleading media generated by AI. Their work shows that detection tools are only a short-term viable solution, while the diphtheic arms race is just beginning.

Dipfake text


Previously, the best prose created by AI was more like texts from the game Mad Libs than the novel “Bunches of Wrath”, but modern language models can now write texts that are close in presentation and persuasiveness to those written by a person. For example, the GPT-2 model , released by San Francisco's OpenAI research firm, creates fragments in the style of New Yorker -style articles or scripts for Brainstorming in a matter of seconds . Researchers The Middlebury Institute's Center for Terrorism, Extremism and Counter-Terrorism suggested that the GPT-2 and other similar models could be set up to advocate the superiority of the white race, jihadist Islamism and other threatening ideologies - and this raises even more concerns.


Above: Frontend GPT-2, a trained language model from research firm OpenAI.
Image courtesy: OpenAI


In search of a system capable of detecting synthetic content, researchers at the Paul G. Allen School of Computer Science and Engineering at the University of Washington and the Allen Institute of Artificial Intelligence developed Grover , an algorithm that they claim was able to select 92% of the diphages in the test a set made up of Common Crawl Corpus open data. The team explains its success with a copywriting approach, which, according to them, helped to understand the features of the language created by AI.

A team of scientists from Harvard and MIT-IBM Watson AI Lab separately released The Giant Language Model Test Room, a web environment that attempts to determine if text was written using an AI model. Given the semantic context, she predicts which words are most likely to appear in a sentence, essentially writing her own text. If the words in the sample being tested correspond to 10, 100 or 1000 most likely words, the indicator turns green, yellow or red, respectively. In fact, she uses her own predictable text as a guideline for identifying artificially generated content.

Dipfake videos


Modern AI, generating video, is just as dangerous and has the same, if not great, capabilities as its natural counterpart. An academic article published by Hong Kong-based startup SenseTime, Nanyang University of Technology, and the Institute of Automation of the Chinese Academy of Sciences details the framework that edits footage using audio to synthesize realistic video. And researchers from Hyperconnect in Seoul recently developed the MarioNETte tool , which can manipulate the facial features of a historical figure, politician or CEO, synthesizing a face animated by another person’s movements.

However, even the most realistic dipfakes contain artifacts that issue them. “Dipfakes created by generative systems study a set of real images in a video, to which you add new images, and then generate a new video with new images,” says Ishay Rosenberg, head of the deep training group at the cybersecurity company Deep Instinct. “The resulting video is slightly different as a result of changes in the distribution of artificially generated data and in the distribution of data in the original video. These so-called "glimpses in the matrix," are what the diphtheic detectors are capable of distinguishing. "


Above: two fake videos created using the most advanced techniques.
Image courtesy of: SenseTime


Last summer, a team from the University of California at Berkeley and the University of Southern California prepared a model to search for exact “units of facial action” - data on facial movements, ticks and expressions, including when lifting the upper lip and turning the head when people frown - to identify fake videos with an accuracy of more than 90%. Similarly, in August 2018, participants in the Media Forensics Program of the US Defense Advanced Research Projects Agency (DARPA) tested systemscapable of detecting AI-generated video based on such signs as unnatural blinking, strange head movements, unusual eye color and much more.

Several startups are currently in the process of commercializing similar tools for detecting fake video images. The Amsterdam laboratory Deeptrace Labs offers a set of monitoring tools aimed at classifying dipfakes that are uploaded to social networks, video hosting platforms and disinformation networks. Dessa has proposed methods for improving fake detectors trained on fake video sets. And in July 2018, Truepic raised $ 8 million.to finance its service for the deep detection of fakes in video and photos. In December 2018, the company acquired the startup Fourandsix, whose counterfeit image detector received a DARPA license.


Above: Dipfake images edited by AI.

In addition to developing fully trained systems, a number of companies have published text corps in the hope that the research community will develop new methods for detecting fakes. To speed this process, Facebook, along with Amazon Web Services (AWS), Partnership on AI, and academics from several universities, led the Deepfake Detection Challenge. The program has a set of video samples with labels indicating that some of them were affected by artificial intelligence. In September 2019, Google released a collection of visual fakesas part of the FaceForensics test, which was created by the Technical University of Munich and the University of Naples Federico II. And most recently, researchers from SenseTime, together with Nanyang University of Technology in Singapore, have developed DeeperForensics-1.0 , a data set for detecting fakes that they claim is the largest of its kind.

Dipfake Audio


AI and machine learning are not only suitable for synthesizing video and text, they can also copy voices. Countless studies have shown that a small data set is all that is required to recreate a person’s speech. Commercial systems such as Resemble and Lyrebird require a few minutes of audio recordings, while sophisticated models, such as the latest Baidu Deep Voice implementation, can only copy voice from a 3.7-second sample.

There are not so many tools for detecting audio diphakes, but solutions are starting to appear.



A few months ago, the Resemble team released an open-source tool called Resemblyzer, which uses AI and machine learning to detect dipfakes by acquiring high-level voice samples and predicting whether they are real or simulated. After receiving an audio file with speech, he creates a mathematical representation summarizing the characteristics of the recorded voice. This allows developers to compare the similarity of the two votes or find out who is talking at the moment.

In January 2019, as part of the Google News Initiative, Google released a speech corpus containing “thousands” of phrases spoken using text-to-speech models. Samples were taken from English articles read by 68 different synthetic voices in different dialects. The case is available to all participants of ASVspoof 2019 , a contest whose goal is to promote countermeasures against fake speech.

Much to lose


None of the detectors has achieved perfect accuracy, and researchers have not yet figured out how to identify fake authorship. Deep Instinct Rosenberg expects this to inspire bad actors to spread fakes. “Even if a dipfake created by an attacker is detected, only the dipfake risks being disclosed,” he said. “For an actor, the risk of being caught is minimal, so there are few constraints against creating fakes.”

Rosenberg's theory is supported by a Deeptrace report , which found 14,698 fake videos online during its most recent count in June and July 2019. Over a seven-month period, their number increased by 84%. The vast majority of them (96%) are pornographic videos featuring women.

Given these figures, Rosenberg argues that companies that “lose a lot” due to diphakes should develop and implement deep detection technology in their products, which, in his opinion, is similar to antivirus programs. And in this area shifts have appeared; Facebook announced in early January that it would use a combination of automated and manual systems to detect fake content, and Twitter recently suggested flagging diphakes and deleting those that could be harmful.

Of course, the technologies underlying the generation of dipfakes are just tools, and they have great potential for good deeds. Michael Klozer, head of Data & Trust at Access Partnership, a consulting company, said the technology is already being used to improve medical diagnostics and cancer detection, fill gaps in the mapping of the universe, and improve the training of unmanned vehicles. Therefore, he warns against the use of general campaigns to block generative AI.

“Since leaders began to apply existing legal norms in cases of diplomatic affairs, it’s very important now not to get rid of valuable technologiesgetting rid of fakes, ”said Klozer. “Ultimately, case law and social norms regarding the use of this new technology are not ripe enough to create bright red lines that delineate fair use and abuse.”

All Articles