• | 11:33 am

OpenAI steps into realm of generative video tech with Sora

ChatGPT maker’s new AI tool can generate instant videos depicting ‘complex scenes with multiple characters and accurate details’ lasting up to a minute

OpenAI steps into realm of generative video tech with Sora
[Source photo: Chetan Jha/Press Insider]

ChatGPT maker OpenAI on Thursday stepped into the domain of generative video technology by unveiling Sora, a tool that can instantly generate realistic videos from text prompts.

The new artificial intelligence system, which pushes the boundaries in the realm of artificial intelligence (AI), can instantly generate videos lasting up to a minute representing “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” OpenAI said.

Sora will be initially available to “red teamers”, or domain experts in areas such as misinformation, hateful content, and bias, to assess critical areas for harms or risks, the company said.

“We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals,” it added.

OpenAI is “starting red-teaming and offering access to a limited number of creators,” chief executive Sam Altman said, terming the moment “remarkable”.

The company said it is also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora.

To ensure the authenticity and integrity of content generated by Sora, “we plan to include C2PA (Coalition for Content Provenance and Authenticity) metadata in future if we deploy the model in an OpenAI product” it added.

How does Sora work?

Explaining how the AI system works, OpeanAI said, “Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps.”

A diffusion model is a type of a generative technique used in machine learning to create or generate new data instances that resemble the training data. It works by gradually modifying a random noise pattern into a coherent image or, in the case of Sora, a video

The process begins with what is essentially a random, meaningless pattern that looks like static on a television screen. This noise does not contain any useful information or resemble the final video in any way.

Over many steps, the model systematically alters this initial noise pattern by using the input text prompt as a guide to shape the noise into a video that matches the described scene. Each step in the process reduces the randomness (noise) and introduces more specific features and details, guided by the patterns the model learned during its training.

As the model progresses through its steps, it gradually eliminates the randomness and replaces it with elements that make up the final video, such as characters, objects, and backgrounds that align with the input text

What else does Sora do?

“Sora is capable of generating entire videos all at once or extending generated videos to make them longer. By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily,” it said.

The new tool is even capable of generating video from a still image.

“In addition to being able to generate a video solely from text instructions, the model is able to take an existing still image and generate a video from it, animating the image’s contents with accuracy and attention to small detail. The model can also take an existing video and extend it or fill in missing frames,” it added.

Other generative video tech

Facebook parent Meta Platforms, Google, and Runway AI have earlier come up with their own versions of the technology that transforms text into videos.

The new tool has the potential to improve creative workflows but also raises questions about its impact on the careers of creative professionals.

The likely role it may play in disseminating misinformation, particularly during critical election periods, is also a cause for concern.

OpenAI said it will be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for the new technology.


John Melvin Konath is the Managing Editor at Press Insider. John has close to two decades of experience in managing and editing a range of domestic and global publications, notably from Southeast Asia, the Middle East and North America. More

More Top Stories: