This is the best summary I could come up with:
The prompt that generated the video above: "A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors."
How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions back in 2020, when all of the tech we're seeing now seemed like a distant fantasy to most people.
In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments.
It's high-resolution (1920x1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity.
It achieves temporal consistency by giving the model "foresight" of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.
"Sora serves as a foundation for models that can understand and simulate the real world," OpenAI writes, "a capability we believe will be an important milestone for achieving AGI."
The original article contains 1,037 words, the summary contains 223 words. Saved 78%. I'm a bot and I'm open source!