Sunday, December 8, 2024

Pyramid Flow open source AI video generator launches


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The number of AI video generation models continues to grow with a new one, Pyramid Flow, launching this week and offering high quality video clips up to 10 seconds in length — quickly, and all open source.

Developed by a collaboration of researchers from Peking University, Beijing University of Posts and Telecommunications, and Kuaishou Technology — the latter the creator of the well-reviewed proprietary Kling AI video generator — Pyramid Flow leverages a new technique wherein a single AI model generates video in stages, most of them low resolution, saving only a full-res version for the end of its generation process.

It’s available as raw code for download on Hugging Face and Github, and can be run in an inference shell here but requires the user to download and run the model code on their own machine.

At inference, the model can generate a 5-second, 384p video in just 56 seconds—on par with or faster than many full-sequence diffusion counterparts — though Runway’s Gen 3-Alpha Turbo still takes cake in terms of speed of AI video generation, coming in at under one minute and often times 10-20 seconds in our tests.

We haven’t had a chance to test Pyramid Flow yet, but the videos posted by the model creators appear to be incredibly lifelike, high enough resolution, and compelling — analogous to those of proprietary offerings. You can see various examples here on its Github project page.

Indeed, Pyramid Flow is available designed now to download and use — even for commercial/enterprise purposes — and is designed to compete directly with paid proprietary offerings such as Runway’s Gen-3 Alpha, Luma’s Dream Machine, Kling, and Haulio, which can cost hundreds of even thousands of dollars a year for users on unlimited generation subscriptions.

As the race between various AI video providers to gain users continues, Pyramid Flow aims to bring more efficiency and flexibility to developers, artists, and creators seeking advanced video generation capabilities.

A new technique for high-quality AI videos: ‘pyramidal flow matching’

AI video generation is a computationally intensive task that typically involves modeling large spatiotemporal spaces. Traditional methods often require separate models for different stages of the process, which limits flexibility and increases the complexity of training.

Pyramid Flow is built on the concept of pyramidal flow matching, a method that drastically cuts down the computational cost of video generation while maintaining high visual quality, completing the video generation process as a series of “pyramid” stages, with only the final stage operating at full resolution.

It’s described in a pre-reviewed paper, “Pyramidal Flow Matching for Efficient Video Generative Modeling,” submitted to open access science journal arXiv on October 8, 2024.

The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others are from Kuaishou Technology.

As they write, the ability to compress and optimize video generation at different stages leads to faster convergence during training, allowing Pyramid Flow to generate more samples per training batch.

For example, the proposed pyramidal flow reduces the token count by a factor of four compared to traditional diffusion models, which results in more efficient training.

The model can produce 5- to 10-second videos at 768p resolution and 24 frames per second, all while being trained on open-source datasets. Specifically, the paper states that Pyramid Flow was trained on trained on:

  • LAION-5B, a large dataset for multimodal AI research.
  • CC-12M, a dataset of web-crawled image-text pairs.
  • SA-1B, which features high-quality, non-blurred images.
  • WebVid-10M and OpenVid-1M, which are video datasets widely used for text-to-video generation.

In total, the authors curated approximately 10 million single-shot videos.

However, many of these “public” or “open source” datasets have in recent years come under fire from critics for including copyrighted material without permission or informed consent of the copyright holders, and LAION-5B in particular accused of hosting child sexual abuse material.

Separately, Runway is among the companies being sued by artists in a class action lawsuit for training on materials without permission, compensation, or consent — allegedly in violation of U.S. copyright. The case remains being argued in court, for now.

Permissively licensed, open source for commercial usage

Pyramid Flow is released under the MIT License, allowing for a wide range of uses, including commercial applications, modifications, and redistribution, provided the copyright notice is preserved.

This makes Pyramid Flow an attractive option for developers and companies looking to integrate the model into proprietary systems, and could challenge Luma AI and Runway as both look to offer paid application programming interfaces for developers seeking to integrate their proprietary AI video generation technology into customer or employee-facing apps.

Yet those proprietary models already exist as inferences suitable for developers, while Pyramid Flow has a demo inference on Hugging Face, it is not suitable for building full applications atop it and users would need to host their own version of an inference, which could also be costly, despite the model itself being “free.”

In addition, Pyramid Flow may prove to be enticing to film studios looking to leverage AI to gain efficiencies, cut costs, and explore new creative tools. One major film studio, Lionsgate — owner of the John Wick and Twilight films franchises, among many other tiles — recently inked a deal for an unspecified sum with Runway to train a custom AI video generation model. Furthermore, Titanic and Terminator director James Cameron joined the board of AI video and image model provider Stability (the latter also subject to the same class-action lawsuit from artists as Runway).

Using Pyramid Flow, Lionsgate or any other film studio could fine-tune the open source version without paying a third party company. However, they would still need to have on hand or contract out the developer talent and computing resources necessary to do so, which may make partnering with established AI providers such as Runway more appealing, since that company and others like it already have the AI engineering talent at their disposal in house.

The research team behind Pyramidal Flow Matching has also made a commitment to openness and accessibility. All code and model weights will be made freely available to the public through their official project page, ensuring that researchers and developers around the world can utilize and build upon this work.

Despite its strengths, Pyramid Flow does have some limitations. For now, it lacks some of the advanced fine-tuning capabilities found in models like Runway Gen-3 Alpha, which offers precise control over cinematic elements like camera angles, keyframes, and human gestures. Similarly, Luma’s Dream Machine provides advanced camera control options that Pyramid Flow is still catching up to.

Moreover, the relatively recent launch of Pyramid Flow means its ecosystem—while robust—isn’t as mature as those of its competitors.

Looking ahead: AI video race shows no signs of slowing

As the AI video generation market continues to evolve, Pyramid Flow’s launch signals a shift toward more accessible, open-source solutions that can compete with proprietary offerings such as Runway and Luma.

For now, it offers a solid alternative for those looking to avoid the cost and limitations of closed models, while providing impressive video quality on par with its more commercial counterparts.

In the coming months, developers and creators will likely keep a close eye on Pyramid Flow’s growth. With the potential for further improvements and optimizations, it could very well become a go-to tool in the arsenal of video content creators everywhere. All the companies and researchers are currently battling both for technological supremacy and users.

Meanwhile, OpenAI’s Sora, first shown off in February 2024, remains nowhere to be seen — outside of its collaborations with a handful of small early alpha users.



Source link

Related articles

Share article

spot_img

Latest articles