Thursday, May 1, 2025

Alibaba Cloud open-sources advanced AI models for video generation

- Advertisement -

Alibaba Cloud, the technology arm of Alibaba Group, has announced the open-sourcing of its advanced AI models for video generation, marking a significant contribution to the global open-source community. This initiative aims to empower researchers, developers, and businesses by providing access to cutting-edge tools for creating high-quality visual content.

The company has released four models from its Wan2.1 series, the latest iteration of its video foundation model Tongyi Wanxiang. These include the 14-billion-parameter T2V-14B and I2V-14B models, as well as their smaller counterparts, T2V-1.3B and I2V-14B-480P. Designed to generate videos and images from text or image inputs, these models are now available on Alibaba Cloud’s ModelScope platform and the collaborative AI hub Hugging Face. Within a week of their release, downloads across both platforms exceeded 1 million.

The Wan 2.1 series leads in key dimensions such as dynamic degree, spatial relationships, color, and multi-object interactions

The Wan2.1 series stands out as the first video generation model supporting text effects in both Chinese and English. It excels in generating realistic visuals by handling complex movements, improving pixel quality, adhering to physical principles, and executing precise instructions. These capabilities have earned Wan2.1 the top spot on the VBench leaderboard, a benchmark suite for video generative models, with an overall score of 86.22%. The model leads in key areas such as dynamic motion, spatial relationships, color accuracy, and multi-object interactions.

- Advertisement -

We used the text prompt: “In a wide-angle, frontal shot, a man dives from the platform in red swim trunks, arms out and legs together. As the camera lowers, he leaps into the water, creating splashes, with the blue pool in the background.”

Lowering barriers to AI innovation

By making these models open-source, Alibaba Cloud aims to lower entry barriers for businesses and researchers seeking to leverage AI for video creation. The T2V-14B model is optimized for generating visuals with intricate motion dynamics, while the T2V-1.3B model balances quality with computational efficiency, making it accessible for developers with limited resources. For instance, users can generate a five-second 480p video using a standard laptop in just four minutes.

The I2V models extend functionality by enabling image-to-video generation. Users can input an image alongside a brief text description to create dynamic videos. These models support images of any dimensions, offering flexibility for various use cases.

A legacy of open-source leadership

This release builds on Alibaba Cloud’s history of open-source contributions. In August 2023, the company introduced its Qwen (Qwen-7B) large language model series, which has since topped Hugging Face’s Open LLM Leaderboards and inspired over 100,000 derivative models globally. These efforts underscore Alibaba’s commitment to fostering innovation through collaboration.

Driving industry transformation

Training video foundation models demands vast computing resources and high-quality data. By sharing these tools openly, Alibaba Cloud enables more organizations to create tailored visual content cost-effectively. This move also positions Alibaba among global leaders in AI innovation, competing with major players like OpenAI and Stability AI.

As the demand for advanced AI tools grows across industries such as marketing, entertainment, and gaming, Alibaba Cloud’s open-source initiative highlights its dedication to democratizing access to transformative technologies while driving progress in video generation AI.

Author

- Advertisement -

Share post: