This week, all eyes are on DeepSeek, a Chinese AI lab making waves in the tech industry. However, Alibaba isn’t letting its rival steal the spotlight. The company’s Qwen team just unveiled a powerful new family of AI models—Qwen2.5-VL—that’s designed to tackle a variety of text and image analysis tasks, putting it in direct competition with leading players like OpenAI and Google.

What Makes Qwen2.5-VL Stand Out?

It comes with impressive capabilities, offering a wide range of functions that include text parsing, video understanding, image analysis, and even controlling software. Think of it as a more advanced version of the model behind OpenAI’s Operator, allowing users to interact with computers in dynamic ways. These models have the ability to parse files, analyze complex data from invoices or forms, and even comprehend long-duration videos.

But the real kicker? According to benchmarking results from the Qwen team, the most powerful Qwen2.5-VL model—Qwen2.5-VL-72B—outperforms its competitors, including OpenAI’s GPT-4, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 2.0 Flash, across a variety of categories such as video understanding, math, document analysis, and question-answering.

Advanced Image and Video Analysis Capabilities

One of the most intriguing features of Qwen2.5-VL is its ability to recognize and analyze a wide range of media types. It can identify logos and characters from movies and TV series, as well as products in images, which suggests that it may have been trained on copyrighted works. Additionally, the model can analyze charts and extract data from scans of documents—capabilities that make it a powerful tool for various business and research applications.

Restricted Topics and Censorship

However, like many Chinese-developed AI systems, Qwen2.5-VL is bound by the country’s strict regulations regarding online content. When asked about sensitive topics, such as the political actions of Xi Jinping, It returned an error message. This limitation is due to China’s internet censorship rules, which require AI models to align with “core socialist values.” Consequently, some topics, like Taiwan’s autonomy, are off-limits for discussion.

Qwen2.5-VL Can Control Software, But with Limitations

Another exciting feature of Qwen2.5-VL is its ability to interact with software on both PCs and mobile devices. In a demo, the model successfully opened the Booking.com app on an Android phone and booked a flight from Chongqing to Beijing. However, when tested on a Linux desktop, it struggled to accomplish much beyond switching tabs. This suggests that while it can interact with apps, it’s still facing challenges in more complex environments.

Licensing and Availability

It is available for testing through Alibaba’s Qwen Chat app and can also be downloaded from the AI development platform Hugging Face. The flagship model, Qwen2.5-VL-72B, is licensed under Alibaba’s custom terms, meaning companies and developers with more than 100 million monthly active users must request permission before deploying the model commercially. In contrast, the smaller versions of the model, Qwen2.5-VL-3B and Qwen2.5-VL-7B, are available under a more permissive license, allowing broader access.

Alibaba’s new Qwen2.5-VL AI models showcase impressive capabilities in text, image, and video analysis, challenging global competitors. However, censorship restrictions and software limitations remain key factors to consider.

The Future of AI in China

With the release of , Alibaba has made it clear for that it’s ready to compete at the highest level in the AI space. While its features rival the best in the industry, it’s also a reminder of the unique challenges AI developers face in China—especially when it comes to navigating strict censorship laws. As the AI arms race continues, it’ll be interesting to see how Alibaba’s latest models fare against global competition, especially in an increasingly regulated tech landscape.