Alibaba has released two major generative AI models, Wan2.7-Video and Wan2.7-Image, expanding its multimedia AI suite with capabilities aimed at professional creators and enterprise workflows.
What is Wan2.7-Video?
Wan2.7-Video is Alibaba's latest video generation model, designed to handle the full creative pipeline from script to finished output. It is not a single model but a suite of four components:
Wan2.7-t2v — Text-to-video generation
Wan2.7-i2v — Image-to-video generation
Wan2.7-r2v — Reference-to-video generation
Wan2.7-videoedit — Video editing
The suite supports video lengths from 2 to 15 seconds and outputs at 720p and 1080p resolution. Enterprise users can access batch processing and custom workflows through dedicated APIs.
Key Capabilities of Wan2.7-Video
Natural language editing
Users can modify video elements, including character actions, dialogue, appearance, scene settings, shooting style, and camera movements, using plain-text instructions, without specialized post-production software.
Lip sync and voice preservation
The model can rewrite dialogue while automatically syncing lip movements and preserving the original speaker's vocal characteristics.
Multi-character consistency
Wan2.7-Video can maintain consistent visual identities and custom voice profiles for up to five distinct characters across a video project.
Multimodal inputs
The system accepts audio clips, multi-panel images, and text as control inputs, allowing fine-grained direction over weather, environment, camera composition, and character behavior.
Cinematic shot generation
A single text prompt can produce storyboards with professional camera techniques including FPV drone perspectives, 360-degree orbital shots, and context-sensitive lighting. The model also includes a video continuation feature that enables smooth scene transitions.
Style and emotional range
The model supports thousands of style combinations and more than 50 distinct emotional expressions for character performance.
What Is Wan2.7-Image?
Released shortly before Wan2.7-Video, Wan2.7-Image is a visual generation model focused on personalization and color accuracy — two areas where AI image generators have historically underperformed.
Deep personalization
Users can fine-tune specific physical traits — such as bone structure and eye shape — for consistent character representation across outputs.
Exact color matching
A dedicated "color palette" feature allows users to input precise color codes, enabling brand-accurate color reproduction in generated images.
Advanced text rendering
The model uses a 3,000-token context window to produce print-quality text within images, including academic content, complex formulas, and tables across 12 languages.
High-volume batch generation
A single request can process up to nine reference images and generate 12 distinct outputs, supporting storyboard production and e-commerce workflows.
Pixel-level editing
A "click-to-edit" interface enables precise element manipulation — adding, moving, or repositioning components within generated images.
Wan2.7-Image-Pro
Alibaba simultaneously launched an enhanced version, Wan2.7-Image-Pro, featuring stronger prompt interpretation, more stable composition, and 4K output resolution.
Both models are part of Alibaba's Wan2.7 series. Enterprise API access is available for batch processing workflows. Specific pricing and regional availability have not been detailed in the official release.