Your Characters Just Found Their Voice

May 26, 2026

Genvid now closes one of the most requested gaps in AI-assisted production: synchronized, character-specific lip sync between generated audio and generated video, built directly into the timeline editor. Make your characters speak-and their lips match-using the same editor where the rest of your production lives.

How It Works

Every shot in Genvid can carry individually generated audio tracks for each character. Until now, those audio performances lived alongside the video rather than within it. Lip Sync merges them together, analyzing each character’s audio track and generating a new video with baked-in, phoneme-accurate lip movement for every speaker in the shot.

Once you select your shot in the Timeline Editor and choose your sync model, Genvid identifies how many speakers are present in the shot, and generates synchronized audio for each one independently before merging everything into a single output video. The result is a new video asset, prepended to your shot’s clip list, with all audio baked in and ready for selection.

Multi-Character Lip Sync, Out of the Box

The feature handles multi-speaker shots natively. A scene with two characters produces a single merged output where both characters’ lip movements are synchronized to their respective audio tracks. The platform handles the per-speaker attribution automatically, with no manual assignment of audio to character required.

Part of a Complete Audio Pipeline

Lip Sync arrives alongside Genvid’s existing Dialog Audio Timeline Editor, which supports trimming, resetting, and muting individual audio clips directly on the timeline. Together they form a complete in-platform audio production workflow that lets you generate voices, edit timing, sync to video, and deliver, all without leaving the platform at any stage.