Beginner3 min read

Long-audio lipsync (chop + wrap + Seedance + stitch)

Drop a long VO + a face. Hybrig chops at silences, wraps each segment, runs Seedance one at a time, and stitches the lot back into one continuous video.

What you’ll build

Drop a long VO + a face. Hybrig chops at silences, wraps each segment, runs Seedance one at a time, and stitches the lot back into one continuous video. Every step runs locally on your own GPU. No subscription. No per-render meter. The chain is yours, the file is yours, the render lives on your drive forever.

The chain

4 pipeline nodes, in order:

1
Audio ChopperLocal~~
Split a long audio file into ≤8s segments at natural silence boundaries. Local ffmpeg pass. WHAT: Takes one long voiceover in, returns N short segments out. Each cut lands at the longest silence inside an 8s window so no segment ends mid-word. If the input has no usable silences (rare on rapid-fire reads), the chopper falls back to hard cuts and warns. WHY: Seedance reliably lipsyncs under 10 seconds; past that the model drifts off tempo. The chopper is layer 1 of the long-audio lipsync pipeline taught at /learn/seedance-audio-partitioning. WHERE in the process: between any audio source (F5-TTS, voice ref, BYO) and the audio-to-video-wrap node. Feed the segmented output into the wrap, then into the Seedance segment runner.
2
Audio → Video WrapLocal~~
Package an audio track inside a video container so Seedance reads cadence from the video timeline. Local ffmpeg pass, no upload. WHAT: Takes raw audio in, returns a short MP4 with the audio muxed onto a blank or still-image visual track. WHY: Seedance's lipsync model produces measurably better sync when its audio reference is delivered as a video. Raw audio in causes the model to reshape the tempo and on songs hallucinate lyrics. Two creators surfaced this independently — neither found it in fal.ai's docs because fal.ai doesn't publish it. We bake it into the pipe so users never trip on it. WHERE in the process: between any audio source (F5-TTS, UGC Audio Fix, voice ref) and a Seedance node. Skip the wrap if the downstream lipsync is local (LatentSync) — the trick is Seedance-specific. AUTO: When a Seedance render is dispatched with raw audio and no manual video ref, the pipeline performs this wrap server-side regardless of whether you dropped the node on the canvas. The node exists to make the wrap visible in graphs you author by hand.
3
Seedance Segment Runner (cloud)Cloud~~
Run N wrapped segments through Seedance ONE AT A TIME and return the lipsynced clips in order. Never batches. WHAT: Takes the output of audio-to-video-wrap (N wrapped MP4s) and a face reference, submits each segment to Seedance, waits for completion, moves to the next. Returns lipsynced clips in original order. WHY ONE AT A TIME: fal's queue has been known to silently park jobs and bill on retry. One bad batch fan-out can run a $5 render into a $40 one. Sequential = predictable spend. Local renders can fan out freely; metered cloud goes single file. WHERE in the process: layer 3 of the long-audio lipsync pipeline. Feed the wrapped-segments output into this node, then feed its lipsynced-segments output into the Remotion assembler.
4
Remotion AssemblerLocal~~
Stitch N lipsynced segments back into one continuous video with the master audio laid on top. Optional crossfades at the segment boundaries. WHAT: Takes the Seedance segment runner's output (N lipsynced clips in order) plus the original master audio, returns one continuous MP4. The clips' own audio tracks are muted; the original VO plays over the top so the output uses the clean source audio instead of Seedance's round-tripped version. WHY REMOTION HERE: this is composition work — Sequence stacking, crossfades, master-audio overlay. Remotion's React-driven path scales cleanly when you later want text overlays, brand bugs, or per-segment treatments. ffmpeg owns the wrap step upstream; Remotion owns the assembly downstream. WHERE in the process: layer 4, the final step of the long-audio lipsync pipeline. Output is a finished talking-head video.

Prerequisites

An NVIDIA RTX GPU with 12 GB+ VRAM. 24 GB recommended for the full pipeline. Check system requirements →
Hybrig desktop installed and your worker connected. Get the desktop app →

Try it

The fastest way to learn this chain is to drop it onto the Studio canvas and run it on your own rig. The whole graph is pre-wired.

Open this in Studio →Browse all lessons

Watch

Walkthrough video coming

Video walkthrough coming. For now, run the workflow and watch the pipeline timeline — every stage exposes its intermediate artifact and a plain-language description of what that node did.