Beginner3 min read

Long-audio lipsync (chop + wrap + Seedance + stitch)

Drop a long VO + a face. Hybrig chops at silences, wraps each segment, runs Seedance one at a time, and stitches the lot back into one continuous video.

What you’ll build

Drop a long VO + a face. Hybrig chops at silences, wraps each segment, runs Seedance one at a time, and stitches the lot back into one continuous video. Every step runs locally on your own GPU. No subscription. No per-render meter. The chain is yours, the file is yours, the render lives on your drive forever.

The chain

4 pipeline nodes, in order:

  1. 1
    Audio ChopperLocal~~

    Split a long audio file into ≤8s segments at natural silence boundaries. Local ffmpeg pass. WHAT: Takes one long voiceover in, returns N short segments out. Each cut lands at the longest silence inside an 8s window so no segment ends mid-word. If the input has no usable silences (rare on rapid-fire reads), the chopper falls back to hard cuts and warns. WHY: Seedance reliably lipsyncs under 10 seconds; past that the model drifts off tempo. The chopper is layer 1 of the long-audio lipsync pipeline taught at /learn/seedance-audio-partitioning. WHERE in the process: between any audio source (F5-TTS, voice ref, BYO) and the audio-to-video-wrap node. Feed the segmented output into the wrap, then into the Seedance segment runner.

  2. 2
    Audio → Video WrapLocal~~

    Package an audio track inside a video container so Seedance reads cadence from the video timeline. Local ffmpeg pass, no upload. WHAT: Takes raw audio in, returns a short MP4 with the audio muxed onto a blank or still-image visual track. WHY: Seedance's lipsync model produces measurably better sync when its audio reference is delivered as a video. Raw audio in causes the model to reshape the tempo and on songs hallucinate lyrics. Two creators surfaced this independently — neither found it in fal.ai's docs because fal.ai doesn't publish it. We bake it into the pipe so users never trip on it. WHERE in the process: between any audio source (F5-TTS, UGC Audio Fix, voice ref) and a Seedance node. Skip the wrap if the downstream lipsync is local (LatentSync) — the trick is Seedance-specific. AUTO: When a Seedance render is dispatched with raw audio and no manual video ref, the pipeline performs this wrap server-side regardless of whether you dropped the node on the canvas. The node exists to make the wrap visible in graphs you author by hand.

  3. 3
    Seedance Segment Runner (cloud)Cloud~~

    Run N wrapped segments through Seedance ONE AT A TIME and return the lipsynced clips in order. Never batches. WHAT: Takes the output of audio-to-video-wrap (N wrapped MP4s) and a face reference, submits each segment to Seedance, waits for completion, moves to the next. Returns lipsynced clips in original order. WHY ONE AT A TIME: fal's queue has been known to silently park jobs and bill on retry. One bad batch fan-out can run a $5 render into a $40 one. Sequential = predictable spend. Local renders can fan out freely; metered cloud goes single file. WHERE in the process: layer 3 of the long-audio lipsync pipeline. Feed the wrapped-segments output into this node, then feed its lipsynced-segments output into the Remotion assembler.

  4. 4
    Remotion AssemblerLocal~~

    Stitch N lipsynced segments back into one continuous video with the master audio laid on top. Optional crossfades at the segment boundaries. WHAT: Takes the Seedance segment runner's output (N lipsynced clips in order) plus the original master audio, returns one continuous MP4. The clips' own audio tracks are muted; the original VO plays over the top so the output uses the clean source audio instead of Seedance's round-tripped version. WHY REMOTION HERE: this is composition work — Sequence stacking, crossfades, master-audio overlay. Remotion's React-driven path scales cleanly when you later want text overlays, brand bugs, or per-segment treatments. ffmpeg owns the wrap step upstream; Remotion owns the assembly downstream. WHERE in the process: layer 4, the final step of the long-audio lipsync pipeline. Output is a finished talking-head video.

Prerequisites

Try it

The fastest way to learn this chain is to drop it onto the Studio canvas and run it on your own rig. The whole graph is pre-wired.

Watch

Walkthrough video coming

Video walkthrough coming. For now, run the workflow and watch the pipeline timeline — every stage exposes its intermediate artifact and a plain-language description of what that node did.